Introduction To Machine Learning
Introduction To Machine Learning
Instance Based Models - Uses the complete data set to create model
without parameterization
Both have labels therefore fall under the supervised learning problems.
More examples:
● Clustering
○ Automatically separate customers for better marketing campaign
○ Clustering as exploration tool to understand data to make informed decisions
● Dimensionality Reduction
○ Compress data
○ Visualize dataset in reduced space
○ Learn with missing labels, used for search engines, used for recommender
systems
● anomaly detection
○ concerns learning which data point as an outlier
Agent (software) Environment Reward and State
Learning policy taking Changes for each agent The reward is a problem
actions to maximize a action to a new state dependent function that
cumulative reward penalizes undesirable
actions and rewards
desirable actions
Has no teacher showing him Updates for each action the The state are problem
which the best action is new cumulative reward specific
Semi-Supervised Learning
Definition:
● Has both labeled and unlabeled samples
● Learn model for complete data set
Approaches:
● Mixing supervised and unsupervised learning
● Train mode, Predict missing and retrain
Example:
Active Learning
Definition
● Learn model via interaction with teacher
● Teacher can be human or program
● Reinforcement learning subset
● Aget cannot alter the environment
Transfer Learning
Definition:
● Transfers knowledge from one learning task to another
● Typical application: Learning more efficiently from small
dataset when a large different dataset is available
○ model trained on large data
○ Pre-trained model a initializes model B
○ Train model B on small data
Why use transfer learning:
● Better overall performance in less training iterations
● For small training datasets
Steps:
● Collect data X
● Analyse, clean and process data. Carry out Feature Engineering
● Choose algorithm suitable to solve the task
● Construct and validate the model. Adapt model if required
● Implement the model and predict ^y =h(x)
Data Collection
Rich datasets can be crucial for a machine learning workflow and impact the final success.
Data collection requires a lot of time, resources and preparation:
● Data collection design
● Which features?
● How many samples?
● How can we measure (sensors = automatic collection)?
● How can we record the data?
● Where do we save the data? On chip?
Data Science
Steps transforming data towards being usable for machine learning
● Data Cleaning
○ Missing values: Delete/fill missing values (using interpolation)
○ Up and Downsampling
○ Filtering the data
● Data integration
○ merging data from different sources
● Data transformation
○ normalization → rescalling attributes to a joint scale
● Future extraction
○ Mathematical part
○ Selecting structured data from unstructured datasets.
○ Selecting structured data from unstructured datasets. Features created from
existing measurements (most common: statistical features like mean, max,
min, and standard deviation
○ we need it because:
■ Learning data without extracting features may require a high number
of variables
■ a high number of variables
● means a high amount of memory and computational power
● may cause overfitting: the model has so many degrees of
freedom that it will perfectly match the training data, but
perform poorly on the test data
● Feature selection
○ keeping features that improve the trained model and discarding the rest
● Feature reduction
○ Transforming high-dimensional into lower dimensional feature space
Algorithm selection
● ^ (calculates the
Validation compares Y test , Y test
error)
● Note: Here still neglected
○ Hyper Parameter-tuning
○ model selection
■ using a validation dataset