Part of
ML Notes
Machine learning is a form of applied statistics with increased emphasis on the use of computers to statistically estimate complicated functions and a decreased emphasis on proving confidence intervals around these functions (Goodfellow 2016).
Forms of Learning
-
Scenarios
-
Problems for which existing solutions require a lot of fine-tuning or long lists of rules
-
Complex problems for which using a traditional approach yields no good solution.
-
Fluctuating environments: ML systems can adapt to new data.
-
Getting insights about complex problems and large amounts of data.
Applications
-
Analyzing images of products on a production line to automatically classify them
- image classification β convolutional neural networks (CNN)
-
Detecting tumors in brain scans
- semantic segmentation β CNN
-
Automatically classifying news articles
- natural language processing (NLP)β recurrent neural networks (RNN), CNNs, Transformers.
-
Automatically flagging offensive comments on discussion forums
- NLP β RNNs, CNNs, Transformers
-
Summarizing long documents automatically
- NLP β text summarization β RNNs, CNNs, Transformers
-
Creating a chatbot or a personal assistant
- NLP β natural language understanding, question-answering modules
-
Forecasting company revenue based on several performance metrics
-
Regression β Linear Regression, Polynomial Regression, regression SVM, regression Random Forest, Artificial Neural Network.
-
in order to factor in past performance metrics β RNNs, CNNs, or Transformers
-
-
voice commands
- speech recognition β RNNs, CNNs, or Transformers
-
detecting credit card fraud
- anomaly detection
-
segmenting clients based on purchases in order to design a different marketing strategy for each segment.
- clustering
-
Representing a complex, high-dimensional dataset in a clear and insightful diagram.
- data visualization β dimensionality reduction
-
recommending a product that a client might be interested in, based on past purchases.
- recommender system β feed past purchases to an artificial neural network.
-
building an intelligent bot for a game
- Reinforcement Learning
Types of ML Systems
Human Supervision
Supervised Learning
- The agent observes input-output pairs and learns a function that maps from input to ouput.
- Labeled data
- Classification
- k-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural Networks
- Predicting a target numeric value given a set of features called predictors
Unsupervised Learning
- The agent learns patterns in the input without any explicit feedback
- Unlabeled data
- In unsupervised learning, given a training set , without a labeled output, one must construct a sufficient model of the data.
Algorithms
-
Clustering
- K-means
- Input:
- K-means
- Randomly initialize cluster centroids.
- For all points, compute which cluster centroid is the closest.
- For each cluster centroid, move centroids to the average points belonging to the cluster.
- Repeat until convergence.
K-means is guaranteed to converge. To show this, we define a distortion function: K means is coordinate ascent on J. Since always decreases, the algorithm converges.
-
DBSCAN
-
Hierarchical Cluster Analysis (HCA)
-
Anomaly detection and novelty
-
One-class SVM
-
Isolation Forest
-
-
Visualization and dimensionality reduction
-
Principal Component Analysis (PCA)
-
Kernel PCA
-
Locally Linear Embedding (LLE)
-
t-Distributed Stochastic Neighbor Embedding (t-SNE)
TIP: Itβs a good idea to reduce the dimension of your training data using a dimensionality reduction algo before feeding it to another ML algo (supervised learning). It will run faster and take up less disk+memory space, and perform better.
-
-
Association rule learning
-
Apriori
-
Eclat
-
Semisupervised Learning
- Deep Belief Networks β restricted Boltzmann machines (unsupervised) stacked on top of each other β fine-tuned using supervised learning techniques.
Reinforcement Learning
- The agent learns from a series of reinforcements: rewards and punishments.
Feature Engineering
Learning Incrementally
Online Learning
Batch Learning
Comparing new data points to known
Instance-based Learning
Model-based Learning
References
GΓ©ron 2019
Goodfellow 2016