machine-learning
# ML Notes
Machine learning is a form of applied statistics with increased emphasis on the use of computers to statistically estimate complicated functions and a decreased emphasis on proving confidence intervals around these functions (Goodfellow 2016).
# Forms of Learning
Problems for which existing solutions require a lot of fine-tuning or long lists of rules
Complex problems for which using a traditional approach yields no good solution.
Fluctuating environments: ML systems can adapt to new data.
Getting insights about complex problems and large amounts of data.
# Applications
Analyzing images of products on a production line to automatically classify them
- image classification –> convolutional neural networks (CNN)
Detecting tumors in brain scans
- semantic segmentation –> CNN
Automatically classifying news articles
- natural language processing (NLP)–> recurrent neural networks (RNN), CNNs, Transformers.
Automatically flagging offensive comments on discussion forums
- NLP –> RNNs, CNNs, Transformers
Summarizing long documents automatically
- NLP –> text summarization –> RNNs, CNNs, Transformers
Creating a chatbot or a personal assistant
- NLP –> natural language understanding, question-answering modules
Forecasting company revenue based on several performance metrics
Regression –> Linear Regression, Polynomial Regression, regression SVM, regression Random Forest, Artificial Neural Network.
in order to factor in past performance metrics –> RNNs, CNNs, or Transformers
voice commands
- speech recognition –> RNNs, CNNs, or Transformers
detecting credit card fraud
- anomaly detection
segmenting clients based on purchases in order to design a different marketing strategy for each segment.
- clustering
Representing a complex, high-dimensional dataset in a clear and insightful diagram.
- data visualization –> dimensionality reduction
recommending a product that a client might be interested in, based on past purchases.
- recommender system –> feed past purchases to an artificial neural network.
building an intelligent bot for a game
- Reinforcement Learning
# Types of ML Systems
# Human Supervision
# Supervised Learning
- The agent observes input-output pairs and learns a function that maps from input to ouput.
- Labeled data
- Classification
- k-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural Networks
- Predicting a target numeric value given a set of features called predictors
# Unsupervised Learning
- The agent learns patterns in the input without any explicit feedback
- Unlabeled data
- In unsupervised learning, given a training set $S=(x_{1},…,x_{m})$, without a labeled output, one must construct a sufficient model of the data.
# Algorithms
Clustering
- K-means
- Input: ${x^{(1),x^{(2)},x^{(3)},…,x^{(m)}}}$
- K-means
- Randomly initialize cluster centroids.
- For all points, compute which cluster centroid is the closest.
- For each cluster centroid, move centroids to the average points belonging to the cluster.
- Repeat until convergence.
K-means is guaranteed to converge. To show this, we define a distortion function: $$J(c,\mu)=\sum_{i=1}^m||x^{(i)}-\mu_{c^{(i)}}||^2$$ K means is coordinate ascent on J. Since $J$ always decreases, the algorithm converges.
DBSCAN
Hierarchical Cluster Analysis (HCA)
Anomaly detection and novelty
One-class SVM
Isolation Forest
Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Locally Linear Embedding (LLE)
t-Distributed Stochastic Neighbor Embedding (t-SNE)
TIP: It’s a good idea to reduce the dimension of your training data using a dimensionality reduction algo before feeding it to another ML algo (supervised learning). It will run faster and take up less disk+memory space, and perform better.
Association rule learning
Apriori
Eclat
# Semisupervised Learning
- Deep Belief Networks –> restricted Boltzmann machines (unsupervised) stacked on top of each other –> fine-tuned using supervised learning techniques.
# Reinforcement Learning
- The agent learns from a series of reinforcements: rewards and punishments.
feature engineering
# Learning Incrementally
# Online Learning
# Batch Learning
# Comparing new data points to known
# Instance-based Learning
# Model-based Learning
# References
Géron 2019
Goodfellow 2016