DSEG 635: Learning From Data

Description

This course covers the theory, algorithms, and applications of computational learning. The technical topics covered include linear models, theory of generalization, regularization and validation, neural networks, support vector machines, as well as specialized techniques and a term-long project with big datasets.

Prerequisites
- ICT-605 Applied Data Analytics or DSEG-560 Machine Learning.
- Ability to program and develop algorithms in some programming language

Textbook
- Learning From Data by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin.

Content

THEORY

1	The Learning Problem What is learning? Is learning feasible? Training versus Testing: Can we learn? Dichotomies; Growth function; break point.
2	Theory of Generalization The Vapnick-Chervonenkis inequality. The VC Dimension and Learning: Scope of VC analysis; Utility of VC dimension; Generalization bounds. Bias-Variance Tradeoff: Bias and Variance; Learning Curves.

TECHNIQUES (MODELS & METHODS)

1	The Linear Model Linear Classification; Linear Regression; Nonlinear Transformations. Logistic Regression; Likelihood Measure; Maximum Likelihood; Gradient Descent. Design and Implementation of a max-likelihood solution beyond Logistic Regression, e.g., Cox Regression.
2	Neural Networks and a Peek on Deep Learning Multilayer Perceptrons. Back Propagation: a Matrix Formulation. Implementation of a simple neural network from scratch using backpropagation algorithm and stochastic gradient descent, without packages. Multi-Class Logistic Regression. Vanishing Gradients
3	Overfitting & Regularization Constraining the model; Weight Decay; Augmented Error. How does Regularization work. Implementation of LASSO solver using Coordinate Descent.
4	Support Vector Machines Maximizing the margin; Support Vectors; Nonlinear Transforms. Constrained Optimization: Augmented Lagrangian Multipliers, KKT Conditions. Implementation of an SVM solver using the algorithm Coordinate Descent and SMO.
5	Kernel Methods: The kernel trick; soft-margin SVM.
6	Radial Basis Functions: RBF and nearest neighbors; RBF and neural networks; RBF and regularization.
7	Convex Optimization in Machine Learning Newton's method; Nesterov's accelerated gradient. Extension to non-Euclidian norm. Excessive gap minimization; Binary SVM, as an example. Maximum-margin models with structured output.
8	A Peek at Unsupervised Learning K-Means Clustering. Probability Density Estimation. Gaussian Mixture Models.

PARADIGMS

1	Supervised Learning
2	Unsupervised Learning
3	Reinforcement Learning
4	Active Learning
5	Online Learning
6	Bayesian Learning
7	Graphical Models

Resources

Machine Learning is a subject with a lot of very good expertise and tutorials. It is best to tap on these resources, as they have good production quality and are more condensed. However, we still recommend in-class lectures as they are helpful in building better connection with the materials. Students are highly encouraged to listen to The Video Lectures of Prof. Yaser Abu-Mostafa.