Machine Learning & Pattern Recognition

CS 295-3, K hour: Tue, Thu 2:30-3:50 PM, CIT Lubrano

Instructor: Thomas Hofmann



Short Course Description

The course will provide a systematic introduction to pattern recognition & machine learning and covers introductory as well as intermediate level material. It is mainly geared towards graduate students, but may also be suitable for advanced undergraduate students with a solid mathematical background. There are no strictly enforced prerequisites, but familiarity with probability theory (for example CS 155), calculus, and linear algebra is a plus.

Covered topics include: decision theory, maximum likelihood estimation, Bayesian statistics, linear classifiers, support vector machines, nearest neighbor classification, Parzen windows, linear regression, regularization theory, neural networks, boosting, model selection, statistical learning theory, feature selection, graphical models, and various techniques for unsupervised learning.

Students from other departments are welcome.

Textbook & References

The course will use the following textbook:

Pattern Classification
by Richard O. Duda, Peter E. Hart, and David G. Stork
John Wiley & Sons, Inc., New York, Second Edition, 2001.

In addition, chapters and excerpts from the following books will be used in class:

An Introduction to Support Vector Machines
by Nello Cristianini and John Shawe-Taylor
Cambridge University Press,
Cambridge UK, 2000.
The Nature of Statistical Learning Theory
by Vladimir N. Vapnik
Springer Verlag, 2nd Edition, 1999.
A Probabilistic Theory of Pattern Recognition
by Luc Devroye, Laszlo Gyorfi, and Gabor Lugosi
Springer Verlag, 1996.
Neural Networks for Pattern Recognition
by Christopher M. Bishop
Oxford University Press, Oxford UK, 1995.
Machine Learning
by Tom Mitchell
McGraw-Hill, 1997.
Probabilistic Reasoning in Intelligent Systems
by Judea Pearl
Morgan Kaufmann Publishers,
San Francisco, CA, 2nd Edition, 1988.
Introduction to Bayesian Networks
by Finn V. Jensen
Springer Verlag 1996.
Parametric Statistical Inference
by J.K. Lindsey
Clarendon Press, Oxford, 1996.
The Elements of Statistical Learning
by T. Hastie, R. Tibshirani, J. Friedman
Springer Verlag 2001.

Course Organisation and Grading Policy

Organization

Grading

Continuations


General Outline (subject to change)

Sep 4,6 No class - European Conference on Machine Learning
Sep 11 Introduction
Bayesian Decision Theory (part I)
-
[DHS 2.1-2.2,A.4]
slides
Sep 13 Normal Distribution
Bayesian Decision Theory (part II)
[DHS 2.5, A.4]
[DHS 2.6, DHS 2.9]
slides
    Assignment 1: Decision Theory
Sep 18 Maximum Likelihood Parameter Estimation
Bayesian Parameter Estimation
[DHS 3.1-3.3]
[DHS 3.3-3.5]
slides
Sep 20 Theory of point estimators, Sufficient statistics
Cramer-Rao Theorem, Rao-Blackwell Theorem
[DHS 3.6; Lindsey 7.4 (copy at CIT 505)] slides
    Assignment 2: Parameter Estimation
    Programming Exercises A: Naive Bayes Classifier for Text Categorization
Sep 25 Discussion & Interaction
Exponential Family
 
[DHS 3.6]
 
slides
Sep 27 Fisher's Linear Discriminant
Perceptron Algorithm
[DHS 5.1,5.2, 3.8.2]
[CS 2.1.1 (copy at CIT 505)]
slides
Oct 2 Support Vector Machines for Classification [CS 6.1]
slides
Oct 4 Soft-Margin Classifiers
Quadratic Programming
[CS 5.1-5.3, DHS A.3] slides
Oct 9 Non-linear Discriminant Functions via Kernels
[CS 3.1-3.3] slides
    Assignment 3: Linear Discriminant Functions & SVMs
Oct 11 Discussion & Interaction
Nearest Neighbor Classifier
[DHS 4.5-4.6] slides
Oct 16 Linear Regression, Ridge Regression, Regularization [CS 2.2, HTF 3] slides
Oct 18 Radial Basis Function Networks, K-means, Interploation [Bishop 5] slides
Oct 23 Regularization Networks, SVM regression [Bishop 5, CS 6.2] slides
  Programming Exercises B: Classification & Regression
Oct 25 Neural Networks, Multilayer Perceptrons, Backpropagation [Bishop 4, DHS 6.3,6.6] slides
Oct 30 Discussion & Interaction   -
Nov 1 Resampling for Model Evaluation & Ensemble Methods:
Bagging, Boosting, Jackknife, Bootstrap
[DHS 9.4-9.5] slides
Nov 6 Ensemble Methods: AdaBoost, Hierarchical Mixtures of Experts [HTF 9.5, 10] slides
Nov 8 Statistical Learning Theory [CS 4.1-4.2] slides
Nov 13 Graphical Models: Markov Networks, Bayesian Networks [Jordan, Bishop: Chapter 2]
(draft version, not for circulation)
pdf
Nov 15 Inference in Graphical Models
The Junction Tree Algorithm, Part I
[Jordan, Bishop: Chapter 14]
(draft version, not for circulation)
pdf
  Assignment 4: Boosting, Backpropagation & Learning Theory
Nov 20 The Junction Tree Algorithm, Part II
Hidden Markov Models
[Jordan, Bishop: Chapter 15]
(draft version, not for circulation)
pdf
    Assignment 6: Graphical Models
Nov 22 No class - Thanksgiving Recess
Nov 27 Hidden Markov Models (continued)
Mixture Models
[DHS 10.1-10.4]  
Nov 29 Learning in Graphical Models
Expectation Maximization Algorithm
Tutorial by David Heckerman html
    Programming Exercises C: Unsupervised Learning
Dec 3 Discussion & Interaction (self-organized)   -
Dec 6 No class - Neural Information Processing Systems
Reading & preparation of student presentations
  -
Dec 11 Student Presentations
Applications of Machine Learning Methods
[*]  
Dec 13 Student Presentations
Applications of Machine Learning Methods
[*]