EECS6898 Large-Scale Machine Learning

Class lectures: Tuesdays 12:35-2:25pm in Eng253, Mudd

Instructor: Sanjiv Kumar (Office Hours: Tuesdays 2:25pm - 3:25pm)
TA: Jun Wang / Junfeng He (Office Hours: Tuesdays 1:25-2:25pm/4:25-5:25pm)

Driven by rapid advances in many fields such as Biology, Finance and Web Services, applications involving millions or even billions of data items such as documents, user records, reviews, images or videos are not that uncommon. Can we develop methods that can learn efficiently from these massive amounts of potentially noisy data? There is an urgent need to revisit the traditional machine learning methods and tools to bridge the wide gap between large scale practical requirements and traditional learning approaches.

The goal of this course is to introduce fundamental concepts of large-scale machine learning. Both theoretical and practical aspects will be discussed. The primary focus of the course will be on analyzing basic tools of large-scale learning including the relevant theory and algorithms rather than focusing on specific machine learning techniques. It will also provide running examples from real-world settings from various fields including Vision and Information Retrieval. The course will prepare students to evolve a new dimension while developing models and optimization techniques to solve a practical problem - scalability.

We will analyze tools for large-scale learning that can be applied to a variety of commonly used machine learning techniques for classification, regression, ranking, clustering, density estimation and semi-supervised learning. Example applications of these tools to specific learning methods will also be provided. A tentative list of tools we plan to discuss is given below:

Randomized Algorithms
Matrix Approximations I (low-rank approximation, decomposition)
Matrix Approximations II (sparse matrices, matrix completion)
Approximate Nearest Neighbor Search I (trees)
Approximate Nearest Neighbor Search II (hashes)
Fast Optimization (first-order methods)
Kernel Methods I (fast training)
Kernel Methods II (fast testing)
Dimensionality Reduction (linear and nonlinear methods)
Sparse Methods/Streaming (sparse coding...)

Announcements

Please check this section frequently for new announcements.

The slides for "Sparse Methods" are now available under Lectures. (Nov 22)
The slides for "Dimensionality Reduction" are now available under Lectures. (Nov 16)
Assignment 3 is due end of the day today. (Nov 16)
The slides for "Kernel Methods II" are now available under Lectures. (Nov 8)
The project proposal submission date has been extended to 11:59 pm, Nov 4. (Nov 2)
Assignment 3 has been posted under Homework. (Nov 2)
Assignment 2 is due end of the day today. (Nov 2)
The details on the class project are now available under Projects. (Oct 27)
The slides for the eighth lecture are now available under Lectures. (Oct 25)
Assignment 2 has been posted under Homework. (Oct 20)
The slides for the seventh lecture are now available under Lectures. (Oct 19)
Assignment 1 is due end of day today. (Oct 19)
The slides for the sixth lecture are now available under Lectures. (Oct 11)
Assignment 1 has been posted under Homework. (Oct 5)
The slides for the fifth lecture are now available under Lectures. (Oct 4)
The slides for the fourth lecture are now available under Lectures. (Sept 27)
The slides for the third lecture are now available under Lectures. (Sept 20)
The slides for the second lecture are now available under Lectures. (Sept 13)
The slides for the first (intro) lecture are now available under Lectures. (Sept 8)
First (Intro) lecture will be on Sept 7.
Course website is up (August 20).

Course Prerequisites

Basic Linear Algebra
Basic Probability and Statistics
Basic Optimization
Basic intro to Machine Learning is preferred but not necessary.

Grading

No Exams!
Homeworks (3 assignments 60%)
Final project (40%)

Large-Scale Machine Learning

EECS 6898, Fall 2010

Sanjiv Kumar

Columbia University

Class lectures: Tuesdays 12:35-2:25pm in Eng253, Mudd

Announcements

Course Prerequisites

Grading