

Course Outcome:
After taking this course, students will be able to understand and implement in Python algorithms of Unsupervised Machine Learning and apply them to real-world datasets.
Course Topics and Approach:
Unsupervised Machine Learning involves finding patterns in datasets. The core of this course involves study of the following algorithms:
Clustering: Hierarchical, DBSCAN, K Means & Gaussian Mixture Model
Dimension Reduction: Principal Component Analysis
Unlike many other courses, this course:
-
Has a detailed presentation of the the math underlying the above algorithms, including normal distributions, expectation maximization, and singular value decomposition.
-
Has a detailed explanation of how algorithms are converted into Python code with lectures on code design and use of vectorization
-
Has questions (programming and theory) and solutions that allow learners to get practice with the course material
The course codes are then used to address case studies involving real-world data to perform dimension reduction/clustering for the Iris Flowers Dataset, MNIST Digits Dataset (images), and BBC Text Dataset (articles).
Course Audience:
This course is designed for:
-
Scientists, engineers, and programmers and others interested in machine learning/data science
-
No prior experience with machine learning is needed
-
Students should have knowledge of
-
Basic linear algebra (vectors, transpose, matrices, matrix multiplication, inverses, determinants, linear spaces)
-
Basic probability and statistics (mean, covariance matrices, normal distributions)
-
Python 3 programming
-
Students should have a Python installation, such as the Anaconda platform, on their machine with the ability to run programs in the command window and in Jupyter Notebooks
Teaching Style and Resources:
-
Course includes many examples with plots and animations used to help students get a better understanding of the material
-
Course has many exercises with solutions (theoretical, Jupyter Notebook, and programming) to allow students to gain additional practice
-
All resources (presentations, supplementary documents, demos, codes, solutions to exercises) are downloadable from the course Github site.
2021.08.28 Update:
-
Section 9.5: added Autoencoder example
-
Section 9.6: added this new section with an Autoencoder Demo
2021.11.02 Update:
-
Sections 2.3, 2.4, 3.4, 4.3: updates so codes can run in more recent versions of python and matplotlib and updates to presentations to point out the changes
2021.11.02 Update:
-
Added English captions to the course videos
-
1Section 1.1: Introduction
Introduction to Unsupervised Machine Learning with Python Course
-
2Section 1.2: About this Course
Information about course audience, prerequisites, and how to get most from course
-
3Section 1.3: Course Resources and Set Up
Information about course Github site and resources, installing Anaconda distribution if required, installing python packages, and testing set up
-
4Section 2.0: Python Demos
This brief section gives an overview of the demos in Section 2
-
5Section 2.1: Numpy Basic Demos
Jupyter notebook demo of basic numpy functionality used in the course
-
6Section 2.1: Exercises
Exercises for Section 2.1
-
7Section 2.2: Numpy Matrix Operations Demo
Jupyter notebook demos of numpy matrix operations functionality used in the course
-
8Section 2.2: Exercises
Exercises for Section 2.2
-
9Section 2.3: Matplotlib Basic Demo
Jupyter notebook demos of basic matplotlib plotting functionality used in this course
-
10Section 2.3: Exercises
Exercises for Section 2.3
-
11Section 2.4: Matplotlib Cluster Plot and Animation Demo
Jupyter notebook demos of matplotlib colormesh, scatter plot, and animation functionality used in this course
-
12Section 2.4: Exercises
Exercises for Section 2.4
-
13Section 2.5: Pandas Demo
Jupyter notebook demo of basic pandas functionality for reading data from csv files
-
14Section 2.5: Exercises
Exercises for Section 2.5
-
15Section 2.6: Sklearn Datasets Demo
Jupyter notebook demo of generating dataset using sklearn datasets functionality
-
16Section 3.0: Review of Mathematical Concepts
Review of what is covered in Section 3
-
17Section 3.1: What is Data in Unsupervised Learning
Description of data for Unsupervised Machine Learning and demo of using sklearn and wordcloud to process and visualize text. Students will be able set up datasets for their applications and be able to use basic sklearn functionality to convert text to feature matrices.
-
18Section 3.1: Exercises
Exercises for Section 3.1
-
19Section 3.2: Computational Complexity
Review of computational complexity and relevance to algorithms with demos using numpy package. Student will be able to estimate complexity power using numpy.
-
20Section 3.2: Exercises
Exercises for Section 3.2
-
21Section 3.3: Distance Measures
Description of distance measures and now to compute them using the numpy package functionality. Students will be able to compute distances between vectors using numpy.
-
22Section 3.3: Exercises
Exercises for Section 3.3
-
23Section 3.4: Singular Value Decomposition
Description of singular value decomposition and demo of how to compute svd using numpy. Students will understand what the singular value decomposition is, how to compute it, and how it will be used in the course.
-
24Section 3.4: Exercises
Exercises for Section 3.4
-
25Section 3.5: Mean, Variance, and Covariance
Review of mean, variance, and covariance, which are used in various unsupervised machine learning algorithms. Demo shows how to use numpy functions to compute mean, variance, and covariance.
-
26Section 3.5: Exercises
Exercises for Section 3.5
-
27Section 4.1: Hierarchical Clustering Algorithm
Description of Hierarchical Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
-
28Section 4.1: Exercises
Exercises for Section 4.1
-
29Section 4.2: Hierarchical Clustering Code Design
Description of course code design for the Hierarchical Clustering Algorithm. Given this code design, students will be able to implement the algorithm using Python.
-
30Section 4.3: Hierarchical Clustering Code Walkthrough
Walkthrough of course Hierarchical Clustering code. Students will be able to understand and use the course Hierarchical Clustering code.
-
31Section 4.3: Exercises
Exercises for Section 4.3
-
32Section 5.1: DBSCAN Algorithm
Description of the DBSCAN algorithm
-
33Section 5.1: Exercises
Exercises for Section 5.1
-
34Section 5.2: DBSCAN Code Design
Review of DBSCAN code design for course
-
35Section 5.3: DBSCAN Code Walkthrough
Walkthrough of course DBSCAN code
-
36Section 5.3: Exercises
Exercises for Section 5.3
-
37Section 6.1: K Means Algorithm
Description of K Means Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
-
38Section 6.1: Exercises
Exercises for Section 6.1
-
39Section 6.2: K Means Code Design
Review of course K Means code design
-
40Section 6.3: K Means Code Walkthrough
Walkthrough of course K Means code
-
41Section 6.3: Exercises
Exercises for Section 6.3
-
42Section 7.1: Normal Distribution Probability Density Function
Description of the Normal Distribution Probability Density Function for one dimension and multiple dimensions.
-
43Section 7.1: Exercises
Exercises for Section 7.1
-
44Section 7.2: Gaussian Mixture Model Algorithm
Description of Gaussian Mixture Model Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
-
45Section 7.2: Exercises
Exercises for Section 7.2
-
46Section 7.3: Gaussian Mixture Model Code Design
Review of course Gaussian Mixture Model code design
-
47Section 7.4: Gaussian Mixture Model Code Walkthrough
Walkthrough of course Gaussian Mixture Model code
-
48Section 7.4: Exercises
Exercises for Section 7.4
-
49Section 8.1: Metrics for Measuring Quality of Clustering
Description of Silhouette Index for measuring quality of clustering
-
50Section 8.1: Exercises
Exercises for Section 8.1
-
51Section 8.2: Comparison of Algorithms
Comparison of DBSCAN, K Means, and Gaussian Mixture Model clustering algorithms for 6 sklearn datasets
-
52Section 9.0: Dimension Reduction Overview
Overview of the dimension reduction algorithms
-
53Section 9.1: Principal Component Analysis Algorithm
Description of the Principal Component Analysis Algorithm and Jupyter Notebook demo.
-
54Section 9.1: Exercises
Exercises for Section 9.1
-
55Section 9.2: Principal Component Analysis Code Design
Review of design for Principal Component Analysis code.
-
56Section 9.3: Principal Component Analysis Code Walkthrough
Walkthrough of Principal Component Analysis code.
-
57Section 9.3: Exercises
Exercises for Section 9.3
-
58Section 9.4: PCA Applied to MNIST Digits Dataset
Application of Principal Component Analysis to MNIST Digits Dataset.
-
59Section 9.4: Exercises
Exercises for Section 9.4
-
60Section 9.5: Autoencoders
Description of how Autoencoders can be used for dimension reduction.
-
61Section 9.6: Autoencoder Demo (Optional)
This optional section has demo on using autoencoders for dimension reduction.
-
62Section 10.1: Clustering Quality Metrics
Description of the Purity and Bar Chart metrics for measuring quality of clustering plus demo and code walkthrough of Python implementation
-
63Section 10.2: Clustering for Iris Flower Dataset
Discussion of using clustering algorithms and PCA to reduce dimension to find clusters in the Iris Flower Dataset
-
64Section 10.2: Exercises
Exercises for Section 10.2
-
65Section 10.3: Clustering for MNIST Digits Dataset
Discuss of using clustering algorithms and PCA to reduce dimension to find clusters in the MNIST Digits Dataset
-
66Exercises for Section 10.3
Exercises for Section 10.3
-
67Section 10.4: Clustering for BBC Text Dataset
Discussion of using clustering algorithms and PCA to reduce dimension to group articles for the BBC Text dataset
-
68Section 10.4: Exercises
Exercises for Section 10.4