EE3001 - Machine Learning (Fall 2022)

Basic Information

Instructor: Jie Wang
Email: jiewangx@ustc.edu.cn
Time and Location: Tues., Thur. 14:00 PM - 15:35 PM (3C304)
TAs:
- Xize Liang (xizeliang@miralab.ai)
- Zijie Geng (zijiegeng@miralab.ai)
- Haoyang Liu (haoyangliu@miralab.ai)

Lectures

All course materials will be shared via this page.

Index	Date	Topic	Lecture Notes	Homework
00	Aug 30, 2022	Introduction	Lec00-Introduction.pdf
01	Sept 01, 2022	Basics of Analysis	LecA1-AnalysisBasics.pdf
02	Sept 06, 2022	Linear Regression I	Lec01-LinearRegression.pdf, Lec01-LinearRegression_slides.pdf
03	Sept 08, 2022	Linear Regression II
04	Sept 13, 2022	Bias-Variance Decomposition I	Lec02-BiasVarianceDecomposition.pdf
05	Sept 15, 2022	Basics of Linear Algebra	LecA2-LinearAlgebraBasics.pdf	HW01.pdf
06	Sept 20, 2022	Bias variance decomposition II
07	Sept 22, 2022	Elementary Probability Theory	LecA3-ElementaryProbabilityTheory.pdf
08	Sept 27, 2022	Bayesian Linear Regression	Lec03-BayesianLinearRegression.pdf
09	Sept 29, 2022	Convex Sets I	Lec04-ConvexSets.pdf
10	Oct 11, 2022	Convex Sets II		HW02.pdf
11	Oct 13, 2022	Separation Theorems I	Lec05-SeparationTheorems.pdf
12	Oct 18, 2022	Separation Theorems II
13	Oct 20, 2022	Convex Functions I	Lec06-ConvexFunctions.pdf
14	Oct 25, 2022	Convex Functions II		HW03.pdf
15	Oct 27, 2022	Subdifferential I	Lec07-Subdifferential.pdf
16	Nov 1, 2022	Subdifferential II
17	Nov 3, 2022	Convex Optimization Problems	Lec08-ConvexOptimizationProblems.pdf	HW04.pdf
18	Nov 8, 2022	Decision Tree	Lec09-DecisionTree.pdf
19	Nov 10, 2022	Naive Bayes Classifier	Lec10-NaiveBayesClassifier.pdf
20	Nov 15, 2022	Logistic Regression I	Lec11-LogisticRegression.pdf
21	Nov 17, 2022	Mid-term Exam
22	Nov 22, 2022	Logistic Regression II
23	Nov 24, 2022	SVM I	Lec12-SVM1.pdf	HW05.pdf
24	Nov 29, 2022	SVM I
25	Dec 1, 2022	SVM II	Lec13-SVM2.pdf
26	Dec 6, 2022	Neural Networks	Lec14-NeuralNetworks.pdf
27	Dec 8, 2022	Convolutional Neural Networks	Lec15-ConvolutionalNeuralNetworks.pdf	HW06.pdf
28	Dec 13, 2022	Principal Component Analysis I	Lec16-PrincipalComponentAnalysis.pdf
29	Dec 15, 2022	Principal Component Analysis II
30	Dec 20, 2022	Reinforcement Learning I	Lec17-RL_DeterministicEnvironment.pdf
31	Dec 22, 2022	Reinforcement Learning II	Lec18-RL_StochasticEnvironment.pdf	HW07.pdf

Project

Description

Recommending clothes of suitable sizes to customers based on the information of clothes and users are very important for E-commerce platforms. In this project, you are expected to implement a classifier to predict customers’ fit feedback (“Large”, “True to Size”, or “Small”) based on a dataset collected from RentTheRunWay.

Dataset

You can download the training data (in the form of json data) from here (Updated November 24, 2022). The dataset contains 87766 samples. Each sample is a dictionary corresponding to a rental or purchase record as shown in the following example.

Key	Value
fit	True to Size
item_name	Lilly Pulitzer Lettie Eyelet Top
size	XS
price	$158
user_name	Alexandra
rented_for	Everyday
usually_wear	4
age	58
height	5’ 2''
weight	116LBS
body_type	PEAR
bust_size	32B
review_summary	Unexpected wow
review	Thought this would be a whatever shirt. Really cute!! Color popped and fit well. Definitely rent.
rating	5

Notice: For some training samples, some features and even labels are missing. Before building a model, you may want to clean the data or think about how to use the data whose features or labels are missing.

Your model will be tested on a dataset (in the form of json, too) of 9751 samples. In the testing dataset, labels “Small”, “True to Size”, and “Large” are set to 1, 2, and 3, respectively. Each testing sample is a dictionary as follows.

Key	Value
fit	1
item_name	Amanda Uprichard Tweed Aldridge Blazer
size	S
price	$319
rented_for	Work
usually_wear	4
age	40
height	5’ 3''
weight	130LBS
body_type	STRAIGHT & NARROW
bust_size	34B

Requirements

Do NOT use any autograd tool or any optimization tool from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both forward and backward passes. You can use the packages in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
We define a default base class called PB20000000 in PB20000000.py. You are supposed to implement your algorithm in [your student ID] directory. We provide an example code here. For detailed requirements, please refer to the comments in our code.
You are supposed to send a package named [your student ID].zip , which contains the [your student ID] directory organized as follows to ml2022fall_ustc@163.com.
```
[your student ID]
  |- [your student ID].py 
  |- [your student ID]-report.pdf (your report)
  |- ... (your code and model)
```
For a teamwork, please use the team leader’s student ID in the package name and submit the package by your team leader.
Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.

Grading

The full points = min(Base score (up to 20pts) + Bonus (up to 5pts), 20pts).
The base score is determined by the Macro F1-score evaluated by TAs’ code.
The bonus involves three aspects as follows.
- Your insights on the data and task.
- The novelty of your approach, which should be highlighted in your report.
- The readability of your code and report. Please make them easy to follow.

System Requirements

We will evaluate your model on a GeForce RTX 2080Ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.

Hint

These papers may be helpful to you.

[1] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP 2014.

[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.

Due Day

Team leaders should inform the TAs about your team members before 23:59 PM, December 2, 2022.
Please submit your report, code and trained model before 23:59 PM, January 11, 2023.
No late submissions will be accepted.

Page view (from Jan 1, 2021)：

Downloads (from Jan 1, 2021)：112