Basic Information
- Instructor: Jie Wang
- Email: jiewangx@ustc.edu.cn
- Time and Location: Tues., Thur. 14:00 PM - 15:35 PM (3C304)
- TAs:
- Xize Liang (xizeliang@miralab.ai)
- Zijie Geng (zijiegeng@miralab.ai)
- Haoyang Liu (haoyangliu@miralab.ai)
Lectures
All course materials will be shared via this page.
Index | Date | Topic | Lecture Notes | Homework |
---|---|---|---|---|
00 | Aug 30, 2022 | Introduction | Lec00-Introduction.pdf | |
01 | Sept 01, 2022 | Basics of Analysis | LecA1-AnalysisBasics.pdf | |
02 | Sept 06, 2022 | Linear Regression I | Lec01-LinearRegression.pdf, Lec01-LinearRegression_slides.pdf | |
03 | Sept 08, 2022 | Linear Regression II | ||
04 | Sept 13, 2022 | Bias-Variance Decomposition I | Lec02-BiasVarianceDecomposition.pdf | |
05 | Sept 15, 2022 | Basics of Linear Algebra | LecA2-LinearAlgebraBasics.pdf | HW01.pdf |
06 | Sept 20, 2022 | Bias variance decomposition II | ||
07 | Sept 22, 2022 | Elementary Probability Theory | LecA3-ElementaryProbabilityTheory.pdf | |
08 | Sept 27, 2022 | Bayesian Linear Regression | Lec03-BayesianLinearRegression.pdf | |
09 | Sept 29, 2022 | Convex Sets I | Lec04-ConvexSets.pdf | |
10 | Oct 11, 2022 | Convex Sets II | HW02.pdf | |
11 | Oct 13, 2022 | Separation Theorems I | Lec05-SeparationTheorems.pdf | |
12 | Oct 18, 2022 | Separation Theorems II | ||
13 | Oct 20, 2022 | Convex Functions I | Lec06-ConvexFunctions.pdf | |
14 | Oct 25, 2022 | Convex Functions II | HW03.pdf | |
15 | Oct 27, 2022 | Subdifferential I | Lec07-Subdifferential.pdf | |
16 | Nov 1, 2022 | Subdifferential II | ||
17 | Nov 3, 2022 | Convex Optimization Problems | Lec08-ConvexOptimizationProblems.pdf | HW04.pdf |
18 | Nov 8, 2022 | Decision Tree | Lec09-DecisionTree.pdf | |
19 | Nov 10, 2022 | Naive Bayes Classifier | Lec10-NaiveBayesClassifier.pdf | |
20 | Nov 15, 2022 | Logistic Regression I | Lec11-LogisticRegression.pdf | |
21 | Nov 17, 2022 | Mid-term Exam | ||
22 | Nov 22, 2022 | Logistic Regression II | ||
23 | Nov 24, 2022 | SVM I | Lec12-SVM1.pdf | HW05.pdf |
24 | Nov 29, 2022 | SVM I | ||
25 | Dec 1, 2022 | SVM II | Lec13-SVM2.pdf | |
26 | Dec 6, 2022 | Neural Networks | Lec14-NeuralNetworks.pdf | |
27 | Dec 8, 2022 | Convolutional Neural Networks | Lec15-ConvolutionalNeuralNetworks.pdf | HW06.pdf |
28 | Dec 13, 2022 | Principal Component Analysis I | Lec16-PrincipalComponentAnalysis.pdf | |
29 | Dec 15, 2022 | Principal Component Analysis II | ||
30 | Dec 20, 2022 | Reinforcement Learning I | Lec17-RL_DeterministicEnvironment.pdf | |
31 | Dec 22, 2022 | Reinforcement Learning II | Lec18-RL_StochasticEnvironment.pdf | HW07.pdf |
Project
Description
- Recommending clothes of suitable sizes to customers based on the information of clothes and users are very important for E-commerce platforms. In this project, you are expected to implement a classifier to predict customers’ fit feedback (“Large”, “True to Size”, or “Small”) based on a dataset collected from RentTheRunWay.
Dataset
-
You can download the training data (in the form of json data) from here (Updated November 24, 2022). The dataset contains 87766 samples. Each sample is a dictionary corresponding to a rental or purchase record as shown in the following example.
Key Value fit True to Size item_name Lilly Pulitzer
Lettie Eyelet Topsize XS price $158 user_name Alexandra rented_for Everyday usually_wear 4 age 58 height 5’ 2'' weight 116LBS body_type PEAR bust_size 32B review_summary Unexpected wow review Thought this would be a whatever shirt. Really cute!!
Color popped and fit well. Definitely rent.rating 5 -
Notice: For some training samples, some features and even labels are missing. Before building a model, you may want to clean the data or think about how to use the data whose features or labels are missing.
-
Your model will be tested on a dataset (in the form of json, too) of 9751 samples. In the testing dataset, labels “Small”, “True to Size”, and “Large” are set to 1, 2, and 3, respectively. Each testing sample is a dictionary as follows.
Key Value fit 1 item_name Amanda Uprichard
Tweed Aldridge Blazersize S price $319 rented_for Work usually_wear 4 age 40 height 5’ 3'' weight 130LBS body_type STRAIGHT & NARROW bust_size 34B
Requirements
-
Do NOT use any autograd tool or any optimization tool from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both forward and backward passes. You can use the packages in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
-
You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
-
We define a default base class called
PB20000000
inPB20000000.py
. You are supposed to implement your algorithm in[your student ID]
directory. We provide an example code here. For detailed requirements, please refer to the comments in our code. -
You are supposed to send a package named
[your student ID].zip
, which contains the[your student ID]
directory organized as follows to ml2022fall_ustc@163.com.[your student ID] |- [your student ID].py |- [your student ID]-report.pdf (your report) |- ... (your code and model)
For a teamwork, please use the team leader’s student ID in the package name and submit the package by your team leader.
-
Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
-
Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.
Grading
- The full points = min(Base score (up to 20pts) + Bonus (up to 5pts), 20pts).
- The base score is determined by the Macro F1-score evaluated by TAs’ code.
- The bonus involves three aspects as follows.
- Your insights on the data and task.
- The novelty of your approach, which should be highlighted in your report.
- The readability of your code and report. Please make them easy to follow.
System Requirements
- We will evaluate your model on a GeForce RTX 2080Ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.
Hint
-
These papers may be helpful to you.
[1] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP 2014.
[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.
Due Day
-
Team leaders should inform the TAs about your team members before 23:59 PM, December 2, 2022.
-
Please submit your report, code and trained model before 23:59 PM, January 11, 2023.
-
No late submissions will be accepted.
Page view (from Jan 1, 2021):
Downloads (from Jan 1, 2021):112