EE3001 - Machine Learning (Fall 2022)

Basic Information

Lectures

All course materials will be shared via this page.

Index Date Topic Lecture Notes Homework
00 Aug 30, 2022 Introduction Lec00-Introduction.pdf
01 Sept 01, 2022 Basics of Analysis LecA1-AnalysisBasics.pdf
02 Sept 06, 2022 Linear Regression I Lec01-LinearRegression.pdf, Lec01-LinearRegression_slides.pdf
03 Sept 08, 2022 Linear Regression II
04 Sept 13, 2022 Bias-Variance Decomposition I Lec02-BiasVarianceDecomposition.pdf
05 Sept 15, 2022 Basics of Linear Algebra LecA2-LinearAlgebraBasics.pdf HW01.pdf
06 Sept 20, 2022 Bias variance decomposition II
07 Sept 22, 2022 Elementary Probability Theory LecA3-ElementaryProbabilityTheory.pdf
08 Sept 27, 2022 Bayesian Linear Regression Lec03-BayesianLinearRegression.pdf
09 Sept 29, 2022 Convex Sets I Lec04-ConvexSets.pdf
10 Oct 11, 2022 Convex Sets II HW02.pdf
11 Oct 13, 2022 Separation Theorems I Lec05-SeparationTheorems.pdf
12 Oct 18, 2022 Separation Theorems II
13 Oct 20, 2022 Convex Functions I Lec06-ConvexFunctions.pdf
14 Oct 25, 2022 Convex Functions II HW03.pdf
15 Oct 27, 2022 Subdifferential I Lec07-Subdifferential.pdf
16 Nov 1, 2022 Subdifferential II
17 Nov 3, 2022 Convex Optimization Problems Lec08-ConvexOptimizationProblems.pdf HW04.pdf
18 Nov 8, 2022 Decision Tree Lec09-DecisionTree.pdf
19 Nov 10, 2022 Naive Bayes Classifier Lec10-NaiveBayesClassifier.pdf
20 Nov 15, 2022 Logistic Regression I Lec11-LogisticRegression.pdf
21 Nov 17, 2022 Mid-term Exam
22 Nov 22, 2022 Logistic Regression II
23 Nov 24, 2022 SVM I Lec12-SVM1.pdf HW05.pdf
24 Nov 29, 2022 SVM I
25 Dec 1, 2022 SVM II Lec13-SVM2.pdf
26 Dec 6, 2022 Neural Networks Lec14-NeuralNetworks.pdf
27 Dec 8, 2022 Convolutional Neural Networks Lec15-ConvolutionalNeuralNetworks.pdf HW06.pdf
28 Dec 13, 2022 Principal Component Analysis I Lec16-PrincipalComponentAnalysis.pdf
29 Dec 15, 2022 Principal Component Analysis II
30 Dec 20, 2022 Reinforcement Learning I Lec17-RL_DeterministicEnvironment.pdf
31 Dec 22, 2022 Reinforcement Learning II Lec18-RL_StochasticEnvironment.pdf HW07.pdf

Project

Description

  • Recommending clothes of suitable sizes to customers based on the information of clothes and users are very important for E-commerce platforms. In this project, you are expected to implement a classifier to predict customers’ fit feedback (“Large”, “True to Size”, or “Small”) based on a dataset collected from RentTheRunWay.

Dataset

  • You can download the training data (in the form of json data) from here (Updated November 24, 2022). The dataset contains 87766 samples. Each sample is a dictionary corresponding to a rental or purchase record as shown in the following example.

    Key Value
    fit True to Size
    item_name Lilly Pulitzer
    Lettie Eyelet Top
    size XS
    price $158
    user_name Alexandra
    rented_for Everyday
    usually_wear 4
    age 58
    height 5’ 2''
    weight 116LBS
    body_type PEAR
    bust_size 32B
    review_summary Unexpected wow
    review Thought this would be a whatever shirt. Really cute!!
    Color popped and fit well. Definitely rent.
    rating 5
  • Notice: For some training samples, some features and even labels are missing. Before building a model, you may want to clean the data or think about how to use the data whose features or labels are missing.

  • Your model will be tested on a dataset (in the form of json, too) of 9751 samples. In the testing dataset, labels “Small”, “True to Size”, and “Large” are set to 1, 2, and 3, respectively. Each testing sample is a dictionary as follows.

    Key Value
    fit 1
    item_name Amanda Uprichard
    Tweed Aldridge Blazer
    size S
    price $319
    rented_for Work
    usually_wear 4
    age 40
    height 5’ 3''
    weight 130LBS
    body_type STRAIGHT & NARROW
    bust_size 34B

Requirements

  • Do NOT use any autograd tool or any optimization tool from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both forward and backward passes. You can use the packages in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.

  • You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.

  • We define a default base class called PB20000000 in PB20000000.py. You are supposed to implement your algorithm in [your student ID] directory. We provide an example code here. For detailed requirements, please refer to the comments in our code.

  • You are supposed to send a package named [your student ID].zip , which contains the [your student ID] directory organized as follows to ml2022fall_ustc@163.com.

    [your student ID]
      |- [your student ID].py 
      |- [your student ID]-report.pdf (your report)
      |- ... (your code and model)
    

    For a teamwork, please use the team leader’s student ID in the package name and submit the package by your team leader.

  • Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.

  • Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.

Grading

  • The full points = min(Base score (up to 20pts) + Bonus (up to 5pts), 20pts).
  • The base score is determined by the Macro F1-score evaluated by TAs’ code.
  • The bonus involves three aspects as follows.
    • Your insights on the data and task.
    • The novelty of your approach, which should be highlighted in your report.
    • The readability of your code and report. Please make them easy to follow.

System Requirements

  • We will evaluate your model on a GeForce RTX 2080Ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.

Hint

  • These papers may be helpful to you.

    [1] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP 2014.

    [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.

Due Day

  • Team leaders should inform the TAs about your team members before 23:59 PM, December 2, 2022.

  • Please submit your report, code and trained model before 23:59 PM, January 11, 2023.

  • No late submissions will be accepted.


Page view (from Jan 1, 2021):

Downloads (from Jan 1, 2021):112