210709 - Introduction to Machine Learning (Spring 2021)

Basic Information

Lectures

All course materials will be shared via this page.

Index Date Topic Lecture Notes Homework
00 Mar 12, 2021 Introduction Lec00.pdf, 强化学习和表示学习简介.pdf
01 Mar 19, 2021 Linear Regression Lec01.pdf, Lec01slides.pdf HW01.pdf
02 Mar 26, 2021 Elements of Convex Programming I Lec02.pdf
03 Apr 02, 2021 Elements of Convex Programming II Lec03.pdf HW02.pdf
04 Apr 09, 2021 Elements of Convex Programming III Lec04.pdf HW03.pdf
05 Apr 16, 2021 Decision Tree + Naive Bayes Classifier Lec05_1.pdf, Lec05_2.pdf
06 Apr 23, 2021 Logistic Regression + SGD Lec06.pdf HW04.pdf
07 Apr 30, 2021 SVM + Lagrangian Duality I Lec07.pdf
08 May 07, 2021 SVM + Lagrangian Duality II Lec08.pdf HW05.pdf
09 May 14, 2021 SVM + Lagrangian Duality III Lec09.pdf
10 May 21, 2021 Neural Networks Lec10.pdf, Lec11.pdf
11 May 28, 2021 Principal Component Analysis Lec12.pdf HW06.pdf
12 Jun 04, 2021 Elementary Reinforcement Learning Lec13.pdf, Lec14.pdf HW07.pdf
13 Jun 11, 2020 Discussion on Homeworks
14 Jun 18, 2020 Discussion on Homeworks

Project

Description

In this project, you are expected to implement a classifier to predict the stars (from 1 to 5) based on the Amazon customers comments.

view

Dataset

You can download the training data from here (Updated April 18, 2021). The data contains 107 csv files with roughly 100000 samples. Each csv file corresponds to a commodity. The following table shows one example comments.

Column name Example
CommentsTitle A good purchase
CommentsStars (label) 5
CommentsAuthor San Zhang
CommentsDate Reviewed in China January 19, 2020
CommentsContent (main feature) I love this bag so much! I will go traveling and I am taking this bag as my carry-on because of the durability, versatility and the ability to carry so much!
PurchasemModel_Size (other attributes of a commodity) Color: Black Red

Notice: You may want to clean the data before building a model.

Requirements

  • Do NOT use any autograd tools or any optimization tools from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both the forward and backward processes. You can use the libraries in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
  • You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
  • We define a default base class called PB18000000 in PB18000000.py. You are supposed to implement your algorithm in [your student ID] directory. We provide an example code here. For detailed requirements, please refer to the comments in our code.
  • You are supposed to send a package named [your student ID].zip to ml_homework@163.com, which contains the [your student ID] directory organized as follows. For a teamwork, please use the team leader’s students ID in the package name and submit the package by your team leader.
    [your student ID]
      |- [your student ID].py 
      |- [your student ID]-report.pdf (your report)
      |- ... (your code and model)
    
  • Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
  • Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.

Grading

  • The full points = min(Base score (20pts) + Bonus (5pts), 20pts).
  • The base score is determined by the Macro F1-score evaluated by TAs code. The test data consists of two parts. The first part is randomly sampled from the original data. The second part consists of comments of other commodities.
  • The bonus also consists of two parts. The first part is determined by the novelty of your approach, which should be highlighted in your report. The second part is related to the readability of your code and report. Please make them easy to read.

System Requirements

  • We will evaluate your model on a GeForce RTX 2080ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.

Hint

  • The following papers may be helpful to you.

    [1] Yoon Kim. 2014. Convolutional neural networksfor sentence classification. In Proceedings of EMNLP 2014.

    [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.

Due Day

  • Team leaders should inform the TAs about your team members before 23:59 PM, April 16, 2021.
  • Please submit your report, code and trained model before 23:59 PM, July 02, 2021.
  • No late submissions will be accepted.

Page view (from Jan 1, 2021):

Downloads (from Jan 1, 2021):8987