210709 - Introduction to Machine Learning (Spring 2021)

Basic Information

Instructor: Jie Wang
Email: jiewangx@ustc.edu.cn
Time and Location: F 3:55 PM - 6:20 PM (3C203)
TAs:

Lectures

All course materials will be shared via this page.

Index	Date	Topic	Lecture Notes	Homework
00	Mar 12, 2021	Introduction	Lec00.pdf, 强化学习和表示学习简介.pdf
01	Mar 19, 2021	Linear Regression	Lec01.pdf, Lec01slides.pdf	HW01.pdf
02	Mar 26, 2021	Elements of Convex Programming I	Lec02.pdf
03	Apr 02, 2021	Elements of Convex Programming II	Lec03.pdf	HW02.pdf
04	Apr 09, 2021	Elements of Convex Programming III	Lec04.pdf	HW03.pdf
05	Apr 16, 2021	Decision Tree + Naive Bayes Classifier	Lec05_1.pdf, Lec05_2.pdf
06	Apr 23, 2021	Logistic Regression + SGD	Lec06.pdf	HW04.pdf
07	Apr 30, 2021	SVM + Lagrangian Duality I	Lec07.pdf
08	May 07, 2021	SVM + Lagrangian Duality II	Lec08.pdf	HW05.pdf
09	May 14, 2021	SVM + Lagrangian Duality III	Lec09.pdf
10	May 21, 2021	Neural Networks	Lec10.pdf, Lec11.pdf
11	May 28, 2021	Principal Component Analysis	Lec12.pdf	HW06.pdf
12	Jun 04, 2021	Elementary Reinforcement Learning	Lec13.pdf, Lec14.pdf	HW07.pdf
13	Jun 11, 2020	Discussion on Homeworks
14	Jun 18, 2020	Discussion on Homeworks

Project

Description

In this project, you are expected to implement a classifier to predict the stars (from 1 to 5) based on the Amazon customers comments.

view

Dataset

You can download the training data from here (Updated April 18, 2021). The data contains 107 csv files with roughly 100000 samples. Each csv file corresponds to a commodity. The following table shows one example comments.

Column name	Example
CommentsTitle	A good purchase
CommentsStars (label)	5
CommentsAuthor	San Zhang
CommentsDate	Reviewed in China January 19, 2020
CommentsContent (main feature)	I love this bag so much! I will go traveling and I am taking this bag as my carry-on because of the durability, versatility and the ability to carry so much!
PurchasemModel_Size (other attributes of a commodity)	Color: Black Red

Notice: You may want to clean the data before building a model.

Requirements

Do NOT use any autograd tools or any optimization tools from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both the forward and backward processes. You can use the libraries in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
We define a default base class called PB18000000 in PB18000000.py. You are supposed to implement your algorithm in [your student ID] directory. We provide an example code here. For detailed requirements, please refer to the comments in our code.
You are supposed to send a package named [your student ID].zip to ml_homework@163.com, which contains the [your student ID] directory organized as follows. For a teamwork, please use the team leader’s students ID in the package name and submit the package by your team leader.
```
[your student ID]
  |- [your student ID].py 
  |- [your student ID]-report.pdf (your report)
  |- ... (your code and model)
```
Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.

Grading

The full points = min(Base score (20pts) + Bonus (5pts), 20pts).
The base score is determined by the Macro F1-score evaluated by TAs code. The test data consists of two parts. The first part is randomly sampled from the original data. The second part consists of comments of other commodities.
The bonus also consists of two parts. The first part is determined by the novelty of your approach, which should be highlighted in your report. The second part is related to the readability of your code and report. Please make them easy to read.

System Requirements

We will evaluate your model on a GeForce RTX 2080ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.

Hint

The following papers may be helpful to you.

[1] Yoon Kim. 2014. Convolutional neural networksfor sentence classification. In Proceedings of EMNLP 2014.

[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.

Due Day

Team leaders should inform the TAs about your team members before 23:59 PM, April 16, 2021.
Please submit your report, code and trained model before 23:59 PM, July 02, 2021.
No late submissions will be accepted.

Page view (from Jan 1, 2021)：

Downloads (from Jan 1, 2021)：8987