Basic Information
- Instructor: Jie Wang
- Email: jiewangx@ustc.edu.cn
- Time and Location: F 3:55 PM - 6:20 PM (3C203)
- TAs:
Lectures
All course materials will be shared via this page.
Index | Date | Topic | Lecture Notes | Homework |
---|---|---|---|---|
00 | Mar 12, 2021 | Introduction | Lec00.pdf, 强化学习和表示学习简介.pdf | |
01 | Mar 19, 2021 | Linear Regression | Lec01.pdf, Lec01slides.pdf | HW01.pdf |
02 | Mar 26, 2021 | Elements of Convex Programming I | Lec02.pdf | |
03 | Apr 02, 2021 | Elements of Convex Programming II | Lec03.pdf | HW02.pdf |
04 | Apr 09, 2021 | Elements of Convex Programming III | Lec04.pdf | HW03.pdf |
05 | Apr 16, 2021 | Decision Tree + Naive Bayes Classifier | Lec05_1.pdf, Lec05_2.pdf | |
06 | Apr 23, 2021 | Logistic Regression + SGD | Lec06.pdf | HW04.pdf |
07 | Apr 30, 2021 | SVM + Lagrangian Duality I | Lec07.pdf | |
08 | May 07, 2021 | SVM + Lagrangian Duality II | Lec08.pdf | HW05.pdf |
09 | May 14, 2021 | SVM + Lagrangian Duality III | Lec09.pdf | |
10 | May 21, 2021 | Neural Networks | Lec10.pdf, Lec11.pdf | |
11 | May 28, 2021 | Principal Component Analysis | Lec12.pdf | HW06.pdf |
12 | Jun 04, 2021 | Elementary Reinforcement Learning | Lec13.pdf, Lec14.pdf | HW07.pdf |
13 | Jun 11, 2020 | Discussion on Homeworks | ||
14 | Jun 18, 2020 | Discussion on Homeworks |
Project
Description
In this project, you are expected to implement a classifier to predict the stars (from 1 to 5) based on the Amazon customers comments.
Dataset
You can download the training data from here (Updated April 18, 2021). The data contains 107 csv files with roughly 100000 samples. Each csv file corresponds to a commodity. The following table shows one example comments.
Column name | Example |
---|---|
CommentsTitle | A good purchase |
CommentsStars (label) | 5 |
CommentsAuthor | San Zhang |
CommentsDate | Reviewed in China January 19, 2020 |
CommentsContent (main feature) | I love this bag so much! I will go traveling and I am taking this bag as my carry-on because of the durability, versatility and the ability to carry so much! |
PurchasemModel_Size (other attributes of a commodity) | Color: Black Red |
Notice: You may want to clean the data before building a model.
Requirements
- Do NOT use any autograd tools or any optimization tools from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both the forward and backward processes. You can use the libraries in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
- You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
- We define a default base class called
PB18000000
inPB18000000.py
. You are supposed to implement your algorithm in[your student ID]
directory. We provide an example code here. For detailed requirements, please refer to the comments in our code. - You are supposed to send a package named
[your student ID].zip
to ml_homework@163.com, which contains the[your student ID]
directory organized as follows. For a teamwork, please use the team leader’s students ID in the package name and submit the package by your team leader.[your student ID] |- [your student ID].py |- [your student ID]-report.pdf (your report) |- ... (your code and model)
- Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
- Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.
Grading
- The full points = min(Base score (20pts) + Bonus (5pts), 20pts).
- The base score is determined by the Macro F1-score evaluated by TAs code. The test data consists of two parts. The first part is randomly sampled from the original data. The second part consists of comments of other commodities.
- The bonus also consists of two parts. The first part is determined by the novelty of your approach, which should be highlighted in your report. The second part is related to the readability of your code and report. Please make them easy to read.
System Requirements
- We will evaluate your model on a GeForce RTX 2080ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.
Hint
-
The following papers may be helpful to you.
[1] Yoon Kim. 2014. Convolutional neural networksfor sentence classification. In Proceedings of EMNLP 2014.
[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL 2019.
Due Day
- Team leaders should inform the TAs about your team members before 23:59 PM, April 16, 2021.
- Please submit your report, code and trained model before 23:59 PM, July 02, 2021.
- No late submissions will be accepted.
Page view (from Jan 1, 2021):
Downloads (from Jan 1, 2021):8987