Basic Information
- Instructor: Jie Wang
- Email: jiewangx@ustc.edu.cn
- Time and Location: M, W 9:45 AM - 11:20 AM (3A311)
- TAs:
- Xize Liang (xizeliang@miralab.ai)
- Runxin Liu (runxinliu@miralab.ai)
- Zijie Geng (zijiegeng@miralab.ai)
- Shuling Yang (shulingyang@miralab.ai)
Lectures
All course materials will be shared via this page.
Index | Date | Topic | Lecture Notes | Homework |
---|---|---|---|---|
00 | Sept 6, 2021 | Introduction | Lec00-Introduction.pdf | |
01 | Sept 8, 2021 | Linear Regression I | ||
02 | Sept 13, 2021 | Linear Regression II | Lec01-LinearRegression.pdf, Lec01slides.pdf | HW01.pdf |
03 | Sept 15, 2021 | Bias-Variance Decomposition | Lec02-BiasVarianceDecomposition.pdf | |
04 | Sept 18, 2021 | Review of Linear Algebra | ||
05 | Sept 22, 2021 | Beyesian Linear Regression I | Lec03-BayesianLinearRegression.pdf | |
06 | Sept 27, 2021 | Beyesian Linear Regression II & Basics of Analysis I | ||
07 | Sept 29, 2021 | Basics of Analysis II | Lec04-BasicsofAnalysis.pdf | HW02.pdf |
08 | Oct 11, 2021 | Convex Sets I | ||
09 | Oct 13, 2021 | Convex Sets II & Separation Theorems I | Lec05-ConvexSets.pdf | |
10 | Oct 18, 2021 | Separation Theorems II | Lec06-SeparationTheorems.pdf | |
11 | Oct 20, 2021 | Separation Theorems III | ||
12 | Oct 25, 2021 | Convex Functions I | Lec07-ConvexFunctions.pdf | HW03.pdf |
13 | Oct 27, 2021 | Convex Functions II | ||
14 | Nov 1, 2021 | Subdifferentials | Lec08-Subdifferentials.pdf | |
15 | Nov 3, 2021 | Convex Optimization Problems | Lec09-ConvexOptimizationProblems.pdf | HW04.pdf |
16 | Nov 8, 2021 | DecisionTree | Lec10-DecisionTree.pdf | |
17 | Nov 10, 2021 | Naive Bayes Classifier | Lec11-NaiveBayesClassifier.pdf | |
18 | Nov 15, 2021 | Mid-term Exam | ||
19 | Nov 17, 2021 | Logistic Regression | Lec12-LogisticRegression.pdf | HW05.pdf |
20 | Nov 22, 2021 | Analysis of the Mid-term Exam | ||
21 | Nov 24, 2021 | SVM I | Lec13-SVM I.pdf | |
22 | Nov 29, 2021 | SVM II | Lec14-SVM II.pdf | |
23 | Dec 1, 2021 | Neural Networks I | Lec15-NeuralNetworks.pdf | |
24 | Dec 6, 2021 | Neural Networks II | Lec16-ConvolutionalNeuralNetwork.pdf | HW06.pdf |
25 | Dec 8, 2021 | Principal Component Analysis | Lec17-PrincipalComponentAnalysis.pdf | |
26 | Dec 13, 2021 | Reinforcement Learning I | Lec18-ElementaryRL_DeterministicEnvironment.pdf | |
27 | Dec 15, 2021 | Reinforcement Learning II | Lec19-Multi-armedBandits.pdf | |
28 | Dec 20, 2021 | Reinforcement Learning III | Lec20-ElementaryRL_StochasticEnvironment.pdf, Lec21-ValueIteration.pdf | HW07.pdf |
Project
Description
In this project, you are expected to implement a Reinforcement Learning algorithm to teach an agent to play an Atari game, Pong.
Environment
Pong has been implemented as the Reinforcement Learning environment in OpenAI Gym. You may want to install Gym by running pip install gym[atari]
. Moreover, You can get more information about this environment Pong.
Agent
We provide a code framework for reference, which contains creating a Pong Environment and evaluating the agent. You may obtain the code here. To run the code, go into the Example directory and execute python scripts/main.py
.
We define a base class called RL_alg
in src/alg/RL_alg.py
. You are supposed to implement your algorithm in src/alg/[your student ID]
directory inheriting from RL_alg
. We give an example in src/alg/PB00000000
, which takes a random action at each step.
You are supposed to finally upload src/alg/[your student ID]
directory. TAs will use the provided code to evaluate your trained agent.
Requirements
-
Do NOT use any autograd tools or any optimization tools from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both the forward and backward processes. You can use the libraries in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
-
You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
-
You are supposed to send a package named
[your student ID].zip
to ml_homework@163.com, which contains the
[your student ID]
directory organized as follows. For a teamwork, please use the team leader’s students ID in the package name and submit the package by your team leader.
[your student ID] |- [your student ID]-report.pdf (your report) |- (your code and model)
-
Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address. In the final test,you are supposed to use your trained agent to play Pong ten times and generate animations instantly.
-
Please submit a detailed report. The report should include all the details of your projects,e.g., the implementations, the experimental settings and the analysis of your results. For example, if you use deep reinforcement learning algorithms, you are supposed to show your clear technical routes, which contain image preprocessing, the structure of your neural networks and the replay memory settings. Besides,the report should include a learning curve plot showing the performance of your algorithm. The x-axis should correspond to number of time steps and the y-axis should show the mean 100-episode reward as well as the best mean reward. What’s more,game animations of trained agent are neccessary.
Grading
- The full points = min(Base score (up to 20pts) + Bonus (up to 5pts), 20pts).
- The base score is determined by the average scores of ten- times game.
- The bonus also consists of two parts. The first part is determined by the novelty of your approach, which should be highlighted in your report. The second part is related to the readability of your code and report. Please make them easy to read.
System Requirements
- We will evaluate your model on a GeForce RTX 2080ti (about 10G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.
Hint
- You can visit the website to obtain more information about Gym, which is a widely used benchmark in the reinforcement learning.
- The following papers may be helpful to you.
- [1] Mnih V , Kavukcuoglu K , Silver D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013.
- [2] Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
- [3] van Hasselt, H., Guez, A. and Silver, D. 2016. Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 30, 1 (Mar. 2016).
- [4] Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M. & Freitas, N.. (2016). Dueling Network Architectures for Deep Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1995-2003 Available from https://proceedings.mlr.press/v48/wangf16.html.
Due Day
- Team leaders should inform the TAs about your team members before 23:59 PM, November 21, 2021.
- Please submit your report, code and trained model before 23:59 PM, January 23, 2022.
- No late submissions will be accepted.
Page view (from Jan 1, 2021):
Downloads (from Jan 1, 2021):5815