Basic Information
-
Instructor: Jie Wang
-
Email: jiewangx@ustc.edu.cn
-
Time and Location: Tues., Thur. 14:00 PM - 15:35 PM (GT-B111)
-
TAs:
- Haoyang Liu (haoyangliu@miralab.ai)
- Qitan Lv (qitanlv@miralab.ai)
- YinQi Bai (yinqibai@miralab.ai)
- Haotong Huang (haotonghuang@miralab.ai)
Lectures
All course materials will be shared via this page.
Index | Date | Topic | Lecture Notes | Homework |
---|---|---|---|---|
00 | Sept 05, 2023 | Introduction | Lec00-Introduction.pdf | |
01 | Sept 12, 2023 | Review of Mathematics I | Lec01-MathematicalReview.pdf | HW01.pdf |
02 | Sept 14, 2023 | Review of Mathematics II | ||
03 | Sept 19, 2023 | Linear Regression I | Lec02-LinearRegression.pdf | |
04 | Sept 21, 2023 | Linear Regression II | ||
05 | Sept 26, 2023 | Bias-Variance Decomposition | Lec03-BiasVarianceDecomposition.pdf | HW02.pdf |
06 | Sept 28, 2023 | Convex Sets I | Lec04-ConvexSets.pdf | |
07 | Oct 03, 2023 | Convex Sets II | ||
08 | Oct 12, 2023 | Separation Theorems I | Lec05-SeparationTheorems.pdf | |
09 | Oct 17, 2023 | Separation Theorems II | ||
10 | Oct 19, 2023 | Convex Functions I | Lec06-ConvexFunctions.pdf | HW03.pdf |
11 | Oct 24, 2023 | Convex Functions II | ||
12 | Oct 26, 2023 | Subdifferential I | Lec07-Subdifferential.pdf | HW04.pdf |
13 | Oct 31, 2023 | Subdifferential II | ||
14 | Nov 02, 2023 | Convex Optimization Problems | Lec08-ConvexOptimizationProblems.pdf | |
15 | Nov 09, 2023 | Mid-term Exam | ||
16 | Nov 14, 2023 | Decision Tree | Lec09-DecisionTree.pdf | HW05.pdf |
17 | Nov 14, 2023 | Naive Bayes Classifier | Lec10-NaiveBayesClassifier.pdf | |
18 | Nov 16, 2023 | Logistic Regression I | Lec11-LogisticRegression.pdf | |
19 | Nov 21, 2023 | Logistic Regression II | ||
20 | Nov 23, 2023 | SVM I | Lec12-SVM1.pdf | |
21 | Nov 28, 2023 | SVM I | ||
22 | Nov 30, 2023 | SVM II | Lec13-SVM2.pdf | |
23 | Dec 01, 2023 | Neural Networks | Lec14-NeuralNetworks.pdf | |
24 | Dec 05, 2023 | Convolutional Neural Network | Lec15-ConvolutionalNeuralNetwork.pdf | HW06.pdf |
25 | Dec 07, 2023 | Principal Component Analysis I | Lec16-PrincipalComponentAnalysis.pdf | |
26 | Dec 12, 2023 | Principal Component Analysis II | ||
27 | Dec 14, 2023 | Reinforcement Learning I | Lec17-RL_DeterministicEnvironment.pdf | |
28 | Dec 19, 2023 | Reinforcement Learning II | Lec18-RL_StochasticEnvironment.pdf | HW07.pdf |
Project
Description
-
Weather prediction plays a crucial role in various aspects of daily life and planning. In this machine learning project, you are tasked with the challenge of implementing a classification algorithm that categorizes the day as either “Rainy” or “Not Rainy”.
Dataset
- You can download the training data from here. The dataset contains 40774 weather records, each record has 25 attributes. The meaning of each attribute is shown in the following table.
Attribute | Meaning |
---|---|
Time Stamp | The time stamp at this record (we relabel the years from 0001-0018) |
T | Atmospheric temperature at 2 meters above the ground |
Po | Atmospheric pressure at meteorological station level |
P | Atmospheric pressure at mean sea level |
Pa | Atmospheric pressure change over the past 3 hours |
U | Relative humidity at 2 meters above the ground |
DD | Wind direction at 10 to 12 meters above ground in the last 10 minutes |
Ff | Average wind speed at 10 to 12 meters above ground in the last 10 minutes |
ff10 | Maximum gusts at 10 to 12 meters above ground in the last 10 minutes |
ff3 | Maximum gusts at 10 to 12 meters above ground between two observations |
N | Total cloud amount |
WW | Current weather condition reported by the weather station |
W | Past weather between observations |
Tn | Lowest temperature in the past 12 hours |
Tx | Highest temperature in the past 12 hours |
Cl | Stratocumulus, stratus, and nimbostratus clouds |
Nh | Amount of cloud layer C1 observed |
H | Height of the base of the lowest cloud layer |
Cm | Altostratus, altocumulus, and nimbostratus clouds |
Ch | Cirrus, cirrocumulus, and cirrostratus clouds |
VV | Horizontal visibility |
Td | Dew point temperature |
tR | Time to reach a specified amount of rainfall |
RRR | Amount of rainfall |
- Notice:
- For some training samples, some features and even labels are missing. Before building a model, you may want to clean the data or think about how to use the data whose features or labels are missing.
- Each record in the dataset represents weather conditions at a specific time during a day. Your objective is to predict if it rained at any time during the entire day. A day should be classified as ‘rainy’ if there is rainfall in any of the time segments. For example, if one segment shows no rain but others do, the day counts as having rain.
- Your model will be tested on the testing dataset. In the testing dataset, labels “Rainy”, and “Not Rainy” are set to 0 and 1, respectively. The attributes of testing sample are the same as training sample.
Requirements
-
Do NOT use any autograd tool or any optimization tool from machine learning packages. You are supposed to implement your algorithm from scratch. For example, if you want to use a neural network, you are expected to implement both forward and backward passes. You can use the packages in the WhiteList. TAs will update the Whitelist if your requirements are reasonable.
-
You can work as a team with no more than three members in total. Please list the percentage of each member’s contribution in your report, e.g., {San Zhang: 30%, Si Li: 35%, Wu Wang: 35%}.
-
We define a default base class called
PB21000000
inPB21000000.py
. You are supposed to implement your algorithm in[your student ID]
directory. We provide an example here. For detailed requirements, please refer to the comments in our code. -
You are supposed to send a package named
[your student ID].zip
, which contains the[your student ID]
directory organized as follows to ml2023fall_ustc@163.com.[your student ID] ├── main.py ├── [your student ID]-report.pdf └── ... (your code and model)
For a teamwork, please use the team leader’s student ID in the package name and submit the package by your team leader.
-
Remember to save the trained model. You are supposed to send your trained model to the aforementioned e-mail address.
-
Please submit a detailed report. The report should include all the details of your projects, e.g., the implementations, the experimental settings and the analysis of your results.
For TA’s test
-
The run command that we use to run your submitted python script is : python main.py –dataset=/home/hyliu/ML_Project/testing_dataset.xls. The required output is the f1_score of your model.
-
In the testing phase, your sunmitted python scripy is regarded as a black-box process that should satisify the above run command and output requirments. Notably, you should load your model path in the submitted python file in advance with name ’/home/hyliu/ML_Project/your_model_path’.
Grading
- The full points = min(Base score (up to 20pts) + Bonus (up to 5pts), 20pts).
- The base score is determined by the Macro F1-score, precision and recall evaluated by TAs’ code.
- The bonus involves three aspects as follows.
- Your insights on the data and task.
- The novelty of your approach, which should be highlighted in your report.
- The readability of your code and report. Please make them easy to follow.
System Requirements
- We will evaluate your model on a GeForce RTX 3090Ti (about 24G memory) under Ubuntu 18.04 system. Please limit the size of your model to avoid OOM.
Due Day
- Team leaders should inform the TAs about your team members before 23:59 PM, November 28, 2023.
- Please submit your report, code and trained model before 23:59 PM, January 19, 2024.
- No late submissions will be accepted.
Page view (from Sep 3, 2023):