In the final project, we hope everyone can think of yourself as a real-world data scientist. Your goal is to come up with some interesting questions, find the right datasets, and implement a data-science pipeline to answer those questions. In order to achieve this, please follow the following steps:
The following table summarizes the TODO list of the final project.
ID | |||
---|---|---|---|
1 | Proposal | Thu 02/16 at 11:59 PM | Submit the filled form to CourSys |
2 | Milestone | Thursday 03/9 at 9:30 AM | Submit your poster and presentation video to CourSys |
3 | Poster Presentation | Tuesday 04/11 at 8:00 AM Tuesday 04/11 at 10:45 AM |
Submit your poster to CourSys Present your project in ASB 10900 |
4 | Final Report | Wednesday 04/12 at 11:59 PM | Submit your video and report to CourSys |
To evaluate whether your project is good or not, please ask yourself the following three questions:
A good project should be important, be challenging, and be able to push you to learn something that you don't know before.
Note that you need to conduct a deep analysis of the data. By deep analysis, I mean you have to think deeply about your analysis results, and report some insightful and reliable findings.
Below is a list of project ideas from previous years. I do not recommend selecting from the list since they have been done by former students.
Machine learning based surveillance system using transfer learning for rare diseases (2021)
Explore the Impact of Weather on Short-time Demand Forecast for Fashion Retailers (2020)
Measuring Observable Influence and Impact of Scientific Research Beyond Academia (2019)
Automated Feature Detection of Aerial Imagery from South Pacific (2018)
Machine learning to detect misstated financial statements (2018)
Communication skills are super important for data scientists. Please use this opportunity to practice your communication skills.
You can think of this presentation as a mid-term report for your project. Your presentation should consist of three parts:
Motivation (2)
Progress Report (2)
Future work (2)
Imagine your manager (who knows little about the technical part of data science) is sitting in the audience, you need to explain your complex project to your manager in a simple way, and make her/him feel excited about it.
Did you convey complex information in a simple way? (2)
Did you excite and motivate the audience? (2)
Search "how to give a good talk" on Google. You will find a lot of good advice. Use them to improve your presentation.
Submission
This is showtime! Make a poster to present your data product. Please make your poster look as professional as possible. Here are a few things that you can put on the poster (10 points):
Why do you do this project?
What questions do you try to answer?
What's your methodology to get the answers?
What datasets/tools do you use?
What's your data-science pipeline like?
Why is your solution good? Why does your result make sense?
What's your data product?
What have you learned through the project?
What do you plan to do if you have more time?
Design tips:
During the poster session (10 points), you will be given 5 mins to present your poster, and TAs and instructors will ask a few questions after your presentation.
Submission
The poster session is scheduled on Tuesday April 11th, 2023. Please upload your poster to CourSys before 8:00 AM.
Code (10 points)
Like CMPT 732, you must use a Git repository for your project. The department's GitLab server is a good way to get one (instructions at that link). Group members must commit their own contributions to the repo. You are encouraged to publicize and open-source your work on GitHub or similar.
In your repository, please include a file README.txt (or README.md if you prefer) indicating how we can actually test your project as well as other notes about things we should look for. If you created some kind of web frontend, please include a URL in the README.md as well.
Report (10 points)
You need to submit a report giving an overview of your project. The report should have at least 2500 words with the following structure:
Project Title:
Come up with an attractive project title (see this page for some tips);Motivation and Background:
Who cares about this project? Any related work?Problem Statement:
What questions do you want to answer? Why are they challenging?Data Science Pipeline:
What's your data-science pipeline like? Describe each component in detail.Methodology:
What tools or analysis methods did you use? Why did you choose them? How did you apply them to tackling each problem?Evaluation:
Why is your solution good? Why does your result make sense?Data Product:
What's your data product? Please demonstrate how it works.Lessons Learnt:
What did you learn from this project?Summary:
A high-level summary of your project. It should be self-contained and cover all the important aspects of your project.Please choose A or B:
Video (10 points)
Please make an attractive video to introduce your project. Here are some requirements:
You can get some inspirations from KDD 2017 Promotional Videos, KDD 2018 Promotional Videos, 2018 Project Showcase, and 2019 Project Showcase.
Submission
We will create a web page on the course website and put your projects there. On the page, we will put a project title, a project summary, and the three URLs that link to your codebase, video, and final report. Please submit your project title, project summary, final report (Medium URL or PDF), code (Github/GitLab URL), and video (Youtube URL) to CourSys.