Note that there separate sets of assignments for CS 451/651 and CS
431/631. Make sure you work on the correct assignments!
CS 431/631 Assignments
Final Project
The final project is a requirement only for graduate students taking CS 631.
The topic of the final project can be on anything you wish in the
space of big data. Anything reasonably related to topics that are
covered in the course is within scope. For reference, there are four
types of projects you might consider:
- Learn additional capabilities (e.g., visualization) of Python
and Jupyter, and use them to build an interactive notebook for visualizing
or exploring a dataset of your choosing. Your interactive
notebook should interact with Spark, so that it will be capable
of supporting exploration of data sets that are too large to fit
in the memory of a single machine.
- Implement a big data algorithm in Spark: choose a
particular big data algorithm (for processing text, graphs,
relational data, etc.) and implement it. Ideally, the implementation
does not already exist in a library or open-source package. Since we
want you to implement the algorithm from scratch, it might perhaps
be too tempting to simply copy existing
code—see notes on academic
integrity.
- Learn and explore a (new) big data processing framework:
although we discussed a variety of processing frameworks in class,
the assignments focused on Spark. Here's your chance
to learn a new processing framework, e.g., Spark Streaming, GraphX,
Giraph, Flink, etc. The project would involve learning to use the
processing framework and doing something interesting with it. The
"something interesting" might be a data mining algorithm, although
the expectations would be lower than building something in
Spark, since learning the new framework would form an
essential component of the project.
- Perform some interesting data science. Is there a particular
dataset you'd like to explore or analyze? Your project could involve
performing interesting analytics on a dataset—here, the focus
would be the analytical product and the insights gleaned, as opposed
to the raw algorithms themselves. However, a superficial analysis
with existing machine-learning libraries is not enough.
You may work in groups of up to three, or you can also work by
yourself if you wish. The amount of effort devoted to the project
should be proportional to the number of people in the team. As a
guideline, the level of effort should be comparable to
two assignments per person.
When you are ready, send me (uwaterloo-bigdata-2019w-staff@googlegroups.com)
an email describing what you'd like to work on. I will provide you
with feedback on appropriateness and scope of your proposed project.
The "soft" deadline for this
proposal is March 15, 2019. There is no
penalty if you miss this deadline, but it is in your best
interest to not leave this proposal to the last minute.
The deliverable for the final project is a report. Use
the ACM
Templates. The contents of the report will vary depending
on the type of project you are doing. However, it should certainly
describe the goal of you project (what is your learning objective,
or what problem are you trying to solve), your methodology, and some
kind of evaluation of your results or progress.
Your project proposal should explicitly describe how your
project report (see below) will be organized: indicate what sections the report
will have, and what you expect to present in each section.
There are no hard limits on the length of your final report, but you
should target something in the range of 5-10 pages.
The (hard) deadline for submission of your project report is 1pm on
April 19, 2019.
Evaluation
Your final project will be evaluated according to the following
criteria, with roughly equal weight placed on each one.
- Scope/Relevance:Is the objective clear? Is
the project course-related and substantial enough?
- Methodology:Is the methodology appropriate
and clearly described?
- Evaluation:Did you evaluate your work? Did
you achieve your objective? If not, did you explain why not?
- Presentation:Is your report well organized
and clearly written?
Your report should clearly indicate where you obtained any data that
you used in your project. Include a link to the data if possible.
Back to top