Note that there separate sets of assignments for CS 451/651 and CS 431/631. Make sure you work on the correct asssignments!

CS 431/631 Assignments

Assignment 0: Warmup due 2:30pm January 22

This assignment is a warmup exercise to get you familar with some of the basic tools you will need for the remaining assignments. In particular, we will be making use of Python (for programming) and Jupyter notebooks.

The general setup is as follows: for each assignment, you will be provided with a "starter" notebook, which will describe what needs to be done for that assignment. You'll complete the assignment in the notebook, and then submit your notebook to a private GitHub repo. Shortly after the assignment deadline, we'll pull your repo for marking.

I'm assuming you already have a GitHub account. If not, create one as soon as possible. Once you've signed up for an account, go and request an educational account. This will allow you to create private repos for free. Please do this as soon as possible since there may be delays in the request verification process.

Create a private repo called bigdata2019w. I'm assuming that you're already familiar GitHub, but just in case, here is how you create a repo on GitHub. If you've successfully gotten an educational account (per above), you should be able to create private repos for free.

Python and Jupyter

All of the programming required for the assignments will be in Python. If you have never programmed in Python before, you will need to gradually bring yourself up to speed. There are many on-line resources that can help with this. A good place to start is In particular, you can start with their Python Tutorial. If you don't like that particular tutorial, there are many others to choose from. There are also many Python books to choose from, if you prefer to learn that way. Choose a book that fits your needs. For example, some books target people who are migrating to Python from other languages, while others are directed at novice programmers.

Most Python tutorials expect you to try out examples as you go along, i.e., they expect you to write and run Python code. This kind of active learing is definitely the way to go. The simplest way for you to run Python code is by using a Jupyter notebook running on the CS Jupyter hub (see below). This will allow you to run Python in a web browser, without having to install any software on your machine. If you wish, you can also install Python locally on your own machine. Python is freely available for a variety of platforms. Bear in mind that all assignments for CS431/631 will be done using notebooks, so it is not a bad idea to get used to them.

Jupyter Notebooks

For this course, you will be writing and running Python code in Jupyter notebooks. Each notebook consists of a sequence of cells. An cell can hold (formatted) text, Python code, or graphics. A great thing about notebooks is that you can open and run them in a web browser. This means that you can work on your own machine, using only a web browser, without having to install any additional software.

A Jupyter "hub" is a place to store and use Jupyter notebooks. For the CS431/631 assignments, you'll be using a hub operated by the School of Computer Science. To get started, go to Log in using your userid and password for the CS student computing environment (not your WatIAM password). Once you have logged in, you should see a list of folders and files - this is the contents of your home directory (folder) in the CS student computing environment.

It is a good idea to create a new folder to hold all of your work for this course, if you do not already have one. To do this, use the New dropdown on the top right to create a new folder, and call the folder cs431 (or whatever name you prefer). Then, open your new folder by clicking on it.

Once you are in your cs431 folder, try creating a new Jupyter notebook. To create a notebook, use the New dropdown to create a new Python 3 notebook. You should see something that looks like this:

A New Jupyter Notebook
This represents a notebook with a single, empty cell. Before going any further with your notebook, try out the following three basic things that you will need to be able to do:
  1. First, change the name of your notebook by clicking on the current name ("Untitled"), and entering a new name, say, Test Notebook.
  2. Next, save your notebook using Save and Checkpoint from the File menu. Saving a notebook saves its current state, so that you can stop working at any time, and resume later from where you left off.
  3. Finally, stop your notebook by selecting Close and Halt from the File menu. This should take you back to your list of files and folders. You should see a new file called Test Notebook.ipynb, which is your saved notebook. By clicking on that notebook file, you can start your notebook running again from the point at which you last saved (try it!).
Once you've tried out these basics, start your test notebook and spend some time familiarizing yourself with the notebook interface. Take the User Interface Tour, which you can launch from the Help menu of a running notebook.

Assignment Workflow

The basic workflow for each assignment will be something like this:

  1. Download the starter notebook for the assignment, as well as any other required files, from the assignment web page to your computer.
  2. Use a web browser to log in to the CS Jupyter hub at
  3. Upload the starter notebook for the assignment, as well as any other required files, from your computer to the CS hub, into your cs431 folder.
  4. Launch the starter notebook that you just uploaded, and follow the instructions in the notebook to complete the assignment. Be sure to save your work.
  5. When you are finished with the assignment, download your notebook (the .ipynb file) from your cs431 folder on the hub to your computer, and submit it to the course staff by following the submission instructions.

Assignment 0

For the first assignment, you will do some simple analyses on the text of Shakespeare's plays. For this assignment, you will need to download three files to your local machine, and then upload them to the Jupyter hub. They are: Files with names that end in .ipynb are Python notebook files. When you work in a notebook and save your work, your work is saved in the .ipynb file. You'll submit your saved A0.ipynb file to your github repository when you are done with the assignment. That will allow us to open your notebook and review your work. After you have uploaded these files to the hub, open A0.ipynb to get started on the assignment. The notebook itself describes what we expect you to do.

Submitting Assignment 0

Once you are done with the assignment, submit A0 using the following steps:

  1. Download your A0.ipynb file from the Jupyter hub to your computer.
  2. Submit your A0.ipynb file to your GitHub repository using the web interface. If you're not already familiar with GitHub, here is how you submit a new file to a repo on GitHub. Make sure your A0.ipynb file is committed to the master branch. Your assignment should be viewable in the web interface.
  3. Add the user bigdatateach a collaborator to your repo so that we can access it. Here is how you add a collaborator to your repo.
  4. Finally, you need to tell us your GitHub account so we can link it to you. Submit your information here.

Back to top