Netflix Prize

About this project: I took part in the Netflix Prize, to try to improve Netflix's predictions on how much someone will like a movie, based on how he/she rated other movies.

I did manage to make it onto the leaderboard briefly (Leaderboard for Oct 19, 2006 - I'm teamgreg), but fell off shortly after and never made it back on. So it goes! The strategy I used for that submission was doing a movie-based correlation while correcting for the mean rating of each user.

Due to the large volume of data, I did this project mostly in C++. Here are my source files:

Makefile - Makefile for the C++ parts
Structures.h - basic structures used throughout
util.h, util.cpp - utility functions for reading and parsing data.
Algorithms.h - contains the algorithms to write binary versions of the supplied ratings (for speed), calculate movie and user average scores and similarities between movies.
ParseTest.h, ParseTest.cpp - calculates user or movie similarities for use in predicting ratings
PredictRatings.cpp - uses an algorithm to read in ratings, etc. and output predicted ratings.
averageRatings.py - averages the predictions in a movie and user file. This didn't work terribly well.
rateprobe.py - calculates the RMSE of the probe dataset.