Netflix Prize
About this project: I took part in the
Netflix Prize, to try to improve
Netflix's predictions on how much someone
will like a movie, based on how he/she rated other movies.
I did manage to make it onto the leaderboard briefly (Leaderboard for Oct 19, 2006 - I'm teamgreg), but fell off
shortly after and never made it back on. So it goes! The strategy I used for
that submission was doing a movie-based correlation while correcting for the
mean rating of each user.
Due to the large volume of data, I did this project mostly in C++. Here are my source files:
- Makefile - Makefile for the C++ parts
- Structures.h - basic structures used throughout
- util.h, util.cpp - utility functions for reading and parsing data.
- Algorithms.h - contains the algorithms to
write binary versions of the supplied ratings (for speed), calculate movie and
user average scores and similarities between movies.
- ParseTest.h, ParseTest.cpp - calculates user or movie similarities for use in predicting ratings
- PredictRatings.cpp - uses an algorithm to read in ratings, etc. and output predicted ratings.
- averageRatings.py - averages the predictions in a movie and user file. This didn't work terribly well.
- rateprobe.py - calculates the RMSE of the
probe dataset.