Csc 8711, Databases and the Web - Project 5 & 6
Due: Sunday, May 3rdGroup Assignment. Maximum of 4 members per group.
Movie Recommendation System using Neo4J and MySQL
For this project, you will use the data from the MovieLens dataset to develop a movie recommendation system in Neo4J and MySQL and compare the performance. You will use two different criteria to generate recommendations. This project will be done in two parts.Two Parts
Part 1 (Neo4J)
You will model the dataset as a graph and use Neo4J to build the recommendation system. Neo4J will be used as an embedded database in a Java application. You will parse the input data and store it in a graph form in Neo4J. Write a separate Java program, calledCreateNeo4JDB.java
to do this (the database folder may
be created inside Project5 folder, alongside
dataset
, src
, and lib
).
You are free to choose your data model,
but make sure to explain it in the project report. You will then execute the two
algorithms described below on the created database to generate movie recommendations.
Name your program Neo4JRecommend.java
.
Part 2 (MySQL)
You will use the same data and build the recommendation engine using MySQL and Java. Provide a SQL script file, calledschema.sql
, that contains create table statements
to create the relational database and a java program, CreateMySQLDB.java
that will load the dataset into the MySQL database. You will use JDBC to connect to the database.
Name your program MySQLRecommend.java
.
Dataset
The dataset consists of movie ratings provided by a set of individuals for a set of movies. The details of each movie is given in theu.item
file. Each movie consists of
a release date, a video release date, IMDB url, and belongs to one of the 19 genres
given in the u.genre
file. The u.user
file contains demographic
information about the users like age, gender, occupation and zipcode. The movie ratings given
by the users along with the timestamp is contained in the u.data
file.
The included README file has further information.
Ranking Criteria
Suppose we need to generate movie recommendations for user U1. You will use the following algorithms to rank the movies:- Collaborative filtering: Analyze the similarities between user U1 and the
other users who had also given high ratings (rating r) to the same movies as U1.
Rank the users based on their similarities (age, gender, occupation, zipcode) with U1.
Extract top x% of the users and explore the other movies (not rated by U1) rated
highly by the users with top similarity to U1. Rank the movies based on the overall ranking
and display the top 10 movies as the recommendation list for U1.
- Item-based filtering: Analyze the different movies rated highly by U1. Try to find similarities (genre, release date) between them and extract other movies in the database that have similar characteristics. Rank the movies based on the overall rankings and display the top 10 movies
Report
Your project report should include the following:- The data model used for Neo4J and MySQL
- Performance comparisons - load times, query response time for both algorithms
- Clearly define the ranking criteria used for both algorithms. Workout both algorithms manually and show the output for a sample input.
Submission Instructions
You will submit both the projects using handin. You will follow the following directory structure:Project5
|_ _ _ dataset (will contain the input files)
|_ _ _ src (will contain your code)
|_ _ _ lib (will contain the required jar files)
Project6
|_ _ _ dataset (will contain the input files)
|_ _ _ src (will contain your code)
|_ _ _ lib (will contain the required jar files)
|_ _ _ schema.sql
(will contain table creation scripts.)
Sample Run
Welcome to the Movie Recommendation System Enter your query choice 1 collaborative filtering 2 item-based filtering 3 quit Enter your choice: 1 Enter user id: 100 Enter minimum rating: 4 The recommended movies for user 100 are: 1. ...... 2. ...... ... 10. ...... Enter your query choice 1 collaborative filtering 2 item-based filtering 3 quit Enter your choice: 2 Enter user id: 200 The recommended movies for user 100 are: 1. ...... 2. ...... ... 10. ...... Enter your query choice 1 collaborative filtering 2 item-based filtering 3 quit Enter your choice:3 Bye