Csc 8711, Databases and the Web - Project 5 & 6

Due: Sunday, May 3rd
Group Assignment. Maximum of 4 members per group.

Movie Recommendation System using Neo4J and MySQL

For this project, you will use the data from the MovieLens dataset to develop a movie recommendation system in Neo4J and MySQL and compare the performance. You will use two different criteria to generate recommendations. This project will be done in two parts.

Two Parts

Part 1 (Neo4J)

You will model the dataset as a graph and use Neo4J to build the recommendation system. Neo4J will be used as an embedded database in a Java application. You will parse the input data and store it in a graph form in Neo4J. Write a separate Java program, called CreateNeo4JDB.java to do this (the database folder may be created inside Project5 folder, alongside dataset, src, and lib). You are free to choose your data model, but make sure to explain it in the project report. You will then execute the two algorithms described below on the created database to generate movie recommendations. Name your program Neo4JRecommend.java.

Part 2 (MySQL)

You will use the same data and build the recommendation engine using MySQL and Java. Provide a SQL script file, called schema.sql, that contains create table statements to create the relational database and a java program, CreateMySQLDB.java that will load the dataset into the MySQL database. You will use JDBC to connect to the database. Name your program MySQLRecommend.java.

Dataset

The dataset consists of movie ratings provided by a set of individuals for a set of movies. The details of each movie is given in the u.item file. Each movie consists of a release date, a video release date, IMDB url, and belongs to one of the 19 genres given in the u.genre file. The u.user file contains demographic information about the users like age, gender, occupation and zipcode. The movie ratings given by the users along with the timestamp is contained in the u.data file. The included README file has further information.

Ranking Criteria

Suppose we need to generate movie recommendations for user U1. You will use the following algorithms to rank the movies:
  1. Collaborative filtering: Analyze the similarities between user U1 and the other users who had also given high ratings (rating r) to the same movies as U1. Rank the users based on their similarities (age, gender, occupation, zipcode) with U1. Extract top x% of the users and explore the other movies (not rated by U1) rated highly by the users with top similarity to U1. Rank the movies based on the overall ranking and display the top 10 movies as the recommendation list for U1.

  2. Item-based filtering: Analyze the different movies rated highly by U1. Try to find similarities (genre, release date) between them and extract other movies in the database that have similar characteristics. Rank the movies based on the overall rankings and display the top 10 movies
You may ask the user to input the rating (r) that needs to be considered for the collaborative filtering case. Try experimenting with different values. The similarities need not be an exact match. You can assign weights to users based on the number of attributes that match with the target user. A similar approach can be used to rank the movies based on similarities.

Report

Your project report should include the following:
  • The data model used for Neo4J and MySQL
  • Performance comparisons - load times, query response time for both algorithms
  • Clearly define the ranking criteria used for both algorithms. Workout both algorithms manually and show the output for a sample input.

Submission Instructions

You will submit both the projects using handin. You will follow the following directory structure:

Project5
|_ _ _ dataset (will contain the input files)
|_ _ _ src (will contain your code)
|_ _ _ lib (will contain the required jar files)

Project6
|_ _ _ dataset (will contain the input files)
|_ _ _ src (will contain your code)
|_ _ _ lib (will contain the required jar files)
|_ _ _ schema.sql (will contain table creation scripts.)

Sample Run


Welcome to the Movie Recommendation System

Enter your query choice

1 collaborative filtering
2 item-based filtering
3 quit

Enter your choice: 1
Enter user id: 100
Enter minimum rating: 4

The recommended movies for user 100 are: 

1.  ......
2.  ......
...
10. ......

Enter your query choice

1 collaborative filtering
2 item-based filtering
3 quit

Enter your choice: 2
Enter user id: 200

The recommended movies for user 100 are: 

1.  ......
2.  ......
...
10. ......

Enter your query choice

1 collaborative filtering
2 item-based filtering
3 quit

Enter your choice:3
Bye