Analyzing Movielens Data Part 2

This is Part 2 of our “Analyzing the Movielens data” series.

In Part 1, we did the following:

  • reviewed the organization of the data
  • outlined a set of questions we’d like to ask
  • created Juxt workflows to integrate the data from three different data sources

Now, let’s look at a couple of the questions.

  1. What is the average rating for each movie broken down by gender?
  2. What are the top 10 movies that men rate higher than women?

Average Rating by Gender

We start by fetching from the user DB, where we have already integrated the Users, Movies and Ratings data. This is done with the Fetch from User DB module using the key “movielens-dataset”.

Juxt flow for calculating average ratings by gender
Juxt flow for calculating average ratings by gender

Average rating by gender can be computed using the built-in Pivot Table library module. Since we want the average Ratings, we set the Value property to “rating” and the Aggregation property to “mean”. We Group By the “title”, and split the “gender” Column values into new columns.

Finally, we render the results as a HTML Data Table, which is as follows:

Average ratings by gender
Average ratings by gender

Top 10 Movies that Men rate higher than Women

Now that we have the average ratings by gender, we can do the following:

  1. Calculate the difference in ratings from men and women for each movie
  2. Sort the movies in descending order by the difference in ratings
  3. Take the top 10
Ratings Difference by Gender
Ratings Difference by Gender

The Calculate New Column module adds a new column to the dataset based on an expression we specify. The expression can be any mathematical equation which references existing columns. In this case, we simply subtract the mean ratings to create a new column “difference”

rating-mean_gender_M - rating-mean_gender_F

The module Top N with Feature of “difference” and a Count of 10, will give us the top 10 movies with the most difference in ratings.

And the results are in:

Ratings Difference by Gender
Ratings Difference by Gender

Please check out our screencast of building these workflows in Juxt.io:

One thought on “Analyzing Movielens Data Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s