This is Part 2 of our “Analyzing the Movielens data” series.

In Part 1, we did the following:

- reviewed the organization of the data
- outlined a set of questions we’d like to ask
- created Juxt workflows to integrate the data from three different data sources

Now, let’s look at a couple of the questions.

- What is the average rating for each movie broken down by gender?
- What are the top 10 movies that men rate higher than women?

## Average Rating by Gender

We start by fetching from the user DB, where we have already integrated the Users, Movies and Ratings data. This is done with the **Fetch from User DB** module using the key “movielens-dataset”.

Average rating by gender can be computed using the built-in **Pivot Table** library module. Since we want the average Ratings, we set the **Value** property to “rating” and the **Aggregation** property to “mean”. We **Group By** the “title”, and split the “gender” **Column** values into new columns.

Finally, we render the results as a **HTML Data Table**, which is as follows:

## Top 10 Movies that Men rate higher than Women

Now that we have the average ratings by gender, we can do the following:

- Calculate the difference in ratings from men and women for each movie
- Sort the movies in descending order by the difference in ratings
- Take the top 10

The **Calculate New Column** module adds a new column to the dataset based on an expression we specify. The expression can be any mathematical equation which references existing columns. In this case, we simply subtract the mean ratings to create a new column “difference”

```
rating-mean_gender_M - rating-mean_gender_F
```

The module **Top N** with **Feature** of “difference” and a **Count** of 10, will give us the top 10 movies with the most difference in ratings.

And the results are in:

Please check out our screencast of building these workflows in Juxt.io:

## One thought on “Analyzing Movielens Data Part 2”