Analyzing the Movielens Data – Part 4

This is Part 4 of our “Analyzing the Movielens data” series.

In Part 3, we answered the following by building Juxt flows –

  • List only the good movies – the ones that got an average rating of 4.3 or higher

In that example, we went over the select module to create custom filters.

Let’s build on that to address the next one

  • What genres of movies do Programmers rate the most?

As always, let’s lay down the logical steps needed to address this

  • Filter the data to include only the movies rated by Programmers (Occupation code =12)
  • Group the filtered data by Genres.
  • Iterate through each group and count the number of items in each of the buckets
  • Sort the Genre buckets by the count and derive the top 10 genres

The data flow for this is shown below

Juxt Flow - Movies Programmers Rate Most
Juxt Flow – Movies Programmers Rate Most

We first filter the data set using the now familiar Select module with our custom filter. Again, the filter is a rather simple one here where we simply do a lookup for occupation and if it is equal to 12 which is the occupation code in the dataset for Programmers, we pass it through to the next stage of analysis.

The figure below shows the filter logic.

Filter Logic to Select Only Programmer Ratings
Filter Logic to Select Only Programmer Ratings

The next step is to group the filtered dataset into buckets of data by genres. This is done simply by using the Group By module with genres as the column to be grouped by. There are 294 genre combinations in the dataset. So, the Group By operation creates 294 buckets each of them containing the data belonging to that specific genre categorization.

Now we need to iterate through each of those buckets and count the number of records in the bucket. We do that with Collect module. Collect works very similar to Select. It takes in collection of data and performs the user (or template) logic in each of the items in the collection. One simply picks the user logic or Collector from the drop down menu.

Figure below shows the collector logic for our use case here. Here, we simply lookup each bucket, Count the number of entries and assign a name (key) to the result.

Collector Logic to Count Ratings in Every Data Group
Collector Logic to Count Ratings in Every Data Group

Top N module outputs the top 10 results sorted by count to an HTML table.

Results - Top Genres Programmers Rate
Results – Top Genres Programmers Rate

A two minute video of the discussion can be seen here

Analyzing the Movielens Data – Part 3

This is Part 3 of our “Analyzing the Movielens data” series.

In Part 2, we answered the following by building Juxt flows:

  • What is the average rating for each movie broken down by gender?
  •  What are the top 10 movies that men rate higher than women?

Continuing on, let’s address the next one

  • List only the good movies – the ones that got an average rating of 4.3 or higher

In the process of doing this, we’ll go over how to build custom filters using Select building block.

The logical steps to address this question are

  • Calculate the average rating for every movie title (total aggregate, not broken down by gender)
  • Select(filter) only the movies that meet the 4.3 cut-off.
Juxt Flow - Movies with Ratings > 4.3
Juxt Flow – Movies with Ratings > 4.3

As before, we start with fetching the data from the user DB with Fetch from User DB.

Average rating per title can be calculated using the built-in Rollup library module (Recall that we had used a Pivot Table in the last example to further break it down by gender, but we have a simpler problem here).

The Rollup module outputs just two parameters – Title (Group by parameter) and Mean-Rating (aggregated feature).

Now, we need a mechanism to go over each of the entries and make a comparison against our selection criteria – mean > 4.30.

We use Select module for that. The Select module takes in each entry row by row and applies the user specified filter logic. We have a simple logic here, but you can apply rather sophisticated logic with multiple parameters using this mechanism.

In addition to the input data, Select module has two other inputs. Context Parameters enables users to provide extra parameters needed for the logic and a Drop down menu for picking the filter.
In our example, we use the filter called good movie selector.

Selector Logic – Juxt uses key-value stores. We use Lookup module with a key of mean-rating to a comparator block If True which compares the mean rating value with the preset value from Context Parameters which in this case is the number 4.3.

Juxt Flow - Selector Logic to Filter Movies > 4.3
Juxt Flow – Selector Logic to Filter Movies > 4.3

Finally, we render the results as a HTML Data Table

Results - Movies with Ratings > 4.3
Results – Movies with Ratings > 4.3

A two minute video of our discussion can be seen here