Industrial Analytics with Juxt

Machine generated manufacturing data is growing exponentially in volume, variety and velocity. This creates a huge opportunity for manufacturing houses to increase operational efficiency, improve productivity and gain a competitive advantage. Equipment data is generated at every level in different formats and granularities. This data is eminently suited for storage and analysis using cutting edge technology.

However, a vast majority of the data is still not used due to the lack of the very specialized technical expertise imposed by today’s tools. Juxt makes it easy for non-technical process experts to unearth the value of their operational data.

Juxt is a data and process integration system that enables industrial houses to leverage predictive analytics in the areas of  preventive/prescriptive maintenance, yield optimization and operational efficiency.

Challenges to a Successful Data Driven Project

A number of companies hesitate to invest a lot of dollars and time upfront to implement data driven operations while they are unsure of the returns.

Companies are concerned about having to build an in-house team of technical experts after the initial implementation for maintenance and enhancements.

Companies that do invest in data projects end up finding that their implementations are a ‘black-box’ environment unable to incorporate new learnings or evolving requirements

Juxt Benefits

Modular – Juxt offers a highly modular approach to building data apps. This enables companies to start simple and extend the functionality in phases as the value becomes apparent. This approach also enables users to quickly respond to changing requirements as they gain insights from their data

‘No Code’ Visual Designer – Juxt mitigates the need for a large team of technical experts using a easy to use visual, drag, drop and configure UX. With Juxt, process experts put together a process workflow using functional Lego(c)  blocks, tie them together and hit run. Integrate data from a variety of sources, run predictive machine learning algorithms,visualize the results and drive operational responses.

Distributed Deployment – Juxt is highly scalable and can be deployed across the hierarchy. Analytics can be deployed at the edge where the data is generated, in the cloud or a hybrid of the two. Juxt has a very small footprint with deployment options in embedded systems at the edge close to the sensors enabling real time analytics right at the source.  At the same time, Juxt is big data scalable with connectors to Hadoop data stores and compute clusters. juxt industrial automation system

Juxt Technical Services

Juxt offers technical consulting services to kickstart your data projects. Please contact info@juxt.io to chat.

 

Analyzing the Movielens Data – Part 4

This is Part 4 of our “Analyzing the Movielens data” series.

In Part 3, we answered the following by building Juxt flows –

  • List only the good movies – the ones that got an average rating of 4.3 or higher

In that example, we went over the select module to create custom filters.

Let’s build on that to address the next one

  • What genres of movies do Programmers rate the most?

As always, let’s lay down the logical steps needed to address this

  • Filter the data to include only the movies rated by Programmers (Occupation code =12)
  • Group the filtered data by Genres.
  • Iterate through each group and count the number of items in each of the buckets
  • Sort the Genre buckets by the count and derive the top 10 genres

The data flow for this is shown below

Juxt Flow - Movies Programmers Rate Most
Juxt Flow – Movies Programmers Rate Most

We first filter the data set using the now familiar Select module with our custom filter. Again, the filter is a rather simple one here where we simply do a lookup for occupation and if it is equal to 12 which is the occupation code in the dataset for Programmers, we pass it through to the next stage of analysis.

The figure below shows the filter logic.

Filter Logic to Select Only Programmer Ratings
Filter Logic to Select Only Programmer Ratings

The next step is to group the filtered dataset into buckets of data by genres. This is done simply by using the Group By module with genres as the column to be grouped by. There are 294 genre combinations in the dataset. So, the Group By operation creates 294 buckets each of them containing the data belonging to that specific genre categorization.

Now we need to iterate through each of those buckets and count the number of records in the bucket. We do that with Collect module. Collect works very similar to Select. It takes in collection of data and performs the user (or template) logic in each of the items in the collection. One simply picks the user logic or Collector from the drop down menu.

Figure below shows the collector logic for our use case here. Here, we simply lookup each bucket, Count the number of entries and assign a name (key) to the result.

Collector Logic to Count Ratings in Every Data Group
Collector Logic to Count Ratings in Every Data Group

Top N module outputs the top 10 results sorted by count to an HTML table.

Results - Top Genres Programmers Rate
Results – Top Genres Programmers Rate

A two minute video of the discussion can be seen here

Analyzing the Movielens Data – Part 3

This is Part 3 of our “Analyzing the Movielens data” series.

In Part 2, we answered the following by building Juxt flows:

  • What is the average rating for each movie broken down by gender?
  •  What are the top 10 movies that men rate higher than women?

Continuing on, let’s address the next one

  • List only the good movies – the ones that got an average rating of 4.3 or higher

In the process of doing this, we’ll go over how to build custom filters using Select building block.

The logical steps to address this question are

  • Calculate the average rating for every movie title (total aggregate, not broken down by gender)
  • Select(filter) only the movies that meet the 4.3 cut-off.
Juxt Flow - Movies with Ratings > 4.3
Juxt Flow – Movies with Ratings > 4.3

As before, we start with fetching the data from the user DB with Fetch from User DB.

Average rating per title can be calculated using the built-in Rollup library module (Recall that we had used a Pivot Table in the last example to further break it down by gender, but we have a simpler problem here).

The Rollup module outputs just two parameters – Title (Group by parameter) and Mean-Rating (aggregated feature).

Now, we need a mechanism to go over each of the entries and make a comparison against our selection criteria – mean > 4.30.

We use Select module for that. The Select module takes in each entry row by row and applies the user specified filter logic. We have a simple logic here, but you can apply rather sophisticated logic with multiple parameters using this mechanism.

In addition to the input data, Select module has two other inputs. Context Parameters enables users to provide extra parameters needed for the logic and a Drop down menu for picking the filter.
In our example, we use the filter called good movie selector.

Selector Logic – Juxt uses key-value stores. We use Lookup module with a key of mean-rating to a comparator block If True which compares the mean rating value with the preset value from Context Parameters which in this case is the number 4.3.

Juxt Flow - Selector Logic to Filter Movies > 4.3
Juxt Flow – Selector Logic to Filter Movies > 4.3

Finally, we render the results as a HTML Data Table

Results - Movies with Ratings > 4.3
Results – Movies with Ratings > 4.3

A two minute video of our discussion can be seen here

Analyzing Movielens Data Part 2

This is Part 2 of our “Analyzing the Movielens data” series.

In Part 1, we did the following:

  • reviewed the organization of the data
  • outlined a set of questions we’d like to ask
  • created Juxt workflows to integrate the data from three different data sources

Now, let’s look at a couple of the questions.

  1. What is the average rating for each movie broken down by gender?
  2. What are the top 10 movies that men rate higher than women?

Average Rating by Gender

We start by fetching from the user DB, where we have already integrated the Users, Movies and Ratings data. This is done with the Fetch from User DB module using the key “movielens-dataset”.

Juxt flow for calculating average ratings by gender
Juxt flow for calculating average ratings by gender

Average rating by gender can be computed using the built-in Pivot Table library module. Since we want the average Ratings, we set the Value property to “rating” and the Aggregation property to “mean”. We Group By the “title”, and split the “gender” Column values into new columns.

Finally, we render the results as a HTML Data Table, which is as follows:

Average ratings by gender
Average ratings by gender

Top 10 Movies that Men rate higher than Women

Now that we have the average ratings by gender, we can do the following:

  1. Calculate the difference in ratings from men and women for each movie
  2. Sort the movies in descending order by the difference in ratings
  3. Take the top 10
Ratings Difference by Gender
Ratings Difference by Gender

The Calculate New Column module adds a new column to the dataset based on an expression we specify. The expression can be any mathematical equation which references existing columns. In this case, we simply subtract the mean ratings to create a new column “difference”

rating-mean_gender_M - rating-mean_gender_F

The module Top N with Feature of “difference” and a Count of 10, will give us the top 10 movies with the most difference in ratings.

And the results are in:

Ratings Difference by Gender
Ratings Difference by Gender

Please check out our screencast of building these workflows in Juxt.io:

Analyzing Movielens Data Part 1

This might be familiar – a perennial question that keeps coming up in our home – What movie to watch tonight?

There’s a ton of movie ratings from actual users in the Movielens dataset. Wouldn’t it be great to use all this data to help us pick the right movie everytime?
We’ll use a Movielens dataset that contains 1,000,209 anonymous ratings of 3,900 movies made by 6,040 MovieLens users. – data explained here

From this, let’s say we want to ask the following –

  1. What is the average rating for each movie broken down by gender
  2. List only movies that received at least 100 ratings
  3. Of those, list only the good ones – movies that got ratings of 4.3 or higher
  4. List the top 10 movies that, on average, men rate higher than women
  5. What genres of movies do programmers like?

Organization of Data

The data is distributed across three disparate data stores.

Movie data is in a Comma Separated Value (CSV) file in Amazon S3. This contains MovieID, Title and Genres

Users data is in a CSV file in Dropbox data store. This contains UserID, Gender, Age, Occupation and Zip-code

Ratings data is in a Relational Database table in PostgreSQL that contains UserID, MovieID, Rating and a Timestamp

Data Integration

Since the data is spread across multiple silos (Amazon S3, Dropbox) and multiple formats (CSV, PostgreSQL), we need to combine the relevant data into a form that is easier to work with.

In the figure below, functional modules are wired together to create the data integration. At a high level:

  • We load the data from the different sources
  • Combine them (Join) based on a common feature or column to create a virtual data source (user-id for ratings & users, movie-id for movies)
  • The combined data is stored in a user DB (in-memory cache) with the ID “movielens-dataset”. This can be fetched in subsequent modules for further analytics
Data Import Flow
Data Import Flow

Please check out the video version of the data integration

In the next post, we will get into building the flows needed to answer the questions we started with.

Introducing the Juxt Data Workbench

Juxt is an interactive data workbench in the cloud that enables business users to build sophisticated data applications without writing any code. Similar to how you would describe a process on a whiteboard, i.e., identifying the steps needed and sequencing them into a complete end-to-end flow, Juxt lets you build data apps. For example – send promo email offers to a target set of customer profiles based on past behavior, forecast product demand based on seasonal trends.

Juxt - an Interactive Data Workbench
Juxt – an Interactive Data Workbench

In Juxt, you can create apps by dragging functional blocks from our component library onto a design canvas, wire them together and press the “Run” button. Our built-in component library includes connector blocks for various data sources & types, components for data munging, augmentation & aggregation, statistical & predictive machine learning algorithms, and web API’s to popular online services.

Explore, Integrate and Operationalize Data
Explore, Integrate and Operationalize Data

You can also build your own functional blocks using Juxt’s built-in library components or wrap existing R, Python, Javascript & Clojure code assets. This is huge, because all the interesting work you’ve already done can start being applied across your company right away. Every block that you create becomes usable in all your projects and across your entire organization. Wheel reinvention problem … solved!

Once you’re done creating your cool data app, publish it and show your peers & users what a rockstar you are!!