# Pareto charts in CRM Analytics

In CRMA we have the flexibility to build lots of different charts to analyze our KPIs. In this article, we’ll learn how to create a Pareto chart, but first of all: What is a Pareto chart? And why would I need this chart in my analysis?

The Pareto chart (named after the Italian economist, Vilfredo Pareto) is a type of chart that uses combined bars and a line graph, where we plot our metric in descending order on the bars at the left side, and represent the total cumulative percentage of this (or a different) metric in the line.

This chart is used to demonstrate and highlight the most important ones among a set of factors. Taking the example of quality control, Pareto charts are useful to find, for example, defects to prioritize in order to observe the greatest overall improvement. Usually, the greater the number of factors (in this case, case types) analyzed, the clearer we can observe the Pareto principle, also known as the “80/20” rule, which states that for many different outcomes, roughly 80% of consequences or results will come from 20% of causes (known as the “vital few”). This chart is one of the seven basic tools of quality control and is used in several different industries and different control techniques, like Six Sigma and TQM.

So now that we know all this cool information about this chart, let’s learn how to create this using Salesforce CRM Analytics. It’s relatively easy to build one and we will require basically 2 things: Combo charts and windowing functions.

You can follow the tutorial using this dataset for your steps.

We’ll create 2 use cases. The first one will illustrate only the case count by type and the cumulative percentage of this count among case types. The second use case will be the average time to close a case by case type, and the cumulative percentage of case count, to illustrate the Pareto chart using 2 different metrics. This chart can only be built by using a custom SAQL query as it involves post-projection groupbys and projection. Also, it can be done in fewer steps, but I want to illustrate well the main concepts to build this.

### Use case 1: Case count by case type

In this first use case, we will generate a Pareto chart displaying the count of cases by case type, in vertical bars, in descending order, and the cumulative percentage of each type, until 100%, will be the line of our combo chart.

**Step 1** – Group your bars by your metric, and generate the count for rows and the cumulative count for all rows in the dataset. For this, we will use our first window function. Notice that we are not partitioning the function, cause we want the cumulative rows of the whole dataset, but in window functions, you can also partition your calculations within the groupings. Here is the SAQL code for this step:

```
q = load "Pareto_data";
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type', count() as 'count', sum(count()) over ([..] partition by all) as 'Total rows';
```

And the data result:

**Step 2** – Divide the count of rows by the Total rows column. We’ll project, in a new line, the previous fields we generated, and we’ll calculate a new field for the division. This division’s purpose is to get the percentage of each Case Type count against the total. Here is the SAQL code for this step:

```
q = load "Pareto_data";
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type', count() as 'count', sum(count()) over ([..] partition by all) as 'Total rows';
q = foreach q generate 'Case_Type', sum('count') as 'count', sum('count')/sum('Total rows') as 'Percentage by Case Type';
```

And the data result:

**Step 3** – Rollup the case percentage. Now, we’ll use another window function to roll up our previously calculated Percentage by Case Type field and order it using the count of rows field, in a decrescent order (to generate the bars at the left and the cumulative line at the right). Notice that we are regrouping our data, and if you try to do this step without regrouping, it will result in an error, since windowing functions only work in grouped data. Also, notice that this window function uses the [..0] syntax, meaning that we are cumulating all previous rows up to the current row. Here is the SAQL code for this step:

```
q = load "Pareto_data";
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type' as 'Case_Type', count() as 'count', sum(count()) over ([..] partition by all) as 'Total rows';
q = foreach q generate 'Case_Type', sum('count') as 'count', sum('count')/sum('Total rows') as 'Percentage by Case Type';
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type', sum('count') as 'count', sum('Percentage by Case Type') as 'Percentage by Case Type', sum(sum('Percentage by Case Type')) over([..0] partition by all order by sum('count') desc) as 'Cumulative percentage';
```

And the data result:

**Step 4** – Switch to the chart mode, change the percentage columns to the percent format, then, select a combo chart for your visualization. Hide the percentage by case type field from your chart projection and select the dual-axis mode for your combo chart.

And voilá! You have created your own Pareto chart.

What information can we drive out from this chart? As you can see, up to 80% of our cases come from just the first three categories and all the other three categories represent only 20% of our cases against the total count. Meaning that to be more efficient, a company should drive most of its efforts toward the first three case categories.

### Use case 2: Average time to solve and case count by case type

This second use case will be similar to the first one, but now we will use 2 different metrics in our Pareto chart. The vertical bars will be the average time to close a case in the bars, also by case type, and the cumulative line will be the case count. Why use 2 different metrics in our Pareto chart? To illustrate that not always the greatest effort should be directed to the highest bars.

**Step 1** – Just like our previous use case, we’ll group our query by case type, but now we’ll project the average time to close a case and the case count. As you can see, we can calculate directly the percentage here by dividing the case count by the cumulative case count in the windowing function, thus, skipping the first step of our first use case. Here is the SAQL code for this step:

```
q = load "Pareto_data";
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type' as 'Case_Type', avg('Time_to_close_hours') as 'avg_Time_to_close_hours', count() as 'count', count()/sum(count()) over([..] partition by all) as 'Percentage by Case Type';
```

And the data result:

**Step 2** – Regroup the data and use a new windowing function to roll up the case percentage by type. This new calculated field will be the line of our Pareto chart. Here is the SAQL code for this step:

```
q = load "Pareto_data";
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type' as 'Case_Type', avg('Time_to_close_hours') as 'avg_Time_to_close_hours', count() as 'count', count()/sum(count()) over([..] partition by all) as 'Percentage by Case Type';
q = group q by 'Case_Type';
q = foreach q generate 'Case_Type' as 'Case_Type', avg('avg_Time_to_close_hours') as 'avg_Time_to_close_hours', sum('count') as 'count', sum('Percentage by Case Type') as 'Percentage by Case Type', sum(sum('Percentage by Case Type')) over ([..0] partition by all order by avg('avg_Time_to_close_hours') desc) as 'Percentage rollup';
```

And the data result:

**Step 3** – Format our chart to display the selected metrics. Like the first use case’s step 4, format your chart to hide the undesired metrics and display only the correct ones. After you complete this step, this will be your resulting chart:

As you can see, cases from the first three categories take above 5 hours on average to be closed, but the two first categories correspond to only 21% of our total cases, and the “Other” category corresponds alone to 30,1% of our total cases, and it has a significantly greater time on average to close these cases. On first analysis, this means that if we want to be more efficient and decrease our cases’ average closing time, we should focus our efforts on cases from the “Other” category.