SAQL simply explained – Part 5

In the last blog of this SAQL series, we saw how powerful SAQL really can be by joining datasets together using the cogroup statement. But we are not stopping there, we can do more powerful stuff. Another way of joining different streams is with the union statement, so in this blog, we will uncover that concept.

Union, what?

Before I get to the use case in this blog let’s talk about what the union statement does. Actually, it’s simple, it’s just a fancy word for append. With this statement, we can append streams from one or multiple datasets in one query.

When we use the union statement it’s important we have projected the exact same columns with the same naming in each stream before we append (union) them. Hence if you are working with two different datasets you may very well have to manipulate your data a bit before you use the statement. This also means we keep the grain level intact in the process and we will keep a row for the group field(s) in every stream. If we look at the sample dataset from part 4 of this blog series then using the union statement on our opportunity and case dataset will result in the same account showing multiple times. You can see the illustration below.

What’s the use case?

Okay, to illustrate the union statement we will be looking to see how many cases and opportunities we have for each account. If you read the last blog, you are probably thinking “again?”. Yes, we did cover this use case previously, but the number of opportunities and cases were illustrated in the form of two measure giving us one for opportunities and one for cases.

Now if you want to show this table in a stacked bar, where you have one bar per Account Name showing both the number of opportunities and cases you are out of luck. Check my chart below to see the result.

So in the example for this blog we will see how we can convert the two measures into a new dimension called ‘Type’ indicating if the measure is counting opportunities or cases. Confused? I hope not, but below you can see what we are trying to achieve with the union statement.

Let’s look at that union example

I’ve said it before, but I’ll say it again, make the tool do the hard work and only switch to SAQL when you need to. Sometimes this can mean create multiple steps or lenses to get the queries.

First, let’s create a dashboard for the use case and add a query based on the ‘cases’ dataset from the Sales Analytics Templated App. If you haven’t installed it already and you want to follow along then that is a prerequisite. The query will keep the ‘Count of Rows’ but we need to group by ‘Account.Name’. Once you have that you can switch to ‘Query Mode’ by clicking the button in the top right of the explorer and copy the whole query – we need it shortly. Once you have it copied and perhaps pasted in a text editor then you can close down the new query and get ready to create a new one.

Alright, create one more query, but this time select the ‘opportunities’ dataset from the Sales Analytics Templated App. Similarly to before you should keep the ‘Count of Row’ but add the grouping on ‘Account.Name’ and switch to ‘Query Mode’ to see the SAQL. Now add the text you copied before underneath the existing query. Just remember that you need to have this step on your dashboard if not the query will return an error once you run it as explained in part 4 of this blog series.

You should end up with a query that looks like the one I have below.

1 q = load "opportunity";
2 q = group q by 'Account.Name';
3 q = foreach q generate 'Account.Name' as 'Account.Name', count() as 'count';
4 q = order q by 'Account.Name' asc;
5 q = limit q 2000;

6 q = load "case";
7 q = group q by 'Account.Name';
8 q = foreach q generate 'Account.Name' as 'Account.Name', count() as 'count';
9 q = order q by 'Account.Name' asc;
10 q = limit q 2000;

You can run the query now, but we have an issue if we do. We only get the result from the case dataset as the streams are called exactly the same and hence we overwrite the opportunity stream. Hence, we are renaming the streams; the opportunity stream will now be referred to as q1 and the case stream q2. Below you can see the changes. I’ve also removed line 4, 5, 9 and 10, in other words, the order and limit statements as we don’t need them at this point in our query, we can add them later if we wish, but we at least want to make sure we get all the rows from both datasets. Below you can see how the query should look after the changes.

1 q1 = load "opportunity";
2 q1 = group q1 by 'Account.Name';
3 q1 = foreach q1 generate 'Account.Name' as 'Account.Name', count() as 'count';

4 q2 = load "case";
5 q2 = group q2 by 'Account.Name';
6 q2 = foreach q2 generate 'Account.Name' as 'Account.Name', count() as 'count';

Note that we are lucky. Both of our datasets have exactly the same naming, which we need in order for the union to work. But let’s say ‘Account.Name’ was instead ‘Opportunity.Account.Name’ and ‘Case.Account.Name’ we would have to modify the projected field, the alias. See below how we would have done that.

...
3 q1 = foreach q1 generate 'Opportunity.Account.Name' as 'Account.Name', count() as 'count';
...
6 q2 = foreach q2 generate 'Case.Account.Name' as 'Account.Name', count() as 'count'

Static expression

Before we get to the union part remember the use case, we wanted to be able to group by a new ‘Record Type’ field that says if the record is an opportunity or a case. As you probably know or have guessed this field doesn’t exist in either of the datasets, but we can project a static value and give this value a name.

In the ‘foreach’ statements we will add a new expression and alias. The expression is our static value, which for the opportunity stream (q1) will be “Opportunity” – doh. We will give the new field an alias “Record Type”. The same should be done for the case stream (q2) only we will have “Case” as the expression. Remember the alias must be the same for the union to work. Check like 3 and 6 below to see how the query should look.

1 q1 = load "opportunity";
2 q1 = group q1 by 'Account.Name';
3 q1 = foreach q1 generate 'Account.Name' as 'Account.Name', "Opportunity" as 'Record Type', count() as 'count';

4 q2 = load "case";
5 q2 = group q2 by 'Account.Name';
6 q2 = foreach q2 generate 'Account.Name' as 'Account.Name', "Case" as 'Record Type', count() as 'count';

The union

Now it’s time for the magic, the union. It’s actually quite simple, all we have to do is use the union statement and name the streams to union. The syntax is added below.

result = union stream1, stream2, stream3, ...;

As you can see from the union syntax example you can union as many streams as you want. The example we are going through in this blog only has two streams but should you have more you can still use the statement. Below in line 7, you can see how I’ve started a new stream called ‘result’ where I use the ‘union’ statement to append q1 (opportunities) and q2 (cases).

1 q1 = load "opportunity";
2 q1 = group q1 by 'Account.Name';
3 q1 = foreach q1 generate 'Account.Name' as 'Account.Name', "Opportunity" as 'Record Type', count() as 'count';

4 q2 = load "case";
5 q2 = group q2 by 'Account.Name';
6 q2 = foreach q2 generate 'Account.Name' as 'Account.Name', "Case" as 'Record Type', count() as 'count';

7 result = union q1, q2;

And that’s it. If you run the query now you will see the new static expression we created with either “Opportunity” or “Case”, you will also see we have both streams in the result because of the ‘union’ statement. As we are doing an append the same account may appear twice, once from each dataset.

But we are not all done. We have a working query but the use case was to have a chart that is grouped by Account Name and Record Type. So let’s extend the query a little bit more.

After the ‘union’ statement we can take the result and manipulate it further. Remember a query is like reading a book; it’s read line by line and each line adds to a more rich and compelling story. As mentioned above we want to group by ‘Account.Name’ and ‘Record Type’, so after the union let’s add a ‘group’ statement with multiple fields as discussed in part 2 of this blog series. You can see the statement below.

result = group result by ('Account.Name', 'Record Type');

As we have discussed earlier in this blog series, it’s not enough to add a grouping you also need to project the fields you want to use in a ‘foreach’ statement. You can see that below. Notice that instead of using the count() function I am using the sum() function. Why? Well, using count() will give a wrong result as we have already gotten the ‘Count of Rows’ in q1 and q2, meaning if we count after the union we will get the result of 1. Instead, we want to sum the counted rows.

result = foreach result generate 'Account.Name', 'Record Type', sum('count') as 'count';

Finally, to ensure the correct order, I have added an order statement. If you wish you can also add a limit statement, but I’ve left it out in my query.

result = order result by 'Account.Name' asc;

Below you can see how the full and final query looks like.

1 q1 = load "opportunity";
2 q1 = group q1 by 'Account.Name';
3 q1 = foreach q1 generate 'Account.Name' as 'Account.Name', "Opportunity" as 'Record Type', count() as 'count';

4 q2 = load "case";
5 q2 = group q2 by 'Account.Name';
6 q2 = foreach q2 generate 'Account.Name' as 'Account.Name', "Case" as 'Record Type', count() as 'count';

7 result = union q1, q2;

8 result = group result by ('Account.Name', 'Record Type');
9 result = foreach result generate 'Account.Name', 'Record Type', sum('count') as 'count';
10 result = order result by 'Account.Name' asc;

You can now run your query, switch back to chart mode and select a stacked bar chart. You will see this chart looks better with the new ‘Record Type’ field.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.