SAQL simply explained – Part 3

By now we have covered most of the basics in a SAQL query, but there is one more thing that you will see a lot when you are working with SAQL and it is filters. So in this third part, we will look closer at filters.

Pre-projection Filters

When you apply a filter to a query in compact form and switch to SAQL the filter will automatically be right after the load statement (unless you choose an aggregated measure, but more about that later). It kinda makes sense because the following grouping and aggregation will in this case deal with less data. This type of filter is also known as a pre-projection filter, which is the most common type of filter.

In the below example on line 2, you will see how I have added a simple filter where Stage is equal to “Closed Won”.

1 q = load "opportunities";

2 q = filter q by 'StageName' == "Closed Won";
3 q = group q by 'StageName';

4 q = foreach q generate 'StageName' as 'StageName', sum('Amount') as 'sum_Amount';

5 q = order q by 'StageName' asc;

6 q = limit q 2000;

Dimension Filters

So what can we do with filters? Well, actually quite a few things. I guess the most common is the example from above where we want to look at a dimension in the dataset and then look at a specific value hence using the operator ‘==’. If we want that dimension to be one of several values the filter statement would look like below. Instead of using the ‘==’ we use ‘in’ and add the values in an array.

q = filter q by 'StageName' in ["Closed Won", "Negotiation"];

Let’s say you want to look at all values expect “Closed Lost” you can use the not equal to operator ‘!=’, which is a little simpler than adding all the possible values. In that case, the filter statement will look like below.

q = filter q by 'StageName' != "Closed Lost";

If we have a more than one value we don’t want to include in the query then we can use ‘not in’ as shown below.

q = filter q by 'StageName' not in ["Closed Won", "Closed Lost"];

Some dimensions do not have a defined list of values because they in the source system are open text fields. If you want to look for part of a text string you can use the ‘matches’ operator, which basically stands for “contains”.

q = filter q by 'Account.Name' matches "An";

In the above examples we are assuming the column we are using in the filter has actual values. However, sometimes we want to look for null values, which we can do by using ‘is null’ or ‘is not null’. ‘is null’ allow us to look for rows where that column is null. In the below example we will get all the rows where the Account Type has not been selected.

q = filter q by 'Account.Type' is null;

The ‘is not null’ does the opposite; it filters out all the null values. Meaning in the below values we will only see rows where an Account Type has been added.

q = filter q by 'Account.Type' is not null;

Measure Filters

In all the above examples we have been looking at filters using dimensions. But we can of course also use measures for our filter. This one is probably more logical as we use standard mathematical operators for equal, greater than, less than etc. See below for a list of the operators including an example of how they are being used.

Operator  Description Example
== Equal

q = filter q by ‘Amount’ == 100;

< Less than

q = filter q by ‘Amount’ < 100;

<= Less than or equal

q = filter q by ‘Amount’ <= 100;

> Greater than

q = filter q by ‘Amount’ > 100;

>= Greater than or equal

q = filter q by ‘Amount’ >= 100;

 

Note that you can also use the operators ‘!=’, ‘is null’ and ‘is not null’ when working with a measure filter.

Date Filters

Finally, we have dates. Dates are a little different as any date field is deconstructed in the dataflow and split into date parts. Using dates in the design layer we reconstruct the date in the format we need such as Year-Month or Year-Month-Day. In SAQL you will see how dates are reconstructed in the date() function. There are different ways of filtering with dates, you can look relative and absolute dates. First, looking at relative date ranges we need a start and an end date in an array using different date keywords. In the below example we are looking for opportunities that are closed in the current year.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["current year".."current year"];

Note you can combine different keywords, meaning you do not need to start and end with the year. You can, for instance, combine year and day like in the example below.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["current year".."current day"];

If you want to include everything in the start or the end date you can leave it open-ended, all you do is leave that part out of the filter. Hence if you want to include everything up until a current month the filter will look like below.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in [.."current month"];

And if you want to look at anything after the current month the filter will look like below.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["current month"..];

You can, of course, use any of the date keywords. When in doubt of the keywords to use please refer to the documentation.

A cool thing you can do is to add or subtract to a date. A typical one I get asked about is how do I look at the same date last year. Well, we can simply subtract 1 year from the current day like below.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["1 year ago".."current day - 1 year"];

If you want to add another year you simply put ‘+ 1 year’.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in ["1 year ahead".."current day + 1 year"];

Finally, it’s possible to filter on absolute dates. Personally, I don’t really use these but let’s say I want to look at opportunities closed from March 1st, 2019 to August 15th, 2019 I can use the dateRange() function. Here I need to specify the start and end date as actual dates in the function, which you can see in the example below.

q = filter q by date('CloseDate_Year', 'CloseDate_Month', 'CloseDate_Day') in [dateRange([2019,3,1], [2019,8,15])];

Filter Logic

When we have multiple filters in a SAQL query they are defaulted to have ‘AND’ between them, which means all filters are being applied. Have a look at the query below to see an example.

1 q = load "opportunities";
2 q = filter q by 'Account.Industry' == "Media";
3 q = filter q by 'Account.Type' == "Customer";
4 q = group q by 'StageName';
5 q = foreach q generate 'StageName' as 'StageName', sum('Amount') as 'sum_Amount';
6 q = order q by 'StageName' asc;
7 q = limit q 2000;

Sometimes this is not the behavior we want, instead, we want to apply filter logic to change this behavior to ‘or’ or even have a combination of ‘and’ and ‘or’. The easiest way to achieve this is actually to stay in compact form and apply this in the UI, which was made available in the Summer 19 release. Check out my Summer 19 release blog to learn more about this feature. If we look at SAQL we would add ‘&&’ for “and” and ‘||’ for “or”, but you can actually also just use the words ‘and’ and ‘or’. This is added between two filters like the examples below.

1 q = load "opportunities";
2 q = filter q by 'Account.Industry' == "Media" or 'Account.Type' == "Customer";
3 q = group q by 'StageName';
4 q = foreach q generate 'StageName' as 'StageName', sum('Amount') as 'sum_Amount';
5 q = order q by 'StageName' asc;
6 q = limit q 2000;

Post-projection Filter

Alright, we should now be able to understand how to add a filter in SAQL regardless if it is a dimension, measure or date filter. I also said that the most common filter is a pre-projection filter which adds the filter right after the load statement. However, sometimes it’s not where we want the filter to be placed, some times we want to filter after your projection also known as a post-projection filter.

Let’s say I am looking at all my accounts and I want to rank them by the highest opportunity amount. Using the compare table we can quickly use the rank function and switch to SAQL and you will see the following.

1 q = load "opportunities";
2 result = group q by 'Account.Name';
3 result = foreach result generate q.'Account.Name' as 'Account.Name', sum(q.'Amount') as 'A';
4 result = group result by 'Account.Name';
5 result = foreach result generate 'Account.Name', sum(A) as 'A', rank() over([..] partition by all order by sum(A) desc) as 'B';
6 result = order result by ('B' asc);
7 result = limit result 2000;

Taking this a little further I am interested in looking at accounts that contain “An” but I want to keep the overall ranking of my accounts. If I apply the filter after the load statement I will never get the ranking of the accounts that do not contain “An”, because I filter those out before I even get to rank them. In this case, I want to use a post projection filter. In other words, a filter that is applied after my foreach statement, which is what projects the ranking. So my final SAQL query looks like below.

1 q = load "opportunities";
2 result = group q by 'Account.Name';
3 result = foreach result generate q.'Account.Name' as 'Account.Name', sum(q.'Amount') as 'A';
4 result = group result by 'Account.Name';
5 result = foreach result generate 'Account.Name', sum(A) as 'A', rank() over([..] partition by all order by sum(A) desc) as 'B';
6 result = filter result by 'Account.Name' matches "An";
7 result = order result by ('B' asc);
8 result = limit result 2000;

One important thing to note is that you can only do post-projection filters with fields that are included in the query. Hence if I wanted to filter by Industry “Media” I would first need to group by Industry, project it and then apply my filter. If I did not want to show the industry I can do another projection after my filter dropping the industry field as in the example below.

1 q = load "opportunities";
2 result = group q by ('Account.Name', 'Account.Industry');
3 result = foreach result generate q.'Account.Name' as 'Account.Name', q.'Account.Industry' as 'Account.Industry', sum(q.'Amount') as 'A';
4 result = group result by ('Account.Name', 'Account.Industry');
5 result = foreach result generate 'Account.Name', 'Account.Industry', sum(A) as 'A', rank() over([..] partition by all order by sum(A) desc) as 'B';
6 result = filter result by 'Account.Industry' == "Media";
7 result = foreach result generate 'Account.Name', 'A', 'B';
8 result = order result by ('B' asc);
9 result = limit result 2000;

Finally, if you wish to do a post projection filter on a measure you can simply add it via the UI. When you go to select a filter you can choose the aggregated measure.

Depending on what you are doing with your filter you may actually need to switch to SAQL. The most common ways of filtering are possible to do in the UI, so remember only to switch to SAQL if you really need to.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.