Einstein Analytics: A robust approach for performing period over period analysis (using a single date picker)

Date dimensions in Salesforce Wave Einstein Analytics have always been an enigma. When I first got introduced to Edgespring (precursor to Wave) back in Summer of 2014, besides the fresh approach to the analytics user experience based on an animated charting library and the awesome performance of columnar storage and inverted index based query engine, I was amazed by the summarization capabilities, especially date field summarization. I thought it was cool that one could group by ‘year-month’ and then easily pivot by ‘quarter-year’ (pivot by ‘quarter-year’, why the heck would one do that? stay tuned for a post on this pattern…). I wasn’t new to search based analytics indices but this fresh take blew me away.

As the product has evolved over the years, I’ve noticed that date dimension continues to be a tad mysterious. Rikke and others have written about how to dynamically switch date pivots, use lists to do year over year analysis, fixed timeframe analysis to static step based analysis to make it easier to use dates for filters and to perform year over year analysis. I have used these approaches for many of my EA implementations. However, one of my customers recently challenged me to improve these approaches and come up with a solution that:

  • Works with a single standard date widget
  • Works with absolute dates
  • Works with relative dates

With the added bonus:

  • Allow switching between “current period vs. prior period” (example: Jan-Feb 2019 vs. Nov-Dec 2018) & “current period vs same period prior year” (example: Jan-Feb 2019 vs. Jan-Feb 2018)

What do you say to such a request? well of course challenge accepted.

Approach 1: use an external date translation file

By default, my first instinct is to solve it at the time of ingest. When a natural dimension is created at the time of ingest, it not only moves the processing time offline (dataflow) but it also simplifies the front end queries. This approach has served me well in the past so no reason to deviate and it worked. I quickly create a CSV file with about 50 columns (see below). In this CSV file, I precalculate a corresponding date 1 month ago, 2 months ago, and so on. I then augmented this data to my main dataset with invoice date as my join. Then it was SAQL time, I wrote a query to calculated the difference between start and end date in days and selected the appropriate date column to use a filter and it worked. The big idea is that when Pop_x_Month field is used in as filter where x = difference in months between start and end date selected by the user in the date picker, the results are restricted to the period prior to the user’s selected range.

dataset based on a worksheet that precalculated dates from preceding months

dataset based on a worksheet that precalculated dates from preceding months
Results from step_GD_date_range_diff_1 step
step_GD_date_range_diff_1
--date_Current_GD_1 is the name of my date picker step 
1 q_this = filter ds by 
{{row(date_Current_GD_1.selection,[0],["min","max"]).asDateRange("date('date1_Year', 'date1_Month', 'date1_Day')")}}; 

--step_GD_date_range_diff_1 is the name of my intermediate step that picks the 
correct date dimension to use to retrieve prior period data (see image above) 
2 q_last = filter ds by 
{{row(date_Current_GD_1.selection,[0],["min","max"]).asDateRange(cell(step_GD_date_range_diff_1.result,0,"pop_date_string"))}};

While this approach worked and met almost all the requirements, it did have a few limitations. It required an external file to provide the converted dates, it also added a lot of extra dimensions that put extra burden on the query engine.

Approach 2: dynamically translate date in a SAQL step

So I continued with my pursuit and decided to calculate these dates on the fly using SAQL. The trick with this approach is to use an intermediate step that calculates all the date ranges required for performing period over period calculations.

Intermediate step (Intermediate_Date_St_1)
1 q = load "myDataset";
-- the date filter is provided by the date picker with auto-binding
2 q = filter q by date('date_Year', 'date_Month', 'date_Day') 
in [dateRange([2018, 1, 1], [2019, 12, 30])];
3 q = group q by all;

4 q = foreach q generate 

-- extract the the lowest/min and highest/max date from the resultset
-- use date_diff function to calculate the difference in days

number_to_string(date_diff("day", toDate(min('date_sec_epoch'))
, toDate(max('date_sec_epoch'))) + 1,"#")  
as 'diff_month',
toDate(max('date_sec_epoch')) as 'current_max_date',
toDate(min('date_sec_epoch')) as 'current_min_date';

5 q = foreach q generate 'diff_month', 'current_max_date', 'current_min_date', 

-- convert the days difference into seconds by multiplying days with 86400
-- calculate the prior min date by subtracting the date difference (in seconds) 
from current min date in epoch seconds. Repeat the same for prior max date

toDate (date_to_epoch('current_min_date') -(string_to_number('diff_month') * 86400)) as 'prior_min_date', 
toDate (date_to_epoch('current_min_date') -(1 * 86400)) as prior_max_date;

-- format the calculated dated as an array that can be passed 
to dateRange function using format = "[yyyy,MM,dd]". 
-- Optionally, create additional fields using format = "MMM yyyy" 
to be used for user friendly messaging

6 q = foreach q generate 'diff_month', 
date_to_string('current_max_date',"[yyyy,MM,dd]") as 'current_max_date', 
date_to_string('current_min_date',"[yyyy,MM,dd]") as 'current_min_date', 
date_to_string('prior_min_date',"[yyyy,MM,dd]") as 'prior_min_date', 
date_to_string('prior_max_date',"[yyyy,MM,dd]") as 'prior_max_date';
7 q = limit q 2000;

I’ll skip the basic SAQL parts and focus on the key segments.

NOTE: When you place a date widget, let the dashboard auto bind it to this intermediate step (Intermediate_Date_St_1). Delete the date filter line from the query as the runtime will add it for you.

NOTE: the generate statement has been split into multiple statements for simplicity and should not have much impact on query performance.

We start by calculating the difference between the first and last date in the results of this query, which in turn is controlled by the date widget (see step 4 in the above query). The intermediate step (Intermediate_Date_St_1) is a way to dissolve the differences between relative date and absolute date selected by the user.

Once we have the number of days between the start and end date, we can calculate the prior min and max date (see step 4 in the above query). Then it is a matter of formatting it as array so that it can be used with the dateRange() function (see step 5). Now we use binding in our target query and combine the min and max dates as shown in the example below:


-- notice the dateRange() binding in the current_period stream
1 current_period = filter q by date('date_Year', 'date_Month', 'date_Day') in [dateRange(
{{cell(Intermediate_Date_St_1.result,0,"current_min_date").asString()}},
 {{cell(Intermediate_Date_St_1.result,0,"current_max_date").asString()}}
)];

-- notice the dateRange() binding in the prior_period stream
2 prior_period = filter q by date('date_Year', 'date_Month', 'date_Day') in [dateRange(
{{cell(Intermediate_Date_St_1.result,0,"prior_min_date").asString()}},
 {{cell(Intermediate_Date_St_1.result,0,"prior_max_date").asString()}}
)];

-- use the above stream examples in separate steps to calculate 
current numbers and prior numbers respectively 
(as shown in the final screenshot below)
-- alternatively, combine them in a single query using 
coGroup() to calculate period over period numbers

With this intermediate step pattern you can now calculate Current Period vs. Prior Period analysis using a single date widget to provide current date range and the rest is taken care of.

Let me know what you think of this approach. Stay tuned for the bonus approach to toggle between “vs. prior period” and “vs. same period prior year” in my next post. So long…

@kaushforce


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.