Considering Governance Limits – Data Orchestration Part 3
This blog is part of the Data Orchestration blog series. Having previously covered all the details around data sync and the local Salesforce connector, let’s now have a look at all the learnings as well as the role and responsibility of the analytics developer in a scenario.
Painting the Picture
Let’s consider a scenario wherein in one single Salesforce org we have both a Sales App and a Service App. The Sales App and the Service App have different objectives and are ultimately seeking to provide insight to different users. It is the task of the analytics developer to ensure that these objectives are met as well as be aware of the governance limits when considering data sync, dataflows, and how to set up the schedule for these to avoid conflicts, performance issues, or hitting limits.
Below you can see an illustration of the different stakeholders an analytics developer (Mary) is interacting with as well as the different aspects Mary needs to consider when implementing the flow of data when setting up the Sales and Service App.
We will walk through all the considerations, however, let’s first consider the objects of the two analytics apps.
Objectives of Sales App
Chris, the sales manager, has decided to use the Sales App, one of the templated apps, as it provides the insight he needs into the sales pipeline, opportunity trending information, whitespace analysis, etc. based on the sales data in Salesforce. Installing the app will create a series of dashboards and a dataflow, which transforms Salesforce data into datasets.
Key Objects Used: Accounts, Opportunities, Users, Products, Tasks, Events, Roles, Cases, PricebookEntry, OpportunityLineItem, Queue, Activities, Leads, Campaigns, Campaign Members, Opportunity Splits, Product Schedules.
Note: For more information on the app and what is included in it check out the Sales Analytics App in the Salesforce documentation.
Objectives of Service App
Lucy, the head of customer support, has decided to make use of the Service App template to make it easy for service managers and agents to use data to drive the success of the service business. Installing the app will create a series of dashboards and a dataflow, which transforms Salesforce data into datasets.
Key Objects Used: Contact, Opportunity, Task, Case, Account, UserRole, RecordType.
Note: For more information on the app and what is included in it check out the Service Analytics App in the Salesforce documentation.
Documenting your flow of data
Consideration should go into every dataflow or recipe that is being built – even those that come from templated apps. As an analytics developer, instead of going straight to building out a flow of data, you should create a process document that as a minimum aims to cover the following details:
- The objective of the dataflow or recipe.
- Who would be the owner of the dataflow or recipe?
- The required frequency of the data.
- Dataflow schedule time.
- Who would be end-users of the datasets?
- Data dictionary.
- Dataflow or recipe version management.
- The common naming convention to be followed.
This document should also list down the process, procedures, and conventions about how the dataflow would be updated, modified, tested, and maintained.
Establishing this document at the beginning ensures that the maintenance of the dataflow or recipe in question is easy. In addition, you are eliminating the dependency on few individuals who are able to maintain and update the dataflow.
Role of Analytics Developer
In order to be able to create the process document mentioned in the above section, the analytics developer needs to engage with the stakeholders that will be using the analytics apps to understand the requirements but also manage expectations. Mary, who is the analytics developer, has been speaking to Chris and Lucy and it is crucial she ensures she understands the business outcomes of both the Sales and Service Apps, so she are able to meet them when building out the analytics apps. Mary needs to be aware that users of the analytics apps have different roles and different roles might have different expectations for the app. So Mary has also been speaking to Chad, a regional director, who wants to have a holistic view of both sales and service.
Before Mary can start building out the data, she would have to have a clear understanding of the requirements and establish the process document described above. Mary, would also have to be aware of the governance limits and take them into consideration when designing the data and having discussions with her stakeholders to manage expectations. Below are some of the key governance limits to keep in mind:
- The maximum number of dataflow and recipe runs in a rolling 24-hour period: 60
- The maximum number of objects that can be enabled for data sync: 100
- The maximum number of concurrent dataflow runs: 2 (2 for production orgs with the Einstein Analytics Plus platform license, 1 for production orgs with the Einstein Analytics Growth platform license or sandbox orgs)
Note: We will be going into more details around the governance limits for dataflow and recipe runs in part 5 of this blog series.
Things to keep in mind
As mentioned above Mary, should keep the governance limits in mind as she starts to build any solution for the business teams. This also means she shouldn’t create a dataflow or recipe for every little project down the line as this can quickly cause complications when scheduling the dataflow or recipe. Generally speaking, Mary is good to remember that:
- Dataflows don’t need to be dedicated to creating datasets to answer just one team’s requirements. This will quickly result in hitting the 100 dataflow limit plus it can be difficult to schedule and prioritize dataflow or recipe runs.
- The data sync and dataflow schedule should be considered while creating dataflows as this should impact how you structure the dataflows but also you want to manage expectations with stakeholders in terms of data refresh.
For Mary, this means she shouldn’t automatically conclude that the Sales App and Service App should be two independent dataflows. In fact, there might be requirements that overlap between the two teams and some datasets can be leverage by both teams. This would optimize the design of data while keeping governance limits in mind. Hence it is absolutely crucial that Mary understands all the requirements of both teams to be able to make the best data design decisions.
Segregate Between Master Data and Transactional Data
Keeping all the limits in mind Mary decides to segregate the objects into two buckets; master data and transactional data. As we learned previouslu in this blog series, master data is all the data that is not changing with a high frequency and transactional data is the data that changes often. Mary wants to segregate them so that she can have different connections and then have data refreshed at different interval.
The objects Mary has defined as master data is:
- Product Schedules.
The objects Mary has defined as transactional data is:
The below image shows how we can have separate connections for the master data objects and the transactional data objects. Notice that we have the periodic full sync for the master data objects as we know that the data in these objects does not change much, and the periodic full sync ensures that there is no data drift, as we will get a full data sync once a week.
For the transactional data objects, we are doing an incremental sync. We will only pull new, updated, and deleted records to match the changes in the Salesforce object since the previous sync. When we use this method the data sync will run faster as not all cached data is refreshed unlike the full data sync.
Once a second connector has been created and the synced objects split by master and transactional data as mentioned above Mary can define the schedule for the two connectors. One of the connectors will contain data that should be refreshed with low frequency where the other will contain data that should be refreshed with a high frequency. With the requirements in mind Mary decides that:
- Connection 1: low frequency with master data – refreshed every day
- Connection 2: high frequency with transactional data – refreshed every hour.
With this example hopefully it’s clear that as an analytics developer when you setup the flow of data you need to be clear on the business requirements while keeping governance limits in mind and by that also manage expectations with your business users.
In the next part of this blog series, we will look at data from external data sources and data volume. Or head back to review the other blogs in the Data Orchestration blog series.