External Data and Data Volume – Data Orchestration Part 4
This blog is part of the Data Orchestration blog series. One of the reasons why Tableau CRM is so powerful is you can join Salesforce data with external data to get deeper insight. In this part, we will take a look at considerations you should have when bringing in external data followed by considerations around data volume.
Objects From External Connections
The data that we need for analysis could very well be residing outside of Salesforce and with the help of connectors, we can get that data into Tableau CRM. The number of connectors and type of connectors is changing with each release, hence it is highly recommended you to check out the official documentation to get the most updated list of connectors available.
Note: For the most current list of connectors available check the documentation for Data Connectors.
Types of External Connectors
There are various types of external connectors and each of these connectors will have its own set of limitations, which you should familiarize yourself with before leveraging them. Below is the list of various types of external connectors currently available.
- External Salesforce Connector: Salesforce External, Salesforce Marketing Cloud Contacts OAuth 2.0, Salesforce Marketing Cloud Contacts, Customer 360 Global Profile Data Connection (Beta)
- Application Connectors: Google Analytics, Microsoft Dynamics CRM, NetSuite, Oracle Eloqua
- Database Connectors: Amazon RDS, Amazon Redshift, Google BigQuery, Google BigQuery Standard SQL, Google Cloud Spanner, Heroku Postgres, Microsoft Azure SQL Data Warehouse, Microsoft Azure SQL Database, SAP HANA, Snowflake
- Analytics Mulesoft Connectors: Analytics Mulesoft Microsoft SQL Connector, Analytics Mulesoft MySQL Connector, Analytics Mulesoft Oracle Connector
- Object Store and No SQL Connectors: Amazon S3
- Pilot and Beta Connectors
Note: For the objects coming from external connections, the object connection mode is always Full Sync.
Note: Most connectors can sync up to 100 million rows or 50 GB per object, whichever limit it reaches first.
While the external limits have their own set of limitations they do work the same way as your local connector and can be scheduled to sync objects ready to be used in your dataflow or recipe.
Getting Data From External Connections
The maximum number of objects that we can sync is 100. This includes both the internal objects (SFDC Local) and objects coming from external connections. Hence it is important that we keep a check on the object limit as we could very quickly hit the limits and would be forced to remove objects from the data sync and ultimately rethink the data orchestration. Before we can remove an object from the data sync, we will have to ensure that there is no dataflow or recipe that is referencing that same object. Because of this, it can be complex and time-consuming to fix if this limit is hit.
Note: The maximum number of Objects we can sync is 100. This includes the objects from SFDC_Local connection as well as the Objects coming from external connections.
Use of Views or Interface Tables instead of Individual Objects
As we just discussed above it is important to keep a check on the number of objects that are being synced but the same can be said about reducing the number of rows that need to be pulled in via the external connections. To achieve this we can use views or interface tables to combine data from different objects to cut down the connected objects needed, add logic to filter the number of rows and use transformations such as casting, formatting, or concatenation to prepare the data fields on extract. The image below shows the advantage of using views, we get the required data into Tableau CRM as one object rather than have three objects which have the normalized full row data.
Pre-Process External Data Before Data Sync
Quite often we need to harmonize the data from external sources whether it be values, formats, or types so it can be used with data from other sources. Pre-processed data makes it easy for us to further combine the data that is coming in from other sources using dataflows and recipes, for example, we have a specific date format for transaction date or we have different account names in different systems but we of course wish to use only one set as master. It is generally faster to do this on the extract in other words before it reaches our dataflow. This ensures that the dataflow or recipe run time is shorter and more definitive as we do not have to accommodate for these modifications while preparing the data for datasets.
Managing Data Volume
Once we have decided the objects we need from both local and external objects, as well as the data sync frequency, we can choose what records to bring over including the specific fields that we want from each object for our analysis. By choosing the records and fields we need, we can control the data volume that is getting synced from each object. It is important to remember that all the records or all of the fields in an object are rarely required for your analysis and note that higher volume will impact your performance. We need to keep in mind that Tableau CRM should not be treated as a data repository.
Choosing Right Fields
When deciding what fields to include in the data sync, it is important to only choose the fields which are relevant for the analysis as this helps reduce the sync time as well as dataflow processing time, hence overall performance is impacted. As a general rule you should not include fields with long text, description fields or any free text as they would not add much value to your analysis (unless of course you are using sentiment analysis). Finally, you should keep in mind there is a 5,000 field limit per record that can be synced which includes 1,000 date fields.
Note: Information on how to Add and Remove Local Salesforce Objects and Fields from Data Sync.
Note: The maximum number of fields in a record: 5,000 (including up to 1,000 date fields).
Data Sync Filters
In some cases all data from an object may not be required, maybe records older than five years are not relevant for your analysis. In these cases, we can and should apply a filter on the object to just get the data that is needed. Earlier we saw how we can just select the required fields from an object to make the sync efficient and further by setting a data sync filter on the object we can make the sync even more efficient.
The data sync filter helps in filtering the data according to the defined criteria, hence we are only syncing the data which we need. This will also make the data sync faster as we are limiting the number of records that we bring over. Please note that the filter you set at the object level applies to all the dataflows and recipes that are referring to these objects. This ensures that the data considered for analysis is consistent across the organization.
Note: For more details check the documentation Filter Data from Objects.
A typical example is if your Salesforce environment is older you probably have years worth of opportunity data. Let’s say you are only interested in analyzing the data for the last five years opportunities for all your dashboards but you have data from the past 10 years. You could then set an object filter to ensure that you are getting just the last five years of data into the cache.
Some of the external connectors also allow you to apply a filter on the objects. In the image below we see the settings for filtering on external data in this case Redshift. Please check the documentation for the connector to see if filtering is supported as well as the language used to define the filter.
There can be instances where the filters on connectors would not work. One example is that query could time out due to the more complex query that is run against the data and as a result, the sync would fail, in such cases, one should resort to doing a full sync. The product team is constantly evaluating the connectors and working to improve the way these filters work with each release.
Considerations when Syncing Incrementally
Now, that we know the different types of data sync and the importance of data sync filters, let’s talk about some limitations of incremental sync and how we can use filters to accommodate it. There are few objects which do not support Incremental sync. And in these cases, we can use the object filters to ensure that we only have the data that is required before the object gets synced. Thereby we limit the data to be synced and the process will complete faster.
Note: Check the help documentation for a list of objects that do not support incremental sync.
Let’s look at an example with the “Campaign Member” object, this object is not supported for incremental sync. The campaign member object data is to be considered transactional data and hence the data volume is huge. But we might not need the entire data that is present in the Campaign Member object. We can make use of the data sync filters and filter only the data we need from the Campaign Member object. This ensures that the data sync is faster despite the object doing a full sync each time there the connection is run.
In the next part of Data Orchestration blog series, we will look at considerations for maintaining your data sync. This will be followed by looking at dataflows and recipes including the difference and scheduling. Or head back to review the other blogs in the Data Orchestration blog series.