Export your Einstein Analytics datasets

Have you been in a situation that you need to download a dataset from Einstein Analytics? It may be to use it in another system or simply to have a back up of your data? It is actually possible to do and Mohan Chinnappan has made it even easier with his Dataset Export Utils. This blog will cover how to use this tool, however, it’s assumed you have experience with SalesforceDX CLI, if not check out this trail from Trailhead.

Creating a dataflow

Before we get to the actual export of data we need to have a dataflow. This dataflow can be complex or simple, as long as you have the data that you want. In the example I am going to use I will keep it simple and just use two nodes or data transformations; edgemart to bring in the dataset and export to export it.

Note: You will only have the export node if you have Einstein Discovery (Einstein Analytics Plus licenses).

Head to your data manager, to set up your dataflow. I have created a brand new dataflow, but you can easily do this in existing dataflows as well – if you are using this for a production environment and not a demo org make sure this works with your general data orchestration.

As mentioned my dataflow is simple. First I am adding an edgemart node or transformation to bring in an existing dataset of opportunities, all I need to do is give my node a name and pick the dataset I want to use. Next, I choose the export node to be able to export my data as a CSV file. All I have to do here is give my node a name, choose the edgemart as my source, and define a user to perform the action. You can leave the target as is “Einstein Discovery”. See the steps I took below.

Note: The user chosen in the export node must have the Einstein Analytics Admin permission set.

You may wonder why the target is Einstein Discovery. Well, this node is a bit of a legacy thing. Previously Einstein Discovery was not part of Analytics Studio and users had to go elsewhere in the platform to make their predictions, but the data manager was a powerful tool to shape data before making predictions, hence the export tool is a way where you shaped your data but enabled Einstein Discovery to use the data by exporting the CSV file to Salesforce Core where it could be picked up. While Einstein Discovery retired this mechanism, we can now use it for downloading full datasets.

Where is your data now?

When your dataflow runs the data is exported to a sObject and stored for 48 hours, hence you need to grab it before then. Your dataset is split into several parts if it exceeds 32 MB and you would need to grab all parts to get all your data. Let’s have a quick look in WorkBench how it looks.

In order to download the dataset, we need to know the id of the dataset we exported. To do this you can run a SOQL query selecting the DatasetExport object. The main thing you want to include in your query is Id and PublisherInfo so you know what to extract and of course which row is the relevant one. See the query and steps below.

SELECT Id,PublisherInfo FROM DatasetExport

Remember that the file can be split into several parts. Hence to get the actual data we need to query a different object the DatasetExportPart. We can apply the Id from the query we just ran as a filter. You will end up with a query similar to this:

SELECT DatasetExportId,Id,Owner,PartNumber FROM DatasetExportPart WHERE DatasetExportId = '0PxB0000000TOnXKAW'

With the part id(s) noted down we can now use the REST Explorer and the GET function to grab the data. The path we have to use is:

/services/data/v48.0/sobjects/DatasetExportPart/<InsertPartId>/DataFile

You need to replace <InsertPartId> with the id we just found by querying the DatasetExportPart object. As my id was “0PyB0000000TPsdKAG”, for me it will look like this:

/services/data/v48.0/sobjects/DatasetExportPart/0PyB0000000TPsdKAG/DataFile

Clicking “Execute” you will get the data, which you can copy.

Note: If you have multiple parts you would need to repeat this step for each part and append the data afterward.

As you can see this is a very manual process, so let’s look at the Dataset Export Utils as mentioned in the introduction of the blog.

Installing the plugin

Before we install the plugin you can find all the details about it here including a list of all the commands offered in this plugin.

Note: You need to have node.js installed to leverage this plugin – download it from https://nodejs.org/en/download/.

In your command window (I am using Mac’s Terminal where I have already authenticated the org I will be using) enter the following command:

sfdx plugins:install sfdx-mohanc-plugins

When prompted to confirm the installation simply enter y and the installation kicks off. In the end, I fire off a command to confirm the installation has completed successfully.

--To see if the new plugin is installed successfully
sfdx plugins

Okay with the plugin installed what commands can you use? There are two commands to highlight the exportList and the export. The easiest way to understand what they are is by entering the help option, which we will look at in the following section.

exportList command

This command is useful for seeing all the export ids available to use when exporting your dataset. To see the exportList options enter the following in the command window:

sfdx mohanc:ea:dataset:exportList -h

This will give a list of options available for the exportList command as seen in the image below.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataset:exportList -u 

--Example
sfdx mohanc:ea:dataset:exportList -u rikke@demo.org

export command

This command is what you will use to grab the files we exported in the dataflow. To see the export options enter the following in the command window:

sfdx mohanc:ea:dataset:export -h

This will give a list of options available for the export command as seen in the image below.

Let’s try to put these options to use by looking at the most common options for exporting your datasets.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataset:export -u <insert username>

--Example
sfdx mohanc:ea:dataset:export -u rikke@demo.org

Export id

Use the -e option to specify the export id to grab. This refers to the DatasetExport object and the id we previously queried in Workbench. Note you can leave the -e option out and instead of taking a specific id it takes the latest export.

--The option
sfdx mohanc:ea:dataset:export -u <insert username> -e <insert DatasetExportId>

--Example
sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW

It is also possible to define the file path, name, and extension, which I will show in the demo below.

Note: Before using the plugin make sure the authenticate the org you want to use by running the command sfdx force:auth:web:login, which will open up your browser and prompt you to login.

Viewing exports available

Before we can export our data we may need to find the relevant export id (DatasetExport Id) to use, especially if you have multiple export nodes across your dataflows. Of course, if you are not interested in using the -e option in the export command then you can skip this part. Regardless of using the exportList command we can easily find the DatasetExport Id, Owner Id, and Export Node Name. Let’s have a look at the steps to take.

Step 1 – use the exportList command from the plugin

sfdx mohanc:ea:dataset:exportList

Step 2 – define the username to use by adding the -u option

sfdx mohanc:ea:dataset:exportList -u rikke@demo.org

As you can see from the image above the command triggers a list of the DatasetExport ids available, but it also adds the owner id which was defined in the user parameter in the export node as well as the name of the export node. Hence the first id is the DataExport id, second is the owner id and third is the node name.

Looking at the result above I am interested in the dataset that is coming from the node export_Opportunities. All we need from the string is the first id, which is highlighted below. The rest is mere attributes to identify the dataset export.

0PxB0000000TOnXKAW,03CB0000002rbwTMAQ:export_Opportunities

Exporting your dataset with the Dataset Export Utils

Having found the dataset export we are interested in, let’s look at how we export the dataset we created with the dataflow.

Taking the options from before into consideration let’s construct the command we want to use.

Step 1 – use the export command from the plugin

sfdx mohanc:ea:dataset:export

Step 2 – define the username to use by adding the -u option

sfdx mohanc:ea:dataset:export -u rikke@demo.org

Step 3 – as I had multiple exports in my org I want to specify the DatasetExport id by adding the -u option. But remember you can leave this out and just get the latest exported dataset.

sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW

Step 4 – I could technically use the above command, however, that will result in my dataset being printed in the command window, I would much rather have a csv file. Hence I am going to add the path including the name and extension of my file.

sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW > Downloads/Blog/Blog-Opportunities.csv

And that’s it, that is how you can export your datasets from Einstein Analytics. It is worth mentioning that this plugin doesn’t have a limit in file size as all parts from the export are automatically downloaded and joined together.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.