Export your Einstein Analytics datasets

5
(4)

Have you been in a situation that you need to download a dataset from Einstein Analytics? It may be to use it in another system or simply to have a back up of your data? It is actually possible to do and Mohan Chinnappan has made it even easier with his Dataset Export Utils. This blog will cover how to use this tool, however, it’s assumed you have experience with SalesforceDX CLI, if not check out this trail from Trailhead.

Note: You need to have installed the Salesforce Command Line Interface (CLI). Check out this Trailhead module to install the CLI.

Creating a dataflow

Before we get to the actual export of data we need to have a dataflow. This dataflow can be complex or simple, as long as you have the data that you want. In the example I am going to use I will keep it simple and just use two nodes or data transformations; edgemart to bring in the dataset and export to export it.

Note: You will only have the export node if you have Einstein Discovery (Einstein Analytics Plus licenses).

Head to your data manager, to set up your dataflow. I have created a brand new dataflow, but you can easily do this in existing dataflows as well – if you are using this for a production environment and not a demo org make sure this works with your general data orchestration.

As mentioned my dataflow is simple. First I am adding an edgemart node or transformation to bring in an existing dataset of opportunities, all I need to do is give my node a name and pick the dataset I want to use. Next, I choose the export node to be able to export my data as a CSV file. All I have to do here is give my node a name, choose the edgemart as my source, and define a user to perform the action. You can leave the target as is “Einstein Discovery”. See the steps I took below.

Note: The user chosen in the export node must have the Einstein Analytics Admin permission set.

You may wonder why the target is Einstein Discovery. Well, this node is a bit of a legacy thing. Previously Einstein Discovery was not part of Analytics Studio and users had to go elsewhere in the platform to make their predictions, but the data manager was a powerful tool to shape data before making predictions, hence the export tool is a way where you shaped your data but enabled Einstein Discovery to use the data by exporting the CSV file to Salesforce Core where it could be picked up. While Einstein Discovery retired this mechanism, we can now use it for downloading full datasets.

Where is your data now?

When your dataflow runs the data is exported to a sObject and stored for 48 hours, hence you need to grab it before then. Your dataset is split into several parts if it exceeds 32 MB and you would need to grab all parts to get all your data. Let’s have a quick look in WorkBench how it looks.

In order to download the dataset, we need to know the id of the dataset we exported. To do this you can run a SOQL query selecting the DatasetExport object. The main thing you want to include in your query is Id and PublisherInfo so you know what to extract and of course which row is the relevant one. See the query and steps below.

SELECT Id,PublisherInfo FROM DatasetExport

Remember that the file can be split into several parts. Hence to get the actual data we need to query a different object the DatasetExportPart. We can apply the Id from the query we just ran as a filter. You will end up with a query similar to this:

SELECT DatasetExportId,Id,Owner,PartNumber FROM DatasetExportPart WHERE DatasetExportId = '0PxB0000000TOnXKAW'

With the part id(s) noted down we can now use the REST Explorer and the GET function to grab the data. The path we have to use is:

/services/data/v48.0/sobjects/DatasetExportPart/<InsertPartId>/DataFile

You need to replace <InsertPartId> with the id we just found by querying the DatasetExportPart object. As my id was “0PyB0000000TPsdKAG”, for me it will look like this:

/services/data/v48.0/sobjects/DatasetExportPart/0PyB0000000TPsdKAG/DataFile

Clicking “Execute” you will get the data, which you can copy.

Note: If you have multiple parts you would need to repeat this step for each part and append the data afterward.

As you can see this is a very manual process, so let’s look at the Dataset Export Utils as mentioned in the introduction of the blog.

Installing the plugin

Before we install the plugin you can find all the details about it here including a list of all the commands offered in this plugin.

Note: You need to have node.js installed to leverage this plugin – download it from https://nodejs.org/en/download/.

In your command window (I am using Mac’s Terminal where I have already authenticated the org I will be using) enter the following command:

sfdx plugins:install sfdx-mohanc-plugins

When prompted to confirm the installation simply enter y and the installation kicks off. In the end, I fire off a command to confirm the installation has completed successfully.

--To see if the new plugin is installed successfully
sfdx plugins

Okay with the plugin installed what commands can you use? There are two commands to highlight the exportList and the export. The easiest way to understand what they are is by entering the help option, which we will look at in the following section.

exportList command

This command is useful for seeing all the export ids available to use when exporting your dataset. To see the exportList options enter the following in the command window:

sfdx mohanc:ea:dataset:exportList -h

This will give a list of options available for the exportList command as seen in the image below.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataset:exportList -u 

--Example
sfdx mohanc:ea:dataset:exportList -u rikke@demo.org

export command

This command is what you will use to grab the files we exported in the dataflow. To see the export options enter the following in the command window:

sfdx mohanc:ea:dataset:export -h

This will give a list of options available for the export command as seen in the image below.

Let’s try to put these options to use by looking at the most common options for exporting your datasets.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataset:export -u <insert username>

--Example
sfdx mohanc:ea:dataset:export -u rikke@demo.org

Export id

Use the -e option to specify the export id to grab. This refers to the DatasetExport object and the id we previously queried in Workbench. Note you can leave the -e option out and instead of taking a specific id it takes the latest export.

--The option
sfdx mohanc:ea:dataset:export -u <insert username> -e <insert DatasetExportId>

--Example
sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW

It is also possible to define the file path, name, and extension, which I will show in the demo below.

Note: Before using the plugin make sure the authenticate the org you want to use by running the command sfdx force:auth:web:login, which will open up your browser and prompt you to login.

Viewing exports available

Before we can export our data we may need to find the relevant export id (DatasetExport Id) to use, especially if you have multiple export nodes across your dataflows. Of course, if you are not interested in using the -e option in the export command then you can skip this part. Regardless of using the exportList command we can easily find the DatasetExport Id, Owner Id, and Export Node Name. Let’s have a look at the steps to take.

Step 1 – use the exportList command from the plugin

sfdx mohanc:ea:dataset:exportList

Step 2 – define the username to use by adding the -u option

sfdx mohanc:ea:dataset:exportList -u rikke@demo.org

As you can see from the image above the command triggers a list of the DatasetExport ids available, but it also adds the owner id which was defined in the user parameter in the export node as well as the name of the export node. Hence the first id is the DataExport id, second is the owner id and third is the node name.

Looking at the result above I am interested in the dataset that is coming from the node export_Opportunities. All we need from the string is the first id, which is highlighted below. The rest is mere attributes to identify the dataset export.

0PxB0000000TOnXKAW,03CB0000002rbwTMAQ:export_Opportunities

Exporting your dataset with the Dataset Export Utils

Having found the dataset export we are interested in, let’s look at how we export the dataset we created with the dataflow.

Taking the options from before into consideration let’s construct the command we want to use.

Step 1 – use the export command from the plugin

sfdx mohanc:ea:dataset:export

Step 2 – define the username to use by adding the -u option

sfdx mohanc:ea:dataset:export -u rikke@demo.org

Step 3 – as I had multiple exports in my org I want to specify the DatasetExport id by adding the -u option. But remember you can leave this out and just get the latest exported dataset.

sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW

Step 4 – I could technically use the above command, however, that will result in my dataset being printed in the command window, I would much rather have a csv file. Hence I am going to add the path including the name and extension of my file.

sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW > Downloads/Blog/Blog-Opportunities.csv

And that’s it, that is how you can export your datasets from Einstein Analytics. It is worth mentioning that this plugin doesn’t have a limit in file size as all parts from the export are automatically downloaded and joined together.

If you want to import the dataset to another org, check out the blog on the load command.

How useful was this post?

Click on a star to rate useful the post is!

Written by


21 thoughts on “Export your Einstein Analytics datasets”

  • 1
    Alicia White on October 8, 2020 Reply

    Hi Rikke, This is really helpful! I have found a bug with the plugin and am not sure where to report it so hopefully you can pass this on if this is not the right place. When I use the sfdx mohanc:ea:dataset:export command, the files are joined together but not by order of their PartNumber. DatasetExportPart 1 always has the column headers, so if it is not the first in the combined file, the data gets out of sort and doesn’t fall under the right heading. Appreciate if you can help pass this on to Mohan. Thanks!

    • 2
      Rikke on October 9, 2020 Reply

      Thanks for letting me know. Try and update the plugin, there should be an order build in the latest version.

      • 3
        Alicia White on October 9, 2020 Reply

        Thanks! I updated the plugin and it looks like that solved the issue.

        • 4
          Rikke on October 10, 2020 Reply

          That’s great!! Thanks for letting us know.

  • 5
    William Peck on October 9, 2020 Reply

    Glad to have found this place, as I am wrestling with Salesforce (data querying).

    But for point of reference, if I wanted to export (less than a million rows) from Oracle, I would write a simple query in PL/SQL Developer, then export to Excel. Boom. Done.

    • 6
      Terry Wilson on January 27, 2021 Reply

      William, your point of reference does not really apply, TCRM datasets are not a transactional data mutable store , it is an end user data index made for aggregate BI queries. at scale and performance in the cloud. So the comparison is not an appropriate one. We know that you could not scale thousands of user on billions of rows of records to to query the excel Excel which is what our datasets are designed for. In general, we want people to use the data in TRCM not export it to external outside systems with lack of security and freshness.

  • 7
    Kamil on November 25, 2020 Reply

    Hi Rikke,

    is there any way how to automate this steps?

  • 8
    Sidhu on February 23, 2021 Reply

    Hi Rikki,

    Is there a way to get the Dataset rows (similar like restore data) count history in workbench ?

  • 9
    Pranit Bhisade on March 11, 2021 Reply

    Hi Rikke,
    Thank you for the blog, I have a question, so is the limit of 32MB applies for data which we are exporting from a report(Values Table in TableauCRM) , I have read that only 10k rows are allowed to download as a csv, or excel file. Can we any how increase it. What could be the maximum downloadable size for a report in EA.
    Best Regards,
    Pranit.

  • 10
    Luca Miglioli on March 24, 2021 Reply

    Hi Rikke,

    thanks for the post! Unfortunately I’m facing some error while exporting my dataset (rows: 30 millions):

    “The “chunk” argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Object”.

    May you please help?

    Thanks a lot!

    • 11
      Luca Miglioli on March 24, 2021 Reply

      Nevermind, resolved: I unchecked the “Specify CSV partition settings” in the Recipe

    • 12
      Daniel on October 4, 2022 Reply

      hi, I am getting the same issue, I have unchecked the “Specify CSV partition settings”, I am trying to export 27 million rows
      Thanks!

  • 13
    andy on August 3, 2021 Reply

    You save my day Rikke

    Anyway, do you know how to delete DatasetExport?

    Thanks!

  • 14
    Kenneth Torres on October 14, 2021 Reply

    One quick question, when I’m pulling data out of Tableau CRM I’m noticing that the data is a few days older than the dataflow we created above. Do you have any idea why that is?

  • 15
    Matteo on November 25, 2021 Reply

    Hi, is there any way how to schedule this steps? thanks for your help!

  • 16
    Archie on April 19, 2022 Reply

    Looks like even Salesforce help documentation found this blog entry useful

    https://help.salesforce.com/s/articleView?id=sf.bi_integrate_recipe_output_results_csv.htm&type=5

  • 17
    Yonas Mekuria on April 20, 2022 Reply

    Is there a way to automate this?

  • 18
    Ann Marie on September 7, 2022 Reply

    Thank you so very much. This is the first tutorial that has worked for us. I love that this process gives you a .csv with all of the rows instead of separate, chunked up ones. Thank you again!

  • 19
    Paul Liu on December 4, 2022 Reply

    Check out my new free Recipe Output for CRM Analytics app, a lot of questions and issues mentioned in the comments have been addressed in my app, enjoy it!
    https://appexchange.salesforce.com/listingDetail?listingId=a0N4V00000Hs4jqUAB

    Click here for the Demo Video
    https://youtu.be/Wmo43ZONLaw

    Click here for the Configuration Guide – Exporting CRM analytics Recipe Output CSV Files Made Easy
    https://www.linkedin.com/pulse/exporting-salesforce-crm-analytics-recipe-output-csv-files-paul-liu

    #Tableau CRM #Einstein Analytics

  • 20
    Nilu on March 23, 2023 Reply

    Hi,

    Is there any plugin/commands available to find field usage from receipe now?

  • 21
    dave on June 27, 2023 Reply

    Any idea how to change the location of the output json and .csv file? Default location appears to be windows\system32 folder which I can’t write too, so the output files fail to be written.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.