Export node details of a dataflow job

When you are in your Data Manager and checking the data monitor you get a list of all the jobs that have recently run in your Tableau CRM (Einstein Analytics) org. When looking at how the jobs are running – especially the dataflows – you can expand the job and see all the nodes that have been run. This is helpful for many reasons first of all if a dataflow failed you can quickly identify which node you need to correct, but it is also beneficial when you want to look at which nodes are taking a long time to run and you want to identify opportunities for improvements.

Having this detail in the Data Monitor is great, but sometimes it’s nice to export these details for further details or maybe even a backup. In this blog, I will walk through how this is possible with Mohan Chinnappan‘s analytics plugin. Please check out this blog for details on how to install or update the plugin.

Note: this blog is using the following version sfdx-mohanc-plugins 0.0.122. To see the latest details around the command check out github.

The dataflow jobs timing command

The main command for this blog is the dataflow jobs timing command. Let’s have a look at the options for the command by using the following:

sfdx mohanc:ea:dataflow:jobs:timing -h

Let’s have a closer look at the options for this command.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:timing -u <insert username>

--Example
sfdx mohanc:ea:dataflow:jobs:timing -u rikke@demo.org

Dataflow job id

Use the -j option to specify a dataflow job id to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:timing -u <insert username> -j <insert dataflow job id>

--Example
sfdx mohanc:ea:dataflow:jobs:timing -u rikke@demo.org -j 03CB000000383oAMAQ

The dataflow job list command

To use the dataflow jobs timings command we need to have a dataflow job id, which we can get by using the dataflow job list command. To get the option for this command enter the following:

sfdx mohanc:ea:dataflow:jobs:list -h

Let’s have a closer look on the option for this command.

Username

Use the -u option to specify a username to use for your command.

--The option
sfdx mohanc:ea:dataflow:jobs:list -u <insert username>

--Example
sfdx mohanc:ea:dataflow:jobs:list -u rikke@demo.org

Export dataflow job details

Having looked at the dataflow jobs timing command as well as the dataflow jobs list command to get the dataflow job id, let’s have a look at steps to get a deeper look at how a given dataflow performs.

Note: Before using the load command you would have to log in to the desired org by using the command sfdx force:auth:web:loginwhich will launch the login window in a browser.

Step 1 – use the dataflow:jobs:list command to extract the list of jobs run in the org.

sfdx mohanc:ea:dataflow:jobs:list

Step 2 – define the username for the target org by adding the -u option.

sfdx mohanc:ea:dataflow:jobs:list -u rikke@discovery.gs0

Step 3 – press enter.

Step 4 – find the dataflow job you want to export the details from and copy the id. I am saving the id in a text editor. Note that you see dataflow and data sync in the list, so it may be a long list. Essentially this list is identical to what you see in the Data Monitor in the Data Manager.

Step 5 – use the dataflow:jobs:timing command to export the timing and node details from a dataflow job.

sfdx mohanc:ea:dataflow:jobs:timing

Step 6 – define the username for the target org by adding the -u option.

sfdx mohanc:ea:dataflow:jobs:timing -u rikke@discovery.gs0

Step 7 – define the dataflow job id from previously using the -j option.

sfdx mohanc:ea:dataflow:jobs:timing -u rikke@discovery.gs0 -j 03CB000000383oAMAQ

Step 8 – press enter.

Once the command is done you will see three files being generated:

  1. A JSON file with the dataflow job id followed by ‘timing’ as the name.
  2. A CSV file with the dataflow job id followed by ‘timing’ as the name. This file is unformatted.
  3. A CSV with the title ‘DFTiming’, which has been formatted to be uploaded to Tableau CRM to visualize the timings of nodes.

The CSV files should automatically open on your computer, but if it doesn’t locate it (check the exact naming in the command window), open it up and you will see all the details for the job.

A sample of the unformatted CSV file.
A sample of the formatted CSV file purposed for visualization in Tableau CRM.

Visualizing dataflow timings

You can take the file DFTiming.csv and upload it to Tableau CRM to visualize how each nodes is performing. You can either upload this within the platform in Analytics Studio or Data Manager. However, you can also leverage the dataset load command from the plugin. For the later please refer to the blog Uploading datasets via CLI, which walks through all the steps.


3 thoughts on “Export node details of a dataflow job”

  • Avatar 1
    Evan+Emerson on November 20, 2020 Reply

    All of these recent blogs regarding Mohan’s additions to the tool have been incredibly helpful and incredibly insightful, thank you!!!

  • Avatar 2
    Kamil on November 25, 2020 Reply

    Hi Rikke,

    is there any way how to automate this steps?

    • Avatar 3
      Rikke on November 25, 2020 Reply

      Should be possible with code.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.