Next-gen Visual Data Prep: What’s Happening with Dataflows and Recipes?

5
(8)

If you’ve been keeping track, you may have noticed there’s been a lot of investment in each release to enhance features in Data Prep Recipes, the visual/low-code solution for preparing and transforming your data. If you caught the Summer 22 release preview, the new Data Manager with a brand new look is now generally available, the dataflow to recipe converter tool is Beta, and there are a whole lot of new features coming to Data Prep such as Date Configurations, Staged Data, new Multi-value functions, and a slew of enhancements to our recipe editor including new data sampling options, relative date filtering, enhanced global search, and more! With the new Data Manager, you may have noticed that some of the Dataflow assets such as the dataflow list view and the ability to create new dataflows have now moved into the Data Manager (Legacy) section.

This can be nerve-wracking news to admins who have yet to learn Data Prep or admins of organizations who have a lot of Dataflows that will need to be migrated to Recipes. How do you get up to speed with Data Prep? How do you move your dataflows to Recipes? What is the end-of-life (EOL) timeframe? We saw these questions appear online and heard them asked live during our webinars, and now we’re excited to be able to provide further guidance on what to expect going forward.

Here are FAQs on this topic from the CRMA Data Platform team as well as a few currently available resources to get you going on your Data Prep journey.

Wait, can you catch me up on what’s been happening? What’s everyone talking about?

CRMA Data Platform currently has a few overlapping options for generating datasets; (1) dataflows, (2) the dataset builder, and (3) recipes. Although recipes and dataflows both prepare data, each approach offers a unique set of transformations that manipulate data. Dataflows and recipes aren’t mutually exclusive, and you can use both to meet complex data preparation requirements. For example, you can use a dataflow to generate an intermediate dataset, and then use that dataset as the source for a recipe to perform additional transformations.

New CRMA users can find it daunting to create a new dataset for the first time; where do you start, what do the nodes do, which tool should I use and the list of questions continues. The Data Platform teams acknowledged that:

  1. There were gaps in the data prep tools
  2. The entry for new users is difficult and long

Recipes brought a more powerful and robust tool to the CRMA users as well as made it more approachable for new users to get their data just right for their dashboards. Recipes provide an intuitive, visual interface that allows users to easily point and click their way to build recipes that prepare data and load it into a target such as datasets, Salesforce objects, etc.

Compared to dataflows, recipes are newer and are recommended for performance, functionality, and ease of use. Recipes allow you to preview the data as you transform it, while dataflows only show your node schema. For example, recipes have more join types and transformations with built-in machine learning – such as Predict Missing Values and Detect Sentiment – that aren’t available in dataflows. Recipes can also aggregate data to a higher level. Data Prep Recipes has a lot to offer, and we aren’t slowing down on innovation either!

With a recipe you can:

  • Design complex data preparation flows with the visual editor.
  • Preview your data and how it changes as you apply each transformation.
  • Quickly remove columns or change column labels.
  • Analyze the quality of your data with column profiles.
  • Get smart suggestions about how to improve and transform your data.
  • Aggregate and join data.
  • Bucket values without having to write complex SAQL expressions.
  • Use built-in machine learning-based transforms to detect sentiment, perform data clustering, and generate time series-based forecasting.
  • Create calculated columns with a visual formula builder.
  • Perform calculations across rows to derive new data for trending analysis.
  • Use a point-and-click interface to easily transform values to ensure data consistency. For example, you can bucket, trim, split, and replace values without a formula.
  • See the history of all your changes, and back up or move forward to replay it.
  • Push your prepared data to other systems with output connectors.

But what about dataflows? Am I going to have to move all my existing dataflows into recipes?

We want to make Recipes your one-stop-shop for visual data prep. And we want to ensure that we can throw the full force of our developers into building awesome, new features and working on scaling up what we do have, which means we’re not going to release any new features on Dataflows. We don’t have a formal EOL announcement at this time, but we’re working on one as we write this, so expect more details to come soon. We want to give you advanced notice so you have plenty of time to learn Recipes, review your existing dataflows, give us feedback, and get into the future!

The best way for you to maintain and future-proof your organization is to move your Dataflows to Recipes. That can be a large, daunting task for those of you who have become Dataflow rockstars or are looking at a giant pile of Dataflows that nobody has touched in 4 years. We know this will take time. That’s why we’re telling you now! We value transparency and feel that the best way for us to ensure a successful transition is to be open about our intent and let you tell us what you need to make this a reality.

With that in mind, here’s what we delivered so far and already have on our roadmap for migration.

  • We’ve delivered a migration tool that turns a dataflow into recipes with a click of a button. There are additional enhancements to the tool in Winter ’23. There are plans to add more parity features to the migration tool as a follow-on.
  • We’ve published a comprehensive migration guide that details best practices on how to migrate dataflows to recipes
  • We’re providing an option to enroll in dataflow migration academies. More details about the upcoming schedule to be published soon.
  • We will provide the ability to opt-in to enable concurrent recipe runs (subject to approval). With the Winter 22 release lookout for this option under Analytics Settings.

How does product end-of-life (EOL) work?

An EOL is broken down into 3 distinct phases:

  1. The EOL announcement
  2. The EOL transition
  3. The actual product EOL

Starting with the Winter ’23 release, we will formally announce CRMA Dataflows end-of-life as part of our official release notes, which will start a 15-month clock that will give you a window to ramp the usage of Dataflow’s replacement. After this 15-month period which will conclude in March 2024, the Dataflows product will no longer be supported and will be marked as deprecated. The dataflows will continue to run, but it’d be at your own risk, and it won’t be supported if it doesn’t work.

How much time do I have before Dataflows & Dataset Builder are retired? Will they just stop working one day?

At the moment, we don’t have an official timeline for all phases of retirement, and Dataflows still work. Shutting off Dataflows (your dataflows “just stop working”) would be the final step in the EOL process. Before that happens, we will begin by turning off the ability to create new dataflows, so your existing rules can continue to run for some time while being migrated to Recipes. That’s why starting with Recipes now is the best way to prepare. We also plan to incorporate community feedback and put our migration tool to the test as we work out specific timelines.

To summarize, here are our current timelines:

New Customers

After October 2022 (Winter ’23 release), all ‘net new customers’ i.e. all new orgs (signed up after 2022) will get the new Data Prep editor as the default option

  • New customers will get Data Manager 3.0 by default, and by extension won’t be able to create Dataflows
  • Net new customers automatically get the ability to run concurrent recipes.

Existing Customers

Starting in October 2022 (Winter 23), existing customers will get the following new capabilities:

  • A conversion tool to convert a legacy dataflow to a recipe becomes generally available (GA)
  • Customers with existing dataflows will continue to receive product support through the duration of this transition period.
  • Ability to share dataflow and recipe concurrency slots (subject to the existing total org limit).

After March 2024(Spring 24), customers can expect:

  • Official support for dataflow-related cases ends, and after a release, we will also plan to deprecate the dataflow feature completely.

These future dates are all forward-looking statements and are subject to change.

What can I do now to start preparing my org for migrating dataflows to recipes?

First and foremost, start building new use-cases in Recipes. If you’re working on a new project, take some extra time to try to implement it in Recipes and get comfortable. One of the best ways to do that is with our new Convert Dataflows to Recipes (Beta) tool.

Here are a few important points to keep in mind:

  • In the dataflows list page, the drop-down button for each dataflow will include a “Convert to Recipe (Beta)” link
    • Clicking on that link, and a new browser tab with the converted recipe will be displayed
    • You can then save it like a normal recipe
    • Dataflow nodes will be mapped to corresponding data prep transformations
  • Your dataflow remains unchanged
  • The converted recipes are not linked to the source dataflow and can be updated/deleted as needed.
  • Does not impact existing scheduling or notifications; you can schedule the recipe in place of the source dataflow when you are ready.
  • As part of the Beta release, recipe JSON files have a limitation of 800kb.

Note: Please note that this conversion changes the API names of the datasets being created/updated through the recipe. Consider naming them properly to avoid errors and confusion. After running your conversion tests successfully, please consider reverting to the original API name of the dataset in the recipe.

What about recipe concurrency?

With the Winter 22 release (safe harbor), existing customers will have the ability to opt-in to make use of dataflow concurrency limits. The total job concurrency will remain unchanged.

For eg. if an org has a dataflow concurrency of 2 and recipe concurrency of 1, the org will be able to:

  • Run 3 recipes concurrently
  • Run 2 recipes and 1 dataflow concurrently
  • 3 dataflows will NOT be permitted to run concurrently

Dataflow concurrency will remain unchanged; based on the example above, you can continue to run 2 dataflows concurrently.

My dataflows are complex; are recipes ready for that kind of complexity?

Yes, data prep transformations offer a wide assortment of new features and recipes are at functional parity with dataflows. Functional parity doesn’t mean there’s an exact equivalent; for example, field attribute overrides for precision/scale in dataflows are defined as Edit Attribute transforms. Instead of SAQL expressions and functions, recipes support SQL expressions and functions.

There will be features in dataflows that are unsupported in data prep recipes (such as SOQL filter expression in the sfdcDigest node). For example; in recipes, Direct Data simplifies the way you bring Salesforce data into our system. Unsupported features will be documented in Help & Training. Additionally, the conversion tool will move unsupported features into the recipe definition as node-level annotations as shown in the gif below –

How do recipes compare in performance to dataflows?

Generally, recipes will outperform data flows in raw performance. The two platforms have different characteristics, and therefore performances will depend on the actual implementation. In general, because the unified data platform runs on Apache Spark, larger recipes benefit from distributed computing and will run faster than an equivalent data flow that runs on a single host. Please contact Salesforce Support if your converted recipe experiences unexpected performance degradations.

How do I see the job execution details in data prep?

In Data Manager 3.0 you can investigate and optimize your recipe jobs with the more detailed Jobs Monitoring page. Data Manager displays historical job execution details for data prep recipes, including wait time, run time, input dataset row counts, processing time, transformation time, and output dataset row counts and creation time.

More details will be added to Data Manager over the next several releases.

Note: Troubleshoot Recipe Jobs with Expanded Monitoring.

What are some limitations when using data prep?

  • Recipe formulas do not support self-referencing fields
  • Flatten transform supports a source dataset maximum of up to 20M rows; flattening more rows can lead to recipe failure
    • Consider the number of levels you’re flattening with the row size of your dataset. The Flatten transformation can flatten:
      • Up to 300 levels with a < 1million-row dataset.
      • Up to 100 levels with a 15 million-row dataset.
      • Up to 50 levels with a 20 million row dataset.
  • SOQL filters in sfdcDigest need to be rebuilt using Filter node or move to Connected Object filtering

Note: Using the flatten transformation you can flatten the Salesforce role hierarchy to implement row-level security on a dataset based on the role hierarchy. More details here.

Note: Limitations When Using Data Prep.

How will Salesforce assist me in this transition?

  • Webinars: Staring June 2022 lookout for enablement sessions focused on dataflows to data prep migration. We will conduct learning day sessions specially focused on this topic.
  • Academies: Please reach out to your Account Executive/CSM for upcoming dates and the enrollment process for dataflow to recipes migration academies. Additionally, limited solution architect guidance is available upon request.
  • Blogs: Multiple posts highlighting the migration process, and new capabilities. Here’s one to help you get started. Look for MOAR to come.

Resources

We have the following resources available:

Closing Thoughts

As I mentioned earlier we value transparency the most and feel that the best way for us to ensure a successful transition is to be open about our intent and let you tell us what you need to make this a reality. If you’re new to Recipes, start hitting the trails on Trailhead to learn more. Then, start building a very basic recipe. If you’re a Dataflow pro, take your complex dataflows and convert them to recipes. Do you have more questions about recipes or do you have a use case that you can’t figure out how to implement in dataflows? Ask us!

Please feel free to drop a comment below or message me on LinkedIn. As always, let me know what would be most helpful to you!

Forward-looking Statement

This content contains forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proved incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make.

Any unreleased services or features referenced in this document or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

How useful was this post?

Click on a star to rate useful the post is!


4 thoughts on “Next-gen Visual Data Prep: What’s Happening with Dataflows and Recipes?”

  • 1
    Mark Tossell on May 24, 2022 Reply

    Fabulous post. Very informative.

  • 2
    Fabio Grossi on May 25, 2022 Reply

    We massively used Dataflows in the past, and moved to Recipe since a while, but actually we still have to use both.
    Unfortunately sometime we still have to transform our data by Recipe, and then finalize them by a Dataflow jut because the Recipe seems still not able to manage big datasets (hundreds of columns) and/or containing big multi-value fields (in such case the Recipe got stuck for dozens of hours than it fails, while dataflow takes 10 to 20 minutes to handle that. I’m talking of pure dataset opening and saving, with no transformation).
    We cannot migrate old complex dataflow for the same reason, Recipes seems to collapse in all such cases.
    Any idea when the Recipe capabilities will be brought at the same level of Dataflows?
    We just keep waiting for them to work, patiently 🙂

  • 4
    Greg Capoziello on June 24, 2022 Reply

    The information in the “Recipe Concurrency” section is misleading. I have a customer and in logging a support ticket to get increased recipe concurrency, this is the response we received:

    “This is a limit change requests are an exception to our standard product functionality, are subject to review, are not guaranteed, and are subject to change at any time.I have a raised a request with the Product team and waiting for their approval.”

    This is also a large customer (with 2700 SF licenses and 100 CRMA licenses). The lack of concurrency is the number 1 reason why we are still developing and using dataflows.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.