Stop, Drop and Roll Out Your Datasets!

We have all been there. You’re building a brand new dataset for the business in CRM Analytics, and the requirements are, shall we say, fluid. You go in with a simple specification to use fifteen fields across three different objects to deliver two specific metrics, and between changing requirements and scope creep by the time you’re done you find that you’ve got 45 fields from seven objects delivering 8 specific metrics - and not all of those fields are even being used! Now that you’ve the dataset ready to go, it’s time to do some cleanup extra work so future you doesn’t hate present you.

Stop!

Leaving unneeded fields in the dataset can cost you both performance and complexity. The cost of performance is straight forward.  Every field needs to be calculated for every row in the relevant dataset. If you are calculating unneeded fields over enough rows or complex calculations (or both!), this can slow down your recipe substantially. Unneeded fields can be the difference between telling your business users that they can refresh their data once an hour or once every two hours. Faster is usually better, and unneeded fields make you slower.

The cost of complexity is subtler. If you’ve got 45 fields when you only really need 30 fields, every time you come back and look at the dataset you’re going to have to refresh your memory as to what all these fields are and why you really need them. And if you’re going to have to do that as the person who wrote the recipe, imagine how much effort it will be for the new employee who is tasked to come in and tweak it in the future. They could absolutely use outdated fields that don’t do what they think they’re doing. Do your future self a favor and remove those fields now when it’s all fresh in your mind.

Drop!

There are two basic patterns for bulk dropping fields in CRMA Analytics recipes. You will need to use them both.

[1] Put a DROP node right before the Output dataset.

This classic approach is quick and easy. Find your Output dataset in the recipe and add a Transform node right before it. Then add a “Drop” command to the Transform node, and drop every single field in the dataset that you aren’t using. You now have a dataset that only has your needed fields in it.

You don’t have to wait until the end of the recipe to take advantage of this approach. Imagine you have a transform node that creates many interim fields on the way to a final field value. You can put a drop at the end of the transform to get rid of all the interim fields as soon as you’re done with them.

The pluses to this approach are that it is very easy to implement and almost completely removes the complexity cost. Anyone who is using the dataset, whether from a lens, a dashboard or another recipe, will only see the fields you want them to see. You will also see some performance improvements because those fields don’t need to be written to the final dataset. However, if one of your unneeded fields is being carried through the whole recipe only to be dropped at the end, you’re still slowing down your performance for no net benefit. We can do better.

[2] Drop unneeded fields at every JOIN node.

Did you know that you can drop fields in every JOIN node? It’s easy! Select the JOIN node, then scroll through the list of fields. If you don’t need the field past the JOIN node, deselect the checkbox. Then click Apply. Bang! You’ve removed fields from the recipe and reduced the complexity of every downstream node in the entire recipe.

At a bare minimum, you probably don’t need two copies of the field that you used to execute the join. Drop one of them.

Roll Out Your Datasets!

Using the approaches outlined above will improve the performance of your CRM Analytics recipes while reducing the complexity of your datasets. Both approaches are very simple to implement and can be used in both brand new recipes and old battle tested recipes. Really, there’s no excuse not to start using both approaches as you roll out new datasets and enhance old ones.