Calculation flow charts – Unravel the Complexity of Tableau workbooks using Python and GraphViz

Automated generation of dependency graphs for your calculated fields.

Tableau is a mighty tool for analyzing and visualizing all kind of data in many ways. Besides a variety of charting options it provides very powerful opportunities to manipulate and aggregate data. This often leads to complex, multi level calculations that can hardly be maintained. Any change feels like a play of jackstraws.

This Post will give you a tool at hand to dig into the complexity of your work, identify the main elements of interest and enable you taking back control.

What goes wrong:

The way to overcomplex, performance degrading workbooks is paved by the following mechanisms:

  1. You create calculated fields. You use these fields in other calculated fields. You use those in other calculated fields. You use…. You get the point. This problem is amplified by using multiple fields on different levels such that the result is no straight tree but resembles more the web of a drunken spider.
  2. You end up with a calculation where you hardly can retrace wich “atoms” (data coming directly from the source) have been used. Due to all the in-between calculations it takes a lot of effort to sort that out.
  3. You use Level of Detail calculations. This very powerful (and necessary) measure reduces performance and should be avoided if possible. If LOD calculations are not at the beginning or the end of a chain it is difficult to find and replace them without harm.
  4. Tableau offers limited information on the dependencies a field carries. This makes it difficult to judge how or whether it can be removed or altered.

These are the most obvious points every Tableau developer (I hope not being the only one) comes across if building dashboards.

What can help:

That’s quite obvious :

  • An image like a flow chart lining out the dependency of a field from all the other ones would be great.
  • Would be helpful to mark LOD calculations and atoms in the such an image
  • A list of atoms that have been used for a field resolving all the intermediate dependencies.
  • A list of calculated fields that are somehow depending an a single atom
  • A list of Atoms never used. A listof fields neither used in a chart nor in a downstream calculation. A list of …. you get the point.

The search:

In search for a solution the lists were the starting point. It turned out to be fairly simple by using the Python package provided by Tableau. It defines a workbook class letting you directly access all needed information. Digging into the dependencies down to the atoms required a recursive function but, hey, no risk no fun.

Drawing a reasonable useful chart of dependent nodes requires a decent piece of software. The search ended at GraphViz, an open source project that was build for doing exactly that. And, surprise, it provides a python package for easy interfacing.

What have we got:

To illustrate the results the freely available COVID 19 dashboard provided by Tableau (use the “download” button in the upper right to get your copy) is used. It is not overly complex but should do the trick.

Main goal: the images.

For each calculated field an image is generated and stored. Below the flow chart for “Difference in Cumulative or New Positive Cases to Previous Day %” is shown.

The colors of the bounding boxes mark what we are looking for:

  • Red: element is an “atom”, a value from the Data Source
  • Blue: element is a parameter.
  • Purple: This element is a LOD expression.
  • Black: calculated fields
  • Green: element is the field in question.

There are plenty of options to customize such an image, from the shape of the nodes to the order and distribution of elements in general.

Lists as wanted are generated as well. For sake of simplicity they are all put as sections into one file but it requires no rocket science to adjust the output format as wanted for further use.

Since lists are not as charming so I only cite some lines (mind, the bold face in the code blocks are for readability, the out-file is plain text):

  • Unused atoms: Well, a list of 4 atoms not of particular interest here.
  • Parameter: Cumulative or New was found. No surprise.
  • LOD expressions: Max Date: {MAX([REPORT_DATE])}Here we go: Max Date we already know from the image. The calculation is shown as well.
  • Made out of: The example of our image can be found, too:
Difference in Cumulative or New Deaths to Previous Day %: [‘Report Date’, ‘People Death Count’, ‘Cumulative or New’, ‘People Death New Count’]
  • Used in : Let us have a look where one of the above seen atom, People Positive New Cases Count, is used in:
People Positive New Cases Count: [Total or New Positive Cases, ‘Selected Metric’, ‘Current Day Cumulative or New Positive Cases’, ‘Current Day Select Metric’, ‘Difference in Cumulative or New Positive Cases Down’, ‘Difference in Cumulative or New Positive Cases Same’, ‘Difference in Cumulative or New Positive Cases Up’, Difference in Cumulative or New Positive Cases to Previous Day %’, ‘Difference in Cumulative or New to Previous Day Positive Cases’, ‘Difference in Select Metric Down’, ‘Difference in Select Metric to Previous Day %’, ‘Difference in Select Metric Same’, ‘Difference in Select Metric Up’, ‘Difference to Previous Select Metric’, ‘Last Date Positive Cases’, ‘Last Date Select Metric’, ‘Previous Day Cumulative or New Positive Cases’, ‘Previous Day Select Metric’]

You see that not only the direct successor but all “descendants” are listed.

Where to buy ?

Luckily all this comes for free. You do not even need a Tableau license as long as you have your workbook at hand.

  1. Install Python 3.x : There is help on that here.
  2. Install the required packages:
pip install tableaudocumentapi
pip install graphviz

3. Download the GraphViz executable. It is used to fire a command line command that generates the images.

4. The central script: Tableau Analyzer

https://gist.github.com/CK4444/054ad072718a9fe9c4314f833523980a.js

You only need to configure the source path:

source = ‘C:/Users/CKnoell/Desktop/Kram/TabAna/Coronavirus (COVID-19) Cases.twbx’ writeAna = True # write anlysis lists to a file
writeGVZ = False # write dependency gaph as GVZ file for further usage
writePic = True # write Pics for each calculated field in own directory

Changing the writeXY flags will switch on/off what is written to the disc.

A disclaimer: The source code is provided „as is“ – and a typical 80/20 result. So it is running fine but it was not made for being a show case and is certainly neither shiny nor bullet proof.

More important: the calculation logic and the algorithms to identify certain properties are not perfect, especially identifying parameter is a difficult task. I’d be happy for any hint and improvement proposal.

Conclusion

The above toolset enables you to easily dig into the most complex calculations of your Tableau workbooks. It provides valuable information for streamlining, debugging and reducing complexity of Tableau reports to increase maintainability and performance.

Thank you for reading.

PhD in Physics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store