Thursday, September 3, 2015

Connecting the Tableau Server Background Jobs Table back to Content

There's some amazing work happening all around us.  One of the greatest things about Tableau is the awesome community, and when someone discovers something it's not long before the world hears about it.  

Today, I have the pleasure of introducing you to Mark Jacobson, a Slalom Consultant out of the Silicon Valley office.  Mark's connected some dots in the Tableau Server Postgres DB repository data model for us, allowing us to better understand how extracts are connected back to their workbooks - something that's really valuable if your Server is overburdened with extract jobs.  Check it out!

Connecting the Tableau Server Background Jobs Table back to Content

If you’ve ever tapped into the Postgres DB that lies at the heart of Tableau Server, you’ll find that it is a gold-mine of data about how your users interact with and experience your Tableau Server. Statistics on everything from how frequently users access workbooks, when they publish content and how well your server is performing can be gleaned from this data. And what better way to do that than by hooking up Tableau directly to get these insights.

Lots of posts have been written about how to access these tables and Tableau has published a data dictionary about what you’ll find within. There have also been numerous contributions from the Tableau Community showing what can be done with this data. Jeffrey Shaffer, who runs the amazing Tableau Blog data + science, has compiled a list of these on his Tableau Reference Guide.

Much of my client work is focused on helping organizations understand what’s going on inside their Tableau Server platforms. On a recent assignment, I had the challenge of needing to understand what was causing my client’s Extract Refresh jobs to experience long delays. I knew that by tapping into the background_jobs table I’d be able to find my answer. 


To see where the delays were, I created a heat map detailing the delays between the scheduled start time and the actual start time of each job. This showed me where the bottlenecks were but one piece was still eluding me. Specifically, I needed to connect the performance of each refresh job back to the workbook or datasource that was experiencing the delay and then to the project where it was located.

Looking for a Solution

At first this would seem like an easy task. I would just join the background_jobs table back to the workbooks table or datasources table on a common ID field and go on with my analysis. That was until I looked at the contents of these tables. The background jobs table does not contain a workbook_id or datasource_id like the other tables within the workgroup schema.

It does contain a “title” field which has the name value of the content and a “subtitle” field that displays either “Workbook” or “Datasource” if the background job is related to an Extract refresh but this is not unique.

What happens if you have a workbook in 2 different projects with the same name? How would you know you were getting the job results of the right workbook? This was indeed my problem so I began to look for a more unique way to link these tables. After all, Tableau Server has to be able to make this association so it can perform the desired operation, right?


Discovering the Link

I knew the answer had to be within the background_jobs table, so I went digging. Low and behold, I found my answer within the “args” field. For a Workbook’s Extract Refresh job, the “args” value looks something like this:

---
- Workbook
- 87994
- (string masked for confidentiality)
- 75596
- null

Could one of these values turn out to be the workbook_id or datasource_id I was looking for? Turns out the ID value after “Workbook” is the workbook_id value that will tie back to the workbooks table. Having found the workbook_id, I now needed it in a format that I could use in a join. The next step requires some custom SQL and a little knowledge about regular expressions.

Using the following statement, I was able to parse the “args” string into an object_id:

cast(split_part(
(regexp_replace(args,'---','')),'- ',3
) as integer) as object_id

The statement replaces the dashes and splits the string into its own field which is then cast into an integer so it can be successfully joined with either workbooks.id or datasources.id inside of Tableau’s Data Source editor.





Success! I now had a version of the background_jobs table complete with the ID of the object being refreshed that I could link back to my data!

If you’re like me and you’ve been stuck searching for way to do this, I hope you enjoyed the post. I’ve created a .tds file and hosted it on GitHub so you don’t have to recreate the SQL if that’s not your thing.

Happy hunting!


Mark Jacobson
Consultant, Slalom Consulting
Silicon Valley, California