Sunday, March 22, 2015

Airline Roots - Iron Viz Contest Entry

Details coming soon.

In the mean time, heres the viz:


Nelson

Friday, March 20, 2015

Wheels within Wheels: The History of Marvel Comics (Guest Post)

Hello everyone, I’m Chris Toomey.  I work with Nelson at Slalom and he has kindly let me borrow his blog for the day.  If you haven’t guessed already, this post is my entry into the Wiki Data Contest


For those of you who don’t know (or maybe forgot) – every year Tableau hosts a series of themed competitions to select competitors to the Iron Viz Championship.  The first is all about finding data on Wikipedia and make a viz out of it.  I’ve been reading comics since I was 10 and thought it would be cool to chart the History of Marvel Comics.  

Sources and Inspiration

In developing this viz, I came to appreciate how just how much Tableau’s technical depth supports the creative process.  There were many sources of inspiration, both technical and creative, and their work deserves mention. 

Jonathan Hickman – Writer for the Avengers and mastermind behind Marvel’s upcoming “Secret Wars” event.  His visual style is fantastic, and his storytelling is even better.  A primary theme in his work is the notion of circularity – think concentric circles and time travel.  If you decide to open up his Avengers #1, this is one of the first things you will see:



Jer Thorp – Co-founder of the Office for Creative Research, former NYU professor, and from 2010-2012, the Data Artist in Residence at the New York Times.  In 2012, he published a visual history of the Avengers, a selection of which can be seen below.  If you want to see master work in data visualization, read his stuff.



Bora Beran – Program Manager at Tableau, where he focuses on statistics and calculation features, query generation and technical partnerships.  Bora’s post on radial treemaps was what gave me the confidence that I could take what I’d seen in the work of Hickman and Thorp and translate it to Tableau.

There is also much more Marvel-related background, but you’ll need to explore the viz to find it! 

Getting the Data

While the specifics of the competition make it appear that you can simply CTRL-C + CTRL-V your way to a fabulous Wikipedia data set, the reality is much more cumbersome.  After working the design basics out in my handy notebook (paper still has its uses!), I knew exactly what kind of data I wanted.  This meant that I had to manually cut, paste, clean, and format the data.  It may not have been the most elegant solution, but it gave me the exact data structure that I needed to start. 

Once I had all my data, it still needed some polish so I turned to Alteryx.  In Alteryx I was able to shape, sort, aggregate, join, and union my way to a Tableau-usable dataset, without the pain of doing it all in Excel.  In a few instances I needed subsets of the data to serve as inputs or lookups.  Instead of writing that data out once, and bringing it back in as a full input, I kept it all inside the tool (which is why you see two sets of text inputs).  


Here’s how that trick works: insert a Browse tool into the workflow where you want to pull out data.  Once the workflow has run, open the Browse window and click the
icon in the top right and select “All Data with Headers.”  Then right-click anywhere in the workflow and select “Paste”.  The data will now be a Text Input and keeps everything self-contained. 

Design Notes and Community Musings

I knew that circles were going to be the dominant part of my viz, so I needed to brush up on my trigonometry.  To do so I spent some time dissecting Steven Carter’s wonderful Adoption Gap visualization, and playing around on the Desmos Graphing Calculator.  Desmos is a wonderful tool for writing and comparing any type of function.  Simply start typing and it will do the equation formatting for you, hit Enter and you’ve got a graph.  Very handy for benchmarking equations, particularly SIN and COS.

Once I had my basic math down, I visited Bora Beran’s radial treemap post.  The third of his three examples shows a radial bar chart - which I knew was the basic model I wanted for my viz. 
 I first replicated his design using my data – his methods require using a data densification trick that relies on binning and unions.  It’s handy and got me most of the way there.  I just needed each bar (representing a character/team pair) to be offset by the number of months from time zero that they joined their respective team.  I’d like to say that I came up with a magic trig method to do this – but it simply came down to changing or adding one variable at a time and seeing what happened.  Desmos was particular helpful here as I could make multiple equations and compare then quickly. 

Here’s the solution: Inside of the SIN/COS functions, I added the following: 

RUNNING_AVG(SUM([DiffFromStart])) * WINDOW_max(1.85*PI()))  

DiffFromStart is a field I calculated in Alteryx that does a DateDiff from each characters JoinDate and the first date in the dataset.  I’m calculating it at the team+character level, SUM(DiffFromStart) is the same as the unaggregated value.  I wrapped it in a RUNNING_AVG to account for densification – I didn’t want that value to change just because Tableau created new points. Finally, the WINDOW_MAX represents the amount of the circle I’m actually playing with.  The result is that I can push the polygon around the circle and keep it aligned properly. 

It is important to note that with the introduction of the “Spin the Wheel” filter, which operates before all the Table Calculations, this introduces some odd variations in arc length and position.  The values don’t change, but they move around in unexpected ways – which makes all the interactions all the more fun, you simply don’t know what’s coming!

Final Polish

The last thing I needed to do was finalize the formatting. A black background gives the colors the most pop, and I used Tableau 9’s handy hex color values to create consistent palettes.  The final piece was fonts – I spent so much time on the viz that I didn’t want to be undone by unreadable text.  Here’s the links to the best Tableau font resources I could find:

There are a number of other tricks to get the tooltips to function the way I wanted, but I’ll spare you the details and let you all pull things apart.

That’s it! Thanks to Nelson for loaning me his space, and thanks to you all for taking the time to read and interact with my work – I hope you enjoy using it as much as I enjoyed building it.
I’m active on Twitter, but the link to my LinkedIn page is on my Tableau Public profile – so feel free to reach out with comments, questions, or if you just want to talk comics.  I’m also at all the Seattle Tableau User Group events, and will be at TC15 in October.  

Monday, March 2, 2015

Demystifying Nested Table Calcs in Tableau

The farther you go with this thing we love called Tableau, the cooler and more advanced analysis you want the ability to do. Over the last few months I’ve had an opportunity to work on a number of really interesting projects. The one that’s been taking my free time has been for awesome cause: creating dashboards to help track and stop the Ebola spread in West Africa. It’s a part of the work that the Tableau Foundation is focused on with an organization called Dimagi, an award-winning global health software company, and the Columbia University Earth Institute. Fellow Zen Master Peter Gilks, of Slalom New York, has been the lead on this project and I would guess has spent well over 100 hours of his free time working to make it successful. If you know Peter, please thank him for the time he’s invested.  It’s certainly been a labor of love for all of us.  The other interesting work has been focused on my client in the travel industry, slicing and dicing data in ways that I honestly have never been able to do before - and what has been a big factor in the success of these projects? My head-long dive into Nested Table Calcs!

Even as a Zen Master, I’m always learning - and I have A LOT to learn. Different projects, different clients all require new things, and the depth of the analysis that is possible inside of Tableau becomes even more apparent the farther down the rabbit hole we go.  I would argue that beginning to understand today’s topic of Nested Table Calcs is the best way to take your Tableau game to the next level, and provide your users with even more in depth analysis.

We have to start with some fundamentals, but we’re also going to make a couple assumptions.  We’re going to assume you know at least a little bit about table calculations, though we’re certainly going to discuss how those are constructed as well. We also need to assume you want to perform some sort of analysis that at least a little bit complex and is going to require at least two or more steps of transformation/calculation.

So let’s start with the basics - what exactly is a Nested Table Calc? I’m so glad you asked! While there are some things in the world of technology that have names or acronyms that defy logic, this is not one of them. These things act just as you would think of Russian nested dolls: something small and simple is used to create something slightly larger and slightly more complex, which is then used to create something a little bit more advanced, and so on and so forth. 



The calculations build one on top of the other, allowing you to produce super complex analysis like:

Rank the top five states in each region by year-over-year change in average monthly profit margin

See? Pretty cool stuff, huh?  The orange parts are the table calculations. Stacking (or nesting) table calcs in each part of that analysis in just the right way allows us to do something super complex, and should begin to give you a little bit of insight into the power of what were actually talking about here.

You should also know there is a tremendous amount of awesome work from the likes of Joe Mako, Jonathan Drummey, Noah Salvaterra,  Keith Helfrich (not a Zen, but sure could be) and many others on this subject, and I would certainly encourage you to dig deeper by going through their work on this.  Without them, I’d still probably be lost in the dark when it comes to all this stuff.

So why don’t you say we take that little example we discussed before (which I came up with on-the-fly) and walk you right through it?  Sounds like fun! Here we go….

Just as we seemingly always do, we are start off with popping open Superstore Sales data (the one with English in the title). We’ll start by dragging region to the filter shelf and excluding International (I mean, we’re looking for states here). Alright - let’s begin by remembering the goal of the analysis:

Rank the top five states in each region by year-over-year change in average monthly profit margin

Got it? Because we’re doing nested table calcs we’re actually going to begin at the end.  Let’s get set up to show monthly profit margin.  Start with Order Date and throw both year and month on the column shelf. Also, go ahead and throw region on the row’s shelf (we will add states later on). At this point you should be looking something like this:


Now let’s quickly throw together the formula for profit margin: the sum of profit / the sum of sales:



We will simply drag it out onto the worksheet and we will quickly have a monthly profit margin:



OK, if we look back at what we are putting together you’ll see that we actually want to take an average of the monthly profit margin - but averages in Tableau can be kinda confusing.  If you simply use the average function (AVG) Tableau’s going to calculate the average at whatever the level of granularity in your data actually is at the row level. The problem for us is that our data is much more granular than a monthly look, so instead were going to have to use a window average table calculation (Window_Avg).  Using the window average allows us to set the granularity based on what is in the view.  Level of detail calculations in v9.0 (probably the next blog post you’ll see from me) go one step further and allow you to set the level of detail without concern for what is actually in your view (it’s going to be awesome). Now, before we actually create the table calculation let’s look and see what were actually hoping to get. By clicking on Worksheet>> Show Summary you will get this lovely little card that allows you to see interesting details about the data in your view or what you have selected.  Once you see it on your sheet, click and select the 12 months of 2010 for the Central region. Your summary card should update and now show that you have selected 12 marks with an average profit margin of 13.86%:



Note - this is a monthly average. What I mean by that is that we are adding up the percentages from the 12 months and dividing by 12 - these months are not weighted and this number is not a yearly sum of profit divided by a yearly sum of sales - that would be a yearly average, and, to clarify, that’s not what were interested in. Just thought you should know.

So now we know we’re looking for 13.86% in the Central region for the year 2010. Let’s write our Table Calc.  We’re going to leverage the Window_avg function and we’ll call it “Profit Margin - Win Avg”:



I like to try to keep the base name of the function first and then add in whatever aspect of the nest that calculated field is. Thus, we called it “Profit Margin - Win Avg” which should help us group the functions together and keep things straight going forward.  We’ll drag that pill out onto the view of the sheet, on top of the numbers that are already there, and we should now see two sets of numbers stacked on top of each other, like this:



The problem we should notice is that this average is the same for all months and all years - and it’s not the 13.86% we were expecting. What we wanted to see was an average that was the same for every month but restarted every year. In order to get there we click on the pill that is now located in the measure values and has the triangle on it – the triangle icon is letting you know that this is a table calculation. We click on it and go down the menu and hit “Edit Table Calculation” and it brings up the dialog box that causes trembling and fear in the hearts of even the most advanced Tableau user.

Have no fear amigos, we’ll walk through this together. 

Where it currently says “Table (Across)” (which is the typical default – and rarely what you want to use) click on it and go all the way down to the last option where it says Advanced.  Should now see this:



If you select Month of Order Date and click the arrow to send it to the right what you’re telling Tableau is “the thing I want you to average is the Month of Order Date - do this for each year and for each region (IE when region and year change I want you to restart your average)”.  Make sense? Even though there are easier ways to do it, I always go to the advanced tab on table calculations because I need to say out loud to myself how I’m planning for Tableau to do this calculation.  I would recommend you do the same until you become pretty good at it.

Once you apply those changes your sheet should now look like this, and you will notice we now see correctly that 13.86% is the monthly average in the Central region for the year 2010.  Perfect!



Now let’s head back to the original statement of analysis we’re working on:

Rank the top five states in each region by year-over-year change in average monthly profit margin

Well, average monthly profit margin is done so let’s continue to work backwards - and go after the year-over-year change.  This is another table calculation, and another simple one.  But at the same time it’s also the first one were going to nest the average that we took. So here’s the tricky part and how we do this: right-click on that first table calculation we created called “Profit Margin - Win Avg” and select Create Calculated Field. The new calculated field that we want to create is going to take the current year average profit margin and subtract it from the prior year average profit margin. Therefore the table calculation in Tableau we want to use is the lookup function that looks like this:



Again, we’re basically telling Tableau to take the difference between the current Year and the previous year and return the difference (for multiplying by 100 because these are percentages and we want to return the difference in the percentages in points - we do this because it’s easy to see when we’re right). Once you’ve created that calculated field, right-click on it and open up the Default Properties>> Number Format and force it to be a two decimal number with the string “ pts” on the end, like this:



Now, drag that out and placed on top of the other numbers in the table just like we did before.  You end up seeing a third row of numbers which sort of looks right but mostly have zeros. Again, we’re going to have to tell Tableau how we want this table calculation to be calculated. Go back to measure values and click on the third pill, “Profit Margin – Y/Y Diff…” and hit Edit Table Calculation again. You should notice a couple things: the default of Table (Across) is back again for the field we just brought in, but you should also notice we now have the ability to toggle between two different table calculations at the top. So long as the first table calculation we did is still in the view, you should see that if you switch to it it’s already preconfigured using what we gave it in the steps above.  You have the ability to switch between the two calculated fields because you are now leveraging a nested table calculation - this is because we leveraged the calculated field of a table calculation in another calculated field with another table calculation.  The power here is that we have the ability to set the configuration of each table calculation nesting separately, and you will see that this is very important for what we want to do now.



This time we bring over both Year of Order Date and Month of Order Date, making sure to put Year on top of Month, because once we set that part of the dialog box will tell the first dialog box we want to do this at the level of year and we do not want to restart the table calculation. This has the effect of forcing Tableau to ignore month completely as we do this table calculation because it is lower than the level of granularity that we are doing the table calculation.  Make sense?

Once we go back to the table, the effect is pretty obvious.  Everything from 2010 is now gone (which is logical because you’d be subtracting 2010 from 2009 but 2009 doesn’t exist - so it shows up as null). And we now see the same number show up for all months but different in each of the years 2011, 2012, in 2013 - just as we expected.  AND you’ve now written your first nested table calc. – you’re the bee’s knee’s!

Alright, let’s take another look at our statement of analysis we’re working on:

Rank the top five states in each region by year-over-year change in average monthly profit margin

Before we get to ranking the states, let’s limit the information that we’re looking at.  You should notice that the data for each year is different, but it’s the same for every month.  Therefore let’s limit our sheet to looking at only the first month of every year.  The easiest way to do this is to create a test - a true or false Boolean statement – to figure out if we’re looking at the first month of the year. Of course we can simply write something like Month(Date)=1, but that’s not a Table Calc., so what fun would that be?  Instead we’re going to use a function called “First” and it goes like this:



We create a calculated field called “First Month?” and is going to run a table calculation to figure out if what we’re looking at is the “first” based on the way we define it.  Notice that once you’ve created this calculated field it’s going to show up in Measures, not Dimensions even though it is a Boolean and returns true or false.  The reason for this is that table calculations are always aggregations, and because they’re aggregations they have to show up in Tableau as Measures. We are going to drag this new field onto the end of the column shelf, and Tableau is now going to label each column either true or false based on whether or not it’s the first in the column.  Therefore, very logically, the first column appears as true and all the others appear as false. Our goal however is to make the first month of each year show up as true and the other month as false. 

To get there, click on the triangle of the field you just drag it onto the column shelf, and go down and hit Edit Table Calculation. This time were going to put year and month to the right side (in that order).  After we hit OK, we’re going to configure the first dialog box to calculate the table calculation at the level of Month of Order Date, and restarting every Year of Order Date.  It should look like this:



Once that is done you should see a few more “trues” showing up every January. Now, simply right-click on one of the “falses” and select Hide.  Hide is important feature of Tableau when it comes to table calculations.  If we simply excluded, or filtere out all the other months, it would change our average to simply be whatever the profit margin was in January.  By using hide, the data is not excluded, it just doesn’t show up, and our averages remain correct. Your sheet should now look something like this:



We’re getting really close now. Looking back once again at our original statement of analysis:

Rank the top five states in each region by year-over-year change in average monthly profit margin

It's time to bring states. In the mapping items hierarchy click and drag States and drop it onto rows in between Region and Measure Names. You now have a bunch of rows, one for each of the three measures for each state that’s in the data.  The very last piece of this analysis is the ability to rank each of those states based on that year-over-year change so that we can identify the top five, in order, for each year, for each region. This is where we write our last part of our nested Table Calc.

Just as we did before, we’re going to right-click on the previous calculated field we created and select “Create Calculated Field”.  Here we leverage one more table calculation that we will use in order to rank our year-over-year differences:



Using rank unique guarantees that we’re going to get a different ranking for every row, even in the event that the numbers were exactly the same. Before we bring this calculated field into the view, right-click on it, and about two thirds of the way down, select “Convert to Discrete” - this is going to force Tableau to write the actual number rank rather than trying to create a visualization – which it would do if we left it as a continuous measure. Once you’ve done that, now click and drag that pill and place it in between Region and State. Again, it's time to open up the trusty Edit Table Calculation dialog boxes to get this last table calculation set up for us. At this point you should be a pro at getting these things set up, but think logically about what we’re looking for here: we want to rank the states (so address that), we wanted to restart for every year and for every region (so partition those), and we want to ignore the months (because they are all the same) - so place Month underneath State in the addressing box.  Once that’s all taking care of tell Tableau that were going to do this At the Level of State.  Everything should look like this:



Once this is done you’re going to see some funkiness for second.  The reason for this is that the ranking of the states that are not in the region that is specified are going to show up as null, and Tableau puts these at the top.  Simply right-click on the word “Null”, and click hide. Also, hide the year 2010 – we’re not going to care about it anymore going forward.  Now, press Control and left click and drag the rank function from the rows shelf over to the filter shelf. By clicking Control before you do this, it has the effect of copying and pasting the function with all the editions that we have made. When the dialog box pops up start by clicking “None” and then ticking the boxes for 1, 2, 3, 4 and 5 (because we want to see the top five States for each Region - that’s what we’re selecting here). Also, you no longer need the first two fields that are inside the measured values – “Profit Margin” and “Profit Margin – Win Avg”. Click and remove those two fields and the field “Profit Margin - Y/Y Diff” will shift up and now reside on the label shelf by itself. At this point are really close to our final result and you should be seeing something like this:



Because we have the State on the Rows Shelf, it's creating a new row for each of the states that ranks first for each of the years.  Therefore, in most cases, were seeing three rows for every ranking (one for each of the states). To fix this we’re simply going to take State and move it over to the Label Shelf. Doing that gets is 98% of the way there:



At this point all we’re going to do is add a little bit of visualization - this wouldn’t be Tableau without it!  Change the mark type to squares and Control, click and drag the “Profit Margin – Y/Y Diff…” pill from the label shelf to the color shelf as well. I've use the colorblind diverging color palette and set the start/center/end at 20/50/120 pts, to get this final view of a heat map highlighting the best and worst of our top five states for each year.  The final product looks like this:





That’s it! You did it! You've created your (possibly first) nested table calculation in Tableau! They are all really easy just like this :) Well, maybe not, but now you can begin to see the powerful analysis that nested table calculations give you the ability to understand. It's worth noting one last thing - the reason we left each step of the calculation in the view until the very end was that Tableau uses the configuration for each of the nested pieces that we had already set up. We could have gone straight to the last calculated field and brought that in, but we would have had to set each of the other nested pieces up from scratch, which can be tricky if you can’t see them. Removing them at the end is how I like to keep things straight.

Congratulations! You’ve made it all the way through this very long post. I hope you learn something about nesting table calculations. Thanks as always for stopping by!

Nelson