The greatest value of a picture is when it forces us to notice what we never expected to see. – John W. Tukey
Let’s assume that you have some data with you and you wish to garner some insights from it. Coding is not your forte and you don’t know how to get started.
Let me tell you this – you can make something as descriptive / insightful as the image below, with gestures as simple as drag and drop. And it doesn’t even require a single ounce of coding. Now that is the power of Tableau for you!
For all those reading this who have been acquainted with Tableau, can plot a few basic charts on it, and wish to learn more about its wide horizons, this article is meant for you.
As for those who have yet to be introduced to the beauty and simplicity of Tableau, quickly go through Tableau for Beginners first. Practice making a few simple visualisations and then rush back here!
In this article we are going to discuss a few core functionalities of Tableau which help in making really dynamic graphs. So let’s quickly get started!
Its not practical to store all data in a single table. In order to avoid anomalies related to updates, data is almost always distributed in multiple tables that have some relation with each other. Let’s understand the same with an example.
Consider the situation where a Superstore, on the verge of expansion, perceives that the number of Returned orders has been increasing by day. To ascertain the analysis and come up with the right action plan, they plotted the following chart to understand the products that were being returned :
As can be observed, Binders have the maximum number of items being returned. But judging by the color of the bars, Machines and Tables have the highest percent of return ( returned / bought ) :
This seems pretty similar to plotting just another chart, but the trick here was that it was created by using the combined data of two tables : Orders and Returns. This is known as a Join.
Let’s try making the same to get a better understanding :
The dataset is made up of 3 tables : Orders, People and Returns, and the ones that we are interested in at the moment are Orders and Returns.
For two tables to be joined, there has to be the presence of at least one common field. Here, Tableau automatically Inner Joined the two tables, based on the commonality of the column Order ID. By way of inner join, the combined data only consists of those rows that have the same Order ID in both the tables.
You can change the Join Type as well as the Joining Field in Tableau, but you need to ensure that it’s sensible.
Changing the Join type (Default Inner to Right) :
Changing the Join field :
See how I tried to join the two tables based on Row ID of Orders and Order ID of Returns? Since the two are not compatible, we don’t see any records, and plus that red mark near the circles shows an error.
Let’s get back to working on the chart now :
Here we have used Inner Join, but you can always choose between Inner, Right, Left and Full Outer based on your requirements.
Data blending is quite similar to Joins, with the difference being that Joining requires the data to be from the same data source. In the above example, we used different tables from the same Excel file. But Data Blending comes into the picture when you are working with different Data Sources. Let’s understand the same with an example.
The Superstore has another vertical in the form of a Coffeechain which is spread across as many states as the Superstore is. But they are considering shutting down some of the branches after observing the following plot :
As can be seen there are some branches that are doing equally well as the Superstore such as California and New York, while many are not, such as Iowa and New Mexico. Just like in Joins, here the trick is that both the datasets, dealing with the two verticals, were stored in different Data Sources, an Excel file and a TDE database.
Why not we plot the same as well to get a better understanding? We will begin by blending the Superstore data and the Sample – CoffeeChain database. You can find the data for the latter here as well:
Here you must have observed a few things; let’s take them up one by one :
Let’s rectify these null values by interchanging Steps 3 and 4 :
Now that your data is all ready, blended or joined, let’s start making some interesting dashboards. From here on out, we will be using only the Superstore data: Orders + Returns (Left-Joined).
Let’s start off by considering the example of a Survey analysis. In a survey of Food Consumption, under the section of Food Preferences, instead of ‘Low Fat’, you may have ‘LF’, or instead of ‘Regular’, you may have ‘reg’.
In such cases, during data visualisation, you face issues like the following:
As you can see, due to different nomenclatures, this visualisation is not ideal. So, one possible solution to this is Grouping where you can place ‘LF‘ and ‘Low Fat‘ in one group, and ‘reg‘ and ‘Regular‘ in another :
Let’s understand this a bit better with the help of the following Dashboard :
The above is the Returns Analysis across Categories and their Sub Categories. Although, it cannot be seen, in the bar graph, Copiers have the maximum Return percentage, followed by Furnishings :
As far as the Line Chart goes, it seems that the Sales Team had been right all along. The Returns were in fact increasing quite rapidly, but luckily from what we can see, the rise is slowly receding.
From the pie charts, you can clearly analyse the Returns of each Category. As can be observed, Technology suffered the maximum number of Returns.
Another analysis, which I am going to leave for you to make, could be the Return distribution across the various States. Once you finish learning how the above graphs were made, you can easily make this too. So, let’s get started :
This step automatically segregated the Sales of each Sub Category based on whether the Orders had the Null or Yes value under Returns or not.
The remaining steps are merely customisation. Let’s do those as well :
Also, in the Legends, you will most likely see ‘In/Out’ as the aliases. You can change this as per your requirements, by right clicking on the In/Out blue pill in the Marks Pane, and choosing Edit Alias.
Let’s shift to the pie chart that we had made. We are going to apply the same ReturnedOrNot group to this as well. First create two duplicates of this sheet, and work on one of them :
Its equally easy making the Line Chart :
All that is left now, is combining the above Worksheets into one Dashboard. Why don’t you try making the chart for State wise Returns distribution as well?
After taking note of the Returns analysis, your organisation decided that the increase in Returns was not that alarming, and that it should not be constituted as a reason for non – expansion.
But the Superstore is only going to expand in those States where the Sales and Profit both have crossed a certain margin, for example, 40,000 and 10,000 respectively :
So, Sets, as are created above, are really similar to groups. In Set, you group data that fulfils a particular set condition. Another interpretation could be: Groups help you attain a higher level hierarchy, as we had seen in the previous example, whereas Sets help you attain a lower level granularity.
Let’s understand this better by creating the above Dashboard :
This step joins the above two conditions for Sales and Profit, to get the requisite combined computation. To view the results :
You can always customise the above chart by changing the colour, adding labels etc.
The line chart is as easy to make as the one we had made previously for Trend of Return. Here we have excluded the States that belong to the No Expansion group, like we had excluded the Not Returned there.
The Superstore dataset is pretty comprehensive. It offers quite a lot of information and field sets. But like all data, there is always the possibility of extracting more features. Calculated Fields help you do exactly that while also allowing you to carry out both simple and complex calculations on the data.
So what is a Calculated Field?
To put it in simple words, it’s a formula that you apply to your data, where the various Measures act as the variables.
How to create one?
Just simply go to Analysis, click on Create Calculated Field and something like this will pop up :
Here is where you write your formulas, and as you can see, Tableau provides you with the various syntaxes too, so that you never feel lost! You can also apply ‘If-Else’ conditions, ‘Case’ conditions (as we shall see next) and of course the usual mathematical computations too, which we will explore now.
So what Calculation to start with? Let’s begin with something simple, that is, Average Sales associated with the Orders. The most apt formula for the same would be Total Sales / Total Number of Orders. To convert this formula in Tableau terms, Total Sales implies the SUM of Sales, whereas Total Number of Orders means their COUNT.
Let’s get to the application then :
So what you have basically accomplished with the Calculated Field is create a Measure of your own, which you can use just like Sales and Profits.
Obviously this was just a gist of what Calculated Fields can do. They can be used for various complex calculations as well, and the glimpse of one such instance can be seen in the following section.
Filters such as the following are a great way of interacting with the visualisations on Tableau :
Just like filters are measures through which you can look at various aspects of your data, a Parameter is another great feature. It can be used in place of Filters, and can exhibit its own dynamic property too.
So what are Parameters? These act as variables of an equation, which you can change to get different results each time.
Let’s try and understand this with the help of an example. So far we have been making separate graphs for separate Measures. Whenever we had to analyse Sales, Profits, Quantity or Discount of the various Categories of Products, we had to make different charts, everything being the same BUT the Measures.
With the power of Parameter Control, there is actually a pretty easy way of going about this, without the repetitions.
Consider this for Sales :
And the following for Profit :
Just with a simple click on the MeasureValue list, you are getting cumulative results across various Dimensions. Here, the MeasureValue is our Parameter, since we are able to change its value.
Let’s get to making one of our own now. Parameters rely heavily on Calculated Fields, so you are going to get a nice chance to practice what you learned above:
Do not expect the drop down list to start magically creating graphs on its own. There are still some computations to be done.
So far we have only allocated the names to the Parameter, but not the Values that they are supposed to take up. So for this purpose, we are going to create a Calculated Field.
There you go! Go ahead and try changing the MeasureValues now.
Now onto the chart we have created in the Dashboard. To see the lines for the individual Categories, simply drag that Dimension on top of the chart :
For creating other charts, do as you normally would, but instead of dragging the individual Measures onto the Rows / Columns, drag the NameOfMeasure parameter instead.
That brings us to the end of this article. But don’t worry, I will be back again with another article on Tableau!
Meanwhile, I think it’s customary to give you a Dashboard to make :
You may face a tad bit of difficulty in the beginning, but if you truly apply yourself, you are bound to get it. And of course if there are ever any doubts, or if you wish for me to cover any concept in the next article, do leave them as comments.
All the best to you Data Explorers!
This is the most in depth article I have seen about Tableau. I mostly use it for a few merged charts in one Dashboard, but this is so much more involved. Great work.
I am glad you appreciated the article Matt. Tableau can infact be used in a variety of ways to accumulate detailed and clear insights. Do try them out.
Great work pavleen this is what i was looking after your first article on tableau. Will help alot in cracking interviews. Will you post the advance level too after this?
Thanks Nikhil. Hope it helps too. And yes the advance version is in production :)
Hello Pavleen, great one! Request you to re-upload the coffee chain data as an excel or csv, the .tde file read is not supported by the tableau public edition that I am working off. thanks
Hey Hita, Thanks for pointing it out. I have made the change and you can find the data here as well.