This article was published as a part of the Data Science Blogathon.
How can you ensure that the many lines of code you write meet the objective? The critical phases of analyzing data are after you have the dataset and before you set up the environment, to write codes. There is a science to what data we have access to and will it suffice to answer questions the business is seeking. That is a discussion for another time. Here we will focus on what happens once data is available.
An effort that is critical to the success of any analysis exercise is to organize thought.
This is the phase before we begin to set up the environment to run the codes. The analysis begins much before we start writing the code. The process begins with understanding the question. Map it against the libraries of Python, Mathematics, statistics, color, language. That is not a comprehensive but a good starting point.
Let’s understand this with an example.
“The learning curve is an upward sloping line with a positive delta”
That line will be understood differently by each reader here. Take a pause and note your thoughts. Think about what will be an appropriate visual description, other than words. You are welcome to share your thoughts with Analytics Vidya and the author even before you complete reading. you can write at [email protected].
There are many layers rolled up in that statement. Towards the end of the article, we will enlist them. Here we visit how we organize thought and the impact of color on visualizations.
A data set (or dataset) is a collection of data. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable. [1]
Tabulating is the first step towards organizing thoughts. It is a representation of something along rows and columns. Tabulation works great for numbers. Markdown cells are a good way to tabulate verbal descriptions.
Let’s look at the underlying data. Tabulating and quantifying aid understanding.
The temptation as a data scientist would be to say: value 1 performance at 25. If your first reaction was; why, who said so, can we make any conclusion basis this? Well, congratulations you are on your way to be a data scientist!
Visualization is a visual description of thought. Nonverbal descriptions are graphs.
Be aware that visualization of a learning curve need not always be a line graph. Nor does it have to be upward sloping. Data Science encourages us to build. Share your thoughts on other visualization possibilities of the learning curve.
No matter how detailed the visualization is, if presented on a paper or screen it will be a 2-dimensional. Colour will add the third dimension.
Some of the simplest visualizations around our bar graphs. They are simple to decipher. They are simple also because they breed familiarity. Tally marks and bar graphs go back at least 10 years for most of us. Let us use Bar Graphs to understand color as a dimension.
The dominant features above can be presented as follows:
No matter how detailed the visualization is, if presented on a paper or screen it will be a 2-dimensional. All conversations will be between the x-axis and the y-axis. Colour will add the third dimension.
Preconceived notions and cultural influences add depth to data analysis. Data and Datum can be numbers, words, and expanding to be pictures and even voice. So everything that affects anything is data.
If you can organize thought, then you will be able to identify the variables. Each variable will be the heading of a column in the table. All readings together (if numeric) will be distributed. Data science will then become an exercise of understanding the interplay between variables.
Let’s enlist the various layers rolled up in the statement.
“The learning curve is an upward sloping line with a positive delta”.
In conclusion: organizing thought forms the bedrock of data analysis. Tabulation helps organize thought. Visualization will generate insights about variables. In the journey of data, scientists enlist as many layers. Explore the interplay between variables. Enjoy the journey! Share your thoughts with the author at
https://www.linkedin.com/in/priyanka-krishna-sharma/
and
Very interesting and thought provoking blog. Probably adding a locational tag may add another dimension to the data set and take the learning curve much higher.
Very interesting
Good one priyanka!