How effectively can you convert a business problem into a data problem?
This question holds the key to unlocking the potential of your data science project. There is no one-size-fits-all approach here. This is a nontrivial effort with positive long-term results and hence deserves a great deal of focused collaboration across the product team, the data science team, and the engineering team.
Every leader knows that being able to measure progress is an invaluable aspect of any project. This understanding goes to an entirely different level when it comes to data science projects.
We discussed how to manage the different stakeholders in data science in my previous article (recap below). In this article, we are going to discuss the journey of translating the broad qualitative business requirements into tangible quantitative data-driven solutions.
One of the most tangible advantages of this approach, among many others, is that it establishes a common understanding of what ‘success’ means and how we can measure it. It also lays a framework for how progress will be tracked and communicated among the various internal and external stakeholders.
This is the second article of a four-article series that discusses my learnings from developing data-driven products from scratch and deploying them in real-world environments where their performance influences the client’s business/financial decisions. You can read articles one and three here:
Let me quickly recap what we covered in the first article of this series. It’s important to have this background before reading further as it is essentially the base on which this article will revolve.
We discussed the three key stakeholders in a data-driven product ecosystem and how the data-science-delivery leader has to align them with each other. The three main stakeholders are:
With that background, let’s dive into this article!
Consider the following mini-scenarios:
Did you find any of these questions ‘artificial’? Do re-read the scenarios and take a few seconds to think through. Most of us would find these questions to be perfectly natural!
What would certainly be artificial though is asking questions like:
In most scenarios, we express our asks in qualitative terms. This is true about business requirements as well.
Isn’t it more likely that the initial client ask will be “Build us a landing page which is aesthetically pleasing yet informative” versus “we need a landing page which is rated at least 8.5-out-of-10 by 1000 random visitors to our website on visual-appeal, navigability and product-information parameters”?
On the other hand, systems are built and evaluated based on exact quantitative requirements. For example, the database query has to return in less than 30 milliseconds, the website has to fully load in less than 3 milliseconds on a typical 10mbps connection, and so on.
This gap between qualitative business requirements and quantitative machine requirements is exacerbated when it comes to data-driven products.
A typical business requirement for a data-driven product could be “develop an optimal digital marketing strategy to reach the likely target customer population”. Converting this to a quantifiable requirement has several non-trivial challenges. Some of these are:
To define customers ‘similar’ to our target population, we need to agree on a set of N dimensions that will be used for computing this similarity:
After that, we need to critically evaluate whether all the relevant data exists in an accessible format. If not, are there ways to infer at least parts of it?
Consider a business scenario where a company has a chatbot that handles customer queries automatically. When the chatbot fails to resolve a customer query, the call is transferred to a human expert.
It is fair to assume that the cost of a human expert manning a call center is higher than an automated chatbot resolving the customer query. Thus, the business problem can be stated as: Reduce the proportion of calls that reach a human expert.
The first barrier to cross is often the HiPPO Effect.
Simply put, the HiPPO (Highest Paid Person’s Opinion) effect states that the authority figure’s suggestions are interpreted as the final truth, and promptly implemented, even if the findings from the data are contrary.
For instance, in the above example, the HiPPO might be that calls are getting diverted to human experts due to time-out issues related to network connectivity within the chatbot’s workflow. A more prudent data-driven approach would be to list out all the possible reasons leading to call diversions, one of them being the connectivity issue.
Such a list can be derived from a combination of expert knowledge and some initial data log analysis. This step falls under, what we call, the ‘data-discovery’ phase.
The data-discovery phase, which is essentially an iterative process, systematizes the use of insights from the data to guide the expert’s intuition and to identify the next dimension of data to investigate.
The data-discovery phase also identifies if there are any gaps in the ‘ideal-data-needed’ vs. ‘actual-data-available’. For example, we may identify that the last interaction between the chatbot and the customer is not being stored in the database. This lack of data needs to be solved promptly by changing the data storage schema.
Source: Yseop
Let’s assume that this analysis of possible failure scenarios led to the following findings:
Armed with this information, the next step would be to dig deeper. For example:
The findings from this step will help rank the problems in terms of their prevalence and also identify systemic issues. If the failure of the speech-to-text component is one of the prevalent problems, the speech-to-text vendor needs to be approached to identify if the speech inputs are not being captured/transferred as per the norms/best-practices or if the speech-to-text system needs more context for better predictions.
Moving further along in this journey, translating qualitative data specific questions into quantitative model training strategies is also a nuanced topic, one that can have far-reaching consequences.
Continuing the conversation on speech-to-text issues, it may seem prudent to answer ‘who is the caller?‘. At the surface level, it may seem synonymous to ‘is the caller Miss Y?‘. But these two questions lead to totally different Machine Learning (ML) models.
The ‘who is the caller?‘ question leads to an N-class classification problem (where N is the number of possible callers), whereas ‘is the caller Miss Y?‘ leads to N binary-classifiers!
While all of this may seem complex and data science-led, we cannot underestimate the role of the domain expert. While all errors are mathematically equal, some errors can be more damaging to the company’s finances and reputation than others.
Domain experts play a critical role in understanding the impact of these errors. Domain experts also help layout the best practices in the industry, understand customer expectations and adhere to regulatory requirements.
For example, even if the chatbot is 100% confident that the user has asked for a renewal of a relatively inexpensive service, the call may need to be routed to a human for regulatory compliance purposes depending on the nature of the service.
One of the final steps is to have a relevant subset of data labeled by human experts in a consistent manner.
At the vast scale of Big Data, we are talking about obtaining labels for hundreds of thousands of samples. This will need a huge team of human experts to provide the labels.
A more efficient way would be to sample the data in such a manner that only the most diverse set of samples are sent for labeling. One of the best ways to do this is to use stratified sampling. Domain experts will need to analyze which data dimensions get used for the stratification.
Consistency in human labels is trickier than it may seem at first. If the existing automated techniques for label generation are 100% accurate, then there is no need for training any newer machine learning algorithms. And hence, there is no need for human-labeled training samples (e.g., we do not need manual transcription of speech if speech-to-text systems are 100% accurate).
At the same time, if there is no subjectivity in human labeling, then it is just a matter of tabulating the list of steps that the human expert has followed and automating those steps. Almost all practical machine learning systems need training because they are not able to adequately capture the various nuances that humans apply in coming to a particular decision.
Thus, there will be a certain level of inherent subjectivity in the human labels that can’t be done away with.
The goal, however, should be to design label-capturing systems that minimize avenues for ‘extraneous’ subjectivity.
For example, if we are training a machine learning system to predict emotion from speech, the human labels will be generated by playing the speech signals and asking the human labeler to provide the predominant emotion.
One way to minimize extraneous subjectivity is to provide a drop-down of the possible emotion label options instead of letting the human labeler enter his/her inputs in a free flow text format. Similarly, even before the first sample gets labeled, there should be a normalization exercise among the human experts where they agree on the interpretation of each label (e.g., what is the difference between ‘sad’ and ‘angry’).
An objective way to check the subjectivity is ‘repeatability and reproducibility (R&R)’. Repeatability measures the impact of temporal context on human decisions. It is computed as follows:
Reproducibility measures how consistently the labels can be replicated across experts. It is computed as follows:
Conducting R&R evaluations on even a small scale of data can help identify process improvements as well as help gauge the complexity of the problem.
Machine learning is typically ‘passive’. This means that the machine doesn’t proactively ask for human labels on samples where it is most confusing. Instead, the machines are trained on labeled samples that are fed to the training algorithms.
A relatively new branch of machine learning called Active Learning tries to address this. It does so by:
The human labels are sought on priority for such ‘confusing samples’.
For all the pieces to come together, we need an “all-rounder” data science team:
We covered quite a lot of ground here. We discussed the nuances of translating a qualitative business requirement into tangible quantitative business requirements.
Reach out to me in the comments section below if you have any questions. I would love to hear your experience on this topic.
In the third article of this series, we will discuss various deployment aspects as the data-driven product gets ready for real-world deployment. So watch this space!
Awesome post, thanks for sharing.
Thanks, Anna. Glad you liked it.
Very simply "Thank you !" - very informative article with relatable, real-life examples. That should help to avoid at least some of the usually inevitable pit falls.
Glad you found it informative, Ulf. Thanks, Om
Another great article Om! Looking forward to the next article!
Thanks, Kim. Yes, working on the third one now :)