This article was published as a part of the Data Science Blogathon
Reading the title, if you have guessed that the article is all about odds, you are right. This article is all about odds and its variants – log of odds, odds ratio, etc. I am not sure about you, but I have always been confused by the terms odds, odds ratio, etc. These terms were taught using lengthy formulas and whatnot. If you are as confused as I was, this article will clear all your confusion.
You might have come across the term odds while studying probability or statistics. It is also famous in the betting, horse racing industries, etc. You might have heard the sentence, “What are the odds, that my horse will win the race” or “What are the odds that I will win a lottery”. Odds are nothing but chance, when someone says, “What are the odds”, one can interpret it as “What are the chances”.
Odds are basically the ratio of some event happening to some event not happening. It can also be defined as the ratio of the probability of an event happening to the Probability of the event not happening. Odds can be expressed as a Ratio or a Fraction.
Now one should also note that Odds should not be confused with Probability. Probability is the ratio of an event happening to the total number of events (Event Happening + event Not Happening). Odds can be derived from Probabilities and vice versa. But it is especially important not to confuse odds with probabilities.
We can consider an example to demonstrate the difference between Odds and Probability. Consider a team that played 100 matches and won 25 of them and lost 75 of them. Now we can calculate the Odds and Probabilities as follows,
We can say that the Odds in favor of the team winning are 1:3 or 1/3 or 0.333. Since we have odds in favor of the team winning, we also have odds against the team winning which is the “multiplicative inverse” of the odds in favor of the team winning. As you might have worked it out, the odds against the team winning are 3:1 or 3/1 or 3.
We can also calculate the probability of the team winning and losing as follows,
As I had mentioned that Odds can also be calculated from the probabilities, we will see how it is done below,
Few things to keep in mind about the odds,
As we have seen in the example above, the odds in favor of the team winning were 0.33, and the odds against the team winning were 3. This is just a simple case of 100 matches being played, consider a hypothetical situation where 1000 matches are played, and a team won only 25 of those and lost 975. In such cases, the odds in favor of the team winning will be 0.0256 and the odds against the team winning (odds in favor of team losing) will be 39. Because of such a large gap in the magnitude of the two odds, it becomes necessary to normalize it. This is mainly the reason why we log transform the odds. One more advantage of Log transforming odds is that once transformed, the distribution turns out to be symmetrical and this is helpful in the case of binary classification problems.
If you consider the above example, after log transforming, the odds will be as follows,
This can also be calculated using the probabilities as below,
It is also interesting to note that the formula that we just saw to calculate the Log of odds using the probabilities can also be written as below,
If you have seen the above formula somewhere, then you are right, it is the Logit transformation formula that we use in Logistic Regression. The need to Log transform the odds will be further illustrated using the example below.
Consider a team that played 1000 matches, and if we consider 1000 possibilities of the team winning and losing (both adding up to 1000) and then calculate the corresponding odds, and plot the graph of log odds, it will resemble the normal distribution. This is implemented using the python code below,
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# initializing the Win and Lose lists
Win = list(range(1,1000,1))
Lose = list(range(999,0,-1))
# creating empty data frame
df = pd.DataFrame()
# initializing the columns of the data frame with the two lists
df['Win'] = Win
df['Lose'] = Lose
# calculating the odds of winning and losing
df['Odds_Win'] = df['Win']/df['Lose']
df['Odds_Lose'] = df['Lose']/df['Win']
# calculating the log of odds of winning and Losing
df['Log_Odds_Win'] = np.log(df['Odds_Win'])
df['Log_Odds_Lose'] = np.log(df['Odds_Lose'])
# plotting the log odds of winning
sns.displot(df['Log_Odds_Win'],kde=True);
plt.title("Distribution of Log of Odds in Favor of Winning");
plt.xlabel("Log Odds");
plt.ylabel("Count");
You can see the histogram generated below,
As the name suggests Odds Ratio is just a “Ratio of Two Odds”. Although “Odds” is also a ratio, “Odds” and “Odds Ratio” are not the same. Odds are the ratio of an event happening to an event not happening, but Odds Ratio is the ratio of two odds (Odds1 and Odds2). The Odds ratio is an important concept that is useful while interpreting the output of the Logistic Regression algorithm, it also measures the association between events. It is especially important to differentiate the terms “Odds” and “Odds Ratio” and not to get confused between the two. The Odds ratio is expressed by the formula given below ( using the probabilities ).
As we have seen in the case of Odds, values of the Odds ratio range from 0 to infinity. When the numerator in an odds ratio is lesser than the denominator, the value of the Odds Ratio is less than 1 and when the numerator is greater than the denominator, the value is greater than 1 ( Up to infinity). Similar to Odds values, since there is a chance that the magnitude of these two values will be different, it is convenient if we normalize the ratio using Log normalization. Once we do that, the distribution of the odds ratios becomes normal. Log of odds ratio can be defined using the formula below,
Just for explaining the concept of Log of Odds Ratios, using the problem of win and lose described above, we can also calculate and plot the Log of Odds ratio using Python as below,
# calculating odds ratios
df['Odds_Ratio_Win_Lose'] = df['Odds_Win']/df['Odds_Lose']
df['Odds_Ratio_Lose_Win'] = df['Odds_Lose']/df['Odds_Win']
# calculating log of odds ratios
df['Log_Odds_Ratio_Win_Lose'] = np.log(df['Odds_Ratio_Win_Lose'])
df['Log_Odds_Ratio_Lose_Win'] = np.log(df['Odds_Ratio_Lose_Win'])
# plotting the log odds of winning
sns.displot(df['Log_Odds_Ratio_Win_Lose'],kde=True);
plt.title("Distribution of Log of Odds Ratio Win Over Lose");
plt.xlabel("Log Odds Ratio");
plt.ylabel("Count");
The histogram of Log of Odds ratios is as below,
Consider the example of Smoking and its effects on Lung cancer, if we are to form a two by two table showing the effects of smokers and non-smokers in causing lung cancer, the table would look something like this,
Cancer | Non-Cancer | Totals | |
Smoker | 100 | 60 | 160 |
Non-Smoker | 34 | 125 | 159 |
Totals | 134 | 185 | 319 |
Out of the 319 patients,
From the above information, if you want to calculate the Odds Ration, you just have to cross multiply and take the ratio. Odds Ratio can be calculated as,
Now, what this odds ratio means is that the odds of someone smoking and having cancer are 6.127 times the odds that someone who does not smoke and has cancer. To find out if this Odds Ratio is statistically significant or not, we need to calculate the Confidence intervals, which is not in the scope of this article. However, you can see a complete example of it in the link here.
Odds and Odds Ratios play a very important role in the Medical domain, betting industries, etc. It becomes especially important in the medical domain to check the effect of certain exposures on certain outcomes. It is also important in the gambling industry as it relies heavily on the odds and probabilities. I hope I have been able to explain the intuition behind Odds, Odds Ratios, and Log effectively transforming the two.
As always any improvement tips, suggestions are always welcome.
I am a Software Test Engineer with a passion for Data Science, I am looking to explore opportunities in the field of Data Science and Machine Learning, You can connect with me on GitHub and LinkedIn.
Another article on Data Scraping using Python and Selenium can be found here.
Some excellent resources on Odds and Odds ratios can be found here and here.
To convert log odds to probability, apply the formula:
probability = e^log_odds / (1 + e^log_odds)
WLO is a modified version of the log odds ratio (LOR) that considers the frequency of both the feature and the target variable, making it less sensitive to rare features and more sensitive to common ones. It’s often used in text analysis and machine learning to identify relevant features.
Bayes’ rule in log odds form simplifies calculations and provides a more intuitive understanding of relative likelihoods. It expresses posterior odds as the sum of prior odds and the log likelihood ratio.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.