You’ve undoubtedly heard about gold diggers as well. In the majority of these cases, individuals discover enormous wealth with the aid of gold diggers and become overnight millionaires.
Your friend has a gold detector. You, too, have chosen to join the group of Gold Seekers in order to help your buddy. So you and a friend go to a mine with around 1000 stones in it, and you guess that 1% of these stones are gold.
When gold is identified, your friend’s gold detector will beep, and the method is as follows:
This device detects gold and constantly beeps when it comes into contact with gold.
This device is 90% accurate in identifying gold from stones
As you and your fellow explore the mine, the machine beeps in front of one of the rocks. If this stone is gold, its market value is about $1,000. Your buddy recommends that you pay him $ 250 and pick up the stone. The deal seems appealing because you earn three times as much money if it’s gold. On the other hand, the gold detector’s accuracy is great, as is the likelihood of gold being gold. These are the thoughts that will finally encourage you to pay $ 250 to your buddy and pick up the stone for yourself.
It is not a bad idea to take a step back from the world of gold seekers and return to the beautiful world of mathematics to examine the problem more closely:
Given that there are approximately 1000 stones in this mine and that 1% of them are gold, this means that there are about 10 gold stones in this mine.
As a result, approximately 990 stones in this mine have no unique material value.
The device’s accuracy in distinguishing gold from stones is 90%, which means that if we put 990 stones (which we are certain are not gold) in front of it, it will mistakenly sound for about 99 stones.
Given the foregoing, it is likely that if we turn this device in the mine, it will sound 109 times, even though only 10 beeps are truly gold. This means that there is only a 9% chance that the stone we paid $250 for is gold. That means we didn’t do a good deal and probably wasted $250 on a piece of worthless stone. If we want to mathematically summarize all of these conversations, we will have:
After investigating this issue mathematically, we discovered that the “measurement accuracy” parameter alone is insufficient to achieve a reliable result and that other factors must be considered. The “false positive paradox” is a concept used in statistics and data science to describe this argument.
This paradox typically occurs when the probability of an event occurring is less than the error accuracy of the instrument used to measure the event. For example, in the case of “gold diggers,” we used a device with 90% accuracy (10% error) to investigate an event with a 1% probability, so the results were not very reliable.
Familiarity with Terminology
Before delving into the concerns surrounding the “false positive paradox,” it’s a good idea to brush up on a few statistical cases. Assume that a corona test has been performed to help you understand the notion. This test yielded four modes:
True Positive: You are infected with the Coronavirus and the test is positive.
False Positive: You have not been infected with the Coronavirus, yet the test is positive.
True Negative: You have not been infected with the Coronavirus, and the test result is negative.
False Negative: You have been infected with the Coronavirus, but the test results are negative.
It should be mentioned that the corona test and medical tests, in general, are used as examples here, and these four requirements may be applied to any event in which there is a risk of inaccuracy.
In the instance of gold seekers, the percentage of false-positive error of the device, i.e. if the device is not gold but the device beeps, was 10%, and the percentage of false-negative error of the device, i.e. if the device is gold but the device does not beep, was 0%. In the next sections, we will look at some different aspects of the “false positive paradox” debate.
Measurements Accuracy: Unknown Virus
A mysterious virus has infected a city of 10,000 people, impacting roughly 40% of the population. As a product manager, you focus on creating the viral detection kit as quickly as feasible in order to distinguish infected persons from healthy people.
Your ID kit has a 5% false-positive error rate and a 0% false-negative error rate. This kit is currently being utilized to detect sick persons in the city, and your predicted outcomes are as follows:
Estimated number of people with the disease:
Given that this camera’s false positive and false negative error rates are under 1% and it has 99 per cent accuracy, the most likely response to this question is that if the alarm goes off, with the probability of 99 per cent, terrorists will be detected. There is a complication. However, dealing with statistics-probability concerns is not so easy!
We estimate that there are 500 terrorists in a metropolis of 1 million people. This assumption is plausible and supported by demographic and statistical data. Now we return to the original question: what is the likelihood of a terrorist being within the complex if the alarm goes off? The following calculations are used to arrive at this percentage.
Given the reconnaissance camera’s 99 per cent accuracy, there are 500 terrorists in the city, and if they all pass in front of the camera, the siren will sound 495 times:
99,900 are regular folks (the entire population of the city minus the terrorists) If all of these persons pass in front of the camera, the alert rings 9995 times due to the recognition camera’s 1% error:
As a result, if the entire city passes in front of this camera, the siren will ring 10490 times. However, the siren only sounded properly in 495 of the cases. It is now possible to assess what proportion of terrorists would be present if a siren sound:
According to these estimates, if the alarm goes off, it is more than 95 per cent likely to be false, and there is no terrorist within the complex. The end outcome is considerably different from our original concept. Most people are astonished by this outstanding precision when they read about the camera’s 99 per cent accuracy and believe that the majority of the output of this gadget would be right, but we have once again demonstrated that measurement accuracy alone is insufficient. Photo by Markus Spiske on Unsplash: This situation, the likelihood of a terrorist being present among the city’s residents is close to 0.05 per cent. This is despite the fact that the inaccuracy of this instrument is just approximately 1%. As a result, we are once again confronted with the “false positive” problem, which has resulted in an inefficient output.
Test of Consciousness
An alert device has entrusted you with product management. Police will use the gadget to detect drivers who have drunk alcohol or used drugs. The following are the specs for the product produced by your team:
This equipment has a zero per cent false-negative error rate, which indicates that it accurately screens all persons who have used alcohol or drugs.
This gadget has a false positive error rate of roughly 5%, which means that in 95 per cent of situations, the test result for those who haven’t used drugs or alcohol is negative. Still, in 5% of cases, the test result is positive.
You spend some time thinking about product releases and ask the police to provide you with a study on the incidence of alcohol and drug use among drivers because you are adept in data science and have been a data scientist before taking on product management.
Following an examination of the data, you will discover that, on average, 5 out of every 1,000 drivers had ingested alcohol and drugs. This is a bit of a problem since if the police randomly test drivers with your existing product, it might lead to a disaster! We undertake the following computations to have a better understanding of this problem.
Five individuals out of every thousand have taken alcohol and drugs, and because the device’s false-negative error rate is zero per cent, the test of these five people will be positive.
As previously stated, the device’s false positive error rate is around 5%. This means that around 50 of the 995 drivers who have not drunk will test positive:
This indicates that just 5 persons have been confirmed right out of 55 positive tests published among 1000 people. This shows that the test’s accuracy is around 9%, and if someone tests positive, it is more than 90% probable that it was not ingested and is hence innocent!
As a result, it is evident that using this gadget at random would be very error-prone and will badly harm your firm’s reputation and the police department. To fix this difficulty, the circle of persons must be somewhat restricted. In other words, you should devise a way in which the likelihood of a person taking alcohol or drugs was more significant than the device’s inaccuracy. So you create a code of conduct in which you specify behaviours that, if a driver does any of them, they have a 60% likelihood of ingesting alcohol or drugs. This will provide credence to the device’s output. To be sure, imagine cops halted a group of 100 drivers who were sceptical of the method.
Photo by aranprime on Unsplash
Given that the chance of consumption among these people is 60%, in a group of 100 people, approximately 60 people have consumed, hence the test of these 60 people will be positive:
Because the gadget has a 5% mistake rate, two of the remaining 40 persons are likely to test positive:
As a result, it was discovered that even a basic understanding of the sample space had a significant impact on the outcome. Of course, being aware of your sample space is a lengthy issue that we will return to in future posts.
Conclusion
According to the findings, the measurement precision of a device alone cannot ensure the reliability of the output, and the sample space under consideration is maybe more essential than the instrument’s accuracy. To avoid the effect of a “false positive paradox,” conditions must be constructed in which the chance of occurrence exceeds the device’s inaccuracy. In the instance of the “awareness test,” this resulted in a significant boost in output accuracy.
Read more articles on our website. Here is the link.
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Powered By
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.