Analytics Vidhya has long been at the forefront of imparting data science knowledge to its community. With the intent to make learning data science more engaging to the community, we began with our new initiative- “DataHour.”
DataHour is a series of webinars by top industry experts where they teach and democratize data science knowledge. On 4th June 2022, we were joined by Eshan Tiwari for a DataHour session on “Data Science in a FAANG company!”
Hold on! You must be thinking, what is FAANG, or you probably would have heard about FAANG. So, FAANG is an acronym for Facebook (now Meta), Amazon, Apple, Netflix, and Google (now Alphabet).
One of the companies that created a significant impact in the world of technology and the way of human interactions is Facebook, which also owns Instagram. Remember how we all went gaga with its launch?
In this DataHour, Eshan will demonstrate how data science is used on Facebook and what is required to become a data scientist at Facebook.
Eshan Tiwari is Data Science Lead at Facebook with 9 years of experience in analytics. He is an experienced Analyst with a demonstrated history of working in the consulting and internet industry and an expert in experimentation and applied ML. He is skilled in Management, Matlab, SQL, Microsoft Excel, Tableau, and Python.
Previously, he has worked with companies like upGrad, ZS Associates and SGS.
There is always confusion among all the new people trying to enter the market. Everyone wants to get into the data science world. But there are too many options to choose from. And out of all the data roles available from those roles as data scientists unless everything else is sitting into a different data. So, Facebook is a company of around 7000 to 8000 people. But they have less than one thousand data scientists in the company, whereas they have around 15000 data people. Hence, we can conclude that data scientist is not the majority. The same goes for other companies like Google. At Google, data scientists are close to 1000, whereas that company is around 160 000 people. So, there are way more data roles beyond data science titled. And data science titles are also good.
Source: Presenter’s Presentation
Generally, all the data rules in FAANG companies are categorized into two categories tech and non-tech data roles. There’s not much difference in terms of skill set. Still, what happens whenever you become part of a product or the people who develop the product? Now that product doesn’t comprise product managers, technical program managers, or software engineers. And then, there are data scientists and data engineers who enable data logging or analysis around the product. So that entire organization is called tech talks because leaders are usually engineers, product managers, or tech people.
Whereas if we think about it- there are many other roles- sales, business, partnership, operations, legal, HR or people, operations, ops data analytics. There are too many teams, and they also require a lot of data streams. But eventually, all leader is not like engineers or product managers, so that is called a non-tech role.
Ops Data Scientist/ Business Analytics/ Business Intelligence: There are a bunch of roles with the title operation data analyst, data analyst, business analyst, etc. They are all proper data roles. Then we need to develop a lot of dashboards, do a lot of reporting, and maintain our metrics. For example, if we want to understand how much bad content we have taken down from our platform, what has been the revenue for last quarter, or how many new video creators we have onboarded on our platform. All of this work is done by people sitting in non-tech data rooms.
Business Data Scientist: The third title we see in non-tech roles is business data scientists. So these are either sales data scientists, marketing data scientists, or partnerships.
Data Developer: And then one more title is data developer. So these data developers are very similar to data engineers. We use the data engineer title in tech, non-tech, or data developer.
Data Scientist-Analytic: The first role is of data scientist-analytics. These consist of people who develop the right set of metrics to measure product features or check the launch of a feature, the impact of the launch of a feature, etc.
Data Scientist-ML: These consist of the people who change recommendations in the back-end algorithm. For instance, the kind of post a person sees, the friend recommendations they get, and when should an ad pop up.
Machine Learning Engineers: This class comprises people who develop recommendation systems, classification systems, or anything related to deep learning or neural network.
Data Engineer: These consist of people who are similar to a software engineer, but they only focus on the world of data. For instance, billion of people use Facebook. Everything they do, like comments, messages, scroll, etc, is logged in by Data Engineer
These roles in FAANG companies are divided into tech and non-tech communities.
In tech data science, we have three kinds of designation.
Product- Data Scientist/Product Analyst: They consist of people working closely with the product and engineering team.
Data Scientists: The next in line are data scientists. They are the people who work on the long-term project or have used a lot of ML, worked on developing new algorithms, or used the existing complicated algorithm to come up with a solution. And at last, work with machine learning engineers to operationalize the model.
Core Data Scientist: These consist of the people who are generally post-doc or at least have done Ph.D. and are an expert in their field. Their product is the projects. These projects are not about how Facebook, Google, or Amazon is doing today. It’s more like what they should do three to five years from now or where people should make investments. Whether our platform supports any particular kind of political ideology or not. Therefore, core data scientists’ main work is working on proof concept ideas, i.e., the things we have not even thought about.
So in Facebook, we have an or where their role is to build new products and test fast. Then they take an idea and quickly build a product around it, then see if it is working. Core data science people work on whether Facebook should go into the marketplace kind of model. Suppose you have seen Facebook as a marketplace where you can buy and save things, rather than going from likes, and comments from several users. In that case, Facebook should start thinking about how meaningful time people invest in a platform. So these are the kind of problems these core data scientists team is attacking.
Similarly, if we talk about non-tech roles in FAANG companies. So here we have marketing or sales data scientist, legal data scientist, and partnership data scientist.
Marketing/Sales Data Scientist: The problems like launching a new ad product; how has been the impact on the venue, or reaching out to advertisers or content creators, video creators through digital marketing channels, and how is the funnel performing? These are the questions answered by a marketing/sales data scientist.
Legal Data Scientist: Additionally, Facebook has a lot of PR issues, stress, trust, and safety challenges. For this, they have a team called legal data scientists who look into companies from the government or police and try to predict the issues on the platform. For instance, things that are allowed in India might not be allowed in Dubai. So how can the platform proactively predict the legal issues in India, Dubai, or Europe? This is what this team does.
Partnership Data Scientist: Similarly, there is a team called partnership data science. So these are the people who basically try to understand; for example, in India, should Facebook have telecom partners like Airtel, or if the U.S. should have with atn or some other providers. So they generally help us understand what kind of news media, mobile network, video content creator, sports community, and movies we should be engaging by buying their exclusive right and put into our platform.
Product growth example: The presenter was working on making Facebook available in local languages. In 2015 and 2016, Facebook was unavailable in Indian languages, like Marathi, Bengali, or Hindi. So, the presenter worked with a team called the Internationalization team, that works on language translation. They had five or six variants of the Hindi language, out of which they had to choose the most optimal one. If you want to send a friend request so it could be like “dosto ya mitro ko judne ke liye request bhejo.”
Product Building: Similarly, when they are launching a new product, the presenter was working with the marketplace to launch the product in Thailand. They wanted to check – what kind of goods people were selling in the Thailand marketplace. Is there anything or something which can put us at legal risk? For example – marijuana is banned in Thailand. If people are selling marijuana which is allowed in, the U.S but not in Thailand, this would be a legal risk.
Product Development: For instance, adding new emoticons on our platform, we need to find the impact of those emoticons or, using Instagram stories and Facebook stories? Now you see, if you put a story on Instagram, you can put it on Facebook as well. The presenter would want to know- how the product is performing, how the metric is going, what are the features people use, and what features people do not use. If some changes are made, what will be their impact? All those work falls under the remit of product data science.
Let’s imagine you are working with a business team. There is a sales data scientist for India. He would give direction to the sales team on what should be the target for the quarter or what kind of customers you should or should not be reaching out to.
Suppose a program manager has developed a new program to increase sales in India. They would ask, hey, we applied four new tactics, can you tell me what has been the effectiveness of each tactic and which tactic is important and which is not.
You guys must have seen there’s so much hate content on the platform, violence, pornography, then people copying each other’s photos, impersonating itself. So there, Facebook particularly invests a lot of data science effort, interest, and safety.
So one of the projects on which the presenter worked – was hate speech in Myanmar, and he was asked how many/what is the percentage of hate coefficient in Myanmar. And when Cambridge analytics happened, he was asked to do how many advertisements live on the Facebook platform are coming with political agenda, or somebody would be like, hey, we have launched new privacy filters where let’s say you can lock your photos- what has been the effectiveness of locking this photo in stopping impersonation. A lot of those asked to go into this.
It is more like how many new data centers should be open, how much video consumption we expect in the future, and how much data capacity we need to increase according to that. Or let’s say what the headcount growth we are seeing is. So how many laptops should we procure in our company to provide laptops to every employee or a contractor? These are the kind of data science problems that are taken here by infrastructure growth.
You might have heard something called metaverse. When metaverse was about to be launched, there would be a data science team thinking about whether they should launch metaverse and what will be the audience of this. Recently Google came up with something called google for education. So during this pandemic, they must have seen that Coursera or companies like India Unacademy, UpGrad, or any other; seeing their growth, Google decided to launch Google Education. Now, what will be the size of customers or total addressable market we can get for Google Education. These are the kind of future investments where data scientists give income.
How should we rank ads so that it should not be like the person who is paying the highest only their ad issuing, or how are we increasing the quality of the ad. And how could we what kind of inventory we should put in what time of the day? So a lot of work goes into the ads ranking/ad auctioning.
Similarly, a lot of work goes into the impact management of programs. If we on something related to safety if, we learn something related to user awareness. How to measure the impact of all those launches are the typical kind of data science quality.
Getting into DS roles in FAANG is not rocket science. It’s like a very standard process across the organizations. The presenter itself had interviewed all major tech companies. He experienced four major themes:
About 90 to 95 percent of people check for your Sequel. Why sequel only? Because all the data these days is getting logged into our databases, cloud, and all those, for accessing those, you need to know the programming language such as SQL. Now at least internally, the sequel we use is compelling. A lot of stats could be done into those. In smaller companies where you can’t do a lot of stats into your SQL server, they use a combination of SQL in python. Python is mostly used for data science libraries, whereas SQL is mostly for data manipulation and extraction. Some companies also check R. Python or R is optional; in most interviews, they will just check SQL.
So in each row, you can expect two to three kinds of case studies about FAANG companies. Now, what are the types of case studies? Let’s say the number of uber rides has been going down for the last seven days. How will you investigate? Let’s say Facebook has launched a new emoticon on WhatsApp. How will you measure the impact of this? Or, Let’s say we are building something called Facebook dating. Should we launch the app or not? So doing a-b test experimentation there.
So mostly understanding what a type 1 error is, a type 2 error, the distribution of the data set, and the assumptions of linear regression. We didn’t ask for these checks. We do these checks for logic and basic knowledge of ML-like linear regression, logistic regression, and clustering that is more than sufficient. Instead, we check probability, knowledge of distributions, and type 1 type 2 error. How would you handle daytime balance, missing data, imputation, etc.? These kinds of things we want to check.
As a data scientist, you would be working with multiple cross-functional partners. You would be actually teaching people to how to make driven decisions or how to make data-driven decisions. This will require a lot of communication and influence. So behavioral interviews have become very popular. So the kind of questions you can ask are- tell me about the time when you received difficult feedback, tell me about the time when people did not agree with your idea, how did you convince them, tell me about a time when you helped someone to grow. Basically, they want to understand your work approach. So, a combination of these distinct things helps in an interview.
Generally, your interview is of two-part:
Screening: Here, they just want to check whether you are even a good candidate or not. So they take one SQL round and one small case study. After clearing this one, you go for on-site.
On-Site: Here, they have like two case studies-one is stats, ML, and SQL round, and other behavioral interviews. A few companies can take one or two more rounds, which is like the standard process.
I hope you enjoyed the session and understood it very well. I hope these tips will help you in the long run, and you will get selected for FAANG soon. All the best for the lined-up interviews and a new beginning.