Writing is the best way to improve retention. Converting your learnings into your own words not only leads to a better understanding but also leads to innate observation, which in turn leads to improving curiosity.
In short, writing elevates your learning process to unfathomable levels.
Writing is at the core of Analytics Vidhya’s principles. We have always tried to deliver the best content possible and 2020 was no different for us. With more than 500+ articles published this year, the writing journey never stops for us.
In this article, we highlight the 10 most read articles by the Data Science community on our blog, published this year.
So let’s get the ball rolling!
The best performing article on our blog is the article that is based on the most fundamental questions you ask a data scientist or data analyst in an interview-
“How many data science projects have you completed so far?”
The answer makes all the difference. Data Science is not a field where theoretical understanding helps you start apart. It is the projects you do and the practice you have that determines your probability of success.
Just doing courses or attaining certifications isn’t good enough. Almost everyone we know holds certifications in various aspects of data science. It adds no value to your resume if you don’t combine it with practical experience.
But which data science project should you choose? We at Analytics Vidhya love collecting the best data science projects every month and in this article, we have collected and the best open-source data science projects for the month of June 2020.
You can check it out here.
Feature scaling helps you convert various variables having myriad units of measurement such as kilogram, Rupees, Years, etc into unitless measures. But the question is which method of Scaling to use?
One of the obstacles that every data scientist faces is the dilemma to choose between Normalization and Standardization. The majority of the courses do not focus on this topic. Feature scaling is one of the most important preprocessing steps and playing around with this concept without proper knowledge may lead to an inaccurate or biased model.
The article also talks about why some machine learning models improve drastically with feature scaling while others do not even move a little.
You can read the article here.
“What are the best tools for performing data science tasks? And which tool should you pick up as a newcomer in data science?”
The essence of the article is covered in the question asked above. Once we identify what to learn at a personal level, or do at a professional level with the data, we need to identify the tools that best suit the task. This article is all about identifying the best fitting tool.
Data science is a very vast topic with each spectrum requiring the data to be dealt with in a unique way. And since your models are prone to have a huge impact on the decisions of the organization, it is really important to identify which tools to use.
The article is divided into 2 parts, with the first part focussing on tools for handling Big data in terms of- Volume, Variety, and Velocity. The next part talks about tools for data science in terms of- Reporting and Business Intelligence, Predictive Modelling and Machine Learning, Artificial Intelligence.
You can read the article here.
2020 will go down in the books of history as the year which changed the entire humanity. Every facet of life was impacted by the Coronavirus and it was imperative for people from all domains to come together and to contribute towards solving this problem.
The article covers the use of Generative Adversarial Networks (GAN), an Oversampling technique on real word skewed Covid-19 data in predicting the risk of mortality. This story gives us a better understanding of how data preparation steps like handling imbalanced data will improve our model performance.
The data and the core model for this article are considered from the recent study (July 2020) on “COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm” by Celestine Iwendi, Ali Kashif Bashir, Atharva Peshkar. et al. This study used the Random Forest algorithm boosted by the AdaBoost model and predicted the mortality of individual patients with 94% accuracy. In this article, the same model and model parameters were considered to clearly analyze the improvement of existing model accuracies by using GAN- based Oversampling Technique.
You can read the article here.
Why deep learning?
This is a perfect question. We are swamped with machine learning algorithms. There is no dearth in the count and any kind of data can be solved using any of these algorithms.
Also, deep learning algorithms require huge computing power. So is it necessary to use these algorithms?
This article is a testament to all the queries that question the need for deep learning and its neural networks like convolutional neural networks (CNN), recurrent neural networks (RNN), artificial neural networks (ANN), etc. Deep learning supersedes machine learning in terms of decision boundaries and feature engineering.
You can read the article here.
Many of us still do not know the different domains in the data sector. We still use these terms interchangeably and it causes great confusion during communication.
There is a surge in demand for both Business Analytics and Data Science. Their market size is expected to reach $100 Billion and $140 billion respectively by 2025. Thus, it only makes sense to understand what both the domains actually mean, their responsibilities, and what are the similarities that lead to these terms being used interchangeably.
We at Analytics Vidhya, have come across a lot of aspiring analytics professionals who want to choose “Business Analytics” or “Data Science” as their career, but they’re not even sure about the distinction between these two roles. Before diving into your own choice, you should be clear about which path you want to take, right? It could be a career-defining choice!
This article explores the similarities and differences between business analytics and data science and tries to give you a better picture.
You can read the article here.
Some of the simplest tasks such as joining tables may seem tricky in python. This article is a simple guide to join 2 tables using the pandas library seamlessly.
Our 7th best performing article will help you understand the different types of Joins in Pandas:
You can read the article here.
This is the second data science open source project article to feature in this list. We take it as a clear sign that learning has not taken a backseat when it comes to data science aspirants.
This article contained the top open-source data science projects for the month of April. The list includes-
You can read the article here.
Coding is a very personal experience for any data scientist, business analyst, data analyst, or any programmer.
We have all been at a point in our coding journey when we feel that a particular tool is detrimental to our efficiency. The reason can range from your style of coding, your position in the learning path, or any other reason that makes the tool incompatible for you.
That’s where identifying the right IDE comes in. An IDE helps us write and execute Python code for analytics, data science, software development, and a plethora of other tasks. There are multiple IDEs in the market right now, with their own set of features, pros, and cons.
You can read the article here.
How do we represent that data in a way that’ll help our leadership team or decision-makers come to a consensus quickly?
The answer to the above question is a concise visualization. You cannot create a model in excel or python and simply expect the stakeholders to understand the implications.
Excel has been a market leader when it comes to EDA and visualization tasks for 35+ years now. It is trusted by the majority of businesses, especially small businesses for its features.
In this article, we discuss the following dashboards-
You can read the article here.
The year 2020 was a leap for the machine learning community. I hope you find these data science articles fruitful in your learning journey. Let us know your thoughts in the comments below.
Keep Learning! And never stop writing!