“Your machine ran out of memory.”
Sounds familiar? It certainly is for me – especially anytime I try to run a complex machine learning algorithm on my personal machine. It’s quite a frustrating experience that a lot of data science professionals feel. We don’t have the unlimited computing power of the tech behemoths – so what should we do?
This is where the power of the cloud has transformed data science. And Amazon, with its AWS offering, has conquered the data science market like nothing before.
Cloud computing has seen tremendous growth in the past few years. Almost every organization nowadays uses cloud computing for its wide range of services. 70% of all the money spent on tech is expected to go into cloud services by the end of 2020.
Did you know that AWS’s revenue in the first quarter of 2020 was $10 billion? That’s almost twice as much as its next closest competitor! Every data science professional, from a data science to a data analyst, needs to learn AWS and how it works.
So in this article, let’s dive into what AWS is and find out why it has come at the forefront of cloud computing services.
AWS is a cloud computing platform by Amazon that provides services such as Infrastructure as a Service (IaaS), platform as a service (PaaS), and packaged software as a service (SaaS) on a pay-as-you-go basis. It was launched in 2006 but was originally used to handle Amazon’s online retail operations.
AWS has 3 main products:
AWS provides its consumers with many advantages:
Here is an article that will help you begin your journey in using AWS:
AWS was initially launched in 2002 but it provided only a few services. In 2006, AWS launched its cloud products which included Amazon S3 cloud storage, SQS (Simple Queue Service), and EC2 and in doing so, marked its entry in the online core services industry.
In 2009, AWS saw the international expansion of AWS to Europe where S3 and EC2 were launched. Elastic Block Store (EBS), which provides block-level storage, and Amazon CloudFront, a content delivery network, were released and incorporated into AWS.
It provides block-level storage to use with Amazon EC2 instances. Amazon Elastic Block Store volumes are network-attached and remain independent from the life of an instance.
Over the years, a lot of services were added to the AWS platform which has made it a cost-effective and highly scalable platform. Now, AWS has its data centers all over the world including the United States, Japan, Europe, Australia, and Brazil.
AWS Global Infrastructure map
The following services are provided by AWS in the respective domains:
For more information on services provided by AWS, click here.
By now you would have a broad understanding of what AWS is. So now, let’s shed some light on why companies require their data scientists to know AWS.
Remember when you were just sitting idle waiting for the system to respond? Here, we highlight a list of problems that your local systems must be able to overcome:
I am sure many of you would be still wondering why you should use AWS? Why not go for something else (like Google’s GCP)? Let me answer this by giving the following benefits fo AWS:
AWS has a very well documented user interface which eradicates the requirement of on-site servers to meet the IT demands. This eases up the deployment of programs, software from time to time. AWS meets your every need.
Earlier in this article, we saw what a diverse range of services AWS has to offer. It’s the all in one solution for your IT and cloud requisites considering its efficiency.
You don’t need to worry about whether large datasets will fit into your IDE’s system memory or not.
The AWS Global Cloud Infrastructure is the most extensive, and reliable cloud platform, offering over 175 fully-featured services from data centers globally. Whether you need to deploy your application workloads across the globe in a single click, or you want to build and deploy specific applications closer to your end-users with single-digit millisecond latency, AWS provides you the cloud infrastructure where and when you need it easily.
I sense this will act as the most convincing points! AWS is one of the cheapest platforms for cloud servicing. This is really useful for small businesses to function and grow without allocating much working capital on servers.
2020 Gartner Magic Quadrant for Cloud Infrastructure and Platform Services
Whichever firm you work for, cloud infrastructure will become an important part of your daily data science regime because companies have become more inclined towards cloud computing for solutions.
According to a report from Indeed.com, AWS rose from a 2.7% share in tech skills in 2014 to 14.2% in 2019. That’s a 418% change!
This is because of the pricing model on which AWS works. AWS works on a pay-as-you-go model and charges on either a per-hour or a per-second basis. It also provides an option to reserve a specific amount of computing capacity at discounted rates.
Additionally, AWS keeps in mind the prospective consumers who can’t afford its services. For them, it provides the AWS Free Tier service which allows them to gain hands-on experience with AWS services absolutely free.
All businesses, whether big or small, want to save costs. Small companies save costs of buying servers and conglomerates gain authenticity and productivity. AWS services are also very powerful. On one hand, where it takes days to set up a Hadoop cluster with Spark, AWS does it within a few minutes.
In today’s competitive world, having hands-on experience with cloud services like AWS gives a great lead in the data science race. AWS is now very popular among businesses and your experience with such cloud computing platforms highlights your skills during the recruitment process.
Here are some additional resources that you should look into:
I hope this article serves as a solid argument supporting why cloud computing is necessary for data scientists. Please use the comment section below if any thoughts to share or general queries.
I cannot disagree with this opinion article more. People spend their entire careers studying and administering systems of IT infrastructure. Others spend their entire careers studying mathematics, statistical methods, theory of computer science, and areas of domain expertise. Arguing that every data science professional needs to also have professional level sysadmin or devops skillsets implies that building valuable and differentiated models also requires operating and maintaining the underlying machinery for the tasks that they alone are qualified for: studying data closely, performing experiments efficiently, and documenting their research clearly. Requiring these skillsets in parallel implies cultures of heroism, overwork, and a lack of collaborative culture. Simply put: let scientists be scientists.
Thanks, gives a quick background of AWS.
Good article to understand AWS overview.