“Which programming language should I pick up to start my data science journey?”
If I started handing out nickels for each time I saw this question – there would be a lot of millionaires! It’s easily the most popular question asked by data science enthusiasts. The answer, I’m sure you’ve seen, usually hovers between Python and R.
But here’s my question – why should we limit ourselves to these two languages? There is a whole world of programming languages we can pick up and apply in this field. And therein lies the beauty of data science – it transcends programming languages.
My aim here is to introduce a world beyond Python and R while keeping the core idea behind them. We will cover 6 powerful and useful programming languages for data science that I feel every data scientist should learn (or at least be aware of). All of these languages are open source.
And let’s face it – we love comparisons. Whether it’s Apple v Samsung, iOS v Android, MacOS v Windows (or Linux), these comparisons lead to intense discussions. So if this article sparks a debate among our community – that’s even better!
So, what are these languages and how they are used in the field of data science? Let’s find out!
Note: I have also provided open source libraries and free tutorials wherever possible to help you get started with each programming language.
Scala is a fairly common programming language. Chances are you’ve either worked on it or come across it at some point (especially if you’ve worked in IT).
Scala is an open source modern multi-paradigm programming language created by Martin Odersky in 2003. Scala stands for “Scalable Language”. It is designed to express common programming standards in a brief, elegant and type-safe way.
Let’s put it this way – if you are aware of Java’s syntax, you’ll pick up Scala in a jiffy. In fact, learning Scala will be pretty smooth if you know programming languages like C, C++ or Python. I can already see your enthusiasm starting to light up!
So, why Scala? Well, the code we write in Scala is compiled and executed much faster as compared to pure Python (and not specialized libraries like NumPy). I love Scala because of its stability, flexibility, high speed, and scalability. You can use Scala to develop useful products that work with Big Data.
Interested in learning Scala? We have the perfect article for you:
Julia is coming up big right now in the data science world. If you didn’t know this already, it’s time to get on board. A few experts are already claiming it as a rival to Python! It might be a little too soon for that but it gives us an idea of how useful Julia is.
Julia is a refreshingly modern, meaningful and high-performance programming language created by a group of computer scientists and mathematicians at MIT. It is open source and is commonly used for scientific calculations and data manipulations.
You’ll pick up Julia quickly if you’ve worked on R, Python or Matlab before. There even exists a scikit-learn library in Julia to help your transition. What else could a data scientist ask for?
Again the question comes up – why Julia for data science? There are multiple reasons but the primary one is that the execution speed of Julia is 10x-30x than that of Python and R.
You can refer to the below article to learn Julia for data science from scratch:
Calling all developers! If you were looking for a way into data science without wanting to learn a new language – JavaScript is your pathway to the jackpot.
JavaScript is a powerful, lightweight, and easy-to-implement programming language. It was first launched in Netscape 2.0 in 1995 under the moniker LiveScript.
It’s good to have some basic knowledge of HTML and prior exposure to object-oriented programming concepts if you want to pick up JavaScript. This will give you a basic idea of creating online applications. This comes in especially handy when you’re deploying your machine learning models in mobile apps or in the browser.
Apart from this, JavaScript has some excellent libraries for data visualization and creating dashboards. Various machine learning techniques like gesture recognition, object recognition, music composition, etc. can be executed using TensorFlow.js, a powerful JavaScript library for data science.
You can get started with machine learning in the browser by following the steps mentioned in the below article:
Are you an Apple fan? Do you love using their various devices and their tightly-knit iOS? Well, then you’ll love Swift.
Swift is an open source, easy, and flexible programming language developed by Apple for iOS and OS X apps. Swift builds on the best of C and Objective-C, without the constraints of C compatibility. It’s actually a friendly programming language for freshers because of its concise yet expressive syntax and lightning speed to run the apps.
Swift has recently started gaining traction among the data science community. It is highly endorsed by Jeremy Howard (fast.ai’s co-founder). There are various libraries for performing tasks like numerical computation, high-performance functions for matrix math, digital signal processing, applying deep learning methods, building machine learning models, etc.
Refer to the below article to learn more about Swift for TensorFlow:
How could Google ever stay out of any data science related discussion?
Go, as the name suggests, is a programming language created by Google. Simple, reliable, and efficient software – that’s Go in a nutshell. What I like about Go is its singular focus. It keeps conflicts at bay by focusing on one method at a time (as opposed to other languages where there are multiple ways to solve a problem).
There are a great number of open source tools, packages, and resources for performing data science tasks using Go. This includes data gathering, data organization, data parsing, arithmetic and statistical computations, EDA and building machine learning models, etc.
Check out the below discussion to learn more about the important libraries in Go:
Spark is more of a framework than a language but you’ll soon see why it’s on my list. It is very popular among data engineers and data scientists.
Spark provides:
It is an open source, fast cluster computing framework which is used for processing, querying and analyzing Big Data. The advantage of Spark over other big data frameworks is that it is based on in-memory computation. This enables computations to run up to a hundred times faster.
Basic knowledge of Python is good enough for you to pick up Spark quickly.
Spark can perform various data science and data engineering tasks, such as:
Here’s the perfect article to learn Apache Spark:
Don’t you love how vast the field is for data science languages? Python and R are wonderful in their own right. But my aim here was to bring out other languages that we can use to perform data science tasks.
Some of these languages you might even know right now (I’m sure all you developers are aware of JavaScript!) – you just didn’t realize you could use it for building awesome visualizations and designing models. Well, now you do!
Any language(s) you feel I should have included in the article? Connect with me in the comments section below. I look forward to hearing your thoughts, suggestions, and feedback!
I'm a B.Pharmacy graduate. Is it a right choice for me to select Data science to advance my career?
Hi Ayan, Data Science careers are high in demand nowadays. For switching your career to data science you need to first master a programming language and need to learn the concepts of Statistics, Probability, Algebra, etc. I suggest you choose a career based on your knowledge and interest. To have more clarity on this you can read the answer given by "Boris Gorelik" here: click here
Java has it's label everywhere and it's a very good competitor to python in all the ways but except in data science....could you please explain where java is not strong enough to support data science.... Thank you....
Hi Prudhvi, So Java having a strong typing, flexible and highly effective compiled language doesn't have much data science libraries as compared to Python. Thanks
A very precise, well put article. Thank you, much appreciated!
Thanks for the appreciation.