Python is widely considered the best and most effective language for data science. Most of the polls and surveys that I’ve come across in recent years peg Python as the market leader in this space.
But here’s the thing – data science is a vast and ever-evolving field. The languages we use to build our data science models have to evolve with it. Remember when R was the go-to language? That was swiftly overtaken by Python. Julia also came up last year for data science and now there’s another language that is blossoming.
Yes, I’m talking about Swift for data science.
“I always hope that when I start looking at a new language, there will be some mind-opening new ideas to find, and Swift definitely doesn’t disappoint. Swift tries to be expressive, flexible, concise, safe, easy to use, and fast. Most languages compromise significantly in at least one of these areas.” – Jeremy Howard
When Jeremy Howard endorses a language and starts using it for his daily data science work, you need to drop everything and listen.
In this article, we will learn about Swift as a programming language and how it fits into the data science space. If you’re a Python user, you’ll notice the subtle differences and the incredible similarities between the two. There’s a lot of code here as well so let’s get started!
You can also enrol in this free course on Swift where we cover all these concepts in a structured manner alongwith an awesoem bonsu project: Learn Swift for Data Science
“PyTorch was created to overcome the gaps in Tensorflow. FastAI was built to fill gaps in tooling for PyTorch. But now we’re hitting the limits of Python, and Swift has the potential to bridge this gap”
– Jeremy Howard
There has been a lot of excitement and attention recently towards Swift as a language for data science. Everyone is talking about it. Here are a few reasons why you should learn Swift:
Here is Jeremy Howard articulating how good Swift is:
Before we start with the nitty-gritty details of performing data science using Swift, let’s get a brief introduction to the basics of the Swift programming language.
The current state of Swift for Data Science is primarily made up of two ecosystems:
The open-source ecosystem is one where we can download and run Swift on any operating system or machine. We can build machine learning applications using really cool Swift libraries, like Swift for TensorFlow, SwiftAI and SwiftPlot.
Swift also lets us seamlessly import mature data science libraries from Python like NumPy, pandas, matplotlib and scikit-learn. So if you had any hesitation about switching over to Swift from Python, you’re well covered!
The Apple ecosystem, on the other hand, is impressive in its own right. There are useful libraries like CoreML that let us train large models in Python and directly import them in Swift for inferencing. Additionally, it also comes with a plethora of pre-trained state of the art models that we can directly use to build iOS/macOS applications.
There are other interesting libraries like Swift-CoreML-Transformers that let us run state-of-the-art text generation models like the GPT-2, BERT, etc. on the iPhone.
And there are multiple other libraries that give a good level of functionality when you need to build machine learning-based applications for Apple devices.
There are multiple differences between the two ecosystems. But the most important one is that in order to use the Apple ecosystem, you need to have an Apple machine to work on and you can only build for Apple devices like the iOS, macOS etc.
Now that you have an overview of Swift as a language for data science, let’s get into the code!
The Swift language is available to use on Google Colab with both GPU and TPU versions. We will be using that so that you can quickly get up to speed with it without spending much time on the installation process.
You can follow the below steps to open a Colab notebook that is Swift-enabled:
print("hello world from Swift")
Sweet! If you want to work with Swift locally on your own system then here a few links that you can follow:
Now, let’s quickly cover some basic Swift functions before jumping into the data science aspect of using it.
I’m sure you’ve already used this before. It works the very same way as it does in Python. Simply call print() with whatever you want to print inside the parenthesis:
print("Swift is easy to learn!")
Swift provides two useful options to create variables: let and var. let is used to create a “constant” which is a variable whose value cannot change anywhere further in the program. var is very similar to the variables that we see in Python – you can change the value stored in it anytime further in the program.
Let’s look at an example to see the difference. Create two variables a and b:
let a = "Analytics" var b = "Vidhya"
b = "AV" a = "AV"
Here’s a pro-tip: use var for temporary variables or variables you want to use for some intermediate calculations.
Similarly, use let for things like storing the training data, results, etc. – basically the values that you do not want to change or mess up.
var π = 3.1415925
Swift supports all the common data types, like Integer, String, Float and Double. We can assign any variable with a value, and its type will automatically be detected by Swift:
let marks = 63 let percentage= 70.0 var name = "Sushil"
let weight: Double = 72.8
Let’s have a quick quiz. Create a constant with an explicit type of `Float` and a value of 4 and post the solution in the comments below!
There’s a simple way to include values in strings – write the value in parentheses, and write a backslash (\
) before the parentheses. For example:
You can use three double quotation marks ("""
) for strings that take up multiple lines.
Swift supports both list and dictionary data structures just like Python (there’s that comparison again!). Though the advantage here is that unlike Python, we do not need separate syntax like “{}” for dictionary and “[]” for a list.
Let’s create a list and a dictionary in Swift:
var shoppingList = ["catfish", "water", "tulips", "blue paint"] shoppingList[1] = "bottle of water" var occupationsDict = [ "Malcolm": "Captain", "Kaylee": "Mechanic", ]
We can access the elements of a List or a Dictionary by writing the index/key inside the “[]” brackets (similar to Python):
occupationsDict["Jayne"] = "Public Relations" print(occupationsDict)
The above code will add the key-value pair of “Jayne” and “Public Relations” to the dictionary. This will be the output if you print the above dictionary:
Looping is one of the most important features of any programming language and Swift doesn’t disappoint here. It not only supports all the conventional looping mechanisms (for, while, etc.) but also implements some variations of its own.
Very similar to Python, you can use the for loop with Lists or with ranges in Swift:
The three dots in the first example are used to denote “range” in Swift. If we want to do something in the range of a to b, we will use the syntax a…b.
Similarly, if we want to exclude the last number, we can just change the three dots to “..<” like a..<b. Try playing around with this and see how many times you get it right!
Another important point to note here is that unlike Python, Swift doesn’t use the concept of indentation but uses curly brackets “{}” to denote code hierarchy.
You can use the while and other types of loops in a similar fashion in Swift. You can learn more about loops in Swift here.
Swift supports conditional statements like if, if..else, if..else..if, nested if and even the switch statement (that Python doesn’t support). The syntax for an if statement if quite simple:
if boolean_expression { /* statement(s) will execute if the boolean expression is true */ }
The boolean_expression can be any comparison and the statements that you write inside the if block will only be executed if the result of the comparison or the expression evaluates to true. You can read about other conditionals here.
A Swift function looks syntactically very similar to a function in Python. The major difference here is that we use the func keyword instead of def and we explicitly mention the data types of the arguments and the return type of the function.
Here is how you can write a basic function in Swift:
Source: TechNotification.com
And just like conditionals, we use curly brackets “{}” to denote the code block that belongs to this function.
Writing comments is one of the most important aspects of good code. This is true across any industry and role you work in. This is the most important programming aspect you should learn!
Use comments to include text in your code, as a note or reminder to yourself. Comments are ignored by Swift.
Single-line comments begin with two forward-slashes (//
):
// This is a comment.
Multiline comments start with a forward-slash followed by an asterisk (/*
) and end with an asterisk followed by a forward-slash (*/
):
/* This is also a comment
but is written over multiple lines. */
Now that you are familiar with the basics of Swift, let’s learn about an interesting feature – using Python libraries in Swift itself!
Swift supports interoperability with Python. What this means is you can import useful Python libraries from Swift, call their functions, and convert values between Swift and Python seamlessly.
This gives incredible power to Swift’s data science ecosystem. This ecosystem is still pretty young and is still developing and you can already use mature libraries like Numpy, Pandas, and Matplotlib from Python for filling the gaps in existing Swift offerings.
In order to use Python’s modules in Swift, you can just import Python right away and load whatever library you want to use!
import Python // Load numpy from Python let np = Python.import("numpy") // Create an array of zeros var zeros = np.ones([2, 3]) print(zeros)
This is quite similar to the way you’d use NumPy in Python, isn’t it? You can do the same for other packages like matplotlib:
You have learned quite a bit about Swift already. It’s now time to build your first model!
Swift4Tensorflow is one of the most mature libraries in the open-source ecosystem of Swift. We can easily build machine learning and deep learning models using a very simple Keras-like syntax in native Swift.
It gets even more interesting! Swift4Tensorflow isn’t just a Swift wrapper around TensorFlow but it’s being developed as a feature of the language itself. It is widely expected to become a core part of the language in the near future.
What this means is that the amazing set of Engineers from Apple’s Swift team and Google’s Tensorflow team will make sure that you are able to do high-performance machine learning in Swift.
The library also adds many useful features to Swift like native support for automatic differentiation (which reminds me of Autograd in PyTorch) to make it even more compatible with numeric computing use-cases.
Let’s understand the problem statement we’ll be working with in this section. You might be familiar with it if you’ve touched the deep learning field before.
We will be building a convolutional neural network (CNN) model to classify images into digits using the MNIST dataset. This dataset contains 60,000 training images and 10,000 testing images of handwritten digits that we can use for training image classification models:
This dataset is a fairly common dataset for working with Computer Vision problems so I am not going to describe it in great detail. If you want to know more about it, you can read it here.
Before we can start building the model, we need to download the dataset and pre-process it. For your convenience, I have already created a GitHub repository with the pre-processing code and the data.
Let’s download the setup code, download the dataset and import the necessary libraries:
Your dataset will be now be downloaded in Colab. Let’s load the dataset:
We will plot some images from the dataset to get an idea about what we’re working with:
This is how our images look like:
It seems pretty intuitive, right? The first digit is a handwritten 0 and the second one is a 4.
Let’s now define the architecture of the model. I am using the LeNet-5 architecture which is a fairly basic CNN model using 2 convolution layers with average pooling and 3 dense layers.
The last dense layer has a shape of 10 because we have 10 target classes, one for each digit from 0 to 9:
You would have noticed that the code looks very familiar to how you write code to create models in Python frameworks like Keras, PyTorch or TensorFlow.
The simplicity of writing code is one of the biggest selling points of Swift.
Swift4Tensorflow supports multiple layer types right out of the box and you can read more about them here.
Similarly, we need an optimizer function to train our model. We are going to use stochastic gradient descent (SGD) which is available in Swift4Tensorflow:
Swift4Tensorflow supports many additional optimizers. You can choose your pick based on your project:
Now that everything is set up, let’s train the model!
The above code runs a training loop that feeds the dataset examples into the model to help it make better predictions. Here are the training steps that we follow:
Dataset
grabbing its features (x
) and label (y
) this is very important for the next step.The epochCount
variable is the number of times to loop over the dataset collection. Go ahead and give it a try!
How many epochs did it take for you to achieve a 90%+ accuracy on the test set?
I was able to get 97%+ accuracy in both train and test sets in just 12 epochs.
Though it’s helpful to print out the model’s training progress, it is often more helpful to see this progress.
Let’s visualize the train and test stats that we captured during the training of the model.
This is how the train and test accuracies evolved during the training process:
The way industry experts are reacting to Swift is mind-boggling, it feels like a language that has the potential to not only become one of the mainstream languages for data science but also a language that is going to be used for building applications based on machine learning for the real world.
Currently, it is in infancy and the libraries around data science and numeric computing are still developing. Yet, it has a strong industry backing behind it and I look forward to a future where it will have a rich ecosystem of tools and libraries- maybe even better than what Python has today.
Here are a few libraries of Swift that you can explore further:
All the code used in this article is available on Github
Have you used Swift before? How did you find this article? I would love to hear your thoughts and ideas in the comments section below.
Biggest limitation with Swift I feel is that you can run Swift only if you have a Mac machine. Yo cant complie it on Linux and Windows. I think you havent given this fact much importance here in the article. A great article otherwise.
Hay Saiyad, That's a huge misconception! Swift is an open-source language that you can now use on Windows, Linux and Mac....It even can be used in Google Colab now! Though there are some cool features (libraries) of Swift that are exclusive for Apple and Macs but that shouldn't stop you from appreciating the amazing performance that Swift brings to the table.
If swift is similar to python and Swift can import mature python libraries, why don't we just use Python? I guess the advantage is swift's integration with Apple ecosystem. For many business applications it may not be an advantage at all. What would be nice to see is implementing or inferencing an ML model on both python and swift and demonstrating the advantages of swift.
Hey Bala, The main advantage is that Swift is incredibly fast, has direct support for Automatic Differentiation and TensorFlow (unlike Python which just has a wrapper around TensorFlow) and is a safe (type safe) language. I personally feel it is like an improved version of Python. I do plan to write an article soon comparing Swift and Python's performances.
Not only swift is blazing fast compared to python (there is actually no comparison between python and swift as swift is more comparable to C/C++ when it comes to speed but adding your fast library is incredibly easy . Especially with the extension we can even modify the behavior the core functionalities itself. While for python if we want to modify some we may have to do at C level and some times that is also not possible.