Create Natural Language Processing based Apps for iOS in Minutes! (using Apple’s Core ML 3)

Mohd Sanad Zaki Rizvi Last Updated : 14 Jun, 2020

10 min read

Overview

Intrigued by Apple’s iOS apps? Learn how to build Natural Language Processing (NLP) iOS apps in this article
We’ll be using Apple’s Core ML 3 to build these NLP iOS apps
This is a hands-on step by step tutorial with code

Introduction

I love working in the Natural Language Processing (NLP) space. The last couple of years have been a goldmine for me – the level and quality of developments have been breathtaking.

But this comes with its own share of challenges. One of the biggest obstacles is to convert NLP techniques into practical code. This is where my appreciation for Apple’s Natural Language Toolkit – the library that is built on top of Core ML 3, really grows.

It makes life for a developer, NLP engineer or data scientist remarkably easy! Core ML 3 empowers us to build impactful text-based intelligent iOS applications in a streamlined and easy-to-understand manner.

The process of starting out with Core ML 3 and building NLP iOS apps is as seamless as iOS itself!

Core ML 3 supports some of the most advanced and relevant Natural Language Processing techniques, like ELMo, BERT, ULMFit, GPT, Deep Speech, among others.

In this article, we will explore the nuts and bolts of the Natural Language Toolkit for iOS so that the next time you sit down to build NLP based apps for your favorite iOS device, you can do it in double-quick time!

Setting up the Natural Language Toolkit using Core ML 3
Basic Text Processing: Tokenization, Lemmatization
Language Identification
Spell Checking and Correction
Part Of Speech (POS) Tagging
Identifying People, Organization, etc. from the text (Named Entity Recognition)
Sentiment Analysis
Word Embeddings

Setting up the Natural Language Toolkit using Core ML 3

If you already have Core ML 3 on your system, then you do not have to do any external installation for the Natural Language Toolkit. If you don’t, here’s a handy guide to guide you through the installation process:

Introduction to Apple’s Core ML 3 – Build Deep Learning Models for the iPhone (with code)

Also, note that we will be using Swift to code in this article so if you are not familiar with it, here’s how you can started:

A Comprehensive Guide to Learn Swift from Scratch for Data Science

You can also enrol in this free curse to learn about Swift in a comprehensive and structured manner where we cover this project as well:

Learn Swift for Data Science

We will be working in an XCode Playground for this tutorial. Let’s see how can we open that.

Open XCode and Select File->New->Playground:

nlp ios

In the next window, select a “Blank” under iOS:

You can give it any name you like. You will see the Playground window next:

Source: Learnappmaking.com

The Playground interface is quite simple:

We write code in the central window area as marked in the above image
The right pane shows the live results, as and when we type code
The bottom pane is used to see the output or console when the program is executed
We use the play button to run our code

Now that we have successfully created a new Playground for iOS, we are all set to try Natural Language Processing (NLP) for it!

Basic Text Processing: Tokenization and Lemmatization

Raw text is an example of unstructured data and that’s why we perform certain processing steps on it before performing any kind of analysis. Core ML’s Natural Language Toolkit supports most of the common text processing operations. We’ll learn learn about them in this section.

Tokenization

Tokenization means splitting our text into minimal meaningful units.

This is an important pre-processing step in NLP. Once we get a piece of text, we can break it into meaningful chunks, or units, that can be processed together.

Chunks can be words, phrases, characters, etc. Their form depends on the kind of problem we are trying to solve. Here’s how you can do the same on iOS using Swift:

Here, we have just imported the NaturalLanguage library and created a new tokenizer using NLTokenizer. This tokenizer is then passed the text to be processed and we can then loop over the generated tokens and print them.

Keep in mind that the “unit” type is set to “.word” so we will get words as tokens in the output:

We can also pass other options to “unit”, including .word, .sentence, .paragraph and .document. You can try this out on your own end to see how it works.

Lemmatization

Lemmatization is the process of converting the words of a sentence to its dictionary form.

Let’s take an example. Given the words amusement, amusing, and amused, the lemma for each and all would be “amuse”.

This is a technique that is used a lot in text processing to normalize the text data because even though these words have different spellings and tenses, the meaning that they convey is the same.

Just like tokenization, lemmatization is also pretty straightforward in Swift:

We use NLTagger to tag each token with its lemma by choosing the .lemma as tagSchemes. We will see more of NLTagger when we deal with Named Entity Recognition (NER) and Part of Speech Tagging (PoS) later in the article.

Here’s the lemma or root word of the input text:

Notice how swimming, swam and swim all correspond to the same lemma “swim” which is indeed the root word according to the dictionary.

Now that we have learned how to do basic text processing using the Natural Language Toolkit, let’s see some interesting use cases of NLP that you can use for your iPhone or iOS apps.

Language Identification in iOS

One of the most useful features of the Natural Language Toolkit is that we can detect the language of any given text.

We use NLLanguageRecognizer from the Natural Language Toolkit to detect languages in a piece of text.

There are multiple options available for doing this:

We can find the dominant language from a text that’s multilingual using the dominant language property
We can also get a confidence score of each language that the model thinks is present in the text. This is very useful when we have text in similar languages using language hypothesis

Here is how in Swift, you can identify the language of a text in just a few lines of code:

This is what Swift tells us on running the above code:

Notice that in the first case, it says that the dominant language is “en” or English even though some Japanese characters are also present. This is because the number of English words is more than the Japanese ones.

In the second output, you will see that the probability of “ar” or Arabic is 99.9%. This is correct but notice that it also gives a low probability to “ur” or Urdu which is a very similar language. This is quite fascinating if you’re interested in how languages work (I am!).

Spell Checking and Correction

Spell Checking and Correction is another very important and popular application of NLP that has real-world value for any text-based apps that you might build.

The biggest example of this is Google search itself; it tries to correct our spelling to make sure we get the right results for your query.

It’s fairly easy to implement Spell Checking and Correction in Swift using the Natural Language Toolkit:

So far, we had only used the NaturalLanguage library but now we are importing the UIKit library as well. That’s because a major component of the spell checker that is UITextChecker() is present in the UIKit itself.

The rangeOfMisspelledWord() function extracts the range of index in which the word with the wrong spelling lies in the given text.

Once that is done, we use the guesses() function that basically gives us a couple of options of the rightly spelled words that are mostly similar to the wrong word.

If you run the above code on the given piece of text, you will get the wrongly spelled words and the possible correct spellings for them:

Notice that for each wrongly spelled word like “primarry” or “recieved”, our code tries to predict the nearest correct words. Love that!

Part Of Speech (POS) Tagging

Every word in a sentence is associated with a Part Of Speech (POS) tag – nouns, verbs, adjectives, adverbs, etc.

The POS tags define the usage and function of a word in the sentence.

We can simply use the same NLTagger that we saw earlier to find POS tags for our text. The only difference is that now our scheme will be .lexicalClass instead of the .lemma that we used earlier for Lemmatization:

Here is the output of the NLTagger for the above text:

POS tags are widely used for text analytics as they encompass additional information of the text data at hand.

Identifying People, Organization, etc. from the Text (Named Entity Recognition)

Our text app can be more intelligent if we are able to identify named entities in natural language.

For example, consider a messaging app that can look for names of people and places in text in order to display related information, like contact information or directions.

The example Swift code below shows how to use NLTagger to loop over a given text and identify any named person, place, or organization:

Notice that we are using the same NLTagger that we used for both Lemmatization and POS Tagging. But this time, the tagging scheme is .nameType that is used for NER.

Let’s take an input text:

Apple is looking at buying U.K. startup for $1 billion.

Here is how Swift identifies the correct Named Entities for this text:

Apple: OrganizationName
U.K.: PlaceName

Let’s go through another interesting and useful application of NLP in Swift – Sentiment Analysis!

Performing Sentiment Analysis on iOS

Sentiment Analysis is when we try to predict the sentiment of a given piece of text: is it positive, negative or neutral?

This is one of the most popular and widely used idea in the NLP space. From understanding people’s views using eCommerce reviews to gauging the political mood using tweets, sentiment analysis is ubiquitous. It’s one of the first topics we learn when we delve into NLP.

Let’s take an example to understand how to do sentiment analysis for iOS apps. We want to build a small program that outputs a smiley based on the sentiment of the input text.

First, we will be using the same NLTagger that we have already been using for POS and NER but this time the scheme will be .sentimentScore:

let tagger = NLTagger(tagSchemes: [.sentimentScore])

The rest of the steps are similar to what we have already been doing – process the input text using the tagger and then fetch the tags:

let sentiment = tagger.tag(at: input.startIndex, unit: .paragraph, scheme: .sentimentScore).0

Note that here we are using .paragraph as the unit as we want the sentiment score for the entire piece of text. This is very different from when we were tagging POS and NER and that’s simply because there we needed tags at the individual word level.

Once we get our sentiment score, we can just write an if-else condition to print the appropriate smiley based on the sentiment score.

The range of a sentiment score is [-1.0, 1.0]. A score of 1.0 is the most positive, a score of -1.0 is the most negative, and a score of 0.0 is neutral.

Here’s the entire code to perform sentiment analysis:

The above text is negative so when we run this program in XCode, we will get the sad smiley in the output:

Note that the sentiment score is less than 0 because it’s a negative sentiment.

One of the most interesting things about this sentiment analysis feature is that it supports 7 languages already:

English
Spanish
French
Italian
German
Portuguese
Simplified Chinese

We can build smart applications that are able to understand a user’s emotion in multiple languages right out of the box! I encourage you to play around with this code, change the input text, and see how the model performs. Let me know in the comments section below.

Word Embeddings

Word Embeddings have transformed the way we build NLP systems. From Word2Vec to GloVe, we now have embeddings from large transformer models like BERT, RoBERTa, etc.

The Natural Language Toolkit comes with useful embeddings of its own: OS Embeddings.

These embeddings are available for 7 languages and are optimized for all Apple platforms, including iOS, macOS, watchOS and so on.

Let’s start with a basic example around embeddings. We want to build a basic program that can fetch other words which are semantically “near” to the given word.

Let’s say we type “king”. Off the top of my head, we would want words like “prince”, “crown”, “throne”, etc. right?

We can use NLEmbedding from the Natural Language Toolkit to get the OS Embeddings for a particular word. We use the neighbours() function that takes as input the word to search for and the number of related words we want.

The code is pretty straightforward:

On executing the above code with “cheese” as the input word, this is what I get:

We get different cheese types, such as “mozzarella”, “cheddar”, etc. in the output. Makes sense, right?

Apart from finding similar words, word embeddings in the Natural Language Toolkit support many other useful functions:

Given two words, the distance (or similarity) between them based on their embeddings
Given an embedding vector, find the words that are nearest to it

Although OS Embeddings seem really useful, there are many embeddings like GloVe, Word2Vec, BERT, FastText, etc. that work really well for certain cases and that’s what Apple has realized. That’s why we can actually import these embeddings and use them for our iPhone or iOS app:

In fact, we can even train our own custom embeddings using CreateML which is a tool that we will cover in the next article. You can read more about embeddings in the NLEmbeddings documentation.

End Notes

How fun was that? This article combined my love for NLP with the seamless coding experience of Swift. It was a joy to work on and bring this to the community.

I would again encourage you to play around with the code and try this out yourself. Another aspect we can work on is building Computer Vision-based iOS apps using Core ML 3 but I’ll leave that for a future article. I look forward to hearing your thoughts and experiences with building iOS apps in the comments section below.

Note: You can download all the code used in this blog on my GitHub.

If you’re new to Natural Language Processing and want to get your feet wet, here’s the perfect course to start:

Natural Language Processing (NLP) using Python

Mohd Sanad Zaki Rizvi

A computer science graduate, I have previously worked as a Research Assistant at the University of Southern California(USC-ICT) where I employed NLP and ML to make better virtual STEM mentors. My research interests include using AI and its allied fields of NLP and Computer Vision for tackling real-world problems.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Raymond Doctor

Hi, Fascinating. But a majority of databases used in the scripts available are for English/French/German. Hardly any for low resource languages such as Indic. Any solutions to that ? I have hunted for NLP databases in Indic which can be deployed for ML but have come a cropper.

Show 1 reply

Hey Raymond, Check out this interesting project: https://github.com/goru001/inltk Apart from that, you can also see StanfordNLP (supports 53 human languages!): https://www.analyticsvidhya.com/blog/2019/02/stanfordnlp-nlp-library-python/ Hope this helps! :)

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Create Natural Language Processing based Apps for iOS in Minutes! (using Apple’s Core ML 3)

Overview

Introduction

Table of Contents

Setting up the Natural Language Toolkit using Core ML 3

Basic Text Processing: Tokenization and Lemmatization

Tokenization

Lemmatization

Language Identification in iOS

Spell Checking and Correction

Part Of Speech (POS) Tagging

Identifying People, Organization, etc. from the Text (Named Entity Recognition)

Performing Sentiment Analysis on iOS

Word Embeddings

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID