This article was published as a part of the Data Science Blogathon
Computers and Machines are great while working with tabular data or Spreadsheets. However, human beings generally communicate in words and sentences, not in the form of tables or spreadsheets, and most of the information that humans speak or write is present in an unstructured manner. So it is not very understandable for computers to interpret these languages.
Therefore, In natural language processing (NLP), our aim is to make the computer’s unstructured text understandable and retrieve meaningful information from it.
Let’s define Natural Language Processing (NLP) formally,
Natural language Processing (NLP) is a subfield of artificial intelligence, that involves the interactions between computers and humans.
So, In this article, we will discuss some of the basic concepts related to NLP. This article is part of a blog series on Natural Language Processing (NLP).
This is part-1 of the blog series on the Step by Step Guide to Natural Language Processing.
After the completion of some topics, there is some practice (Test your Knowledge) questions given that you have to solve and give the answer in the comment box so that you can check your understanding of a particular topic.
1. What is Natural Language Processing (NLP)?
2. Applications of Natural Language Processing
3. Understanding Natural Language Processing
4. Difference between Rule-based NLP and Statistical based NLP
5. Components of Natural Language Processing
6. Ambiguity and Uncertainty in Natural Language Processing
Natural Language Processing (NLP) is a subfield of Computer Science and Artificial Intelligence that deals with interactions between computers and human (natural) languages. This becomes crucial when we want to apply Machine Learning or Deep Learning Algorithms to a dataset that contains text and speech.
For Example, we can use NLP to create AI systems such as,
In modern days, most of our smartphones have a speech recognition system. These smartphones use NLP to understand the natural language and give the response. Also, most of the peoples use laptops which operating system has built-in speech recognition.
Which of the below options is the field of Natural Language Processing?
Some applications of Natural Langauge Processing are as follows:
Image Source: Google Images
The Microsoft Operating system has a virtual assistant named Cortana that can recognize a natural voice. Its applications include
If you want to read more about Cortana commands, refer to the link here.
Image Source: Google Images
Siri is a virtual assistant made by Apple Inc.’s iOS, watchOS, macOS, HomePod, and tvOS operating systems. Again, with this you can do a lot of things with voice commands:
Here is a complete list of all Siri commands.
Image Source: Google Images
Gmail is the famous email service developed by Google and is using spam detection to filter out some spam emails by doing text processing, in which it fetches texts from that particular mail that it tries to find as spam or not.
Which of the below are NLP use cases?
Image Source: Google Images
We, as humans, it’s not a very difficult task to perform natural language processing (NLP) but even then, we are not perfect. We often misunderstand one thing for another and often interpret the same sentences or words in a different manner.
For instance, consider the following sentences and try to understand its interpretation in many different ways:
Sentence: I saw a student on a hill with a microscope.
These are various interpretations of the above sentence which are shown below:
Sentence: Can you help me with the can?
In the sentence above, we observed that there are two “can” words, but they have different meanings. Here.
The first “can” word is used to form a question.
The second “can” word that is used at the end of the sentence is used to represent a container that holds some things such as food or liquid, etc.
From the above two examples, we can observe that language processing is not “deterministic” that is the same language has the same interpretations, and something suitable to one person might not be suitable to another person. Therefore, Natural Language Processing (NLP) has a non-deterministic approach.
In simple words, we can use Natural Language Processing to create a new intelligent or AI system that can understand in the same way as that of humans and interpret the language in different situations.
Natural Language Processing is separated into two different approaches:
It uses common sense reasoning for processing tasks.
For Example,
However, these process can take more amount of time, and it requires manual effort.
This type of NLP uses large amounts of data and aims to derive conclusions from it. To train NLP models, it uses machine learning algorithms. After completion of the training process on large amounts of data, the trained model will have positive outcomes with deduction.
The two basic components in which NLP can be divided are as follows:
Image Source: Google Images
NLU is naturally harder than NLG tasks. Let’s discuss the challenges faced by a machine while it tries to understand the natural language.
While learning or trying to interpret a language, there are a lot of ambiguities.
Sentence: He is looking for a match.
Here, What do you understand by “match” – Partner or Cricket/Football Match.
Lexical Ambiguity can occur when a word carries a different sense, i.e. having more than one meaning, and the sentence in which that word is used can be interpreted differently based on its correct sense. To resolve these types of ambiguities to some extent, we can use parts-of-speech tagging techniques.
Sentence: The chicken is ready to eat.
Is the chicken ready to eat its food or the chicken is ready for someone else to it? You never know.
Syntactical Ambiguity occurs when we observed that there can be more than one meaning in a sequence of words. It is also known as Grammatical ambiguity.
Sentence: Chirag met Kshitiz and Dinesh. They went to a restaurant.
Here, they refer to Kshitiz and Dinesh or all.
Referential Ambiguity: It is very often in a text that it mentions an entity (something/someone), and then refers to it again, possibly in a different sentence, with the help of another word. So, these different pronouns can cause ambiguity when it is not clear which noun it is referring to.
It is defined as the process of generating or extracting some meaningful phrases and sentences in the form of natural language with the help of some internal representation.
This component involves the three basic steps:
Question-1: NLP is divided into two subfields:
Question-2: Which of the following is used to mapping sentence plans into sentence structure?
In natural language processing, Ambiguity can be referred to as the ability to be understood in more than one way. In simple terms, we can understand ambiguity as to the capability of being understood in more than one way. Natural language is very ambiguous.
NLP has the following five types of ambiguities:
Lexical ambiguity is the ambiguity that involves the ambiguity of a single word.
For Example, Let’s consider the following Sentences:
She won two silver medals She made a silver speech His worries had silvered his hair
In the above sentences how we treat the word silver- as a noun, an adjective, or a verb.
Syntactic ambiguity occurs when a sentence is parsed in different ways.
For Example, Let’s have a sentence
Sentence: The man saw the girl with the microscope
This sentence is ambiguous as:
whether the man saw the girl carrying a microscope or he saw her through his microscope.
This type of ambiguities occurs when the meaning of the words themselves can be misinterpreted. In simple words, semantic ambiguity occurs when a sentence contains an ambiguous word or phrase.
For Example, Let’s have a sentence
Sentence: The bus hit the pole while it was moving
The above sentence is having semantic ambiguity because this sentence can have two interpretations
Anaphora means when the same beginning of a sentence is repeated several times and Anaphoric ambiguity occurs due to the use of anaphora entities in discourse.
For Example, Let’s have a group of sentences:
Sentence: The dog ran up the hill. It was very steep. It soon got tired.
Here, the anaphoric reference of “it” in two situations causes ambiguity.
These types of ambiguities occur when the context of a phrase gives it multiple interpretations. In simple words, we can say that these ambiguities arise when the statement is not specific.
For Example, Let’s have a sentence
Sentence: I like you too
that can have multiple interpretations as :
Thanks for reading!
If you liked this and want to know more, go visit my other articles on Data Science and Machine Learning by clicking on the Link
Please feel free to contact me on Linkedin, Email.
Something not mentioned or want to share your thoughts? Feel free to comment below And I’ll get back to you.
Currently, I am pursuing my Bachelor of Technology (B.Tech) in Computer Science and Engineering from the Indian Institute of Technology Jodhpur(IITJ). I am very enthusiastic about Machine learning, Deep Learning, and Artificial Intelligence.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
impressive post bro and good information