You’ve probably wondered how banks are able to decide precisely who gets a loan and who doesn’t. Despite dealing with thousands of different applicants, their systems are always equipped to make the appropriate decisions. Not only that, but they can make those decisions very quickly and accurately.
By evaluating the characteristics of loan applicants, banks can decide precisely who they accept and reject. But how are they able to do this for every customer, given that each customer is different?
Well, to be able to classify each customer, banks rely on a general rule. And to the foundation of this general rule is concept learning.
Concept learning plays an important part in many decision-making systems in the world today. Concept learning lays the foundation for many of the decision-making applications we use today, whether taking a medical test, using our credit card or taking out a bank loan. The importance of concept learning lies in its ability to help systems make decisions intelligently. It helps systems make quick and accurate decisions without being explicitly programmed for each scenario.
But how is concept learning able to do all of this?
Well, follow me in this blog, as I’ll help you understand how concept learning works. I will explain the basics of concept learning, how it works, and how it can be applied in the real world. And by the end of this article, you’ll have a more practical understanding of concept learning.
Learning Objectives:
This article was published as a part of the Data Science Blogathon.
Concept learning is the task of inferring a Boolean-valued function from a set of training examples. The purpose of inferring this function is to use it as a general rule for classifying unseen data.
Concept learning is based on a type of learning called inductive learning. In inductive learning, the learner learns by example. In other words, the learner discovers the rules of a particular concept by learning the examples of that concept. For example, if a student teaches himself algebra, the more he practices different types of examples and solutions, the more he will understand the general rules of algebra. The idea is the same for concept learning: a machine is taught the different examples of a concept, and by learning these examples, it will “discover” the general rule(s) that apply to that concept.
Concept learning thus involves learning a function (which is a rule) from a set of training examples.
So, with all this being said, concept learning aims to find a function or rule that truly represents the particular concept being learned. The function must be a true representation of the concept so that it can be able to make accurate classifications of unseen data. By “true representation”, it means that the function must be able to approximate the true value of a target concept. The target concept refers to what we’re trying to classify. A Boolean-valued function, denoted c(x), can take on two or more possible categories. The aim is generally to determine the category of the target concept that a certain object belongs to.
According to the Inductive Learning Hypothesis, if a function can approximate the target concept well enough over training examples, then it will be able to approximate the target concept well for unseen examples.
For example, suppose an algebra learner has gained an understanding of the general rules of algebra based on the examples they’ve practiced. In that case, they’ll be able to apply those rules to solve any new problems that they encounter. Similarly, in concept learning, an inferred function will be able to approximate and classify new data based on how well it has learned in the past.
Concept learning works in two ways. It works by:
Let’s go back to the bank loan example.
Suppose that the bank wants to classify customers according to five features:
And depending on these features, each applicant will be classified into one of two categories: Loan Approved or Not Approved.
To do this, the bank needs to use a training set consisting of example loan applicants. This training set will be used to infer a function that will act as a decision rule for classifying the applicants.
Let’s consider the training sample shown below:
Each row in the training set represents a single applicant (known as an instance). Each applicant has five features with differing values, and has the possible outcomes, “Yes” or “No,” depending on the values of those features.
Each customer is either a negative or positive example. The applicants whose loan application is accepted are the positive examples, while the applicants whose loan application is rejected are the negative examples.
These examples are determined according to the preferences and needs of the bank. The bank may have determined that the best customers have a combination of certain features.
The positive examples are the examples of applicants that the bank has deemed acceptable for awarding a loan. These applicants have a combination of features that have been determined as desirable to the bank in terms of the applicant being able to repay the loan without much trouble.
The negative examples are the examples of applicants that the bank deems unacceptable for awarding a loan. These applicants have a combination of features that the bank sees as undesirable. These are applicants that the bank deems will have difficulty repaying the loan.
The five features: gender, age group, income, dependents, and loan amount together make up what’s called the feature space. A feature space is a space containing a collection of our features, and it’s used for categorizing our data. The size of a feature space depends on the number of features in the training set. For example, if there are two features, the feature space will be two-dimensional. In our case, there are five features, so our feature space is five-dimensional.
The feature space can be thought of as a visual representation of the training set. It shows how the data is classified relative to its features. However, the larger the dimension of the feature space, the more difficult it is to visualize.
So, to better visualize how concept learning works, let’s suppose that the bank is classifying customers according to only two features: age and income. Then we have a two-dimensional feature space as shown below:
From this feature space, we can see a good visual representation of the training set. It shows where each variable lives. Here, we have two classes: Loan Approved and Not Approved, and two features: age and income. We can see how the individual applicants are classified into each class concerning their features.
Notice that there exists a pattern between the two classes of the feature space. Concept learning infers a function depending on the pattern in the data between the two classes. In other words, concept learning involves learning the pattern in the data and creating a function based on what has been learned. This function acts as a decision boundary that distinguishes between the two classes of data. It is what is used to approximate the target concept.
For example, consider the decision boundary separating the two classes:
Good Feature Space
The feature space that we are using is what we can call a good feature space. This is because the classes are as separate as possible, making the decision boundary easy to learn. The classes need to be as separate as possible for the decision boundary to be learned properly. If the classes are highly mixed or overlapping, this is an example of a poor feature space, and the decision boundary won’t be learned adequately enough.
Generalized Decision Boundary
A poor feature space will likely result in the decision boundary either overfitting or underfitting. An ideal feature space ensures that the decision boundary can generalize.
Concept learning can also be viewed as a search, where the goal is to find the function that best fits the training set.
In this case, concept learning aims to find a generalized decision boundary. But multiple possible generalized decision boundaries exist, so it aims to find the best-generalized decision boundary.
To find the best-generalized decision boundary, we have to search through a space of multiple generalized decision boundaries. This space is called a hypothesis space. And a hypothesis refers to a single possible decision boundary.
Each hypothesis in the hypothesis space depends on a certain number of features. In our case, each hypothesis depends on two features. Each feature has a value associated with it, and each value is represented using a constraint. A constraint is an indication of the importance of each feature in each hypothesis. There are three types of constraints to represent the values of each feature. These are the single-value constraint, the specific constraint, and the general constraint.
A hypothesis is often represented as a vector of constraints. For example, suppose that the bank prefers applicants older than 18, with an income greater than 4000, regardless of the applicant’s gender, the loan amount, or the number of dependents. This hypothesis would be represented as:
< “?”, >18, “>4K”, “?”, “?”>
It is a vector of constraints, where the constraints can be interpreted as follows:
Two more types of hypotheses are used in searching for the best hypothesis. These are the most general hypothesis and the most specific hypothesis.
The Most General Hypothesis is named as such because it uses the general constraint for every feature in the hypothesis. The most general hypothesis is thus denoted as follows:
< “?”, “?”, “?”, “?”, “?”>
This hypothesis implies that any value is acceptable for any feature and that each applicant is a positive example.
The Most Specific Hypothesis uses the specific constraint for every feature in the hypothesis. The most specific hypothesis is represented as follows:
< “0”, “0”, “0”, “0”, “0”>
This hypothesis implies that none of the features’ values are acceptable and that none of the applicants is a positive example.
There are many methods for searching for the best hypothesis. One such method is the Find-S method. The Find-S algorithm helps search for the best hypothesis (called the maximally specific hypothesis). The idea behind this method is to compare feature values in the most specific hypothesis to those of each positive example in the training set.
The algorithm starts by searching for each positive example. What it’s looking for is whether the value of the feature in the positive example is the same as the corresponding value of the feature in the hypothesis. If the values are the same, then the algorithm will move on to the next positive example. If the values are different, then the algorithm will change the value of the feature in the most specific hypothesis to that of a general constraint, “?”. The algorithm continues this process until it reaches the last positive example in the training set. Then this leads to the maximally specific constraint.
For a more practical example, let’s look at the steps of the algorithm.
Let’s consider another example.
Suppose that you want to play a sport and want to decide on which day you enjoy the sport. Each day has six features, sky, temperature, humidity, wind, water, and forecast, as shown in the training set:
To start with the Find-S algorithm, choose any positive example in the training dataset and initialize it as the most specific hypothesis (Let’s choose row 1):
Compare the values of the features to the first positive training example in row 1:
The values of the training example and the most specific hypothesis are the same, so we do nothing.
We move on to the next positive training example (in row 2), and compare it to the most specific hypothesis:
The values of the features for humidity are different, so we replace the feature in S1 with “?”. So now we have:
Row 3 is a negative example, so we ignore it. We then move on to the next positive training example (in row 4), and compare it to the most specific hypothesis, S2:
The values of the features for water are different, so we replace the feature in S2 with “?”. So now we have:
So now we have reached the last positive example, and we have the maximally specific hypothesis: <Sunny, Warm, ?, Strong, ?, Same>
As seen with the bank loan example, concept learning plays an important part in automated decision-making. Concept learning answers many business questions and enables organizations to take appropriate steps in their business. It helps organizations make quick and accurate classifications with large amounts of data.
In addition to bank loans, some other applications of concept learning are:
Spam Filtering
Customer Purchasing
School Admissions
Medical Diagnoses
The training examples, denoted D, are the set of positive and negative examples of the target function.
Each training example in the training set is referred to as an instance, denoted X.
Each training example has features or attributes, such as (Gender, Age, Income, and Dependents, Loan Amount).
Each training example is associated with a target concept, c(x). The target concept is the function we are searching for.It is a Boolean-valued function, for example, Award Loan: X > {0,1}
The hypothesis, h, is a vector of attributes or features.
A constraint represents each feature in the hypothesis.
The types of hypotheses are represented as
In conclusion, concept learning provides an efficient way of extracting knowledge from data. It helps machines quickly and accurately learn tasks from large amounts of data, with little human intervention required. Because of this, concept learning is an essential part of many real-world operations and processes, where accuracy, speed, and cost reduction are important. It is thus extensively used in the decision-making processes of many businesses and organizations that work with copious amounts of data.
Key Takeaways
If you have any questions, feel free to contact me on LinkedIn.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.