The One-Class Support Vector Machine (SVM) is a variant of the traditional SVM. It is specifically tailored to detect anomalies. Its primary aim is to locate instances that notably deviate from the standard. Unlike conventional Machine Learning models focused on binary or multiclass classification, the one-class SVM specializes in outlier or novelty detection within datasets. In this article, you will learn how One-Class Support Vector Machine (SVM) differs from traditional SVM. You will also learn how OC-SVM works and how to implement it. You’ll also learn about its hyperparameters.
Anomalies are observations or instances that deviate significantly from a dataset’s normal behavior. These deviations can manifest in various forms, such as outliers, noise, errors, or unexpected patterns. Anomalies are often fascinating because they may represent valuable insights. They might provide insights such as identifying fraudulent transactions, detecting equipment malfunctions, or uncovering novel phenomena. Outlier and novelty detection identify anomalies and abnormal or uncommon observations.
Also Read: An End-to-end Guide on Anomaly Detection
Support Vector Machines (SVMs) are a popular supervised learning algorithm for classification and regression tasks. SVMs work by finding the optimal hyperplane that separates different classes in feature space while maximizing the margin between them. This hyperplane is based on a subset of training data points called support vectors.
Learn More: One-Class Classification Using Support Vector Machines
They both differ in their soft margin formulations and the way they use them:
(Soft margin in SVM is used to allow some degree of misclassification)
One-class SVM aims to discover a hyperplane with maximum margin within the feature space by separating the mapped data from the origin. On a dataset Dn = {x1, . . . , xn} with xi ∈ X (xi is a feature) and n dimensions:
This equation represents the primal problem formulation for OC-SVM, where w is the separating hyperplane, ρ is the offset from the origin, and ξi are slack variables. They allow for a soft margin but penalize violations ξi. A hyperparameter ν ∈ (0, 1] controls the effect of the slack variable and should be adjusted according to need. The objective is to minimize the norm of w while penalizing deviations from the margin. Further, this allows a fraction of the data to fall within the margin or on the wrong side of the hyperplane.
W.X + b =0 is the decision boundary, and the slack variables penalize deviations.
Traditional-Support Vector Machines (SVM) use the soft margin formulation for misclassification errors. Or they use data points that fall within the margin or on the wrong side of the decision boundary.
Where:
w is the weight vector.
b is the bias term.
ξi are slack variables that allow for soft margin optimization.
C is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.
ϕ(xi) represents the feature mapping function.
In traditional SVM, a supervised learning method that relies on class labels for separation incorporates slack variables to permit a certain level of misclassification. SVM’s primary objective is to separate data points of distinct classes using the decision boundary W.X + b = 0. The value of slack variables varies depending on the location of data points: they are set to 0 if the data points are located beyond the margins. If the data point resides within the margin, the slack variables range between 0 and 1, extending beyond the opposite margin if greater than 1.
Both traditional SVMs and One-Class SVMs with soft margin formulations aim to minimize the norm of the weight vector. Still, they differ in their objectives and how they handle misclassification errors or deviations from the decision boundary. Traditional SVMs optimize classification accuracy to avoid overfitting, while One-Class SVMs focus on modeling the target class and controlling the proportion of outliers or novel instances.
Also Read: The A-Z Guide to Support Vector Machine
Kernel functions play a crucial role in One-Class SVM by allowing the algorithm to operate in higher-dimensional feature spaces without explicitly computing the transformations. In One-Class SVM, as in traditional SVMs, kernel functions are used to measure the similarity between pairs of data points in the input space. Common kernel functions used in One-Class SVM include Gaussian (RBF), polynomial, and sigmoid kernels. These kernels map the original input space into a higher-dimensional space, where data points become linearly separable or exhibit more distinct patterns, facilitating learning. By choosing an appropriate kernel function and tuning its parameters, One-Class SVM can effectively capture complex relationships and non-linear structures in the data, improving its ability to detect anomalies or outliers.
In cases where the data is not linearly separable, such as when dealing with complex or overlapping patterns, Support Vector Machines (SVMs) can employ a Radial Basis Function (RBF) kernel to segregate outliers from the rest of the data effectively. The RBF kernel transforms the input data into a higher-dimensional feature space that can be better separated.
The concept of margin and support vectors in One-Class SVM is similar to that in traditional SVMs. The margin refers to the region between the decision boundary (hyperplane) and the nearest data points from each class. In One-Class SVM, the margin represents the region where most of the data points belonging to the target class lie. Maximizing the margin is crucial for One-Class SVM as it helps generalize new data points well and improves the model’s robustness. Support vectors are the data points that lie on or within the margin and contribute to defining the decision boundary.
In One-Class SVM, support vectors are the data points from the target class closest to the decision boundary. These support vectors play a significant role in determining the shape and orientation of the decision boundary and, thus, in the overall performance of the One-Class SVM model. By identifying the support vectors, One-Class SVM effectively learns the representation of the target class in the feature space and constructs a decision boundary that encapsulates most of the data points while minimizing the risk of including outliers or novel instances.
Detecting anomalies using One-class SVM (Support Vector Machine) through both novelty detection and outlier detection techniques:
It involves identifying observations in the training data that significantly deviate from the rest, often called outliers. Estimators for outlier detection aim to fit the areas where the training data is most concentrated, disregarding these deviant observations.
from sklearn.svm import OneClassSVM
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from sklearn.inspection import DecisionBoundaryDisplay
# Load data
X = load_wine()["data"][:, [6, 9]] # "banana"-shaped
# Define estimators (One-Class SVM)
estimators_hard_margin = {
"Hard Margin OCSVM": OneClassSVM(nu=0.01, gamma=0.35), # Very small nu for hard margin
}
estimators_soft_margin = {
"Soft Margin OCSVM": OneClassSVM(nu=0.25, gamma=0.35), # Nu between 0 and 1 for soft margin
}
# Plotting setup
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
colors = ["tab:blue", "tab:orange", "tab:red"]
legend_lines = []
# Hard Margin OCSVM
ax = axs[0]
for color, (name, estimator) in zip(colors, estimators_hard_margin.items()):
estimator.fit(X)
DecisionBoundaryDisplay.from_estimator(
estimator,
X,
response_method="decision_function",
plot_method="contour",
levels=[0],
colors=color,
ax=ax,
)
legend_lines.append(mlines.Line2D([], [], color=color, label=name))
ax.scatter(X[:, 0], X[:, 1], color="black")
ax.legend(handles=legend_lines, loc="upper center")
ax.set(
xlabel="flavanoids",
ylabel="color_intensity",
title="Hard Margin Outlier detection (wine recognition)",
)
# Soft Margin OCSVM
ax = axs[1]
legend_lines = []
for color, (name, estimator) in zip(colors, estimators_soft_margin.items()):
estimator.fit(X)
DecisionBoundaryDisplay.from_estimator(
estimator,
X,
response_method="decision_function",
plot_method="contour",
levels=[0],
colors=color,
ax=ax,
)
legend_lines.append(mlines.Line2D([], [], color=color, label=name))
ax.scatter(X[:, 0], X[:, 1], color="black")
ax.legend(handles=legend_lines, loc="upper center")
ax.set(
xlabel="flavanoids",
ylabel="color_intensity",
title="Soft Margin Outlier detection (wine recognition)",
)
plt.tight_layout()
plt.show()
The plots allow us to visually inspect the performance of the One-Class SVM models in detecting outliers in the Wine dataset.
By comparing the results of hard margin and soft margin One-Class SVM models, we can observe how the choice of margin setting (nu parameter) affects outlier detection.
The hard margin model with a very small nu value (0.01) likely results in a more conservative decision boundary. It tightly wraps around the majority of the data points and potentially classifies fewer points as outliers.
Conversely, the soft margin model with a larger nu value (0.35) likely results in a more flexible decision boundary. Thus allowing for a wider margin and potentially capturing more outliers.
On the other hand, we apply it when the training data is free from outliers, and the goal is to determine whether a new observation is rare, i.e., very different from known observations. This latest observation here is called a novelty.
import numpy as np
from sklearn import svm
# Generate train data
np.random.seed(30)
X = 0.3 * np.random.randn(100, 2)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * np.random.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
# fit the model
clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size
import matplotlib.font_manager
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
_, ax = plt.subplots()
# generate grid for the boundary display
xx, yy = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
X = np.concatenate([xx.reshape(-1, 1), yy.reshape(-1, 1)], axis=1)
DecisionBoundaryDisplay.from_estimator(
clf,
X,
response_method="decision_function",
plot_method="contourf",
ax=ax,
cmap="PuBu",
)
DecisionBoundaryDisplay.from_estimator(
clf,
X,
response_method="decision_function",
plot_method="contourf",
ax=ax,
levels=[0, 10000],
colors="palevioletred",
)
DecisionBoundaryDisplay.from_estimator(
clf,
X,
response_method="decision_function",
plot_method="contour",
ax=ax,
levels=[0],
colors="darkred",
linewidths=2,
)
s = 40
b1 = ax.scatter(X_train[:, 0], X_train[:, 1], c="white", s=s, edgecolors="k")
b2 = ax.scatter(X_test[:, 0], X_test[:, 1], c="blueviolet", s=s, edgecolors="k")
c = ax.scatter(X_outliers[:, 0], X_outliers[:, 1], c="gold", s=s, edgecolors="k")
plt.legend(
[mlines.Line2D([], [], color="darkred"), b1, b2, c],
[
"learned frontier",
"training observations",
"new regular observations",
"new abnormal observations",
],
loc="upper left",
prop=matplotlib.font_manager.FontProperties(size=11),
)
ax.set(
xlabel=(
f"error train: {n_error_train}/200 ; errors novel regular: {n_error_test}/40 ;"
f" errors novel abnormal: {n_error_outliers}/40"
),
title="Novelty Detection",
xlim=(-5, 5),
ylim=(-5, 5),
)
plt.show()
For nu=0.5:
The “nu” value in One-class SVM plays a crucial role in controlling the fraction of outliers tolerated by the model. It directly affects the model’s ability to identify anomalies and thus influences the prediction. We can see that the model is allowing 100 training points to be misclassified. A lower value of nu implies a stricter constraint on the allowed fraction of outliers. The choice of nu influences the model’s performance in detecting anomalies. It also requires careful tuning based on the application’s specific requirements and the dataset’s characteristics.
For gamma=0.5 and nu=0.5
In One-class SVM, the gamma hyperparameter represents the kernel coefficient for the ‘rbf’ kernel. This hyperparameter influences the shape of the decision boundary and, consequently, affects the model’s predictive performance.
When gamma is high, a single training example limits its influence to its immediate vicinity. This creates a more localized decision boundary. Therefore, data points must be closer to the support vectors to belong to the same class.
Utilizing One-Class SVM for anomaly detection, using outlier and novelty detection offers a robust solution across various domains. This helps in scenarios where labeled anomaly data is scarce or unavailable. Thus making it particularly valuable in real-world applications where anomalies are rare and challenging to define explicitly. Its use cases extend to diverse domains, such as cybersecurity and fault diagnosis, where anomalies have consequences. However, while One-Class SVM presents numerous benefits, it’s necessary to set the hyperparameters according to the data to get better results, which can sometimes be tedious.
A. One-Class SVM constructs a hyperplane (or a hypersphere in higher dimensions) that encapsulates the normal data points. This hyperplane is positioned to maximize the margin between the normal data and the decision boundary. Data points are classified as normal (inside the boundary) or anomalies (outside the boundary) during testing or inference.
A. One-class SVM is advantageous because it does not require labeled data for anomalies during training. It can learn from a dataset containing only regular instances, making it suitable for scenarios where anomalies are rare and challenging to obtain labeled examples for training.