• Latest
  • Trending
Implementing a Kernel Principal Component Analysis in Python

Implementing a Kernel Principal Component Analysis in Python

December 20, 2021
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023
Data Leak Hits Thousands of NHS Workers

Data Leak Hits Thousands of NHS Workers

February 20, 2023
EU Cybersecurity Agency Warns Against Chinese APTs

EU Cybersecurity Agency Warns Against Chinese APTs

February 20, 2023
How Your Storage System Will Still Be Viable in 5 Years’ Time?

How Your Storage System Will Still Be Viable in 5 Years’ Time?

February 20, 2023
The Broken Promises From Cybersecurity Vendors

Cloud Infrastructure Used By WIP26 For Espionage Attacks on Telcos

February 20, 2023
Instagram and Facebook to get paid-for verification

Instagram and Facebook to get paid-for verification

February 20, 2023
YouTube CEO Susan Wojcicki steps down after nine years

YouTube CEO Susan Wojcicki steps down after nine years

February 20, 2023
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Thursday, 10 July, 2025
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

Implementing a Kernel Principal Component Analysis in Python

by ITECHNEWS
December 20, 2021
in Data Science, Leading Stories
0 0
0
Implementing a Kernel Principal Component Analysis in Python

Many machine learning algorithms make assumptions about the linear separability of the input data. The perceptron even requires perfectly linearly separable training data to converge. Other algorithms that we have covered so far assume that the lack of perfect linear separability is due to noise: Adaline, logistic regression, and the (standard) SVM to just name a few. However, if we are dealing with nonlinear problems, which we may encounter rather frequently in real-world applications, linear transformation techniques for dimensionality reduction, such as PCA and LDA, may not be the best choice. 

In this section, we will take a look at a kernelized version of PCA, or KPCA. Using KPCA, we will learn how to transform data that is not linearly separable onto a new, lower-dimensional subspace that is suitable for linear classifiers.

YOU MAY ALSO LIKE

ATC Ghana supports Girls-In-ICT Program

Vice President Dr. Bawumia inaugurates ICT Hub

Using some SciPy and NumPy helper functions, we will see that implementing a KPCA is actually really simple:

from scipy.spatial.distance import pdist, squareform

from scipy import exp

from scipy.linalg import eigh

import numpy as np 

def rbf_kernel_pca(X, gamma, n_components):

    """
    RBF kernel PCA implementation.    

    Parameters
    ------------
    X: {NumPy ndarray}, shape = [n_examples, n_features]  

    gamma: float

        Tuning parameter of the RBF kernel    

    n_components: int

        Number of principal components to return    

    Returns
    ------------
    X_pc: {NumPy ndarray}, shape = [n_examples, k_features]

        Projected dataset   

    """

    # Calculate pairwise squared Euclidean distances

    # in the MxN dimensional dataset.

    sq_dists = pdist(X, 'sqeuclidean')    

    # Convert pairwise distances into a square matrix.

    mat_sq_dists = squareform(sq_dists)    

    # Compute the symmetric kernel matrix.

    K = exp(-gamma * mat_sq_dists)    

    # Center the kernel matrix.

    N = K.shape[0]

    one_n = np.ones((N,N)) / N

    K = K - one_n.dot(K) - K.dot(one_n) + one_n.dot(K).dot(one_n)    

    # Obtaining eigenpairs from the centered kernel matrix

    # scipy.linalg.eigh returns them in ascending order

    eigvals, eigvecs = eigh(K)

    eigvals, eigvecs = eigvals[::-1], eigvecs[:, ::-1]    

    # Collect the top k eigenvectors (projected examples)

    X_pc = np.column_stack([eigvecs[:, i]
                           for i in range(n_components)])    

    return X_pc

One downside of using an RBF KPCA for dimensionality reduction is that we have to specify the parameter a priori. Finding an appropriate value requires experimentation and is best done using algorithms for parameter tuning, for example, performing a grid search.

Example – separating half-moon shapes

Now, let us apply our rbf_kernel_pca on some nonlinear example datasets. We will start by creating a two-dimensional dataset of 100 example points representing two half-moon shapes:

>>> from sklearn.datasets import make_moons

>>> X, y = make_moons(n_examples=100, random_state=123)

>>> plt.scatter(X[y==0, 0], X[y==0, 1],

...     color='red', marker='^', alpha=0.5)

>>> plt.scatter(X[y==1, 0], X[y==1, 1],

...     color='blue', marker='o', alpha=0.5)

>>> plt.tight_layout()

>>> plt.show()

For the purposes of illustration, the half-moon of triangle symbols will represent one class, and the half-moon depicted by the circle symbols will represent the examples from another class:

Principal Component Analysis in Python

Clearly, these two half-moon shapes are not linearly separable, and our goal is to unfold the half-moons via KPCA so that the dataset can serve as a suitable input for a linear classifier. But first, let’s see how the dataset looks if we project it onto the principal components via standard PCA:

>>> from sklearn.decomposition import PCA

>>> scikit_pca = PCA(n_components=2)

>>> X_spca = scikit_pca.fit_transform(X)

>>> fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7,3))

>>> ax[0].scatter(X_spca[y==0, 0], X_spca[y==0, 1],
...               color='red', marker='^', alpha=0.5)

>>> ax[0].scatter(X_spca[y==1, 0], X_spca[y==1, 1],
...               color='blue', marker='o', alpha=0.5)

>>> ax[1].scatter(X_spca[y==0, 0], np.zeros((50,1))+0.02,
...               color='red', marker='^', alpha=0.5)

>>> ax[1].scatter(X_spca[y==1, 0], np.zeros((50,1))-0.02,
...               color='blue', marker='o', alpha=0.5)

>>> ax[0].set_xlabel('PC1')

>>> ax[0].set_ylabel('PC2')

>>> ax[1].set_ylim([-1, 1])

>>> ax[1].set_yticks([])

>>> ax[1].set_xlabel('PC1')

>>> plt.tight_layout()

>>> plt.show()

 

Clearly, we can see in the resulting figure that a linear classifier would be unable to perform well on the dataset transformed via standard PCA:

Principal Component Analysis in Python

Note that when we plotted the first principal component only (right subplot), we shifted the triangular examples slightly upwards and the circular examples slightly downwards to better visualize the class overlap. As the left subplot shows, the original half-moon shapes are only slightly sheared and flipped across the vertical center—this transformation would not help a linear classifier in discriminating between circles and triangles. Similarly, the circles and triangles corresponding to the two half-moon shapes are not linearly separable if we project the dataset onto a one-dimensional feature axis, as shown in the right subplot.

Now, let’s try out our kernel PCA function, rbf_kernel_pca, which we implemented in the previous subsection:

>>> X_kpca = rbf_kernel_pca(X, gamma=15, n_components=2)

>>> fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(7, 3))

>>> ax[0].scatter(X_kpca[y==0, 0], X_kpca[y==0, 1],
...               color='red', marker='^', alpha=0.5)

>>> ax[0].scatter(X_kpca[y==1, 0], X_kpca[y==1, 1],
...               color='blue', marker='o', alpha=0.5)

>>> ax[1].scatter(X_kpca[y==0, 0], np.zeros((50,1))+0.02,
...               color='red', marker='^', alpha=0.5)

>>> ax[1].scatter(X_kpca[y==1, 0], np.zeros((50,1))-0.02,
...               color='blue', marker='o', alpha=0.5)

>>> ax[0].set_xlabel('PC1')

>>> ax[0].set_ylabel('PC2')

>>> ax[1].set_ylim([-1, 1])

>>> ax[1].set_yticks([])

>>> ax[1].set_xlabel('PC1')

>>> plt.tight_layout()

>>> plt.show()

We can now see that the two classes (circles and triangles) are linearly well separated so that we have a suitable training dataset for linear classifiers:

Principal Component Analysis in Python

Unfortunately, there is no universal value for the tuning parameter, , that works well for different datasets. Finding a value that is appropriate for a given problem requires experimentation. Here, I will use values for that I have found to produce good results.

Tags: KernelPython
ShareTweetShare
Plugin Install : Subscribe Push Notification need OneSignal plugin to be installed.

Search

No Result
View All Result

Recent News

ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023

Recent News

  • ATC Ghana supports Girls-In-ICT Program April 25, 2023
  • Vice President Dr. Bawumia inaugurates ICT Hub April 2, 2023
  • Co-Creation Hub’s edtech accelerator puts $15M towards African startups February 20, 2023
  • Data Leak Hits Thousands of NHS Workers February 20, 2023
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Go to mobile version