• Latest
  • Trending
Flight Fare Prediction Using Machine Learning

Flight Fare Prediction Using Machine Learning

January 19, 2022
Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023
Data Leak Hits Thousands of NHS Workers

Data Leak Hits Thousands of NHS Workers

February 20, 2023
EU Cybersecurity Agency Warns Against Chinese APTs

EU Cybersecurity Agency Warns Against Chinese APTs

February 20, 2023
How Your Storage System Will Still Be Viable in 5 Years’ Time?

How Your Storage System Will Still Be Viable in 5 Years’ Time?

February 20, 2023
The Broken Promises From Cybersecurity Vendors

Cloud Infrastructure Used By WIP26 For Espionage Attacks on Telcos

February 20, 2023
Instagram and Facebook to get paid-for verification

Instagram and Facebook to get paid-for verification

February 20, 2023
YouTube CEO Susan Wojcicki steps down after nine years

YouTube CEO Susan Wojcicki steps down after nine years

February 20, 2023
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Wednesday, 29 April, 2026
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

Flight Fare Prediction Using Machine Learning

by ITECHNEWS
January 19, 2022
in Data Science, Leading Stories
0 0
0
Flight Fare Prediction Using Machine Learning

Takeaways from the blog

In this article, we do prediction using machine learning which leads to below takeaways:

  1. EDA: Learn the complete process of EDA
  2. Data analysis: Learn to withdraw some insights from the dataset both mathematically and visualize it.
  3. Data visualization: Visualising the data to get better insight from it.
  4. Feature engineering: We will also see what kind of stuff we can do in the feature engineering part.

About the dataset

  1. Airline: So this column will have all the types of airlines like Indigo, Jet Airways, Air India, and many more.
  2. Date_of_Journey: This column will let us know about the date on which the passenger’s journey will start.
  3. Source: This column holds the name of the place from where the passenger’s journey will start.
  4. Destination: This column holds the name of the place to where passengers wanted to travel.
  5. Route: Here we can know about that what is the route through which passengers have opted to travel from his/her source to their destination.
  6. Arrival_Time: Arrival time is when the passenger will reach his/her destination.
  7. Duration: Duration is the whole period that a flight will take to complete its journey from source to destination.
  8. Total_Stops: This will let us know in how many places flights will stop there for the flight in the whole journey.
  9. Additional_Info: In this column, we will get information about food, kind of food, and other amenities.
  10. Price: Price of the flight for a complete journey including all the expenses before onboarding.

Importing Libraries

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
from math import sqrt
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

from prettytable import PrettyTable

Reading the training data of our dataset

train_df = pd.read_excel("Data_Train.xlsx")
train_df.head(10)

Output:

YOU MAY ALSO LIKE

French Telco Orange Hit by Cyber-Attack

ATC Ghana supports Girls-In-ICT Program

 

Output | Prediction Using Machine Learning

Exploratory Data Analysis (EDA)

Now here we will be looking at the kind of columns our dataset has.

train_df.columns

Output:

Index(['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info', 'Price'],
      dtype='object')

Here we can get more information about our dataset

train_df.info()

Output:

 

Exploratory Data Analysis

To know more about the dataset

train_df.describe()

Output:

Output 2 | Prediction Using Machine Learning

Now while using the IsNull function we will gonna see the number of null values in our dataset

train_df.isnull().head()

Output:

 

Output 3

Now while using the IsNull function and sum function we will gonna see the number of null values in our dataset

train_df.isnull().sum()

Output:

Airline            0
Date_of_Journey    0
Source             0
Destination        0
Route              1
Dep_Time           0
Arrival_Time       0
Duration           0
Total_Stops        1
Additional_Info    0
Price              0
dtype: int64

Dropping NAN values

train_df.dropna(inplace = True)

Duplicate values

train_df[train_df.duplicated()].head()

Output:

 

Output 4 | Prediction Using Machine Learning

Here we will be removing those repeated values from the dataset and keeping the in-place attribute to be true so that there will be no changes.

train_df.drop_duplicates(keep='first',inplace=True)
train_df.head()

Output:

 

Output 5 | Prediction Using Machine Learning
train_df.shape

Output:

(10462, 11)

Checking the Additional_info column and having the count of unique types of values.

train_df["Additional_Info"].value_counts()

Output:

No info                         8182
In-flight meal not included     1926
No check-in baggage included     318
1 Long layover                    19
Change airports                    7
Business class                     4
No Info                            3
1 Short layover                    1
2 Long layover                     1
Red-eye flight                     1
Name: Additional_Info, dtype: int64

Checking the different Airlines

train_df["Airline"].unique()

Output:

array(['IndiGo', 'Air India', 'Jet Airways', 'SpiceJet',
       'Multiple carriers', 'GoAir', 'Vistara', 'Air Asia',
       'Vistara Premium economy', 'Jet Airways Business',
       'Multiple carriers Premium economy', 'Trujet'], dtype=object)

Checking the different Airline Routes

train_df["Route"].unique()

Output: See the code.

Now let’s look at our testing dataset

test_df = pd.read_excel("Test_set.xlsx")
test_df.head(10)

Output:

 

Output 6 | Prediction Using Machine Learning

Now here we will be looking at the kind of columns our testing data has.

test_df.columns

Output:

Index(['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info'],
      dtype='object')

Information about the dataset

test_df.info()

Output:

 

Prediction Using Machine Learning

To know more about the testing dataset

test_df.describe()

Output:

Prediction Using Machine Learning

Now while using the IsNull function and sum function we will gonna see the number of null values in our testing data

test_df.isnull().sum()

Output:

Airline            0
Date_of_Journey    0
Source             0
Destination        0
Route              0
Dep_Time           0
Arrival_Time       0
Duration           0
Total_Stops        0
Additional_Info    0
dtype: int64

Data Visualization

Plotting Price vs Airline plot

sns.catplot(y = "Price", x = "Airline", data = train_df.sort_values("Price", ascending = False), kind="boxen", height = 8, aspect = 3)
plt.show()

Output:

 

Data Visualization | Prediction Using Machine Learning

Inference: Here with the help of the cat plot we are trying to plot the boxplot between the price of the flight and airline and we can conclude that Jet Airways has the most outliers in terms of price.

Plotting Violin plot for Price vs Source

sns.catplot(y = "Price", x = "Source", data = train_df.sort_values("Price", ascending = False), kind="violin", height = 4, aspect = 3)
plt.show()

Output:

 

Output | Data Visualization

Inference: Now with the help of cat plot only we are plotting a box plot between the price of the flight and the source place i.e. the place from where passengers will travel to the destination and we can see that Banglore as the source location has the most outliers while Chennai has the least.

Plotting Box plot for Price vs Destination

sns.catplot(y = "Price", x = "Destination", data = train_df.sort_values("Price", ascending = False), kind="box", height = 4, aspect = 3)
plt.show()

Output:

 

Prediction Using Machine Learning

Inference: Here we are plotting the box plot with the help of a cat plot between the price of the flight and the destination to which the passenger is travelling and figured out that New Delhi has the most outliers and Kolkata has the least.

Feature Engineering

Let’s see our processed data first

train_df.head()

Output:

 

Feature Engineering

Here first we are dividing the features and labels and then converting the hours in minutes.

train_df['Duration'] = train_df['Duration'].str.replace("h", '*60').str.replace(' ','+').str.replace('m','*1').apply(eval)
test_df['Duration'] = test_df['Duration'].str.replace("h", '*60').str.replace(' ','+').str.replace('m','*1').apply(eval)

Date_of_Journey: Here we are organizing the format of the date of journey in our dataset for better preprocessing in the model stage.

train_df["Journey_day"] = train_df['Date_of_Journey'].str.split('/').str[0].astype(int)
train_df["Journey_month"] = train_df['Date_of_Journey'].str.split('/').str[1].astype(int)
train_df.drop(["Date_of_Journey"], axis = 1, inplace = True)

Dep_Time: Here we are converting departure time into hours and minutes

train_df["Dep_hour"] = pd.to_datetime(train_df["Dep_Time"]).dt.hour
train_df["Dep_min"] = pd.to_datetime(train_df["Dep_Time"]).dt.minute
train_df.drop(["Dep_Time"], axis = 1, inplace = True)

Arrival_Time: Similarly we are converting the arrival time into hours and minutes.

train_df["Arrival_hour"] = pd.to_datetime(train_df.Arrival_Time).dt.hour
train_df["Arrival_min"] = pd.to_datetime(train_df.Arrival_Time).dt.minute
train_df.drop(["Arrival_Time"], axis = 1, inplace = True)

Now after final preprocessing let’s see our dataset

train_df.head()

Output:

 

Output | Prediction Using Machine Learning

Plotting Bar chart for Months (Duration) vs Number of Flights

plt.figure(figsize = (10, 5))
plt.title('Count of flights month wise')
ax=sns.countplot(x = 'Journey_month', data = train_df)
plt.xlabel('Month')
plt.ylabel('Count of flights')
for p in ax.patches:
    ax.annotate(int(p.get_height()), (p.get_x()+0.25, p.get_height()+1), va='bottom', color= 'black')

Output:

 

Output | Prediction Using Machine Learning

Inference: Here in the above graph we have plotted the count plot for journey in a month vs several flights and got to see that May has the most number of flights.

Plotting Bar chart for Types of Airline vs Number of Flights

plt.figure(figsize = (20,5))
plt.title('Count of flights with different Airlines')
ax=sns.countplot(x = 'Airline', data =train_df)
plt.xlabel('Airline')
plt.ylabel('Count of flights')
plt.xticks(rotation = 45)
for p in ax.patches:
    ax.annotate(int(p.get_height()), (p.get_x()+0.25, p.get_height()+1), va='bottom', color= 'black')

Output:

 

Prediction Using Machine Learning

Inference: Now from the above graph we can see that between the type of airline and count of flights we can see that Jet Airways has the most flight boarded.

Plotting Ticket Prices VS Airlines

plt.figure(figsize = (15,4))
plt.title('Price VS Airlines')
plt.scatter(train_df['Airline'], train_df['Price'])
plt.xticks
plt.xlabel('Airline')
plt.ylabel('Price of ticket')
plt.xticks(rotation = 90)

Output:

 

Output | Prediction Using Machine Learning

Correlation between all Features

Plotting Correlation

plt.figure(figsize = (15,15))
sns.heatmap(train_df.corr(), annot = True, cmap = "RdYlGn")
plt.show()

Output:

 

Correlation between all features

Dropping the Price column as it is of no use

data = train_df.drop(["Price"], axis=1)

Dealing with Categorical Data and Numerical Data

train_categorical_data = data.select_dtypes(exclude=['int64', 'float','int32'])
train_numerical_data = data.select_dtypes(include=['int64', 'float','int32'])

test_categorical_data = test_df.select_dtypes(exclude=['int64', 'float','int32','int32'])
test_numerical_data  = test_df.select_dtypes(include=['int64', 'float','int32'])
train_categorical_data.head()

Output:

 

Prediction Using Machine Learning

Label Encode and Hot Encode for Categorical Columns

le = LabelEncoder()
train_categorical_data = train_categorical_data.apply(LabelEncoder().fit_transform)
test_categorical_data = test_categorical_data.apply(LabelEncoder().fit_transform)
train_categorical_data.head()

Output:

 

Prediction Using Machine Learning

Concatenating both Categorical Data and Numerical Data

X = pd.concat([train_categorical_data, train_numerical_data], axis=1)
y = train_df['Price']
test_set = pd.concat([test_categorical_data, test_numerical_data], axis=1)
X.head()

Output:

 

Categorial Data | Prediction Using Machine Learning
y.head()

Output:

0     3897
1     7662
2    13882
3     6218
4    13302
Name: Price, dtype: int64

Conclusion

So as we saw that we have done a complete EDA process, getting data insights, feature engineering, and data visualization as well so after all these steps one can go for the prediction using machine learning model-making steps.

Source: Aman Preet Gulati
Tags: Machine Learning
ShareTweet

Get real time update about this post categories directly on your device, subscribe now.

Unsubscribe

Search

No Result
View All Result

Recent News

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025

Recent News

  • Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa July 29, 2025
  • French Telco Orange Hit by Cyber-Attack July 29, 2025
  • ATC Ghana supports Girls-In-ICT Program April 25, 2023
  • Vice President Dr. Bawumia inaugurates ICT Hub April 2, 2023
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© Copyright 2026, All Rights Reserved | iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© Copyright 2026, All Rights Reserved | iTechNewsOnline.Com - Powered by BackUPDataSystems

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version