• Latest
  • Trending
Implementation Time by Integrating Jupyter and KNIME

Implementation Time by Integrating Jupyter and KNIME

December 24, 2021
EU Agrees New Cybersecurity Legislation for Critical Services Organizations

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

May 16, 2022
Apple releases iOS 15.5 RC, here’s the list of everything new

Apple releases iOS 15.5 RC, here’s the list of everything new

May 13, 2022
MSI Full AMD 300-Series Motherboard Compatibility for Zen 3

MSI Full AMD 300-Series Motherboard Compatibility for Zen 3

May 13, 2022
Gigabyte New Flagship AORUS 17X Gaming Laptop with Extreme Performance

Gigabyte New Flagship AORUS 17X Gaming Laptop with Extreme Performance

May 13, 2022
MediaTek Unveils New AIoT Platform Stack and Genio 1200 AIoT Chip

MediaTek Unveils New AIoT Platform Stack and Genio 1200 AIoT Chip

May 13, 2022
Oracle expands global network of industry innovation labs

Oracle expands global network of industry innovation labs

May 13, 2022
Google announces 30,000 scholarships under African developer scheme

Google announces 30,000 scholarships under African developer scheme

May 13, 2022
Huawei attracts global talent to tackle world-class challenges

Huawei attracts global talent to tackle world-class challenges

May 13, 2022
MTN SA Commits R2.2 Billion For Network Modernisation

MTN SA Commits R2.2 Billion For Network Modernisation

May 13, 2022
Micron Delivers Industry-Leading Capacity Sizes and QLC NAND

Micron Delivers Industry-Leading Capacity Sizes and QLC NAND

May 13, 2022
ADATA LEGEND 850 and Limited Edition PCIe Gen4 x4 M.2 2280 SSDs

ADATA LEGEND 850 and Limited Edition PCIe Gen4 x4 M.2 2280 SSDs

May 13, 2022
SMART Modular Technologies New DuraMemory DDR5 VLP RDIMM

SMART Modular Technologies New DuraMemory DDR5 VLP RDIMM

May 13, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Monday, 16 May, 2022
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

Implementation Time by Integrating Jupyter and KNIME

by ITECHNEWS
December 24, 2021
in Data Science, Leading Stories
0 0
0
Implementation Time by Integrating Jupyter and KNIME

Data scientists are known for creating their own bubble within the 3I structure — Implement, Integrate, and Innovate. I personally lean towards the last two Is: Integrate new technologies for constant experimentation and Innovate to attain remarkable results.

I have been working with Jupyter Notebook for the last 4–5 years and I feel very comfortable working with it. On the other hand, I share a lot of work projects with my teammate Paolo, who is an expert in building KNIME workflows. You’d think this could be a problem … it’s not!

YOU MAY ALSO LIKE

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

Apple releases iOS 15.5 RC, here’s the list of everything new

KNIME Analytics Platform and Jupyter Notebook are both known for their visual appeal in solving data analytics problems (Fig. 1). Jupyter Notebook presents a simplified script interface for over 40 programming languages via a web browser, but it is in the end a coding platform mostly popular among Python users. KNIME Analytics Platform runs completely on a graphical user interface controlled by drag-and-drop operations and visual programming. It provides a quick understanding of the logic and structure of complex data analysis by representing it via a visual and transparent workflow. You can also write snippets of Python code in KNIME Analytics Platform; the Jupyter Notebook user experience simplifies this input and execution of code resulting in a UI optimized for coders. Now, imagine the plethora of possible applications one can build on the synergy of these two platforms.

In this article, we discuss two common life scenarios that require collaboration between Jupyter Notebook and KNIME Analytics Platform and show how simple this is.

 

Collaboration between Jupyter Notebook and KNIME Analytics Platform

In this section we describe two scenarios and how to:

  • Integrate a Jupyter Notebook in a KNIME workflow (scenario 1)
  • Integrate a KNIME workflow into Jupyter code (scenario 2)

In scenario 1, my teammate Paolo was in control of the project and, pressured by time, asked me for help in implementing a custom data transformation, which I did in Jupyter Notebook. Here, I will show how Paolo integrated my Jupyter script into his KNIME workflow.

In scenario 2, I was in charge of the project and, yet still pressured by time, I asked Paolo to help me with building a workflow to train a classification model, which he provided in KNIME Analytics Platform. Here I will show how I integrated Paolo’s workflow into my Python script from Jupyter Notebook.

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 1: Here we show the user interfaces of the two data science tools: on the left KNIME Analytics Platform and on the right Jupyter Notebook.
 

 

Integrating a Jupyter Notebook in a KNIME Workflow — Scenario 1

Flight delay prediction with machine learning

Paolo was requested to classify flights by their departure delay in the Airline dataset. Each row in the flight delay dataset describes a flight, through its origin, destination, scheduled departure time, and so on. Any flight with a departure delay > 15 minutes was labeled as “delayed”. The requested task was to train a machine learning model to classify whether a flight will be delayed at departure, considering all other suitable flight attributes as input features.

The steps taken to build a workflow to train and evaluate a machine learning model are usually the same: import the data, transform and clean the data, partition them into training set and test set, train the machine learning (ML) model of choice on the training set, apply the trained model to the test set, and score its performance with the scoring metric of choice. The Run Jupyter in KNIME workflow would then look more or less like the one in Fig. 2 (you can download it from the KNIME Hub).

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 2. The Run Jupyter in KNIME training workflow to train and evaluate a ML model to predict departure delays in flights.
 

 

The data cleaning and data transformation part is often time consuming, since it depends on the data domain as well. Paolo was pressured by time and asked me if I could implement that part.

Well, this seemed quite easy for me to do using Jupyter Notebook. Paolo could then import it in his workflow using a Python Script node. Let’s have a look at that step by step.

  • Step 1. Write the Python code in Jupyter Notebook.
  • Step 2. Set up the Python Environment in KNIME Analytics Platform
  • Step 3. Execute the Jupyter Notebook code from the KNIME workflow.

 

Step 1. Write the Python code in Jupyter Notebook

I have created a Python function called Custom_Transformation in Jupyter Notebook(Fig. 3). The function implements some basic feature engineering and returns the original features set along with the transformed features.

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 3. Python function for feature transformation.
 

 

Now, we need to import this code into Paolo’s KNIME workflow.

 

Step 2. Set up the Python environment in KNIME Analytics Platform

 

  • In KNIME Analytics Platform, click File → Preferences → KNIME → Python
  • In the Python preference page (Fig. 4a), create a new Conda environment
  • Click New environment for Python 2 or Python 3 as per the installed Python version on your system.
  • The New Conda environment dialog box opens (Fig. 4b). Now you have to:
  • Enter the environment name in the field highlighted in yellow
  • Click Create new environment
  • Once the environment is created, click Apply and Close in the Preference page (Fig. 4a).

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 4a. Python Preference window.
 

 

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 4b. Dialog box for new Conda environment creation.
 

 

For a step-by-step guide of how to install the Python integration in KNIME and get it working, check my article How to Set Up the Python Extension.

 

Step 3. Execute the Jupyter Notebook code from a KNIME workflow

In the KNIME workflow in Fig. 2, we introduced a Python Script node (the second node from the left after the Table Reader node). This node loads and runs my code from Jupyter Notebook (Fig. 5).

The key instruction in the script is:

My_notebook = knime_jupyter.load_notebook(...)

 

This line uses the knimepy instruction

knime_jupyter.load_notebook

 

to locate the Jupyter Notebook Custom_Transformation, and to load the Jupyter script into the my_notebook variable.

The next line executes the function

Custom_Transformation

 

and returns the results into a pandas DataFrame named

output_table

 

Note. The code in Jupyter Notebook should always provide an output of type pandas DataFrame.

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 5. Configuration dialog for the Python Script node.
 

 

The script to run the Jupyter Notebook inside the Python Script node is shown below:

#Copy input to output

notebook_location = flow_variables['path_to_jupy']

#Filename of the notebook
notebook_name = flow_variables['path_to_jupy (file name)']

#Path to the folder containing the notebook
notebook_directory = notebook_location.replace(notebook_name, "")

#Load the notebook as a Python module
my_notebook = knime_jupyter.load_notebook(notebook_directory, notebook_name)

#Call a function 'custom_transformation' defined in the notebook
output_table = my_notebook.Custom_Transformation(input_table)

 

After executing the Python Script node, at its output we find the transformed features added to the original features now ready to be passed to the next node in the KNIME workflow.

 

Integrating a KNIME Workflow into Jupyter Code — Scenario 2

Flight delay prediction with machine learning

I was asked to implement the deployment application that predicts flight departure delays using the model previously trained on the flight delay dataset. I want to develop this application using Jupyter Notebook. To save time, I would like to borrow and integrate a deployment workflow from Paolo’s work. That is, I would like to integrate a KNIME workflow into my Jupyter Notebook.

This is done in three easy steps, specular to the three steps used in scenario 1.

  • Step 1. Build the KNIME workflow
  • Step 2. Set up the KNIME environment in Jupyter Notebook
  • Step 3. Execute the KNIME workflow from Jupyter Notebook

 

Step 1. Build the KNIME workflow

In our case Paolo already provided me with the KNIME workflow Run KNIME in Jupyter via the KNIME Hub. The workflow deploys the machine learning model to predict flight departure delays (Fig. 6).

 

Step 2. Set up the KNIME package in a Jupyter Notebook

In the Command/Anaconda Prompt, enter

pip install knime

 

This installs the latest knimepy package. This package enables Jupyter Notebook to read and run KNIME workflows.

 

Step 3. Execute the KNIME workflow from a Jupyter Notebook

This is what I need to write in my Jupyter Notebook (Fig. 7 and Fig. 8) in order to import and run the selected KNIME workflow.

  • Import knime package
  • Import the paths to the KNIME executable, to the workspace, and to the KNIME workflow
  • The command knime.Workflow(...) visualizes the workflow. I use it to double check that Jupyter is pointing to the intended workflow (Fig. 7)
  • This instruction wf.data_table_inputs[0]=data_set passes the external data stored as DataFrame from Jupyter to the KNIME workflow (Fig. 7).
  • The wf.execute command executes the KNIME workflow
  • After the workflow is executed, the results are stored by default in wf.data_table_outputs[0] (Fig. 8)

Note. Make sure that the workflow being executed from Jupyter is not concurrently open in KNIME. This stalls execution as the workflow is already open for editing in KNIME.

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 6. Setting up the workspace in Jupyter Notebook and displaying the selected KNIME workflow Run KNIME in Jupyter.
 

 

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 7. Code for executing the workflow.
 

 

As of KNIME Analytics Platform 4.3, I can also call and execute KNIME workflows residing on a KNIME Server, rather than on my local client installation. This has three main advantages:

  • Improved scalability: Execution on a KNIME Server can exploit more computational power from another machine on premise, from a server in the cloud, or even parallel computation of several KNIME Executors or via KNIME Edge.
  • Greater accessibility: Anyone with credentials and an internet connection can use the shared KNIME workflow deployed on the KNIME Server from their Jupyter Notebook
  • Better security and versioning: Combining Jupyter Notebook with KNIME Analytics Platform usually implies data scientists with different backgrounds collaborating in an organization where data access might be restricted. KNIME Server offers a safe and consistent way for sharing workflows, data and Jupyter Notebook edit after edit.

The only change to the previous script consists in the path to the KNIME workflow on the KNIME server and the required username and password to access it, as shown in Fig. 8. This script is also available on GitHub repository. Also, in this case, there isn’t the need to specify the knime executer path like before.

 


Cutting Down Implementation Time by Integrating Jupyter and KNIME
Fig. 8. Code snippet to execute workflow on KNIME Server from Jupyter Notebook.
 

 

Collaboration is key!

The fact that you don’t need to choose between Jupyter Notebook and KNIME Analytics Platform is an excellent feature to foster collaboration in a team. By mixing and matching Jupyter Notebook scripts and KNIME workflow snippets we efficiently produced a very advanced set of applications to train, apply, and deploy machine learning models for predictions.

Coupling this strategy with the KNIME Server enterprise features enables even greater collaboration among data scientists with different backgrounds and tools.

Collaboration is always a key factor in a data science lab and whether you are looking for an open source solution or an enterprise one, KNIME software can help you there with its flexibility in integrating Jupyter and many other tools.

Source: Mahantesh Pattadkal, Data Scientist @ KNIME
Tags: JupyterKNIME
ShareTweetShare

Get real time update about this post categories directly on your device, subscribe now.

Unsubscribe

Search

No Result
View All Result

Recent News

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

May 16, 2022
Apple releases iOS 15.5 RC, here’s the list of everything new

Apple releases iOS 15.5 RC, here’s the list of everything new

May 13, 2022
MSI Full AMD 300-Series Motherboard Compatibility for Zen 3

MSI Full AMD 300-Series Motherboard Compatibility for Zen 3

May 13, 2022

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

EU Agrees New Cybersecurity Legislation for Critical Services Organizations

May 16, 2022
Apple releases iOS 15.5 RC, here’s the list of everything new

Apple releases iOS 15.5 RC, here’s the list of everything new

May 13, 2022

Recent News

  • EU Agrees New Cybersecurity Legislation for Critical Services Organizations May 16, 2022
  • Apple releases iOS 15.5 RC, here’s the list of everything new May 13, 2022
  • MSI Full AMD 300-Series Motherboard Compatibility for Zen 3 May 13, 2022
  • Gigabyte New Flagship AORUS 17X Gaming Laptop with Extreme Performance May 13, 2022
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© 2021 iTechNewsOnline.Com - Powered by BackUpDataSystems

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© 2021 iTechNewsOnline.Com - Powered by BackUpDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version