• Latest
  • Trending
Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

December 14, 2021
Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023
Data Leak Hits Thousands of NHS Workers

Data Leak Hits Thousands of NHS Workers

February 20, 2023
EU Cybersecurity Agency Warns Against Chinese APTs

EU Cybersecurity Agency Warns Against Chinese APTs

February 20, 2023
How Your Storage System Will Still Be Viable in 5 Years’ Time?

How Your Storage System Will Still Be Viable in 5 Years’ Time?

February 20, 2023
The Broken Promises From Cybersecurity Vendors

Cloud Infrastructure Used By WIP26 For Espionage Attacks on Telcos

February 20, 2023
Instagram and Facebook to get paid-for verification

Instagram and Facebook to get paid-for verification

February 20, 2023
YouTube CEO Susan Wojcicki steps down after nine years

YouTube CEO Susan Wojcicki steps down after nine years

February 20, 2023
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Sunday, 17 May, 2026
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

by ITECHNEWS
December 14, 2021
in Data Science, Leading Stories
0 0
0
Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

It’s standard industry practice to prototype Machine Learning pipelines in Jupyter notebooks, refactor them into Python modules and then deploy using production tools such as Airflow or Kubernetes. However, this process slows down development as it requires significant changes to the code.

Ploomber enables a leaner approach where data scientists can use Jupyter but still adhere to software development best practices such as code reviews or continuous integration. To prove that this approach is a better alternative to the current prototype in a notebook, then refactor, this presentation develops and deploys a Machine Learning pipeline in 45 minutes.

YOU MAY ALSO LIKE

French Telco Orange Hit by Cyber-Attack

ATC Ghana supports Girls-In-ICT Program

The rest of this post describes how Ploomber achieves such a lean workflow.

Break down logic in multiple files

One of the main issues with notebook-based pipelines is that they often live in a single notebook. Debugging large notebooks is a nightmare, making pipelines hard to maintain. In contrast, Ploomber allows us to break down the logic in multiple, smaller steps that we declare in a pipeline.yaml file. For example, assume we’re working on a model to predict user activity using demographics and past activity. Our training pipeline would look like this:

Ploomber pipeline

Figure 1. Example pipeline

To create such a pipeline, we create a pipeline.yaml file and list our tasks (source) with their corresponding outputs (product):

# pipeline.yaml
tasks:
    # get user demographics
    - source: get-demographics.py
      product:
        nb: output/demographics.ipynb
        data: output/demographics.csv

    # get user activity
    - source: get-activity.py
      product:
        nb: output/activity.ipynb
        data: output/activity.csv

    # features from user demographics
    - source: fts-demographics.py
      product:
        nb: output/fts-demographics.ipynb
        data: output/fts-demographics.csv

    # features from user activity
    - source: fts-activity.py
      product:
        nb: output/fts-activity.ipynb
        data: output/fts-activity.csv

    # train model
    - source: train.py
      product:
        nb: output/train.ipynb
        data: output/model.pickle

Since each .py has a clearly defined objective, they are easier to maintain and test than a single notebook.

Write code in .py and interact with it using Jupyter

Jupyter is a fantastic tool to develop data pipelines. It allows us to get quick feedback such as metrics or visualizations, essential for understanding our data. However, traditional .ipynb files have a lot of problems. For example, they make code reviews difficult because comparing versions yields illegible results. The following image shows the diff view of a notebook whose only change is a new cell with a comment:

Ploomber example

Figure 2. Illegible notebook diff on GitHub

To fix those problems, Ploomber allows users to open .py files as notebooks, which enables code reviews while still providing the power of interactive development with Jupyter. The following image shows the same .py file rendered as a notebook in Jupyter and as a script in VS Code:

Ploomer pipeline

Figure 3. Same .py file rendered as a notebook in Jupyter and script in VS Code

However, Ploomber leverages the .ipynb format as an output. Each .py executes as a notebook, generating a .ipynbfile that we can use during a code review to check visual results such as tables or charts. Note that in the pipeline.yaml file, each task has a .ipynb file in the product section. See the fragment below:

# pipeline.yaml (fragment)
tasks:
    # the source script...
    - source: get-demographics.py
      product:
        # ...generates a notebook as output
        nb: output/demographics.ipynb
        data: output/demographics.csv

# pipeline.yaml continues...

Retrieve results from previous tasks

Another essential feature is how we establish execution order. For example, to generate features from activity data, we need the raw data:

Figure 4. Declaring upstream dependencies

To establish this dependency, we edit fts-activity.py and add a special upstream variable at the top of the file:

upstream = ['activity']

We are stating that activity.py must execute before fts-activity.py. Once we provide such information, Ploomber adds a new cell to give us the location of our input files; we will see something like this:

# what we write
upstream = ['activity']


# what Ploomber adds in a new cell
upstream = {
    'activity': {
        # extracted from pipeline.yaml
        'nb': 'output/activity.ipynb'
        'data': 'output/activity.csv'
    }
}

No need to hardcode paths to files!

Pipeline composition

A training pipeline and its serving counterpart have a lot of overlap. The only difference is that the training pipeline gets historical records, processes them, and trains a model, while the serving version gets new observations, processes them, and makes predictions.

Figure 5. The training and serving pipelines are mostly the same

All the data processing steps must be the same to prevent discrepancies at serving time. Once we have the training pipeline, we can easily create the serving version. The first step is to create a new file with our processing tasks:

# features.yaml - extracted from the original pipeline.yaml

# features from user demographics
- source: fts-demographics.py
  product:
    nb: output/fts-demographics.ipynb
    data: output/fts-demographics.csv

# features from user activity
- source: fts-activity.py
  product:
    nb: output/fts-activity.ipynb
    data: output/fts-activity.csv

Then we compose the training and serving pipeline by importing such tasks and adding the remaining ones:

Ploomber

We can now deploy our serving pipeline!

Deployment using Ploomber

Once we have our serving pipeline, we can deploy it to any available production backend: Kubernetes (via Argo Workflows), Airflow, or AWS Batch with our second command-line tool: Soopervisor. Such a tool requires a few additional configuration settings to create a Docker image and push our pipeline to production.

That’s it! Ploomber allows us to move back and forth between Jupyter and a production environment without any compromise on software engineering best practices.

ShareTweet

Get real time update about this post categories directly on your device, subscribe now.

Unsubscribe

Search

No Result
View All Result

Recent News

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa

July 29, 2025
French Telco Orange Hit by Cyber-Attack

French Telco Orange Hit by Cyber-Attack

July 29, 2025

Recent News

  • Absa and Visa Extend Strategic Partnership to Advance Growth and Innovation Across Africa July 29, 2025
  • French Telco Orange Hit by Cyber-Attack July 29, 2025
  • ATC Ghana supports Girls-In-ICT Program April 25, 2023
  • Vice President Dr. Bawumia inaugurates ICT Hub April 2, 2023
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© Copyright 2026, All Rights Reserved | iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© Copyright 2026, All Rights Reserved | iTechNewsOnline.Com - Powered by BackUPDataSystems

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version