• Latest
  • Trending
Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

December 14, 2021
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022
Fibre optic interconnection linking Cameroon and Congo now operational

Fibre optic interconnection linking Cameroon and Congo now operational

July 15, 2022
Ericsson and MTN Rwandacell Discuss their Long-Term Partnership

Ericsson and MTN Rwandacell Discuss their Long-Term Partnership

July 15, 2022
Airtel Africa Purchases $42M Worth of Additional Spectrum

Airtel Africa Purchases $42M Worth of Additional Spectrum

July 15, 2022
Huawei steps up drive for Kenyan talent

Huawei steps up drive for Kenyan talent

July 15, 2022
TSMC predicts Q3 revenue boost thanks to increased iPhone 13 demand

TSMC predicts Q3 revenue boost thanks to increased iPhone 13 demand

July 15, 2022
Facebook to allow up to five profiles tied to one account

Facebook to allow up to five profiles tied to one account

July 15, 2022
Top 10 apps built and managed in Ghana

Top 10 apps built and managed in Ghana

July 15, 2022
MTN Group to Host the 2nd Edition of the MoMo API Hackathon

MTN Group to Host the 2nd Edition of the MoMo API Hackathon

July 15, 2022
KIOXIA Introduce JEDEC XFM Removable Storage with PCIe/NVMe Spec

KIOXIA Introduce JEDEC XFM Removable Storage with PCIe/NVMe Spec

July 15, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Sunday, 29 January, 2023
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

by ITECHNEWS
December 14, 2021
in Data Science, Leading Stories
0 0
0
Develop and Deploy a ML Pipeline in 45 Minutes with Ploomber

It’s standard industry practice to prototype Machine Learning pipelines in Jupyter notebooks, refactor them into Python modules and then deploy using production tools such as Airflow or Kubernetes. However, this process slows down development as it requires significant changes to the code.

Ploomber enables a leaner approach where data scientists can use Jupyter but still adhere to software development best practices such as code reviews or continuous integration. To prove that this approach is a better alternative to the current prototype in a notebook, then refactor, this presentation develops and deploys a Machine Learning pipeline in 45 minutes.

YOU MAY ALSO LIKE

Inaugural AfCFTA Conference on Women and Youth in Trade

Instagram fined €405m over children’s data privacy

The rest of this post describes how Ploomber achieves such a lean workflow.

Break down logic in multiple files

One of the main issues with notebook-based pipelines is that they often live in a single notebook. Debugging large notebooks is a nightmare, making pipelines hard to maintain. In contrast, Ploomber allows us to break down the logic in multiple, smaller steps that we declare in a pipeline.yaml file. For example, assume we’re working on a model to predict user activity using demographics and past activity. Our training pipeline would look like this:

Ploomber pipeline

Figure 1. Example pipeline

To create such a pipeline, we create a pipeline.yaml file and list our tasks (source) with their corresponding outputs (product):

# pipeline.yaml
tasks:
    # get user demographics
    - source: get-demographics.py
      product:
        nb: output/demographics.ipynb
        data: output/demographics.csv

    # get user activity
    - source: get-activity.py
      product:
        nb: output/activity.ipynb
        data: output/activity.csv

    # features from user demographics
    - source: fts-demographics.py
      product:
        nb: output/fts-demographics.ipynb
        data: output/fts-demographics.csv

    # features from user activity
    - source: fts-activity.py
      product:
        nb: output/fts-activity.ipynb
        data: output/fts-activity.csv

    # train model
    - source: train.py
      product:
        nb: output/train.ipynb
        data: output/model.pickle

Since each .py has a clearly defined objective, they are easier to maintain and test than a single notebook.

Write code in .py and interact with it using Jupyter

Jupyter is a fantastic tool to develop data pipelines. It allows us to get quick feedback such as metrics or visualizations, essential for understanding our data. However, traditional .ipynb files have a lot of problems. For example, they make code reviews difficult because comparing versions yields illegible results. The following image shows the diff view of a notebook whose only change is a new cell with a comment:

Ploomber example

Figure 2. Illegible notebook diff on GitHub

To fix those problems, Ploomber allows users to open .py files as notebooks, which enables code reviews while still providing the power of interactive development with Jupyter. The following image shows the same .py file rendered as a notebook in Jupyter and as a script in VS Code:

Ploomer pipeline

Figure 3. Same .py file rendered as a notebook in Jupyter and script in VS Code

However, Ploomber leverages the .ipynb format as an output. Each .py executes as a notebook, generating a .ipynbfile that we can use during a code review to check visual results such as tables or charts. Note that in the pipeline.yaml file, each task has a .ipynb file in the product section. See the fragment below:

# pipeline.yaml (fragment)
tasks:
    # the source script...
    - source: get-demographics.py
      product:
        # ...generates a notebook as output
        nb: output/demographics.ipynb
        data: output/demographics.csv

# pipeline.yaml continues...

Retrieve results from previous tasks

Another essential feature is how we establish execution order. For example, to generate features from activity data, we need the raw data:

Figure 4. Declaring upstream dependencies

To establish this dependency, we edit fts-activity.py and add a special upstream variable at the top of the file:

upstream = ['activity']

We are stating that activity.py must execute before fts-activity.py. Once we provide such information, Ploomber adds a new cell to give us the location of our input files; we will see something like this:

# what we write
upstream = ['activity']


# what Ploomber adds in a new cell
upstream = {
    'activity': {
        # extracted from pipeline.yaml
        'nb': 'output/activity.ipynb'
        'data': 'output/activity.csv'
    }
}

No need to hardcode paths to files!

Pipeline composition

A training pipeline and its serving counterpart have a lot of overlap. The only difference is that the training pipeline gets historical records, processes them, and trains a model, while the serving version gets new observations, processes them, and makes predictions.

Figure 5. The training and serving pipelines are mostly the same

All the data processing steps must be the same to prevent discrepancies at serving time. Once we have the training pipeline, we can easily create the serving version. The first step is to create a new file with our processing tasks:

# features.yaml - extracted from the original pipeline.yaml

# features from user demographics
- source: fts-demographics.py
  product:
    nb: output/fts-demographics.ipynb
    data: output/fts-demographics.csv

# features from user activity
- source: fts-activity.py
  product:
    nb: output/fts-activity.ipynb
    data: output/fts-activity.csv

Then we compose the training and serving pipeline by importing such tasks and adding the remaining ones:

Ploomber

We can now deploy our serving pipeline!

Deployment using Ploomber

Once we have our serving pipeline, we can deploy it to any available production backend: Kubernetes (via Argo Workflows), Airflow, or AWS Batch with our second command-line tool: Soopervisor. Such a tool requires a few additional configuration settings to create a Docker image and push our pipeline to production.

That’s it! Ploomber allows us to move back and forth between Jupyter and a production environment without any compromise on software engineering best practices.

ShareTweetShare
Plugin Install : Subscribe Push Notification need OneSignal plugin to be installed.

Search

No Result
View All Result

Recent News

Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022

Recent News

  • Inaugural AfCFTA Conference on Women and Youth in Trade September 6, 2022
  • Instagram fined €405m over children’s data privacy September 6, 2022
  • 5.7bn data entries found exposed on Chinese VPN August 18, 2022
  • Fibre optic interconnection linking Cameroon and Congo now operational July 15, 2022
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Go to mobile version