• Latest
  • Trending
5 Tools for Getting Started with Data Science on GitHub

5 Tools for Getting Started with Data Science on GitHub

December 27, 2021
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022
Fibre optic interconnection linking Cameroon and Congo now operational

Fibre optic interconnection linking Cameroon and Congo now operational

July 15, 2022
Ericsson and MTN Rwandacell Discuss their Long-Term Partnership

Ericsson and MTN Rwandacell Discuss their Long-Term Partnership

July 15, 2022
Airtel Africa Purchases $42M Worth of Additional Spectrum

Airtel Africa Purchases $42M Worth of Additional Spectrum

July 15, 2022
Huawei steps up drive for Kenyan talent

Huawei steps up drive for Kenyan talent

July 15, 2022
TSMC predicts Q3 revenue boost thanks to increased iPhone 13 demand

TSMC predicts Q3 revenue boost thanks to increased iPhone 13 demand

July 15, 2022
Facebook to allow up to five profiles tied to one account

Facebook to allow up to five profiles tied to one account

July 15, 2022
Top 10 apps built and managed in Ghana

Top 10 apps built and managed in Ghana

July 15, 2022
MTN Group to Host the 2nd Edition of the MoMo API Hackathon

MTN Group to Host the 2nd Edition of the MoMo API Hackathon

July 15, 2022
KIOXIA Introduce JEDEC XFM Removable Storage with PCIe/NVMe Spec

KIOXIA Introduce JEDEC XFM Removable Storage with PCIe/NVMe Spec

July 15, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Monday, 6 February, 2023
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

5 Tools for Getting Started with Data Science on GitHub

by ITECHNEWS
December 27, 2021
in Data Science, Leading Stories
0 0
0
5 Tools for Getting Started with Data Science on GitHub

Depending on who you ask, the definition of “data scientist” can vary from “Excel expert” to “deep learning engineer” to “MLOps practitioner” – working individually, or as part of a team. Given this broad spectrum of software engineering experience, it can be challenging for data scientists to ensure that their models and experiments are brought into production safely and sustainably. GitHub can help data scientists with their full end-to-end data science lifecycle, as they track and version control both data and code, reproduce experiments, collaborate effectively with their team members, and deploy models to production.

Below are five tools on GitHub that can help accelerate your machine learning development process:

YOU MAY ALSO LIKE

Inaugural AfCFTA Conference on Women and Youth in Trade

Instagram fined €405m over children’s data privacy

VS Code extensions

First up, we have Visual Studio Code and its extension marketplace. VS Code is a free, lightweight code editor that was built with extensibility in mind: from the UI to the editing experience, almost every part of VS Code can be customized and enhanced. Below are a subset of my favorite extensions and features of the VS CodeIDE, but make sure to check out the marketplace for thousands more:

– Python: This extension provides Python language IntelliSense, linting, debugging, code navigation, code formatting, refactoring, variable and test exploration, and much more.
– SQL Tools: This database explorer is a collection of community-managed extensions that offer support for many common relational databases, including MySQL, SQLite, PostGres, MariaDB, Microsoft SQL Server, and much more.
– Draw.io: An extension that lets you view and edit rich diagrams directly within the editor.
– Live Share: real-time collaborative editing within VS Code (either local, or via the browser).
– GitHub Pull Requests: allows you to review and manage GitHub pull requests and issues in Visual Studio Code, including authenticating and connecting to GitHub; listing and browsing PRs from within VS Code; in-editor commenting, and more.
– Source Control Management: perhaps my most favorite feature in VS Code. If you’re not a fan of git via the command line, this feature gives you a way to merge changes and create graphics locally.

GitHub.dev

If you are browsing any repo on github.com, just clicking . on your keyboard will immediately launch you into github.dev: a browser-based editing environment for GitHub. This browser-based IDE gives you a quick way to edit and navigate code; and is especially useful if you want to edit multiple files at a time, or if you want to take advantage of all of the powerful code editing features of Visual Studio Code when making a change.

Many of the VS Code extensions listed in the previous section are web-enabled, and you can even use specialized compute within the browser. Personally, I have used github.dev with the Pyodide extension both for demos, and to run Python courses using the data science stack: it’s a painless way to create a free, transient Python scratch-pad.

Codespaces

GitHub Codespaces provides cloud-powered development environments for any activity – whether it’s a long-term project, or a short-term task like reviewing a pull request or testing a small change. You can work with Codespaces instances in VS Code locally, or in a browser-based editing environment directly from any GitHub repo – and, even better, all of the extensions for VS Code automatically work in Codespaces.

Data Science on GitHub

You can either use the out-of-the-box Codespace environment, or customize your Codespace instances on a per-project basis, via something called a devcontainer.json file. Example customizations include:

– Setting the Linux-based operating system to use.
– Automatically installing various tools, runtimes, and frameworks.
– Forwarding commonly used ports.
– Setting environment variables.
– Configuring editor settings and installing preferred extensions.

Your existing requirements.txt, Dockerfiles, and conda environment YAMLs are automatically understood by Codespaces, and can be used in devcontainer.json references. If you aren’t a fan of VS Code, you can even use a variety of front-ends with Codespaces, such as Jupyter notebooks or JupyterLab. To create a new Codespace, just click the “Code” button on any GitHub repo, or head to codespace.new.

Model and Data Templates

As you create experiments and machine learning models, it is important to clarify the intended use cases of your work and to minimize any usage contexts for which they are not well-suited. AI ethics researchers are in the process of creating standards for these best practices, which can be included in your repos the same way as your would include a LICENSE.md or a CONTRIBUTIONS.md:

– Model Cards (Mitchell et al, 2018): describes the model, its intended uses and potential limitations, the training parameters and experimental information, and the datasets used to train and evaluate results.
– Datasheets for Datasets (Gebru et al, 2021): a markdown file that describes a dataset’s motivation, composition, collection process, and recommended uses. These datasheets facilitate better communication between dataset creators and consumers, and encourage the machine learning community to prioritize transparency and accountability.

An example YAML section from a model card that specifies metadata:

Data Science on GitHubGitHub Actions

Github Actions allow you to automate, customize, and execute software development workflows directly in your repository. You can think of GitHub Actions as supercharged cron jobs, that can be used for every step of your machine learning and data science development process, from:

– Consuming and transforming data.
– Appending new data in cloud storage buckets.

– Version controlling datasets.
– Retraining models, and storing performance metrics.
– Generating reports and dashboards.
– Deploying new models.

…and much, much more. You can view and search through Data and Machine Learning Actions in our marketplace, and be sure to take a look at our collection of resources on how to facilitate machine learning operations practices with GitHub.

Source: Paige Bailey
Via: ODSC Community
Tags: Data ScienceGitHubTools
ShareTweetShare
Plugin Install : Subscribe Push Notification need OneSignal plugin to be installed.

Search

No Result
View All Result

Recent News

Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022

Recent News

  • Inaugural AfCFTA Conference on Women and Youth in Trade September 6, 2022
  • Instagram fined €405m over children’s data privacy September 6, 2022
  • 5.7bn data entries found exposed on Chinese VPN August 18, 2022
  • Fibre optic interconnection linking Cameroon and Congo now operational July 15, 2022
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Go to mobile version