• Latest
  • Trending
5 Tools for Getting Started with Data Science on GitHub

5 Tools for Getting Started with Data Science on GitHub

December 27, 2021
ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023
Data Leak Hits Thousands of NHS Workers

Data Leak Hits Thousands of NHS Workers

February 20, 2023
EU Cybersecurity Agency Warns Against Chinese APTs

EU Cybersecurity Agency Warns Against Chinese APTs

February 20, 2023
How Your Storage System Will Still Be Viable in 5 Years’ Time?

How Your Storage System Will Still Be Viable in 5 Years’ Time?

February 20, 2023
The Broken Promises From Cybersecurity Vendors

Cloud Infrastructure Used By WIP26 For Espionage Attacks on Telcos

February 20, 2023
Instagram and Facebook to get paid-for verification

Instagram and Facebook to get paid-for verification

February 20, 2023
YouTube CEO Susan Wojcicki steps down after nine years

YouTube CEO Susan Wojcicki steps down after nine years

February 20, 2023
Inaugural AfCFTA Conference on Women and Youth in Trade

Inaugural AfCFTA Conference on Women and Youth in Trade

September 6, 2022
Instagram fined €405m over children’s data privacy

Instagram fined €405m over children’s data privacy

September 6, 2022
8 Most Common Causes of a Data Breach

5.7bn data entries found exposed on Chinese VPN

August 18, 2022
  • Consumer Watch
  • Kids Page
  • Directory
  • Events
  • Reviews
Friday, 2 June, 2023
  • Login
itechnewsonline.com
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion
Subscription
Advertise
No Result
View All Result
itechnewsonline.com
No Result
View All Result

5 Tools for Getting Started with Data Science on GitHub

by ITECHNEWS
December 27, 2021
in Data Science, Leading Stories
0 0
0
5 Tools for Getting Started with Data Science on GitHub

Depending on who you ask, the definition of “data scientist” can vary from “Excel expert” to “deep learning engineer” to “MLOps practitioner” – working individually, or as part of a team. Given this broad spectrum of software engineering experience, it can be challenging for data scientists to ensure that their models and experiments are brought into production safely and sustainably. GitHub can help data scientists with their full end-to-end data science lifecycle, as they track and version control both data and code, reproduce experiments, collaborate effectively with their team members, and deploy models to production.

Below are five tools on GitHub that can help accelerate your machine learning development process:

YOU MAY ALSO LIKE

ATC Ghana supports Girls-In-ICT Program

Vice President Dr. Bawumia inaugurates ICT Hub

VS Code extensions

First up, we have Visual Studio Code and its extension marketplace. VS Code is a free, lightweight code editor that was built with extensibility in mind: from the UI to the editing experience, almost every part of VS Code can be customized and enhanced. Below are a subset of my favorite extensions and features of the VS CodeIDE, but make sure to check out the marketplace for thousands more:

– Python: This extension provides Python language IntelliSense, linting, debugging, code navigation, code formatting, refactoring, variable and test exploration, and much more.
– SQL Tools: This database explorer is a collection of community-managed extensions that offer support for many common relational databases, including MySQL, SQLite, PostGres, MariaDB, Microsoft SQL Server, and much more.
– Draw.io: An extension that lets you view and edit rich diagrams directly within the editor.
– Live Share: real-time collaborative editing within VS Code (either local, or via the browser).
– GitHub Pull Requests: allows you to review and manage GitHub pull requests and issues in Visual Studio Code, including authenticating and connecting to GitHub; listing and browsing PRs from within VS Code; in-editor commenting, and more.
– Source Control Management: perhaps my most favorite feature in VS Code. If you’re not a fan of git via the command line, this feature gives you a way to merge changes and create graphics locally.

GitHub.dev

If you are browsing any repo on github.com, just clicking . on your keyboard will immediately launch you into github.dev: a browser-based editing environment for GitHub. This browser-based IDE gives you a quick way to edit and navigate code; and is especially useful if you want to edit multiple files at a time, or if you want to take advantage of all of the powerful code editing features of Visual Studio Code when making a change.

Many of the VS Code extensions listed in the previous section are web-enabled, and you can even use specialized compute within the browser. Personally, I have used github.dev with the Pyodide extension both for demos, and to run Python courses using the data science stack: it’s a painless way to create a free, transient Python scratch-pad.

Codespaces

GitHub Codespaces provides cloud-powered development environments for any activity – whether it’s a long-term project, or a short-term task like reviewing a pull request or testing a small change. You can work with Codespaces instances in VS Code locally, or in a browser-based editing environment directly from any GitHub repo – and, even better, all of the extensions for VS Code automatically work in Codespaces.

Data Science on GitHub

You can either use the out-of-the-box Codespace environment, or customize your Codespace instances on a per-project basis, via something called a devcontainer.json file. Example customizations include:

– Setting the Linux-based operating system to use.
– Automatically installing various tools, runtimes, and frameworks.
– Forwarding commonly used ports.
– Setting environment variables.
– Configuring editor settings and installing preferred extensions.

Your existing requirements.txt, Dockerfiles, and conda environment YAMLs are automatically understood by Codespaces, and can be used in devcontainer.json references. If you aren’t a fan of VS Code, you can even use a variety of front-ends with Codespaces, such as Jupyter notebooks or JupyterLab. To create a new Codespace, just click the “Code” button on any GitHub repo, or head to codespace.new.

Model and Data Templates

As you create experiments and machine learning models, it is important to clarify the intended use cases of your work and to minimize any usage contexts for which they are not well-suited. AI ethics researchers are in the process of creating standards for these best practices, which can be included in your repos the same way as your would include a LICENSE.md or a CONTRIBUTIONS.md:

– Model Cards (Mitchell et al, 2018): describes the model, its intended uses and potential limitations, the training parameters and experimental information, and the datasets used to train and evaluate results.
– Datasheets for Datasets (Gebru et al, 2021): a markdown file that describes a dataset’s motivation, composition, collection process, and recommended uses. These datasheets facilitate better communication between dataset creators and consumers, and encourage the machine learning community to prioritize transparency and accountability.

An example YAML section from a model card that specifies metadata:

Data Science on GitHubGitHub Actions

Github Actions allow you to automate, customize, and execute software development workflows directly in your repository. You can think of GitHub Actions as supercharged cron jobs, that can be used for every step of your machine learning and data science development process, from:

– Consuming and transforming data.
– Appending new data in cloud storage buckets.

– Version controlling datasets.
– Retraining models, and storing performance metrics.
– Generating reports and dashboards.
– Deploying new models.

…and much, much more. You can view and search through Data and Machine Learning Actions in our marketplace, and be sure to take a look at our collection of resources on how to facilitate machine learning operations practices with GitHub.

Source: Paige Bailey
Via: ODSC Community
Tags: Data ScienceGitHubTools
ShareTweetShare
Plugin Install : Subscribe Push Notification need OneSignal plugin to be installed.

Search

No Result
View All Result

Recent News

ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023
Co-Creation Hub’s edtech accelerator puts $15M towards African startups

Co-Creation Hub’s edtech accelerator puts $15M towards African startups

February 20, 2023

About What We Do

itechnewsonline.com

We bring you the best Premium Tech News.

Recent News With Image

ATC Ghana supports Girls-In-ICT Program

ATC Ghana supports Girls-In-ICT Program

April 25, 2023
Vice President Dr. Bawumia inaugurates  ICT Hub

Vice President Dr. Bawumia inaugurates ICT Hub

April 2, 2023

Recent News

  • ATC Ghana supports Girls-In-ICT Program April 25, 2023
  • Vice President Dr. Bawumia inaugurates ICT Hub April 2, 2023
  • Co-Creation Hub’s edtech accelerator puts $15M towards African startups February 20, 2023
  • Data Leak Hits Thousands of NHS Workers February 20, 2023
  • Home
  • InfoSec
  • Opinion
  • Africa Tech
  • Data Storage

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

No Result
View All Result
  • Home
  • Tech
  • Africa Tech
  • InfoSEC
  • Data Science
  • Data Storage
  • Business
  • Opinion

© 2021-2022 iTechNewsOnline.Com - Powered by BackUPDataSystems

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Go to mobile version