As the most popular programming language for data science, Python packages, frameworks, and libraries are pulled by the millions each month. Month-to-month, Python packages reflect growing trends in the field of data science; as NLP is talked about more often, so will we see more packages pulled for NLP purposes. We took a look at PyPI stats over the past month to see which Python machine learning packages were used the most, and some of the results are expected and some are quite surprising.
Please be aware that many of these packages are downloaded as ‘dependent packages’ versus independent downloads.
1. Scikit-learn: 22.3M
The Python machine learning package with the most downloads by far is scikit-learn with over 22 million downloads in the last month. This high-level Python machine learning package has become a go-to for ML developers, thanks to its broad ability to handle regression, classification, clustering, and more.
2. TensorFlow: 11.5M
While TensorFlow received about half as many downloads as scikit-learn, this popular machine learning framework is still beloved by the data science community. Originally developed by Google, this open-source framework comes with strong support for machine learning and deep learning, and the flexible numerical computation core is used across many other scientific domains. In turn, TensorBoard saw about 10M downloads.
3. Keras: 5.6M
Leading the way for deep learning, Keras saw over 5M downloads in the past month. Built on top of TensorFlow 2.0, Keras makes it easy for teams to scale their deep learning processes, deploy them anywhere, and connect to the greater TensorFlow 2.0 ecosystem.
4. PyTorch: 4.8M
With just under 5M downloads, PyTorch is gaining popularity thanks to its ease of use and ability to quickly scale from research to production and deployment. Initially designed by Facebook AI, PyTorch sees regular use in machine learning, deep learning, computer vision, and NLP.
5. ONNX: 1.2M
ONNX has been part of the Python machine learning package discussion more and more, and last month it saw 1.2M downloads. As an open-source ecosystem for both machine and deep learning, ONNX has been seeing increased use in NLP, working together with Hugging Face for smoother, streamlines NLP pipelines.
6. MXNet: 655K
Even though it didn’t even hit the 1M mark, Apache MXNet is still a valuable framework for deep learning engineers, as it’s designed for efficiency, productivity, and scalability across different machines.
Conclusion on Trending Python Machine Learning Packages
This list isn’t conclusive, as there are plenty of emerging Python machine learning packages emerging. This data was taken from the month of September 2021, so next month and next year may look completely different as updates are released, new frameworks emerge, and compatibilities change.
However, so much of the Python ecosystem will remain the same for the foreseeable future. To make sure you’re prepared to use these packages – and any more to emerge – is still consider training.
At ODSC West 2021 this November 16th-18th, this hybrid in-person and virtual event will feature dozens of sessions on Python to learn more about Python machine learning packages. With topics such as machine learning frameworks to how to implement particular tools for business analytics, ODSC West will have something for everyone. Highlighted sessions include:
- Applications of Modern Survival Modeling with Python: Brian Kent, PhD | Data Scientist/Founder | The Crosstab Kite
- Build a Question Answering System using DistilBERT in Python: Jayeeta Putatunda | Data Scientist | MediaMath
- Identifying Deepfake Images and Videos Using Python with Keras: Noah Giansiracusa, PhD | Assistant Professor of Mathematics/Data Science | Bentley University
- Introduction to Scikit-learn: Machine learning in Python: Thomas Fan | Senior Software Engineer | Quansight Lab
- Introduction to DL-based Natural Language Processing using TensorFlow and PyTorch: Magnus Ekman, PhD | Director | NVIDIA