As investment in AI innovation ramps up, organizations are encountering headwinds from the high costs associated with AI development—a reality recently highlighted by Gartner. That explains why many are turning to AI packages—a collection of tools designed to streamline AI and machine learning development—enabling them to unlock more cost-effective, efficient, and feasible solutions. 

In this post, we explore the top 10 most popular AI packages of 2024 based on research of millions of cloud assets by the Orca Research Pod.

What are AI packages?

Put simply, AI packages make it easier, faster, and more cost-effective to build custom AI or machine learning (ML) applications. AI packages contain a collection of pre-developed modules and tools that enable developers to create, train, and deploy AI models without developing brand new routines. 

AI packages work together with AI services and AI models. AI services are offerings from cloud providers that allow businesses to use or build AI applications in their cloud environments. In contrast, AI models, on the other hand, are programs designed to autonomously perform a specific task based on learned patterns and relationships. AI packages accelerate the training of AI models.

Most used AI packages 

Below, we review the 10 most popular AI packages used in cloud environments and present an overview of their features and benefits. 

#1. scikit-learn

Analyzing the most popular AI packages, scikit-learn claims the top spot, with 43% of organizations deploying it in their cloud environment(s). The AI package is an open-source Python module built on top of the SciPy, NumPy, and matplotlib libraries. sckit-learn enables data scientists and engineers to create machine learning models. 

First released in February 2010, scikit-learn offers a variety of tools to assist with common tasks such as classification, regression, clustering, model selection, preprocessing, and much more. Simple and efficient, it offers a user-friendly interface and numerous machine learning algorithms, including supervised and unsupervised.

#2. NLTK

NLTK (Natural Language Toolkit) takes second place in our list, with 38% of organizations deploying the package. NLTK is an open-source platform used to analyze text and develop natural language processing (NLP) applications in Python. 

NLTK features user-friendly interfaces to text resources, as well as language processing libraries for parsing, classification, semantic reasoning, tokenization, and more. It also offers extensive support documentation and an active online community, together making it popular among data scientists, linguists, engineers, and others. 

#3. PyTorch

PyTorch rounds out the top three most popular AI packages, with 31% of organizations deploying it. PyTorch is a popular open-source machine learning framework based on the Torch library—an open-source ML library. 

First developed by Meta AI and now part of the Linux Foundation, the PyTorch framework is written in Python and enables machine learning developers and data scientists to build deep learning models. PyTorch simplifies the creation of machine learning programs known as artificial neural networks. It remains a popular choice among data scientists for its rapid prototyping and support for an extensive range of deep learning models.

#4. TensorFlow

Next up is TensorFlow, with 26% of organizations deploying it. TensorFlow is an open-source library for machine learning and deep learning projects. 

Developed by Google, TensorFlow can run on a variety of common hardware platforms and operating environments. Organizations can use the platform to develop models for natural language processing, image recognition, computational-based simulations, and more. TensorFlow is a popular choice for its range of applications and ability to simplify the creation of ML and deep learning projects.

#5. Transformers 

Transformers, created by Hugging Face, takes fifth place with 23% of organizations using it. Transformers is a Python library that enables developers to download and customize pretrained models from an extensive database. 

Transformers provides APIs to download, use, fine-tune, and share the pretrained models, which can be used across multiple modalities or a combination of them. For example, these models can work with text (classification, information extraction, Q&A, etc.), images (classification, segmentation, object detection, etc.), and/or audio (classification, speech recognition, etc.). Transformers integrates with Jax, PyTorch, and TensorFlow. 

#6. LangChain

Trailing closely behind Transformers, LangChain finishes at sixth, with 22% of organizations using it. LangChain is an open-source framework that enables developers to build applications using large language models (LLMs). 

LangChain provides various libraries, templates, and tools to help developers more easily and effectively create custom applications powered by LLMs. Its popularity comes from its flexibility, simplicity, and support. For example, it enables organizations to tailor existing LLMs to perform specific applications without the need for fine-tuning or retraining. It also offers access to extensive tools for developers and an active online community.

#7. CUDA

CUDA (Compute Unified Device Architecture) takes the seventh spot, with 20% of organizations deploying it. Created by NVIDIA, CUDA is a parallel computing platform designed to enable developers to create applications that run on NVIDIA graphic processing units (GPUs). 

GPUs play an important role in AI development, accelerating the training of AI applications, their inference speed, and more. CUDA is a popular option for its scalability, ease of use, integrations, and support.

#8. Keras

With 19% of organizations using it, Keras lands in eighth place. Keras is an open-source neural network library that developers use to create deep learning models. 

Python-based and developed by Google, Keras is an API that works with other AI packages, such as TensorFlow, PyTorch, Jax, and more. While slower than other packages, Keras provides a frontend that makes it easier to learn and use.

#9. PyTorch Lightning

Following closely behind, PyTorch Lightning takes ninth place with 18% of organizations deploying it. PyTorch Lightning is an open-source Python library that provides a frontend interface for PyTorch.

PyTorch Lightning is a framework for developing deep learning models. It organizes PyTorch code to simplify and enhance the process, giving developers greater flexibility of projects, better reproducibility and readability, and more.

#10. Streamlit

Streamlit rounds out the top 10, with 11% of organizations using it. Streamlit is an open-source Python-based library that enables data scientists and machine learning engineers to create dynamic data applications without expertise in web development. 

Streamlit offers fast prototyping of applications and the ability to edit scripts and see changes to the live application in real-time.

2024 State of AI Security Report

Data about the top 10 AI packages was published in Orca’s 2024 State of the AI Security Report. In addition to this data, as well as usage stats on AI services and AI models, the report reveals the top AI security risks discovered in actual production environments spanning AWS, Azure, Google Cloud, Oracle Cloud, and Alibaba Cloud. Read more in the State of AI Security blog or download the full report.

AI packages and the Orca Cloud Security Platform

Earlier this year, Orca unveiled its AI Security Posture Management (AI-SPM) solution, which provides full visibility into and comprehensive risk detection across more than 50 AI packages and models, including those explored in this post.

In addition to its AI-SPM capabilities, the Orca Platform also offers its AI-Driven Remediation and AI-Driven Search solutions. The former solution simplifies and streamlines remediations for critical risks, providing on-demand instructions and code tailored to your specific remediation solution and process. The latter solution makes it easier to understand risks and resources across your cloud environment by asking questions in plain language. With the integration of GPT-4o in July 2024, Orca now supports plain language queries in more than 50 languages. 

To learn more about how Orca can help you fortify your AI security in the cloud, schedule a personalized 1:1 demo