This document lists various MLOps technologies organized by category, along with relevant online courses for learning them.
Here is a categorized list of MLOps technologies with brief descriptions, compiled from various sources including Neptune.ai, DataCamp, and Awesome MLOps (GitHub). Below each technology, relevant online courses or tutorials are listed where found.
Google Cloud Vertex AI: Unified platform for ML development, deployment, and management on Google Cloud.
Amazon SageMaker: Fully managed service for building, training, and deploying ML models at scale on AWS.
Azure Machine Learning: Cloud-based environment for training, deploying, automating, managing, and tracking ML models on Azure.
Databricks Lakehouse Platform: Unified platform combining data warehousing and AI use cases on a lakehouse architecture. (Focus on MLOps aspects)
DataRobot: Enterprise AI platform automating the end-to-end process for building, deploying, and managing ML models.
Dataiku: Collaborative data science platform for building, deploying, and monitoring ML models.
Kubeflow: Open-source ML toolkit for deploying ML workflows on Kubernetes simply, portably, and scalably.
MLflow: Open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
ClearML: Open-source platform automating and simplifying MLOps, including experiment management, orchestration, and data management.
Valohai: MLOps platform focused on deep learning, providing machine orchestration and pipeline management.
cnvrg.io: An Intel company providing an OS for AI and ML development from research to production.
Iguazio (acquired by McKinsey): Data science platform for automating MLOps with real-time performance.
Domino Data Lab: Enterprise MLOps platform for model development, deployment, and management.
H2O AI Cloud: End-to-end platform for making, operating, and innovating with AI.
MLReef: Open source MLOps platform that helps you collaborate, reproduce and share your ML work.
Modzy: Platform to deploy, connect, run, and monitor machine learning models.
Neptune.ai: Metadata store for MLOps, built for research and production teams that run experiments.
Weights & Biases (W&B): MLOps platform for experiment tracking, data/model versioning, and model management.
Comet ML: Platform for tracking, comparing, explaining, and optimizing ML models and experiments.
MLflow Tracking: Component of MLflow for logging parameters, code versions, metrics, and output files. (See MLflow courses above)
TensorBoard: Visualization toolkit for TensorFlow, also usable with PyTorch and others for visualizing experiment metrics and model graphs.
DagsHub: Platform for data science collaboration and project management, built on open-source tools like Git, DVC, and MLflow.
Guild AI: Open-source tool for running, tracking, and comparing machine learning experiments.
Aim: Open-source, self-hostable AI metadata tracking tool.
DVC (Data Version Control): Open-source version control system for ML projects, handling large files, data sets, and models.
Pachyderm: Data versioning and pipeline platform built on Kubernetes, providing data lineage.
Delta Lake: Open-source storage layer bringing ACID transactions to Apache Spark and big data workloads.
Dolt: SQL database that you can fork, clone, branch, merge, push and pull like a git repository.
lakeFS: Open-source platform providing Git-like operations (versioning, branching, merging) for data lakes.
Git LFS (Large File Storage): Git extension for versioning large files.
Hub (Activeloop): Dataset format for AI, enabling rapid streaming and management of large datasets.
Labelbox: Training data platform for creating and managing labeled data for AI applications.
Scale AI: Provides high-quality training data for AI applications.
Courses/Tutorials:
Amazon SageMaker Ground Truth: Data labeling service within SageMaker for building highly accurate training datasets.
Appen: Platform providing data sourcing, annotation, and model evaluation for AI.
SuperAnnotate: Platform to build datasets, automate annotation, and manage data pipelines.
Label Studio: Open-source data labeling tool supporting various data types.
Snorkel AI: Programmatic data labeling platform using weak supervision.
Great Expectations: Open-source tool for data validation, documentation, and profiling.
TFDV (TensorFlow Data Validation): Library for analyzing and validating ML data, part of TensorFlow Extended (TFX).
Deequ: Library built on Apache Spark for defining, measuring, and monitoring data quality.
Soda Core: Open-source framework for data reliability engineering and data quality management.
Pandera: Data validation library for pandas dataframes.
Cleanlab: Python library for finding and fixing errors in datasets and ML models.
WhyLabs / Whylogs: Open-source standard for data logging & AI observability, enabling monitoring for data quality and drift.
Feast: Open-source feature store for managing, discovering, and serving features for ML models.
Tecton: Enterprise feature platform for automating the full lifecycle of features for operational ML.
Amazon SageMaker Feature Store: Fully managed repository to store, share, and manage features for ML models within SageMaker.
Courses/Tutorials:
Google Cloud Vertex AI Feature Store: Managed service for storing, serving, managing, and sharing ML features on Google Cloud.
Databricks Feature Store: Integrated feature store within the Databricks platform. (See Databricks courses above)
Hopsworks: Hybrid open/closed source platform with a feature store for ML.
Courses/Tutorials:
Molecula (acquired by FeatureBase): Feature store focused on real-time feature computation and serving.
Apache Airflow: Open-source platform to programmatically author, schedule, and monitor workflows.
Kubeflow Pipelines: Platform for building and deploying portable, scalable ML workflows based on Docker containers on Kubernetes. (See Kubeflow courses above)
Prefect: Modern data stack orchestration tool for building, running, and monitoring data pipelines.
Dagster: Data orchestrator for developing and maintaining data assets, focusing on the entire development lifecycle.
Courses/Tutorials:
Transform Data into Insights with Dagster and Deepnote (Udemy)
Dagster vs Airflow: Comparing Top Data Orchestration Tools for... (DataCamp Blog)
Data Pipeline Automation and Orchestration with Python (Pluralsight - General Orchestration)
What is Dagster? Asset Based Orchestration [2hr full course] (YouTube)
Metaflow: Human-friendly Python library for building and managing real-life data science projects.
Kedro: Python framework for creating reproducible, maintainable, and modular data science code.
Argo Workflows: Open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
Courses/Tutorials:
Flyte: Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes.
Courses/Tutorials:
ZenML: Extensible, open-source MLOps framework to create production-ready ML pipelines.
Seldon Core: Open-source platform for deploying ML models on Kubernetes at scale.
KServe (formerly KFServing): Standard Model Inferencing Platform on Kubernetes, built for serverless inference.
BentoML: Open-source framework for building reliable, scalable, and cost-efficient AI applications.
TensorFlow Serving (TF Serving): Flexible, high-performance serving system for ML models, designed for production environments.
TorchServe: Flexible and easy-to-use tool for serving PyTorch models.
NVIDIA Triton Inference Server: Open-source inference serving software that streamlines AI inferencing.
Ray Serve: Scalable and programmable serving library built on Ray for deploying ML models.
Cortex: Open-source platform for deploying, managing, and scaling ML models in production.
Bodywork: MLOps tool that deploys ML projects from Git repos onto Kubernetes.
Arize AI: ML observability platform for monitoring models in production, troubleshooting issues, and improving performance.
Fiddler AI: Model Performance Management platform providing explainability and monitoring for models in production.
Evidently AI: Open-source Python library for evaluating, testing, and monitoring ML models in production.
Arthur: ML performance monitoring platform ensuring model fairness, explainability, and performance.
Grafana: Open-source platform for monitoring and observability, often used with Prometheus for infrastructure and application metrics.
Prometheus: Open-source systems monitoring and alerting toolkit.
WhyLabs: AI observability platform built on the Whylogs open-source standard for monitoring data and models.
Superwise: Model observability platform for monitoring, analyzing, and optimizing ML models in production.
Aporia: Full-stack ML observability platform.
Deepchecks: Open-source Python package for testing and validating ML models and data.
Giskard: Open-source testing framework dedicated to ML models, from tabular to LLMs.
Robust Intelligence: Platform for ML integrity, providing testing and validation against security and operational risks.
AI Fairness 360 (AIF360): Open-source library with metrics to check for unwanted bias and algorithms to mitigate bias.
Fairlearn: Open-source Python package to assess and improve the fairness of ML models.
SHAP (SHapley Additive exPlanations): Game theoretic approach to explain the output of any machine learning model.
LIME (Local Interpretable Model-agnostic Explanations): Technique explaining the predictions of any classifier in an interpretable manner.
Alibi Explain: Open-source Python library focused on ML model inspection and interpretation.
InterpretML: Open-source package incorporating state-of-the-art machine learning interpretability techniques.
TensorFlow Privacy: Python library including implementations of commonly used privacy-enhancing techniques.
PySyft: Open-source library for secure and private Deep Learning.
OpenDP: Open-source project developing tools for privacy-preserving statistical analysis.
Kubernetes: Open-source system for automating deployment, scaling, and management of containerized applications.
Docker: Platform for developing, shipping, and running applications in containers.
Ray: Open-source framework providing a simple, universal API for building distributed applications.
Run:ai: Platform for AI infrastructure orchestration and management, optimizing GPU resource utilization.
Determined AI (acquired by HPE): Open-source deep learning training platform with experiment tracking, resource management, and hyperparameter tuning.
LangChain: Framework for developing applications powered by language models.
Qdrant: Open-source vector similarity search engine and vector database.
Pinecone: Managed vector database for high-performance similarity search.
Weaviate: Open-source vector database.
Milvus: Open-source vector database for embedding similarity search and AI applications.
Chroma: Open-source embedding database.
LlamaIndex: Data framework for LLM applications to ingest, structure, and access private or domain-specific data.
Haystack: Open-source framework for building applications with LLMs and Transformers.
AutoGluon: AutoML toolkit for deep learning, focusing on image, text, and tabular data.
Auto-Sklearn: Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
TPOT: Python Automated Machine Learning tool that optimizes ML pipelines using genetic programming.
H2O AutoML: Automates the ML workflow, including automatic training and tuning of models within the H2O platform.
FLAML: Fast and Lightweight AutoML library.
NNI (Neural Network Intelligence): Microsoft's open-source AutoML toolkit.
CML (Continuous Machine Learning): Open-source library for implementing CI/CD in ML projects using GitHub Actions or GitLab CI.
Jenkins: Open-source automation server widely used for CI/CD pipelines.
GitLab CI/CD: Integrated CI/CD capabilities within the GitLab platform.
GitHub Actions: CI/CD platform integrated within GitHub.
CircleCI: Cloud-based CI/CD platform.