Overview
ShannonBase AutoML is a built-in, database-native automated machine learning subsystem designed to make advanced analytics and predictive modeling a first-class capability of the SQL engine. Unlike external ML pipelines, AutoML is tightly integrated with ShannonBase’s execution layer, metadata system, and storage engine, enabling models to be trained, managed, and executed directly inside the database.
The primary goals of ShannonBase AutoML are: (1) eliminate data movement between database and ML systems; (2) lower the barrier to applying machine learning using SQL; (3) automatically select efficient training and execution paths based on workload and data characteristics; and (4) provide predictable, production-ready ML inference with low operational overhead.
Conceptually, AutoML acts as an intelligent layer that bridges relational data processing and model-driven computation, transforming ShannonBase into an AI-native SQL engine rather than a traditional database with bolt-on ML features.
Core Components
AutoML Manager
The AutoML Manager is the central coordinator responsible for orchestrating the full lifecycle of machine learning models inside ShannonBase. It operates as a long-lived service within the database engine and integrates deeply with the SQL parser, optimizer, and execution framework.
- Managing model metadata and lifecycle (train, load, unload, drop)
- Scheduling and executing training jobs inside the database
- Selecting appropriate execution backends (CPU, vectorized engine, IMCS)
- Coordinating inference during SQL query execution
Key global state:
- Model registry and versioned metadata
- Feature schema and column mappings
- Training and inference execution statistics
- Loaded model cache and memory usage tracking
Model Metadata and Feature Mapping
Each AutoML model maintains a structured metadata description that binds relational columns to model features. This allows ShannonBase to automatically validate schemas, handle column evolution, and ensure that inference remains correct even as tables change over time.
- Feature list and data types
- Label definition and task type (classification / regression)
- Training parameters and hyperparameters
- Model format (e.g. LightGBM, ONNX)
SQL-Native Workflow
ShannonBase AutoML exposes its functionality entirely through SQL extensions, allowing users to train and apply models using familiar database semantics.
ML_TRAIN: trains a model directly from SQL query resultsML_LOAD: loads a trained model into the execution engineML_PREDICT: performs inference as part of a SQL query...
During query execution, prediction functions are treated as first-class operators and can participate in query planning, vectorized execution, and pushdown into the Rapid Engine when applicable.
Execution and Optimization
AutoML models are executed using a tightly optimized inference path. Depending on the model type and deployment configuration, inference may run: (1) inside the MySQL execution engine; (2) within Rapid’s columnar execution layer; or (3) using vectorized or SIMD-accelerated operators.
The optimizer can reason about ML operators similarly to relational operators, enabling:
- Predicate pushdown with ML predictions
- Batch inference over columnar data
- Cost-aware model execution planning
Summary
ShannonBase AutoML elevates machine learning to a native database capability, eliminating the traditional boundary between data storage and intelligent computation. By embedding model training and inference directly into the SQL engine, it enables: (1) zero-copy ML workflows; (2) simplified production deployment; and (3) adaptive, high-performance analytics at scale.
In essence, AutoML turns ShannonBase into a unified system where data, queries, and models coexist and evolve together—laying the foundation for a truly AI-native database platform.