Overview

ShannonBase AutoML is a built-in, database-native automated machine learning subsystem designed to make advanced analytics and predictive modeling a first-class capability of the SQL engine. Unlike external ML pipelines, AutoML is tightly integrated with ShannonBase’s execution layer, metadata system, and storage engine, enabling models to be trained, managed, and executed directly inside the database.

The primary goals of ShannonBase AutoML are: (1) eliminate data movement between database and ML systems; (2) lower the barrier to applying machine learning using SQL; (3) automatically select efficient training and execution paths based on workload and data characteristics; and (4) provide predictable, production-ready ML inference with low operational overhead.

Conceptually, AutoML acts as an intelligent layer that bridges relational data processing and model-driven computation, transforming ShannonBase into an AI-native SQL engine rather than a traditional database with bolt-on ML features.

Core Components

AutoML Manager

The AutoML Manager is the central coordinator responsible for orchestrating the full lifecycle of machine learning models inside ShannonBase. It operates as a long-lived service within the database engine and integrates deeply with the SQL parser, optimizer, and execution framework.

Managing model metadata and lifecycle (train, load, unload, drop)
Scheduling and executing training jobs inside the database
Selecting appropriate execution backends (CPU, vectorized engine, IMCS)
Coordinating inference during SQL query execution

Key global state:

Model registry and versioned metadata
Feature schema and column mappings
Training and inference execution statistics
Loaded model cache and memory usage tracking

Model Metadata and Feature Mapping

Each AutoML model maintains a structured metadata description that binds relational columns to model features. This allows ShannonBase to automatically validate schemas, handle column evolution, and ensure that inference remains correct even as tables change over time.

Feature list and data types
Label definition and task type (classification / regression)
Training parameters and hyperparameters
Model format (e.g. LightGBM, ONNX)

SQL-Native Workflow

ShannonBase AutoML exposes its functionality entirely through SQL extensions, allowing users to train and apply models using familiar database semantics.

ML_TRAIN: trains a model directly from SQL query results
ML_LOAD: loads a trained model into the execution engine
ML_PREDICT: performs inference as part of a SQL query
...

During query execution, prediction functions are treated as first-class operators and can participate in query planning, vectorized execution, and pushdown into the Rapid Engine when applicable.

Execution and Optimization

AutoML models are executed using a tightly optimized inference path. Depending on the model type and deployment configuration, inference may run: (1) inside the MySQL execution engine; (2) within Rapid’s columnar execution layer; or (3) using vectorized or SIMD-accelerated operators.

The optimizer can reason about ML operators similarly to relational operators, enabling:

Predicate pushdown with ML predictions
Batch inference over columnar data
Cost-aware model execution planning

Summary

ShannonBase AutoML elevates machine learning to a native database capability, eliminating the traditional boundary between data storage and intelligent computation. By embedding model training and inference directly into the SQL engine, it enables: (1) zero-copy ML workflows; (2) simplified production deployment; and (3) adaptive, high-performance analytics at scale.

In essence, AutoML turns ShannonBase into a unified system where data, queries, and models coexist and evolve together—laying the foundation for a truly AI-native database platform.