Overview

Vectorized Table Scan is a core execution feature in ShannonBase Rapid Engine that replaces traditional row-by-row table scanning with a batch-oriented, column-aware execution model.

The VectorizedTableScanIterator integrates deeply with MySQL’s iterator framework while transparently leveraging Rapid’s columnar storage format. It processes data in batches, significantly improving CPU efficiency, cache locality, and memory bandwidth utilization.

This iterator forms the foundation for high-throughput analytical scans in ShannonBase, enabling OLAP-style execution performance without breaking MySQL compatibility.

Design Principles

  • Batch-based execution instead of row-at-a-time processing
  • Columnar data access with minimal materialization
  • Cache-aware batch sizing tuned to hardware characteristics
  • Adaptive runtime behavior based on observed performance
  • Full compatibility with MySQL Field and TABLE abstractions

By aligning execution granularity with modern CPU cache hierarchies and SIMD-friendly data layouts, Vectorized Table Scan dramatically reduces per-row overhead.

Core Components

VectorizedTableScanIterator

The iterator extends TableRowIterator and acts as a bridge between MySQL’s execution engine and Rapid’s columnar storage. It is responsible for orchestrating batch reads, column materialization, and row reconstruction.

  • RapidCursor-backed batch fetching from IMCS
  • ColumnChunk buffers for columnar batch storage
  • Active field caching to avoid repeated metadata lookups
  • Row reconstruction into MySQL Field structures

Each iterator instance maintains its own execution state, ensuring correctness and isolation across concurrent queries.

Batch Size Optimization

A key innovation of the Vectorized Table Scan is its dynamic batch sizing strategy. The initial batch size is computed using:

  • Estimated row size based on active fields
  • Expected number of rows from the optimizer
  • L3 cache size and cache-line alignment
  • Minimum SIMD-friendly vector width

During execution, the iterator continuously monitors batch execution latency. Every fixed number of batches, it adapts the batch size:

  • Large batches are reduced when latency grows too high
  • Batch size is increased when execution remains consistently fast

This feedback-driven approach ensures stable performance across diverse workloads and hardware environments.

Columnar Processing and Field Handling

Data is fetched from Rapid in columnar form and stored in per-column ColumnChunk buffers. For each batch, only the fields required by the query are populated.

Field materialization is type-aware:

  • Numeric fields use direct memory packing for minimal overhead
  • String and ENUM fields leverage dictionary encoding, translating compact identifiers into actual string values only when needed
  • NULL handling is performed per-column using null bitmaps

This design minimizes memory copying and avoids unnecessary decoding work, preserving the benefits of columnar storage throughout execution.

Execution Flow

  • Iterator initialization and Rapid cursor setup
  • Active field discovery and column buffer preallocation
  • Batch fetch from Rapid via next_batch()
  • Per-row reconstruction from column chunks
  • Adaptive batch tuning based on runtime metrics
  • Graceful handling of EOF and deleted rows

The iterator seamlessly integrates into MySQL’s execution pipeline, appearing as a standard row iterator to upper layers.

Performance Metrics and Adaptivity

Each iterator maintains detailed performance metrics, including:

  • Total rows and batches processed
  • Average batch execution time
  • Total read time
  • Error counts and retry behavior

These metrics drive adaptive behavior and also provide a solid foundation for future observability and execution introspection.

Summary

Vectorized Table Scan is a cornerstone of ShannonBase’s analytical execution engine. It transforms traditional MySQL table scans into a modern, batch-oriented, cache-efficient execution pipeline.

  • Delivers order-of-magnitude throughput improvements for scans
  • Preserves MySQL execution semantics and compatibility
  • Adapts dynamically to workload and hardware characteristics

In essence, this feature brings vectorized, columnar execution — long proven in analytical systems — directly into the heart of a MySQL-compatible engine, forming a critical building block for ShannonBase’s hybrid OLTP + OLAP vision.