Module catalog

Source
Expand description

Parquet data catalog for efficient storage and retrieval of financial market data.

This module provides a comprehensive data catalog implementation that uses Apache Parquet format for storing financial market data with object store backends. The catalog supports various data types including quotes, trades, bars, order book data, and other market events.

§Key Features

  • Object Store Integration: Works with local filesystems, S3, and other object stores
  • Data Type Support: Handles all major financial data types (quotes, trades, bars, etc.)
  • Time-based Organization: Organizes data by timestamp ranges for efficient querying
  • Consolidation: Merges multiple files to optimize storage and query performance
  • Validation: Ensures data integrity with timestamp ordering and interval validation

§Architecture

The catalog organizes data in a hierarchical structure:

data/
├── quotes/
│   └── INSTRUMENT_ID/
│       └── start_ts-end_ts.parquet
├── trades/
│   └── INSTRUMENT_ID/
│       └── start_ts-end_ts.parquet
└── bars/
    └── INSTRUMENT_ID/
        └── start_ts-end_ts.parquet

§Usage

use std::path::PathBuf;
use nautilus_persistence::backend::catalog::ParquetDataCatalog;

// Create a new catalog
let catalog = ParquetDataCatalog::new(
    PathBuf::from("/path/to/data"),
    None,        // storage_options
    Some(5000),  // batch_size
    None,        // compression (defaults to SNAPPY)
    None,        // max_row_group_size (defaults to 5000)
);

// Write data to the catalog
// catalog.write_to_parquet(data, None, None)?;

Structs§

ParquetDataCatalog
A high-performance data catalog for storing and retrieving financial market data using Apache Parquet format.

Traits§

CatalogPathPrefix
Trait for providing catalog path prefixes for different data types.