Expand description
Parquet data catalog for efficient storage and retrieval of financial market data.
This module provides a comprehensive data catalog implementation that uses Apache Parquet format for storing financial market data with object store backends. The catalog supports various data types including quotes, trades, bars, order book data, and other market events.
§Key Features
- Object Store Integration: Works with local filesystems, S3, and other object stores
- Data Type Support: Handles all major financial data types (quotes, trades, bars, etc.)
- Time-based Organization: Organizes data by timestamp ranges for efficient querying
- Consolidation: Merges multiple files to optimize storage and query performance
- Validation: Ensures data integrity with timestamp ordering and interval validation
§Architecture
The catalog organizes data in a hierarchical structure:
data/
├── quotes/
│ └── INSTRUMENT_ID/
│ └── start_ts-end_ts.parquet
├── trades/
│ └── INSTRUMENT_ID/
│ └── start_ts-end_ts.parquet
└── bars/
└── INSTRUMENT_ID/
└── start_ts-end_ts.parquet
§Usage
use std::path::PathBuf;
use nautilus_persistence::backend::catalog::ParquetDataCatalog;
// Create a new catalog
let catalog = ParquetDataCatalog::new(
PathBuf::from("/path/to/data"),
None, // storage_options
Some(5000), // batch_size
None, // compression (defaults to SNAPPY)
None, // max_row_group_size (defaults to 5000)
);
// Write data to the catalog
// catalog.write_to_parquet(data, None, None)?;
Structs§
- Parquet
Data Catalog - A high-performance data catalog for storing and retrieving financial market data using Apache Parquet format.
Traits§
- Catalog
Path Prefix - Trait for providing catalog path prefixes for different data types.