Version: latest

Backtest (high-level API)

Tutorial for PoseiTrader a high-performance algorithmic trading platform and event driven backtester.

Overview

This tutorial walks through how to use a BacktestNode to backtest a simple EMA cross strategy on a simulated FX ECN venue using historical quote tick data.

The following points will be covered:

How to load raw data (external to Posei) into the data catalog
How to set up configuration objects for a BacktestNode
How to run backtests with a BacktestNode

Prerequisites

Python 3.11+ installed
JupyterLab or similar installed (pip install -U jupyterlab)
PoseiTrader latest release installed (pip install -U posei_trader)

Imports

We'll start with all of our imports for the remainder of this tutorial.

                                import shutil
from decimal import Decimal
from pathlib import Path

import pandas as pd

from posei_trader.backtest.node import BacktestDataConfig
from posei_trader.backtest.node import BacktestEngineConfig
from posei_trader.backtest.node import BacktestNode
from posei_trader.backtest.node import BacktestRunConfig
from posei_trader.backtest.node import BacktestVenueConfig
from posei_trader.config import ImportableStrategyConfig
from posei_trader.core.datetime import dt_to_unix_nanos
from posei_trader.model import QuoteTick
from posei_trader.persistence.catalog import ParquetDataCatalog
from posei_trader.persistence.wranglers import QuoteTickDataWrangler
from posei_trader.test_kit.providers import CSVTickDataLoader
from posei_trader.test_kit.providers import TestInstrumentProvider

                              

As a once off before we start the notebook - we need to download some sample data for backtesting.

For this example we will use FX data from histdata.com. Simply go to https://www.histdata.com/download-free-forex-historical-data/?/ascii/tick-data-quotes/ and select an FX pair, then select one or more months of data to download.

Example of dowloaded files:

DAT_ASCII_EURUSD_T_202410.csv (EUR\USD data for month 2024-10)
DAT_ASCII_EURUSD_T_202411.csv (EUR\USD data for month 2024-11)

Once you have downloaded the data:

copy files like above into one folder - for example: ~/Downloads/Data/ (by default, it will use the users Downloads/Data/ directory.)
set the variable DATA_DIR below to the directory containing the data.

                                DATA_DIR = "~/Downloads/Data/"

                              

                                path = Path(DATA_DIR).expanduser()
raw_files = list(path.iterdir())
assert raw_files, f"Unable to find any histdata files in directory {path}"
raw_files

                              

Loading data into the Parquet data catalog

The FX data from histdata is stored in CSV/text format, with fields timestamp, bid_price, ask_price. Firstly, we need to load this raw data into a pandas.DataFrame which has a compatible schema for Posei quotes.

Then we can create Posei QuoteTick objects by processing the DataFrame with a QuoteTickDataWrangler.

                                # Here we just take the first data file found and load into a pandas DataFrame
df = CSVTickDataLoader.load(
    file_path=raw_files[0],                                   # Input 1st CSV file
    index_col=0,                                              # Use 1st column in data as index for dataframe
    header=None,                                              # There are no column names in CSV files
    names=["timestamp", "bid_price", "ask_price", "volume"],  # Specify names to individual columns
    usecols=["timestamp", "bid_price", "ask_price"],          # Read only these columns from CSV file into dataframe
    parse_dates=["timestamp"],                                # Specify columns containing date/time
    date_format="%Y%m%d %H%M%S%f",                            # Format for parsing datetime
)

# Let's make sure data are sorted by timestamp
df = df.sort_index()

# Preview of loaded dataframe
df.head(2)

                              

                                # Process quotes using a wrangler
EURUSD = TestInstrumentProvider.default_fx_ccy("EUR/USD")
wrangler = QuoteTickDataWrangler(EURUSD)

ticks = wrangler.process(df)

# Preview: see first 2 ticks
ticks[0:2]

                              

See the Loading data guide for further details.

Next, we simply instantiate a ParquetDataCatalog (passing in a directory where to store the data, by default we will just use the current directory). We can then write the instrument and tick data to the catalog, it should only take a couple of minutes to load the data (depending on how many months).

                                CATALOG_PATH = Path.cwd() / "catalog"

# Clear if it already exists, then create fresh
if CATALOG_PATH.exists():
    shutil.rmtree(CATALOG_PATH)
CATALOG_PATH.mkdir(parents=True)

# Create a catalog instance
catalog = ParquetDataCatalog(CATALOG_PATH)

# Write instrument to the catalog
catalog.write_data([EURUSD])

# Write ticks to catalog
catalog.write_data(ticks)

                              

Using the Data Catalog

Once data has been loaded into the catalog, the catalog instance can be used for loading data for backtests, or simply for research purposes. It contains various methods to pull data from the catalog, such as .instruments(...) and quote_ticks(...) (shown below).

                                # Get list of all instruments in catalog
catalog.instruments()

                                # See 1st instrument from catalog
instrument = catalog.instruments()[0]
instrument

                              

                                # Query quote-ticks from catalog
start = dt_to_unix_nanos(pd.Timestamp("2024-10-01", tz="UTC"))
end =  dt_to_unix_nanos(pd.Timestamp("2024-10-15", tz="UTC"))
selected_quote_ticks = catalog.quote_ticks(instrument_ids=[EURUSD.id.value], start=start, end=end)

# Preview first
selected_quote_ticks[:2]

                              

Add venues

                                venue_configs = [
    BacktestVenueConfig(
        name="SIM",
        oms_type="HEDGING",
        account_type="MARGIN",
        base_currency="USD",
        starting_balances=["1_000_000 USD"],
    ),
]

                              

Add data

                                str(CATALOG_PATH)

                              

                                data_configs = [
    BacktestDataConfig(
        catalog_path=str(CATALOG_PATH),
        data_cls=QuoteTick,
        instrument_id=instrument.id,
        start_time=start,
        end_time=end,
    ),
]

                              

Add strategies

                                strategies = [
    ImportableStrategyConfig(
        strategy_path="posei_trader.examples.strategies.ema_cross:EMACross",
        config_path="posei_trader.examples.strategies.ema_cross:EMACrossConfig",
        config={
            "instrument_id": instrument.id,
            "bar_type": "EUR/USD.SIM-15-MINUTE-BID-INTERNAL",
            "fast_ema_period": 10,
            "slow_ema_period": 20,
            "trade_size": Decimal(1_000_000),
        },
    ),
]

                              

Configure backtest

Posei uses a BacktestRunConfig object, which enables backtest configuration in one place. It is a Partialable object (which means it can be configured in stages); the benefits of which are reduced boilerplate code when creating multiple backtest runs (for example when doing some sort of grid search over parameters).

                                config = BacktestRunConfig(
    engine=BacktestEngineConfig(strategies=strategies),
    data=data_configs,
    venues=venue_configs,
)

                              

Run backtest

Now we can run the backtest node, which will simulate trading across the entire data stream.

                                node = BacktestNode(configs=[config])

results = node.run()
results

Overview​

Prerequisites​

Imports​

Loading data into the Parquet data catalog​

Using the Data Catalog​

Add venues​

Add data​

Add strategies​

Configure backtest​

Run backtest​