Saeed Amen / Founder of Cuemacro
https://www.cuemacro.com / saeed@cuemacro.com / @saeedamenfx / All material is copyright Cuemacro / 2020
tcapy is a transaction cost analysis library in Python and one of the first open source TCA library. Typically, internal solutions can cost hundreds of thousands of dollars to develop and maintain. Using external services to do TCA also often have a licence cost.
This notebook assumes that you have already
In this notebook, we show how we can do more involved TCA calculations with tcapy. We will show users how to construct Python based queries to:
We shall also show you how to do to use some of the lower level parts of tcapy.
For reference, the benchmark times quoted in the text are for the following hardware setup, which is designed to replicate a typical environment, where databases are running on a different system. In our case, we are running on Windows and WSL/Ubuntu on our primary machine. This setup is the best of both worlds, allowing us to use features like Celery and Dash/web GUI undex Linux, but still using Windows, so we could also interact with applications like Excel easily. Note, that the times we quote are from running once (in practice, we should take an average over many runs).
tcapy is running on
Databases are running on (Arctic/MongoDB 3.6.5 for market tick data and Microsoft SQL Server 14.0.2027)
Our first step is to add the tcapy library to our PYTHONPATH
. You may need to change these lines. Note, that in our setup, we are running the Jupyter notebook on Windows, and the backend Celery workers on Linux. If you are running everything in the same environment, the paths are likely to be same.
import sys
import os
windows_tcapy_path = 'e:/cuemacro/tcapy' # Windows platform
linux_tcapy_path = '/home/tcapyuser/cuemacro/tcapy' # Linux platform
local_test_data_path = '../test/resources/' # Windows platform
remote_test_data_path = '../test/resources/' # WSL drive
# Assuming the front end is on Linux
sys.path.insert(0, windows_tcapy_path)
If the Python path now includes tcapy, we can do all the imports from it, that we'll need later. We'll also import all the other Python libraries we'll use.
from collections import OrderedDict
# This is the entry point for our TCA analysis
from tcapy.analysis.tcaengine import TCAEngineImpl
# To construct the parameters for our TCA calculation
from tcapy.analysis.tcarequest import TCARequest
# To access trade and market data
from tcapy.data.databasesource import DatabaseSourceCSVBinary, \
DatabaseSourceDataFrame, DatabaseSourceDukascopy
# Import all the metrics and benchmarks we'll use
from tcapy.analysis.algos.benchmark import BenchmarkArrival, BenchmarkMarketSpreadToMid
from tcapy.analysis.algos.metric import MetricSlippage, MetricTransientMarketImpact
# To aggregate the TCA results
from tcapy.analysis.algos.resultsform import TimelineResultsForm, DistResultsForm, \
BarResultsForm, TableResultsForm, JoinTables
# To help display the output of a TCA calculation
from tcapy.vis.tcaresults import TCAResults
from tcapy.vis.report.tcareport import TCAReport
from tcapy.vis.displaylisteners import PlotRender
# General classes
from tcapy.conf.constants import Constants
from tcapy.util.mediator import Mediator
# Prevent requests from displaying debug messages for certain libraries
import logging
logging.getLogger("findatapy").setLevel(logging.WARNING)
logging.getLogger("requests").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
# For plotting later
import plotly.io as pio
# For interactive plots (but these disappear when we reopen Jupyter)
pio.renderers.default = "notebook"
# Using https://github.com/plotly/Kaleido to generate static plot images
# pio.renderers.default = "svg"
pio.renderers.default = "png"
import plotly
from chartpy import Chart, Style
# default size for Plotly charts we use
chart_width = 800
chart_height = 500
chart = Chart()
constants = Constants()
style = Style(width=chart_width, height=chart_height, scale_factor=-1, silent_display=True)
Let's specify the paths of the trade/order data CSVs in the test folder of tcapy, which are included in the package.
csv_trade_order_mapping = {'trade_df' : os.path.join(local_test_data_path, 'small_test_trade_df.csv'),
'order_df' : os.path.join(local_test_data_path, 'small_test_order_df.csv')}
# The test trade/order data is populated between 25 Apr 2017-05 Jun 2017
# with trades/orders for 'EURUSD', 'USDJPY' and 'EURJPY'
local_csv_trade_order_mapping = OrderedDict([('trade_df', os.path.join(local_test_data_path, 'small_test_trade_df.csv')),
('order_df', os.path.join(local_test_data_path, 'small_test_order_df.csv'))])
remote_csv_trade_order_mapping = OrderedDict([('trade_df', os.path.join(remote_test_data_path, 'small_test_trade_df.csv')),
('order_df', os.path.join(remote_test_data_path, 'small_test_order_df.csv'))])
We need to create the TCAEngineImpl
object, which is the entry point for dealing with TCA calculations.
tca_version = constants.tcapy_version
tca_engine = TCAEngineImpl(version=tca_version)
volatile_cache = Mediator.get_volatile_cache()
2020-07-27 15:16:53,331; INFO:tcapy.analysis.tcaengine: Init TCAEngine version: pro - Env: desktop_laptop_linux (tcaengine.py:53)
In this section, we show how to do TCA calculations with tcapy to
We'll also aggregate some of these results into a single DataFrame.
As a first step, let's specify our MetricSlippage
object. We also create a BenchmarkMarketSpreadToMid
object, which will take the market data and the then calculate the spread from the bid/ask to the mid. For those market data sources, without a bid/ask and only mid, we specify 1 bp as our outlier amount (if bid/ask exist, tcapy will ignore this given we've specified overwrite_bid_ask=False
.
metric_slippage = MetricSlippage()
benchmark_spread_to_mid = BenchmarkMarketSpreadToMid(overwrite_bid_ask=False, bid_mid_bp=1, ask_mid_bp=1)
Let's create a TableResultsForm
which will look at the slippage
of all the trades and pick out only those which are anomalous. The resulting DataFrame, will be ordered by the worst slippage.
# Display a table of all the anomalous trades by slippage (ie. outside bid/ask)
table_results_form = TableResultsForm(market_trade_order_list=['trade_df'],
metric_name='slippage',
# Order by the worst slippage
filter_by='worst_all',
# Only flag trades outside bid/ask
tag_value_combinations={'slippage_anomalous': 1.0},
# Only display a small number of fields
keep_fields=['executed_notional_in_reporting_currency', 'side'],
# Don't round any of the figures
round_figures_by=None)
Here we create a BarResultsForm
object to aggregate notional executed per broker.
# Get the total notional executed by broker (in reporting currency)
bar_results_form_notional = BarResultsForm(market_trade_order_list=['trade_df'], # Trade
# Aggregate by broker name
aggregate_by_field='broker_id',
# Analyse notional
metric_name='executed_notional_in_reporting_currency',
# Sum the notional
aggregation_metric='sum',
# No need for a multipler
scalar=1,
# Round to nearest unit
round_figures_by=0)
Now let's create a BarResultsForm
to calculate the average slippage per broker (weighted by trade size).
# Get average slippage per broker (weighted by notional)
bar_results_form_slippage = BarResultsForm(market_trade_order_list=['trade_df'],
# Aggregate the trades by broker_id
aggregate_by_field='broker_id',
# Do calculations on the slippage field
metric_name='slippage',
aggregation_metric='mean',
# Weight the mean by the amount executed in reported currency (usually USD)
weighting_field='executed_notional_in_reporting_currency',
# Multiply the slippage by 10000 (converting to bp)
scalar=10000.0,
# Round all output to 2 decimal places
round_figures_by=2)
Lastly, create a JoinTables
object to combine the total notional per broker and the associated slippage for each broker.
join_tables_notional_slippage = JoinTables(tables_dict={'table_name': 'jointables_broker_id',
# Fetch the following calculated tables
'table_list': ['bar_trade_df_executed_notional_in_reporting_currency_by/sum/broker_id',
'bar_trade_df_slippage_by/mean/broker_id'],
# Append to the columns of each table
'column_list': ['notional (rep cur)', 'slippage (bp)']})
Here we create a detailed TCARequest
loading our trade data from CSV files for a single ticker (in this case EURUSD). Given that we are downloading directly from Dukascopy, we shall set use_multithreading=False
as the current downloader is not thread safe. If we want to use multithreading/Celery, we can use other data sources, such as a database (eg. Arctic/MongoDB). In general, it is better to use a market data database, which has been prepopulated from an external source, rather than repeatedly calling an external source, which is very slow (in particular when accessing a long history).
Note, we are using the various Metric
, Benchmark
, ResultsForm
and JoinTables
objects which we have already defined earlier.
# We are conducting a TCA computation for trade/order data between 05 May-10 May 2017 for EURUSD
# The trade dataset is from CSV and the market data is downloaded from Dukascopy on the fly
tca_request = TCARequest(start_date='05 May 2017', finish_date='10 May 2017', ticker=['EURUSD'],
tca_type='detailed',
trade_data_store='csv', market_data_store='dukascopy',
trade_order_mapping=local_csv_trade_order_mapping,
metric_calcs=[metric_slippage],
results_form=[table_results_form,
bar_results_form_notional,
bar_results_form_slippage],
benchmark_calcs=[benchmark_spread_to_mid],
join_tables=[join_tables_notional_slippage],
use_multithreading=False)
Let's run the TCA calculation from our TCARequest
. The output is a dictionary of DataFrames (and Figures).
Mediator.get_volatile_cache().clear_cache()
# Dictionary of (mostly) dataframes as output from TCA calculation
dict_of_df = tca_engine.calculate_tca(tca_request)
2020-07-27 15:16:53,411; DEBUG:tcapy.analysis.tcamarkettradeloader: Start loading trade/data/computation (tcamarkettradeloader.py:241) 2020-07-27 15:16:53,414; DEBUG:tcapy.analysis.tcatickerloaderimpl: Get market and trade/order data for EURUSD from 2017-05-05 00:00:00+00:00 - 2017-05-10 00:00:00+00:00 (tcatickerloaderimpl.py:80) 2020-07-27 15:16:53,416; DEBUG:tcapy.data.volatilecache: Attempting to get list from cache: ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] (volatilecache.py:540) 2020-07-27 15:16:53,419; DEBUG:tcapy.data.volatilecache: Attempting to get list from cache: ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] (volatilecache.py:540) 2020-07-27 15:16:53,597; DEBUG:tcapy.data.databasesource: Downloading 2017-05-05 00:00:00 - 2017-05-10 00:00:00 for EURUSD (databasesource.py:3819) 2020-07-27 15:17:01,925; WARNING:tcapy.analysis.algos.benchmark: mid not in market data (benchmark.py:90) 2020-07-27 15:17:01,926; DEBUG:tcapy.data.volatilecache: Attempting to push ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:426) 2020-07-27 15:17:02,035; DEBUG:tcapy.util.deltaizeserialize: Pandas dataframe of size: ----------- 17.08 MB ----------- in 1 chunk(s) (deltaizeserialize.py:219) 2020-07-27 15:17:02,088; DEBUG:tcapy.data.volatilecache: Now pushing ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:443) 2020-07-27 15:17:02,147; DEBUG:tcapy.data.volatilecache: Pushed ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:461) 2020-07-27 15:17:02,149; DEBUG:tcapy.data.volatilecache: Attempting to push ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:426) 2020-07-27 15:17:02,256; DEBUG:tcapy.util.deltaizeserialize: Pandas dataframe of size: ----------- 17.08 MB ----------- in 1 chunk(s) (deltaizeserialize.py:219) 2020-07-27 15:17:02,301; DEBUG:tcapy.data.volatilecache: Now pushing ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:443) 2020-07-27 15:17:02,367; DEBUG:tcapy.data.volatilecache: Pushed ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] to cache (volatilecache.py:461) 2020-07-27 15:17:02,369; DEBUG:tcapy.analysis.tcatickerloader: Get trade order holder for EURUSD from 2017-05-05 00:00:00+00:00 - 2017-05-10 00:00:00+00:00 (tcatickerloader.py:386) 2020-07-27 15:17:02,415; WARNING:tcapy.data.datafactory: Dataframe empty for ticker USDEUR (datafactory.py:97) 2020-07-27 15:17:02,417; DEBUG:tcapy.data.volatilecache: Attempting to get list from cache: ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] (volatilecache.py:540) 2020-07-27 15:17:02,828; WARNING:tcapy.data.datafactory: Dataframe empty for ticker USDEUR (datafactory.py:97) 2020-07-27 15:17:02,829; DEBUG:tcapy.data.volatilecache: Attempting to get list from cache: ['dukascopy_EURUSD_2017-05-05 00:00:00+00:00_2017-05-10 00:00:00+00:00_market_df_None_comp'] (volatilecache.py:540) 2020-07-27 15:17:02,897; DEBUG:tcapy.analysis.tcatickerloader: Filter the market date by start/finish date (tcatickerloader.py:761) 2020-07-27 15:17:02,904; DEBUG:tcapy.analysis.tcatickerloader: Combine trade/order data (tcatickerloader.py:782) 2020-07-27 15:17:02,915; DEBUG:tcapy.analysis.tcatickerloader: Calculating BenchmarkMarketSpreadToMid for market data (tcatickerloader.py:600) 2020-07-27 15:17:02,943; DEBUG:tcapy.analysis.tcatickerloader: Filter trades by venue (tcatickerloader.py:458) 2020-07-27 15:17:02,945; DEBUG:tcapy.analysis.tcatickerloader: Calculating derived fields and benchmarks (tcatickerloader.py:508) 2020-07-27 15:17:02,946; DEBUG:tcapy.analysis.tcatickerloader: Calculating execution fields (tcatickerloader.py:510) 2020-07-27 15:17:02,957; DEBUG:tcapy.analysis.tcatickerloader: Calculating benchmarks (tcatickerloader.py:536) 2020-07-27 15:17:02,958; DEBUG:tcapy.analysis.tcatickerloader: Calculating metrics (tcatickerloader.py:554) 2020-07-27 15:17:02,959; DEBUG:tcapy.analysis.tcatickerloader: Calculating MetricSlippage for trade_df (tcatickerloader.py:560) 2020-07-27 15:17:02,978; DEBUG:tcapy.analysis.tcatickerloader: Calculating MetricSlippage for order_df (tcatickerloader.py:560) 2020-07-27 15:17:02,998; DEBUG:tcapy.analysis.tcatickerloader: Completed derived field calculations for EURUSD (tcatickerloader.py:568) 2020-07-27 15:17:02,999; DEBUG:tcapy.analysis.tcatickerloaderimpl: Generating downsampled market data for potentional display (tcatickerloaderimpl.py:275) 2020-07-27 15:17:03,025; DEBUG:tcapy.analysis.tcatickerloaderimpl: About to join (tcatickerloaderimpl.py:340) 2020-07-27 15:17:03,029; DEBUG:tcapy.analysis.tcatickerloaderimpl: Finished joining (tcatickerloaderimpl.py:356) 2020-07-27 15:17:03,032; DEBUG:tcapy.analysis.tcatickerloaderimpl: About to join (tcatickerloaderimpl.py:340) 2020-07-27 15:17:03,035; DEBUG:tcapy.analysis.tcatickerloaderimpl: Finished joining (tcatickerloaderimpl.py:356) 2020-07-27 15:17:03,172; DEBUG:tcapy.analysis.tcamarkettradeloader: Finished loading data and calculating metrics on individual tickers (tcamarkettradeloader.py:248) 2020-07-27 15:17:03,174; DEBUG:tcapy.analysis.tcamarkettradeloaderimpl: Constructing results form to summarize analysis... (tcamarkettradeloaderimpl.py:74) 2020-07-27 15:17:03,214; DEBUG:tcapy.analysis.tcamarkettradeloaderimpl: Now join table results... (tcamarkettradeloaderimpl.py:122) 2020-07-27 15:17:03,220; DEBUG:tcapy.analysis.tcamarkettradeloaderimpl: Finished calculating results form and join table results! (tcamarkettradeloaderimpl.py:135)
Let's have a look inside the output of our TCA calculation.
print(dict_of_df.keys())
odict_keys(['trade_df', 'order_df', 'sparse_market_trade_df', 'sparse_market_order_df', 'market_df_downsampled', 'candlestick_fig', 'table_trade_df_slippage_by_worst_all', 'bar_trade_df_executed_notional_in_reporting_currency_by/sum/broker_id', 'bar_trade_df_slippage_by/mean/broker_id', 'jointables_broker_id', 'market_df'])
Below we explain what each of the keys in the dictionary output represent.
trade_df
- trade data, with additional calculated fieldsorder_df
- order data, with additional calculated fieldssparse_market_trade_df
- combined downsampled market data with trade datasparse_market_order_df
- combined downsampled market data with order datamarket_df_downsampled
- downsampled market datacandlestick_fig
- candlesticks of resampled market data as a Plotly Figuremarket_df
- this empty given default dummy_market=True
setWe also have various keys associated with the ResultForm
and JoinTable
objects we created.
table_trade_df_slippage_by_worst_all
- table with trades which have outlying slippagebar_trade_df_executed_notional_in_reporting_currency_by/sum/broker_id
- total notional by brokerbar_trade_df_slippage_by/mean/broker_id
- average slippage by brokerjointables_broker_id
- combining the notional and slippage tablesLet's use the TCAResults
object to simplify the output into an easier form and create Plotly Figure charts/HTML styled tables.
tca_results = TCAResults(dict_of_df, tca_request)
tca_results.render_computation_charts()
2020-07-27 15:17:03,248; DEBUG:tcapy.vis.displaylisteners: Plotting main timeline 2214 (displaylisteners.py:988) 2020-07-27 15:17:03,492; DEBUG:tcapy.vis.displaylisteners: Rendered plot 2214 and (displaylisteners.py:998) 2020-07-27 15:17:03,495; DEBUG:tcapy.vis.displaylisteners: Plotting main timeline 2208 (displaylisteners.py:988) 2020-07-27 15:17:03,550; DEBUG:tcapy.vis.displaylisteners: Rendered plot 2208 and (displaylisteners.py:998)
We can now access the DataFrames via properties like bar
, table
and jointables
.
print(tca_results.bar.keys())
print(tca_results.table.keys())
print(tca_results.join_tables.keys())
dict_keys(['trade_df_executed_notional_in_reporting_currency_by/sum/broker_id', 'trade_df_slippage_by/mean/broker_id']) dict_keys(['trade_df_slippage_by_worst_all']) dict_keys(['broker_id'])
Let's view the data related to the BarResultsForm
for the executed notional for each broker, firstly as a DataFrame and secondly as a bar chart by accessing the bar_charts
property of the TCAResults
object.
tca_results.bar['trade_df_executed_notional_in_reporting_currency_by/sum/broker_id']
broker_id | |
---|---|
broker_id_index | |
broker2 | 19743083.0 |
broker3 | 21398742.0 |
broker4 | 3586887.0 |
broker5 | 8986275.0 |
broker6 | 27237125.0 |
tca_results.bar_charts['trade_df_executed_notional_in_reporting_currency_by/sum/broker_id']
We can also show the DataFrame with the outlier trades, which are outside the bid/ask from Dukascopy data.
tca_results.table['trade_df_slippage_by_worst_all']
executed_notional_in_reporting_currency | side | slippage_benchmark | slippage | slippage_anomalous | |
---|---|---|---|---|---|
2017-05-08 10:31:33.011000+00:00 | 3.586887e+06 | 1 | 1.094680 | -0.000021 | 1.0 |
2017-05-05 14:56:49.373000+00:00 | 6.811655e+06 | 1 | 1.098875 | -0.000016 | 1.0 |
2017-05-08 14:37:25.382000+00:00 | 1.551467e+07 | 1 | 1.093750 | -0.000015 | 1.0 |
2017-05-05 06:27:26.514000+00:00 | 1.976319e+07 | -1 | 1.097445 | -0.000013 | 1.0 |
2017-05-09 11:18:39.206000+00:00 | 1.293143e+07 | -1 | 1.089700 | -0.000013 | 1.0 |
Avg | 1.463584e+07 | 0 | 1.094755 | -0.000014 | 1.0 |
We can view the DataFrame for the combination of total amount executed per broker and the slippage (weighted by notional).
tca_results.join_tables['broker_id']
broker_id notional (rep cur) | broker_id slippage (bp) | |
---|---|---|
broker2 | 19743083.0 | -0.14 |
broker3 | 21398742.0 | -0.11 |
broker4 | 3586887.0 | -0.21 |
broker5 | 8986275.0 | -0.10 |
broker6 | 27237125.0 | -0.12 |
We can see the styled table, is a string of HTML code within an HTML IFrame. We could write this to disk as an HTML file or display to a user.
tca_results.styled_join_tables['broker_id']
Iframe(height='300px', sandbox='', srcDoc='<style></style>\n<table border="0">\n <thead>\n <tr style="text-align: right;">\n <th></th>\n <th>broker_id notional (rep cur)</th>\n <th>broker_id slippage (bp)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>broker2</th>\n <td>19743083.0</td>\n <td>-0.14</td>\n </tr>\n <tr>\n <th>broker3</th>\n <td>21398742.0</td>\n <td>-0.11</td>\n </tr>\n <tr>\n <th>broker4</th>\n <td>3586887.0</td>\n <td>-0.21</td>\n </tr>\n <tr>\n <th>broker5</th>\n <td>8986275.0</td>\n <td>-0.10</td>\n </tr>\n <tr>\n <th>broker6</th>\n <td>27237125.0</td>\n <td>-0.12</td>\n </tr>\n </tbody>\n</table>', width='975px', style={'border': 'thin lightgrey solid', 'padding': '0', 'margin': '0'})
We have seen how to call tcapy programatically for more involved usage, in particular how to use it for compliance purposes.
If you are interesting supporting this project, such as sponsoring new features or to get commerical support, please contact saeed@cuemacro.com.