Harmonised data to enhance alpha generation

We contributed this article to the latest HedgeWeek report Data Science In Focus 2020, published in September 2020.

Using order book data to drive investment decisions was historically the domain of high frequency trading hedge funds. However, as these datasets are being aggregated, harmonised and made searchable, a whole host of hedge funds and investment managers can seek to benefit from insights drawn from order book data to improve their back-testing techniques and enhance their alpha generation capabilities.

“In the past, certain types of high frequency trading firms would have gathered order book data and may have been able to use that in some ways. Now, however, there are systems which allow for analytics to be drawn out of this data. Different hedge funds and investment firms can use these analytics to generate insight in a way that they hadn’t been able to before,” comments Elliot Banks, Chief Product Officer at BMLL.

BMLL is a firm which provides such systems. The data and analytics company takes publicly available pricing data at the most granular level from 45 of the world’s largest exchanges and trading venues. The data is collected historically overnight, parsed into a harmonised format so its common across all the different venues. This process enables BMLL and its customers to perform analytics on that data and gain insights they would otherwise not have had access to.

Pull quote 1

Many investment firms already capture this data in real time. However, that information risks remaining unused unless it is arranged in a way which enables it to be analysed. Banks says: “Many firms we speak to have captured their live feed in a format that cannot be rebuilt or optimised into what we call a level 3 order book, that is an order book; which can be analysed, without an enormous amount of effort.”

Large collections of data are useless unless a portfolio manager can generate insight from them. The way BMLL helps solve this is by harmonising the data it gathers and providing analytics on that data. “We don’t remove any of the information, but we make it easy to act on. Managers can quickly get the order book data they need and build it into metrics which are useful and subsequently into insight,” Banks explains.

According to Banks, optimising the search function is a vital component of this process. He says that: “We make sure the data is easy to search. You can have perfectly clean data but if you don’t have a way of quickly finding the specific order book or the underlying security to understand how they link together in a meaningful way, then it’s impossible to actually utilise that data.”

The insights that can be gleaned from analysis of order book data can give managers information on how other market participants are behaving. For example, the metric of an order’s average resting time can give hedge fund managers information about how aggressive the market is in terms of trading. This helps them understand what other participants are doing and they can build that knowledge that into their investment strategies.

There are a variety of ways managers can use order book data and analytics. Banks outlines: “Having access to granular order book data can allow clients to fine-tune their back-testing processes. They can take data from us and start to really infer what market impact might affect them and their strategies. This way they can make sure their strategy is as true as it can be and their back testing is as accurate as possible.”

BMLL offers clients access to that granular data and sets up proprietary and open source analytics libraries through three key products. One is a data lab which gives access to the data via a Jupyter notebook interface. The other is a platform which offers clients the ability to generate analytics. Otherwise, clients can choose to simply get a data feed delivered via an API or an FTP.

Accessing order book data through a third party provides managers direct and instant access to the analytical and insight capabilities which can help them enhance their investment strength. This is particularly relevant to funds which are only just warming up to the notion of including data science in their investment process.

Pull quote 2

Build vs buy

“If you don’t have a data science function in-house, the barriers to entry to build a sophisticated,  productised data science platform are very high. We offer nascent funds a managed service which means a much lower point of entry into the use of data science. Especially compared to the cost and resource necessary to build a complex infrastructure, hiring data science professionals and lawyers to source the data,” notes Banks. This means firms which are not completely confident as to whether data science will add value to their investment process can test the waters and accelerate into the analytics and back-testing element order book data can offer.

Some investment firms have chosen to build this capability in-house rather than appoint a third party. Banks talks of the challenges this can present: “Building is complicated. You need to have data engineering pipelines, teams able to turn disparate datasets from different exchanges into a meaningful consistent harmonised dataset. Then you need teams to map out reference data to clean identifiers to make sure you can identify the different securities. Then on top of that you would need to build an analytics platform.”

He says BMLL overcame several hurdles when building its own system and investment firms planning to do this can expect to meet similar pitfalls. In Banks’ experience, one of the biggest challenges in order book data is the sheer scale and size of it: “A single order book on a given day can have millions of data points. For example, in the US there are more than 10 equity venues and each of those have thousands of securities listed. This is just one aspect in one geography and that alone is a huge amount of data to process.”

Other potential difficulties centre on things like scalability and making sure the system is reliable and robust: “You need to ensure you have APIs and tools which make it as easy as possible for your quants and data scientists to actually get their insight without having to worry about data engineering and other complexities,” Banks advises.

Due to the time and effort building such a mechanism involves, the trend to outsource this work has been gaining ground. Afterall, as Banks remarks: “For fund managers, the intellectual property is not in the building of the platform, it’s about taking these analytics and turning it into alpha and investment strategies. This is where their expertise lies.” 

Dr Elliot Banks
Chief Product Officer, BMLL


Elliot Banks is the Chief Product Officer at BMLL. Elliot is responsible for data science, product development and product delivery, working closely with both clients and development teams to deliver BMLL’s analytics and product suite to clients. Prior to joining BMLL, Elliot held a mixture of commercial and technical roles, including roles within the infrastructure private equity arm of Macquarie and as a Faculty AI Data Science Fellow.

Banks holds an MMath from the University of Cambridge, and a PhD in theoretical physics from Imperial College London.


We contributed this article to the latest HedgeWeek report "Data Science In Focus 2020", published in September 2020. 

The article is also available online at HedgeWeek.com