In recent years, technological improvements have enabled widespread adoption of big data as a core input into investment decisions. The ability to collect, process and deliver large quantities of data to clients has seen the rise of the alternative data phenomenon and nearly ~500 firms providing terabytes of data every year to investors seeking an edge.

The term ‘alternative data’ now means any data set that isn't available from existing pricing and fundamental data feeds, such as satellite images of car parks or credit card data. Rarely does pricing data cross over into the world of alternative data - until now. BMLL has built a capability to extract actionable insight from the full-depth order book data, which we believe qualifies as alternative data because it holds predictive power that is more statistically relevant than traditional alternative data sources.

Alternative pricing data

Using order book data to drive investment decisions was historically the domain of high frequency trading funds. Today, however, a whole host of hedge funds and investment managers want access to this capability. Yet it has been difficult to secure cost-effective access to either the data or analytics platform to derive their own alternative pricing data.

Whether consumed for systematic or quantamental strategies, alternative pricing data can provide a level of insight, which when combined with associated analytics, helps drive decisions in a way that was previously not possible. The insights that can be gleaned from analysis of order book data, for example, can give managers information on how other market participants are behaving. The metric of an order’s average resting time can give hedge fund managers information about how aggressive the market is in terms of trading. When this is combined with the number of orders at each price level, a clear picture of participants' trading appetite can be derived. This helps them understand how other participants are positioning themselves in the market and provides them additional input into their own trading strategies.

Obtaining predictability

The raw pursuit of data is useless, however, unless it can be turned into insight. And alternative data is only useful if it is predictive. One of the biggest challenges in analysing order book data is the sheer scale and size of it; a single order book on a given day can have a hundred million data points. Other potential difficulties centre on making sure the system is reliable, robust and scalable; quants and data scientists need to easily access the data via APIs and tools in order to gain insight without having to worry about data engineering and other complexities. In short, predictability is obtained by having good clean input data that is well stored and accessible.

For firms looking to build data parsing and storage capabilities, there exist a number of trade-offs, the outcome of which will impact a firm's ability to leverage their existing infrastructure to perform the required analysis. Firm's reliance on real-time data feeds as the single source of level 3 data necessary to derive alternative data, requires vast computational resources to store and process the data, which is not cost effective. Storage choices will also impact the usefulness of the data; deep, long-term storage is more cost effective as a back-up, although it is very costly to regularly move vast quantities of data into these facilities. Shorter-term memory banks are more expensive, but operationally more efficient.

Gleaning insights from granular order book data

New systems have now been developed which allow hedge funds and investment firms access to analytics extracted from this alternative data. We at BMLL, the award-winning data and analytics company, provide such systems by taking publicly available pricing data at the most granular level from 45 of the world’s largest exchanges and trading venues. The data is collected overnight and parsed into a proprietary, information-rich, harmonised format that is common across all the different venues. This process enables BMLL and its customers to perform analytics on that data across multiple trading venues with the same code, gaining insights they would otherwise not have had access to.

BMLL data is sourced directly from the exchanges - establishing a primary source of truth. The complex parsing and data harmonisation process ensures it is clean and consistent. Of course, multiple sources of data need to align in order to be useful in large-scale analysis, yet there are instances when reference identifier codes do not agree, which can break a model. For example, a subtle change to a company structure can change the identifier code, meaning the exchange data feeds show two different sources of truth - an extremely complicated scenario to untangle. When trying to perform scalable analysis over years of data, it is vital that the source of the truth is made discoverable, so as to enable an understanding of the agreement at any point in time. Otherwise huge and costly errors in an analysis could occur.

BMLL stores the raw, parsed and alternative data sets in the cloud, thereby safeguarding scalability. Using both manual and automated systematic approaches to assure clean data, as well as regular data quality control tests, BMLL makes sure that ultimately the data is delivered to the end user in an easy to use format and as transparent as possible. Users can then access that data warehouse and examine order book history at a nanosecond level of granularity, or access analytical data generated by BMLL itself to realise additional insight. BMLL offers clients access to that granular data and sets up proprietary and open source analytics libraries through three key products. Data Lab which gives access to the data and compute capabilities via a Jupyter notebook interface. DataViz is an application which offers clients the ability to view and interact with analytics via customisable dashboards; and DataFeed, created specifically for the hedge fund community, wherein clients can choose to get computed analytics bundles delivered via an API or an FTP.

Data-driven decision making

Due to the time and effort building such a mechanism involves, the trend to outsource the acquisition and analysis of alternative data has been gaining ground. For the hedge fund community, their expertise does not lie in the building of the platform, but rather about taking insights from alternative data and turning it into alpha and investment strategies.

We are currently seeing a big shift towards the prevalence of data-driven decision making across all aspects of financial markets. Now, there is increased accessibility to powerful tools that can analyse and to make these data-driven decisions. For those market participants keen to capitalise on the opportunities that alternative data has to offer, access to good quality data that actually provides insight to improve decision making is crucial - a factor that is only going to become more critical within the next five years.

Dr Elliot Banks

Chief Product Officer, BMLL

Elliot Banks is responsible for data science, product development and delivery, working closely with both clients and development teams to successfully deliver BMLL’s analytics and product suite. Prior to joining BMLL, Elliot served in commercial and technical roles, including within the private equity arm of Macquarie, and as a Faculty AI Data Science Fellow. Banks holds an MMath from the University of Cambridge, and a PhD in theoretical physics from Imperial College London.

We contributed this article to the “HFM EU Tech Winners Report” in September 2020. The full report can be downloaded here.