There has been a proliferation in the amount of data generated in recent years, from large regulatory datasets available to supervisors to text created via social media. Many of these newer sources of data have properties, such as high dimensionality, which require analytical methods different from the standard econometrics toolkit. Fortunately, as new data sources have come on the scene, new techniques have also. For example, machine learning or natural language processing.
In 2014, the Bank of England established the Advanced Analytics (AA) division to tap into these novel data sources and use state-of-the-art techniques. Here we preview a few cases where we have used new methods and advanced analytics, as a taster of our panel session at the ESRC Research Methods Festival1.
Machine learning (ML) is a set of approaches to model complex relations within data, often becoming better as the quantity of available data increases. These often come with fewer assumptions regarding the data, such as distributional properties, e.g. normality, which traditional techniques require. In a recent paper2 we review the most common models and demonstrate how they can be applied by central banks. In one of our case studies, we assess the relative performance of different predictive models in forecasting consumer price inflation3 , which is a central task with respect to the Bank of England’s objective of maintaining price stability. We used a simple lead-lag approach to predict changes in inflation using a set of explanatory macroeconomic variables such as the unemployment rate, Bank rate, and changes in monetary aggregates, among others. One of the best performing models was the support vector machine (SVM). The idea behind SVMs is to find a subset of observations, the support vectors, which can be used to describe the target variable, in this case, inflation. Often a mathematical trick is applied to identify the support vectors within a transformed space. This clever approach makes the model highly flexible but also sparse in this small-data example, explaining its good performance.
However, ML is no panacea and some of the limitations relate to modelling in general, some specifically to ML models. The forecasting performance of the SVM, like that of all models, dropped significantly after the global financial crisis of 2008-09 (GFC). This can be explained by the crisis producing patterns in the data which models had not seen before, and therefore could not learn. This also relates to the black-box nature of ML models, where it is harder to understand the relation between inputs and output than, for example, with standard linear models.
In another paper4, we have used Natural language processing (NLP) techniques to analyse letters sent by the Bank of England’s Prudential Regulation Authority (PRA) to the banks and building societies it supervises. Our aim was to understand how the PRA varies its writing style depending on who the letters are sent to. We identified the distinguishing textual indicators using a ML algorithm called random forests, which deals effectively with high-dimensional data. Our results indicated that riskier firms typically receive letters that are overall more linguistically complex and more negative in sentiment.
ML is not the only class of methods that is well-suited but relatively new to central bank policy analysis. In the wake of the GFC, large regulatory data sources became available particularly from previously opaque financial markets. A pilot project5 investigated the bilateral network structure of a subset of the foreign exchange derivatives markets, some of the largest markets ever created as measured by nominal values of transactions. We find that these markets have a highly concentrated, multi-layered network structure. Investigating an external shock in the euro Swiss franc market, we examine its impact on the granular structure and overall connectivity of the market.
Another domain of growing importance in economic studies is computational analysis. Particularly, computational agent-based models (ABMs) are growing in popularity. ABMs are often based on simple behavioural rules guiding the interaction of individual agents and typically require fewer assumptions about aggregation or reversion to equilibrium than traditional macro models. They often suggest that simple micro behaviour can lead to unexpectedly complex macro outcomes. In another recent paper6,we built a heterogeneous agent-based model of the corporate bond market, calibrated against US data. This allowed us to gauge the impact of different bond trading strategies on liquidity and yield and to assess conditions under which large yield dislocations are relatively likely.
All in all, we hope we could give a flavour of how modern data analytics can help a policy institution such as the Bank of England to better gauge the economy and to take appropriate policy decisions.
3 The data and code are available on GitHub.
4 Bholat, D., Brookes, J., Cai, C., Grundy, K., Lund, J., (2017) Sending firm messages: text mining letters from PRA supervisors to banks and building societies they regulate, Staff Working Paper No. 688, Bank of England
5 Cielinska, O., Joseph, A., Shreyas, U., Tanner, J., Vasios, M., (2017) Gauging market dynamics using trade repository data: the case of the Swiss franc de-pegging, Financial Stability Paper No. 41, Bank of England
Submitted by David Bholat, James Brookes, Chiranjit Chakraborty, Andreas Joseph, Alice Owen, Eryk Walczak, Bank of England on Friday, 6th April 2018