Social media and prediction: Crime sensing, data integration and statistical modelling


Principal Investigator: Matthew Williams, Cardiff University

Co-Investigators: William Housley, Adam Edwards, Luke Sloan, Peter Burnap and Omer Rana (Cardiff University), Rob Procter (university of Warwick), Alex Voss (University of St Andrews)

Project duration: 1 April 2013 - 30 September 2014


The digital revolution is generating high-volume data through multiple forms of online behaviour. The global adoption of social media over the past half a decade has seen the expansion of ‘digital publics’ to an unprecedented level. Estimates put social media membership at approximately 2.5 billion non-unique users, with Facebook, Google+ and Twitter accounting for over half of these. These online populations produce hundreds of petabytes (one billion megabytes) of information, with Facebook users alone uploading 500 terabytes (five hundred million megabytes) of data daily. We propose to harvest, store, analyse and interpret a portion of this vast amount of data to interrogate the potential statistical link between social media updates (in this case tweets) that relate to crime and disorder and official rates of crime as recorded by the police in six London boroughs. The potential value added by social media data is that it is user-generated in realtime in voluminous amounts, and as such it can provide insight into the behaviour of populations on the move; the ‘pulse of the city’. This is in contrast to the necessarily retrospective snapshots of social trends and populations provided by conventional methods such as household surveys and officially recorded data.

To build social media predictive models the project investigators will adapt existing methods used by other researchers. For example, the investigators will adapt elements of the methodology used by Tumasjan et al. (2010) who measured Twitter sentiment in relation to candidates in the German general election concluding that this source of data was as accurate at predicting voting patterns as polls. The project investigators will also draw on the work of Asur and Huberman (2010) who correlated frequency and sentiment related to movies on Twitter with their revenue, claiming that this method of prediction was more accurate than the Hollywood Stock Market. Finally, the investigators will follow some of the techniques used by Sakaki et al. (2010) who found that the analysis of Twitter data produced estimates of the centres of earthquakes more accurately than conventional methods. These studies illustrate how social media generates naturally occurring socially relevant data that can be used to complement and augment conventional curated data to predict offline phenomena. In our project, we hypothesise that crime and disorder related tweets will be associated with actual crime rates. If proven correct, our statistical models based on naturally occurring social media data will provide an alternative to official constructions of the crime problem that are derived from curated and administrative data sources.

Through a series of work-packages this project will build on the investigators' previous research. The interdisciplinary team of social and computer scientists will bring together a range of expertise to develop new algorithms to enhance existing social media analysis tools, and to develop new tools for making sense of crime and disorder-related Twitter content. A key part of the project is to innovate with social media data, while also rigorously evaluating tools and methods. These evaluations will be communicated to academic and non-academic audiences with the aim of building the UK’s capacity to marshal big data to help address key social and economic problems.

Listen to Matthew Williams and Peter Burnap talk about their work in NCRM podcast series.


Crime sensing with big data: the affordances and limitations of using open-source communications to estimate crime patterns, British Journal of Criminology Advance Access, March 2016