Putting data science in the service of social science

NCRM news
Carl Miller, Centre for the Analysis of Social Media, Demos

The rise of social media has been important; that is no great revelation. It has wrought profound social change, buffeted our institutions and altered, for many of us, our way of life. New identities, dialects, cultures, affiliations and movements have all bloomed and spread across the digital world, and spilled out of it into mainstream public life.

Back in 2012, we at Demos could see that social media was changing research too. The transfer of social activity onto digital spaces was ‘datafying’ social life. Huge new datasets were being routinely created that we saw as treasure troves of behavioural evidence: often very large, in real-time, rich, linked and unmediated. It was a massive new opportunity to learn about how people and society worked.

Unlocking these datasets presented an enormous challenge. The sheer scale of social media data also meant that conventional social research methods couldn’t cope. Powerful new analytical techniques - modelling, entity extraction, machine learning, algorithmic clustering - were needed to make sense of what was happening. However, the true challenge wasn’t a technological one alone. It was how to deploy the new tools of data science in the service of social science. Getting better at counting people is not the same as getting better at understanding them.

We established the Centre for the Analysis of Social Media that brought together social and policy researchers at Demos, and technologists from the University of Sussex with the explicit aim of confronting this challenge1. The first layer of the challenge has been the technology itself. The tools of big data analysis needed to be put into the hands of non-technical researchers: the subject matter experts who have long understood social science, and now needed to be able to do it in a new way. We built a technology platform, Method52, which allowed non-technical users to use a graphical user interface, and drag-and-drop components to flexibly conduct big data analytics, rather than be faced with a screen full of code2. Especially important was to make accessible a vitally important technique called natural language processing3. Coupled with machine learning, it is one of the crucial ways of understanding bodies of primarily text-based data (like Tweets or Facebook posts) that are too large to manually read.

However, any technology - even one that learns - is just a tool and the second layer has been to learn how to slot all the technology into a broader social scientific methodology. We’ve just concluded a major study with the pollsters Ipsos MORI, on how to use tools like natural language processing within a broader framework that stands up to social scientific scrutiny4. Much of this has been to develop a process of big data analysis that cares about the same things that social science cares about: the introduction of possible biases in how the data is sampled and collected; the non-representative skews in who uses social media; the danger of analyst pre-conceptions and bias in how the data is measured and handled; the difficulty of measuring at great scale the textured complex utterances of people in specific social contexts and the importance of interpreting the results in the light of the norms, cultures, languages and practices of social media itself5.

But even beyond this, the third layer has been get social science to govern the whole endeavour: the questions that are asked, the implications that are drawn, how the research is used, and, of course, the ethical frameworks that control its use.

The big data revolution will not slow down, it will only gather pace. The scales of data will only increase, and the technologies and techniques to harness data are becoming more capable and powerful at a bewildering rate. To my mind, this means that social science - qualitative as well as quantitative - has never been more important. It has never been more crucial to point out the inherent difficulties in studying people in all their messy and chaotic complexity, all the pitfalls of reducing human behaviour into something that can be counted and aggregated, and of how understanding society doesn’t stop with a series of raw metrics, however large they are.

1 More information on its work is available at: http://www.demos.co.uk/research-area/centre-for-analysis-of-social-media/
2 For more information on Method52, see Bartlett, J., Miller, C., Reffin, J., Weir, D., Wibberly, S., ‘Vox Digitas’ (Demos: 2014): http://www.demos.co.uk/files/Vox_Digitas_-_web.pdf?1408832211
3 For a further description of natural language processing, see Reffin, J., ‘Why Natural Language Processing is the most important technology you’ve never heard of’, Demos Quarterly 8, Sprint 2016, http://quarterly.demos.co.uk/article/issue-8/natural-language-processing-the-most-important-technology-youve-never-heard-of/
4 See ‘the wisdom of the crowd’, Ipsos MORI, https://www.ipsos-mori.com/ourexpertise/digitalresearch/sociallistening/wisdomofthecrowd.aspx
5 For more information on this work, see http://www.demos.co.uk/files/Road_to_representivity_final.pdf?1441811336

Further Reading
On the current work of the Centre for the Analysis of Social Media at Demos, http://www.demos.co.uk/research-area/centre-for-analysis-of-social-media/
A technology edition of Demos Quarterly, Issue 8, Spring 2016, http://quarterly.demos.co.uk