Bite-sized

Day 1: Thursday, 12 September

-

How to automate meaning extraction from large text data?: Opportunities and insights from corpus linguistics

Session convener: Justyna Robinson, University of Sussex

Join us for a session in which we showcase a suite of corpus linguistic techniques for the analysis of large text data. Corpus linguistics involves the computational analysis of large text data. It assumes that meaning lives in the habitual patterns which words form with other words. For example, let us look at the noun date. When it co-occurs with words, such as today or 12th September 2024, the noun date most likely indicates a calendar meaning. When it co-occurs with restaurant, go, cinema, the noun date most likely indicates a social practice. Using corpus linguistics software allows us to extract those typical co-occurrences, measure relevant statistics, and provide empirical evidence as to the key meaning patterns in data. While we don't necessarily need corpus linguistics to tell us that the noun date has got distinct meanings, the method it exemplifies allows for a nuanced extraction of stances in the particular text or for a particular speaker to concepts of interest, such as (wo)man, immigrant, climate change. The session will be followed by a hands-on tutorial on corpus linguistics. We will conclude by a Q&A segment in which participants will be encouraged to reflect on how they could integrate such methods in their own research.