Training and Events
Text Learning Workshop
|London School of Economics and Political Science|
Professor Kenneth Benoit
24/04/2017 - 25/04/2017
London School of Economics, Houghton Street, London
View in Google Maps (WC2A 2AE)
Esti Sidley, 0207 955 6947, firstname.lastname@example.org
Text Analysis Using R
Who May Participate
* PhD students
Applicants should have some prior experience of programming in R and in text analysis, although the first day is pitched at an introductory level.
Once your application has been approved, we will send you a link to register. We will only book travel and accommodation for applicants once they have registered for this workshop. The application form for this workshop can be found at https://docs.google.com/forms/d/1-QlxJAPBkFJbVJji5w8lBC1oHPy7tNEGJtQptVvE7XA/edit
The closing date for applications is Wednesday 22nd March and registrations will close on Friday 31st March.
The workshop is not only free to attend, but also we will cover the cost of travel and accommodation up to £300. If you provide us with the details of your requirements, we will book flights and accommodations directly. Lunch and refreshments will be provided on both days and there will be a reception on the evening of the April 24th. Breakfast will be provided on the morning of 25th April for those people who stayed overnight on the 24th. We will only cover accommodation for the night of 24th April. If you require additional nights, we can book this for you but you will be responsible for covering those costs incurred.
We will cover how to format and input source texts, how to structure their metadata, and how to prepare them for analysis. This includes common tasks such as tokenisation, including constructing ngrams and "skip-grams", removing stopwords, stemming words, and other forms of feature selection. We show how to: get summary statistics from text, search for and analyse keywords and phrases, analyse text for lexical diversity and readability, detect collocations, apply dictionaries, and measure term and document associations using distance measures. Our analysis covers basic text-related data processing in the R base language, but most relies on the quanteda package (https://github.com/kbenoit/quanteda) for the quantitative analysis of textual data.
Day 2: Advanced Text Analysis Using R (25 April): 9am - 5pm (with coffee and refreshments at the start)
This day will cover more advanced text analysis using R, including more advanced methods, including how to pass the structured objects from quanteda into other text analytic packages for doing topic modelling, latent semantic analysis, regression models, and other forms of machine learning.
An illustrative workshop previously given can be viewed here https://github.com/kbenoit/ITAUR.
This workshop is supported by European Research Council grant ERC-2011-StG 283794-QUANTESS and the Social and Economic Data Science Unit at the LSE.
Intermediate (some prior knowledge)
Website and registration
Textual Analysis, R, tokenisation , feature selection , constructing ngrams , removing stopwords , topic modelling , latent semantic analysis , regression models , machine learning
Related publications and presentations