Introduction to Text Processing and Natural Language Processing for Social Scientists

Date:

27/10/2017

Organised by:

NCRM, University of Southampton

Presenter:

Dr Juan Grigera, UCL Institute of the Americas

Level:

Entry (no or almost no prior knowledge)

Contact:

Dr Juan Grigera
j.grigera@ucl.ac.uk

Map:

View in Google Maps  (WC1H 0PN)

Venue:

UCL Institute of the Americas, 51 Gordon Square, London

Description:

This one day workshop is an entry level workshop for academics, particularly in humanities and social sciences.

This course is an introduction to basic Text Processing and Natural Language Processing (NLP_ techniques, targeted at anyone trying to begin working on the topic, particularly those coming from the humanities and the social sciences.

A quick survey of Text Processing will present different techniques to dealing with digital text and provide tools and concepts for building corporas. This will include web scraping, OCR and regular expressions.

NLP is a general term describing computer methods to process human language (i.e. natural, unlike ‘artificial’ programming languages that have a strict syntax and semantics).The course will include a conceptual presentation of the tools and possibilities and intend to showcase the theoretical issues and the practical possibilities of NLP. This course will mainly focus on parsing and understanding of natural languages and will survey the available tools (ready made and those available for use with R, Python and Java).

The course covers:

  • Basic text processing techniques: web scraping, OCR, regular expressions
  • NLP: A brief history of the field and of basic achievements and techniques of the structuralist phase (including concordance, dispersion plots, bigrams, collocations, frequency distributions, etc)
  • Text Analysis: Segmentation and tokenization. Regular Expressions, Chunking, part of Speech tagging, lemmantization, folding and stemming
  • Conceptual problems (word sense disambiguation, Pronoun resolution and coreferencing, Textual entailment)
  • Named Entity Recognition
  • Topic Models
  • Autoclassifying

By the end of the course participants will learn about:

  • Basic Text Processing techniques
  • Different approaches to NLP
  • A sample of the techniques available
  • The possible uses of NLP for different BigData and text analysis

Start: 10:00   End: 15:00

Cost:

Attendance is free of charge but registration is strictly required. For questions on eligibility or suitability, please refer to the entry on Participants above or contact Dr. Juan Grigera on j.grigera@ucl.ac.uk

Website and registration:

Region:

Greater London

Keywords:

ICT and Software, Natural Language Processing , Web Scraping , Digital Text

Related publications and presentations:

ICT and Software

Back to archive...