{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NCRM Text Data Workshop - part 3: Basic text data analysis\n", "#### Lewys Brace - l.brace@exeter.ac.uk\n", "\n", "### 1. What is NLP\n", "Natural Language Processing (NLP) is a sub-field combining linguistics and artificial intelligence. In a technical sense, it is the technology used to aid computers in understanding our natural language. It's ultimate aim is to read, decipher, understand, and make sense of the human languages in a manner that is valuable.\n", "\n", "#### NLP applications:\n", "- Personal voice assistants; i.e. Alexa, Google assist, etc.\n", "- Language translation apps; i.e. Google translate.\n", "- Spell and grammar check in Microsoft Office, etc.\n", "- Call centre interactive voice response systems.\n", "- It has become imperative for organisations to have a structure in place to mine actionable insights from the text being generated. From social media analytics, risk management and cybercrime protection, to automating everyday \"boring tasks\".\n", "\n", "#### The difficulties of NLP\n", "- NLP is not simple, but a lot of progress has been made in recent years.\n", "- It's difficult because of the nature of linguistic rules for the passing of information are not easy for computers to understand. For example:\n", " > The high-level absract rules of sarcasm.\n", " > Low-level rules, such as using \"s\" to denote plurals.\n", "- Thus, a comprehensive understanding the human language requires understanding both the words and how the concepts are connected to deliver the intended message.\n", "\n", "#### Mistakes happen\n", "- They happen and are natural.\n", "- For example, the biblical phrase below was once translated from English to Russain:\n", "
\n", " | position | \n", "sentiment | \n", "
---|---|---|
0 | \n", "professor (anthropology sociology)science tech... | \n", "-0.033333 | \n", "
1 | \n", "senior lecturer (sociology), director educatio... | \n", "0.033333 | \n", "
2 | \n", "senior lecturer (sociology)political religious... | \n", "0.000000 | \n", "
3 | \n", "lecturer data analysis | \n", "0.000000 | \n", "
4 | \n", "senior lecturer science technology studiesscie... | \n", "0.000000 | \n", "
... | \n", "... | \n", "... | \n", "
113 | \n", "honorary research fellow | \n", "0.000000 | \n", "
114 | \n", "professor | \n", "0.000000 | \n", "
115 | \n", "associate member | \n", "0.000000 | \n", "
116 | \n", "graduate research assistantanthropology sociol... | \n", "0.000000 | \n", "
117 | \n", "graduate research assistant | \n", "0.000000 | \n", "
118 rows × 2 columns
\n", "