Introduction to Data Science and Big Data Analytics

Date:

17/08/2015 - 28/08/2015

Organised by:

London School of Economics

Presenter:

Professor Kenneth Benoit, Dr Slava Mikhaylov

Level:

Entry (no or almost no prior knowledge)

Contact:

Tyrone Curtis, Programme Coordinator
+44 (0)20 7955 6422
summer.methods@lse.ac.uk

Map:

View in Google Maps  (WC2A 2AE)

Venue:

Houghton Street
London

Description:

Introduction to Data Science and Big Data Analytics

Data Science and Big Data Analytics are exciting new areas that combine scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. Good data science requires experts that combine substantive knowledge with data analytical skills, which makes it a prime area for social scientists with an interest in quantitative methods. This course integrates prior training in quantitative methods (statistics) and coding with substantive expertise and introduces the fundamental concepts and techniques of Data Science and Big Data Analytics.

Who is this course aimed at?
Typical students will be Masters and PhD students from any field requiring the fundamentals of data science or working with typically large datasets and databases. Practitioners from industry, government, or research organisations with some basic training in quantitative analysis or computer programming are also welcome. Because this course surveys diverse techniques and methods, it makes an ideal foundation for more advanced or more specific training. Our applications are drawn from social, political, economic, legal, and business and marketing fields, rather than engineering or other sciences.

Course benefits
This course provides participants with:

  • an understanding of the structure of datasets and databases, including "big data"
  • the ability to work with datasets and databases
  • an introduction to programming languages and basic skills in the R statistical program
  • the ability to analyse data using statistical and machine learning methods.

Prerequisites
An introduction to quantitative methods at any level would serve as a very useful foundation for this course, although no formal prerequisites are required. Familiarity with computer programming or database structures is a benefit, but not formally required.

Course outline
This course aims to provide an introduction to the data science approach to the quantitative analysis of data using the methods of statistical learning, an approach blending classical statistical methods with recent advances in computational and machine learning. We will cover the main analytical methods from this field with hands-on applications using example datasets, so that students gain experience with and confidence in using the methods we cover. We also cover data preparation and processing, including working with structured databases, key-value formatted data (JSON), and unstructured textual data. At the end of this course students will have a sound understanding of the field of data science, the ability to analyse data using some of its main methods, and a solid foundation for more advanced or more specialised study.

The course will be delivered as a series of morning lectures, followed by lab sessions in the afternoon where students will apply the lessons in a series of instructor-guided exercises using data provided as part of the exercises.

The course will cover the following topics:

  • an overview of data science and the challenge of working with big data using statistical methods
  • how to integrate the insights from data analytics into knowledge generation and decision-making
  • how to acquire data, both structured and unstructured, and to process it, store it, and convert it into a format suitable for analysis
  • the basics of statistical inference including probability and probability distributions, modelling, experimental design
  • an overview of classification methods and related methods for assessing model fit and cross-validating predictive models
  • supervised learning approaches, including linear and logistic regression, decision trees, and naïve Bayes
  • unsupervised learning approaches, including clustering, association rules, and principal components analysis
  • quantitative methods of text analysis, including mining social media and other online resources
  • social network analysis, covering the basics of social graph data and analysing social networks
  • data visualisation through a variety of graphs.

This course is offered as part of the LSE Methods Summer Programme, a summer school of intensive short courses in social science research methods for students, researchers and professionals. A number of social events will be held throughout the programme. Participants will be provided with a transcript and certificate upon completion of the course. For more information on the Methods Summer Programme, please visit our website at lse.ac.uk/methods. 

Cost:

Students: £1435
Academic/charity staff: £1930
Professionals: £2425

Website and registration:

Region:

Greater London

Keywords:

Big data analytics, Big data, Social media data, Statistical Theory and Methods of Inference, Regression Methods, Linear regression, Logistic regression, Confirmatory factor analysis, Cluster analysis, Data Mining, Machine learning, Python, R, Data Visualisation

Related publications and presentations:

Big data analytics
Big data
Social media data
Statistical Theory and Methods of Inference
Regression Methods
Linear regression
Logistic regression
Confirmatory factor analysis
Cluster analysis
Data Mining
Machine learning
Python
R
Data Visualisation

Back to archive...