Practical approaches for web scraping for research – using Airbnb as an example data provider

Date:

22/06/2021

Organised by:

Urban Big Data Centre

Presenter:

Dr Andrew McHugh, Senior Data Science Manager, Urban Big Data Centre

Level:

Intermediate (some prior knowledge)

Contact:

Rhiannon Law, Business and Communications Officer, rhiannon.law@glasgow.ac.uk

video conference logo

Venue: Online

Description:

It is no exaggeration to say that the web is a fertile source of data - offering deep insights into people’s beliefs, opinions, transactions, movements and many other aspects of their lives.

For social science academics and data scientists, the UK’s legal environment appears (although not definitively) to provide opportunities to capture these data at scale in service of research goals.

Referencing UBDC’s project and open-source software platform to scrape short-term-let data from Airbnb, this webinar provides practical guidance on how researchers, technologists and data scientists can approach web scraping, from the selection of online sources to the planning, conceptualisation, governance, risk management and implementation of technical approaches.

Participants will receive practical training and code examples, developing an understanding not only of how scraping works but also how to systematise approaches to scale up data collection while avoiding common pitfalls.

Throughout the session, a series of practical examples will cover data scraping using UBDC’s established scraping method from Airbnb’s online platform.

Session format

The webinar will include a series of short talks. These will punctuate the main content of the webinar - a sequence of practical demonstrations with UBDC’s web scraping platform being introduced, installed, configured and deployed.

What you will learn

Following completion of this webinar attendees will be able to:

  • Describe and latterly recreate within their own environment the installation, configuration and deployment of UBDC’s open-source web scraping platform
  • Explain what web scraping means and describe the variety of approaches available
  • Outline the legal, ethical and data governance issues that must be considered when designing a web scraping project (including coverage of relevant intellectual property, contract and privacy law)
  • Summarise limitations on data use and sharing
  • Identify and select appropriate datasets for web scraping
  • Explain what an API (Application Programming Interface) is and query APIs to retrieve data
  • Explain how to negotiate technical barriers to scraping (including managing call limits and avoiding blacklisting)
  • Systematise, scale and optimise approaches, including planning for wide geographical or temporal data coverage

Who should attend

Academic researchers or technical support staff who are interested in learning how to capture online data systematically and at scale using web scraping techniques. The session’s core content is principally aimed at technologists/implementers charged with building web scraping systems.

Prior knowledge requirements

Attendees wishing to subsequently recreate the software deployment covered within the session should have some experience of the Python programming language or similar languages and be comfortable running code within their computing environment. No specific technical proficiencies are required to attend and engage with the webinar content itself.

Participants may also find it beneficial to attend the related Using daily Airbnb web scraped data to provide spatial and temporal understanding of short-term lets activity webinar on 24 June (10:00 - 11:00 BST).

Data and software requirements

Although there is no explicit technical participatory component to the webinar, code examples, documentation and practical exercises will be available for webinar attendees from UBDC’s GitHub repository. Instructions for installing core libraries and software will also be made available to attendees during and following the session.

Registration

Registration for this online event is free and available via Eventbrite. Full details and instructions for joining will be circulated post-registration.

Cost:

Free

Website and registration:

Region:

Scotland

Keywords:

Data Collection, Online Data Collection , Big data

Related publications and presentations:

Data Collection
Online Data Collection
Big data

Back to archive...