Skip to content

Latest commit

 

History

History
69 lines (44 loc) · 2.71 KB

README.md

File metadata and controls

69 lines (44 loc) · 2.71 KB

CAR18 Chicago

Introduction to Web Scraping

adapted from Alex Richards' (@alexrichards) excellent IRE17 class.

He'll also be teaching a repeat web scraping session Sunday!

This session will cover:

  • How web scraping will make your life easier
  • How to do so responsibly
  • Using third-party Python packages
  • Fetching web pages with Python
  • Navigating the HTML in those pages to get data
  • Structuring scraped data and writing it to a CSV
  • And a couple of tips on shortcuts with HTML tables!

Software requirements:

You should have Python on your machine. Type the following in Bash (on Mac OS, you can access it with an Application called Terminal) to check that you have the correct version for the class:

which python3

which should return something like

/Library/Frameworks/Python.framework/Versions/3.5/bin/python3

If not, and you're in the CAR18 class, you should flag down the instructor or a TA. If you're not in the class, download Python3.

If you already have Python 3, you should be able to run the command pip install -r requirements.txt after downloading this repository to get the packages listed below:

Have questions?

You can always:

Struggling with installation? Try this updated guide for Windows and OS X.

Resources:

Python

  • PyCAR for in-depth Python learning
  • CodeAcademy for Python syntax
  • Think Python, a popular introductory book whose digital edition is available free online

Scraping

The Internet