Neal Caren

Department of Sociology
University of North Carolina, Chapel Hill
neal.caren@gmail.com
@haphazardsoc

Code available from
http://nealcaren.github.io/workshop_2014/

Please make sure you have a copy of IPython up and running, ideally from Anacodna.

If you are bored, download the zip file, unzip it, and open up an IPython Notebook.

If you are still bored, sign up at the New York Times to be a developer.

Two overlapping projects

  1. Collecting text and non-text data from the web.
    1. Sometimes they want to give it to you (APIs)
    2. Sometimes they don't (HTML scraping)

Yelp can tell us how SES structures food options.

Or map who supports secession

Two overlapping projects

  1. Collecting text and non-text data from the web.
    1. Sometimes they want to give it to you (APIs)
    2. Sometimes they don't (HTML scraping)
  2. Analyze text data, that may or may not have been downloaded.

Trends in Tea Party Coverage

What were Occupy supporters doing on Facebook?

How the New York Times writes about and women

Goals for today

Understand text collection and analysis in Python

(well enough so that you can Google your problems, find the answer, and implement it.)

Have you start thinking about what theoretical puzzles these methods can help you answer

The fear:

Computational Social Scientist:
- Someone who knows less theory than a sociologist and less programming than a computer scientist.

More specifically

  1. Up and running with IPython (1_Into)
  2. Understand a basic workflow (2_Twitter)
  3. Basics of Python and sentiment analysis (3_Sentiment)
  4. Collecting data from a website (4_Upworthy)
  5. Collecting data from an API (5_Times)
  6. Basic text ML with scikit-learn (6_classification)

Why Python

Setup

On your computer

On the internet

Slides and Code for Today

Links to get them on the internet

Getting yourself setup

On a Mac

On Windows

Ipython Notebooks are what all the cool kids use.

A new notebook

Cells can be Markdown (like this one) or code

To start off with

Make sure you hit Shift-Enter or Ctrl-Enter when you are done.

In [3]:
2 + 2
Out[3]:
4

Frequently used Python Packages

Data aquisition

Analysis