Acknowledgment: This project was originally created by Andrea Arpaci-Dusseau at the University of Wisconsin-Madison, adapted by Charlie Peck of Earlham College, and further adapted by David Hovemeyer.

Deliverables / due dates

The deliverable is a single Google document containing (a) Your answers to the 4 questions about the TED talk (linked below), and (b) Your essay about your Ngram experiment. The document should be titled Project 2 Document, and should be in a folder called Project 2 within your shared Google drive folder.

A draft is due Friday, Nov 13th. The final version is due Tuesday, Nov 24th.

Learning goals

What to do

Google Labs' Ngram Viewer is a tool that lets you search for words in a database of 5 million books and other printed material from across centuries. To begin, watch the TED talk by Erez Lieberman Aiden and Jean-Baptiste Michel to see how it works and some of the surprising facts they have learned. From this talk, answer the following questions:

  1. What are ngrams? Why did Google release ngrams instead of the full text of the books?
  2. In what year did “thrived” become more popular than “throve” (at least in texts?)
  3. What is culturomics?
  4. Why was the word “beft” popular in texts before 1800?

Next, experiment with the Google Ngram viewer itself. You should pick some subject than you are interested in and see how the popularity of some related words have varied over time. You can pick anything you like, but if you pick a subject that you have some additional knowledge of, you will likely find it easier to interpret the resulting data. Please don’t use any of the search terms used in the TED talk. You can find detailed information about the ngram tool at https://books.google.com/ngrams/info.

You should compare between two and four ngrams that are related to your topic. Write a short essay describing what you found, be sure to address the following:

Specify the exact query you gave to retrieve your results (e.g., the ngram phrases, the range of years, and the corpus language).

You should describe how your ngrams are related (i.e., the overall subject you are investigating).

The graph that was produced, that is include the image(s) in your document.

An objective description of the popularity of each ngram relative to one another and over time.

A subjective discussion about the relative popularity of the terms and their popularity over time.

Your essay should be at least two pages in length (double spaced).