Acknowledgment: This project was originally created by Andrea Arpaci-Dusseau at the University of Wisconsin-Madison, adapted by Charlie Peck of Earlham College, and further adapted by David Hovemeyer.
Deliverables / due dates
The deliverable is a single Google document containing (a) Your answers to the 4 questions about the TED talk (linked below), and (b) Your essay about your Ngram experiment. The document should be titled Project 2 Document, and should be in a folder called Project 2 within your shared Google drive folder.
A draft is due Friday, Nov 13th. The final version is due Tuesday, Nov 24th.
Learning goals
- Learn about interfaces for analyzing and visualizing data sets
- Learn about Google’s Ngram Viewer
- Form a hypothesis about words/phrases and see if you can test it using Google’s Ngram tool
- Document an experimental process and results for a hypothesis
What to do
Google Labs' Ngram Viewer is a tool that lets you search for words in a database of 5 million books and other printed material from across centuries. To begin, watch the TED talk by Erez Lieberman Aiden and Jean-Baptiste Michel to see how it works and some of the surprising facts they have learned. From this talk, answer the following questions:
- What are ngrams? Why did Google release ngrams instead of the full text of the books?
- In what year did “thrived” become more popular than “throve” (at least in texts?)
- What is culturomics?
- Why was the word “beft” popular in texts before 1800?
Next, experiment with the Google Ngram viewer itself. You should pick some subject than you are interested in and see how the popularity of some related words have varied over time. You can pick anything you like, but if you pick a subject that you have some additional knowledge of, you will likely find it easier to interpret the resulting data. Please don’t use any of the search terms used in the TED talk. You can find detailed information about the ngram tool at https://books.google.com/ngrams/info.
You should compare between two and four ngrams that are related to your topic. Write a short essay describing what you found, be sure to address the following:
Specify the exact query you gave to retrieve your results (e.g., the ngram phrases, the range of years, and the corpus language).
You should describe how your ngrams are related (i.e., the overall subject you are investigating).
The graph that was produced, that is include the image(s) in your document.
An objective description of the popularity of each ngram relative to one another and over time.
- Is the popularity of these search terms increasing or decreasing over time?
- Has the relative popularity of the terms changed at all over time?
- Are there distinct moments in time when the popularity of the search term has abruptly increased or decreased?
A subjective discussion about the relative popularity of the terms and their popularity over time.
- Why do you think some of the ngrams are more popular than others?
- How does the popularity of each ngram correlate with what was going on in the real world?
- What information from the other sources do you have to back up your speculations?
Your essay should be at least two pages in length (double spaced).