Book Review: ‘Uncharted’: Data becomes mirror of culture


  • ‘UNCHARTED: BIG DATA AS A LENS ON HUMAN CULTURE’ by Erez Aiden and Jean-Baptiste Michel; Riverhead Books ($27.95)

Why do English speakers say “drove” rather than “drived”?

As graduate students at the Harvard Program for Evolutionary Dynamics about eight years ago, Erez Aiden and Jean-Baptiste Michel pondered the matter and decided that something like natural selection might be at work. In English, the “-ed” past-tense ending of Proto-Germanic, like a superior life form, drove out the Proto-Indo-European system of indicating tenses by vowel changes. Only the small class of verbs we know as irregular managed to resist.

To test this evolutionary premise, Aiden and Michel wound up inventing something they call culturomics, the use of huge amounts of digital information to track changes in language, culture and history. Their quest is the subject of “Uncharted: Big Data as a Lens on Human Culture,” an entertaining tour of the authors’ big-data adventure, whose implications they wildly oversell.

To tackle the drived/drove question, Aiden and Michel assigned two undergraduates to read every textbook on historical English grammar, compile a list of irregular verbs and follow their fortunes through the centuries. The students turned up 177 irregular verbs in Old English, a number that declined to 145 in Middle English (the language of Chaucer) and to 98 in modern English. Of the original Old English irregulars, the 12 most frequently used verbs stayed irregular, while 11 out of the 12 least frequently used verbs made the changeover. Only “slink” held the line.

Invigorated by the great verb chase, Aiden and Michel went hunting for bigger game. Given a large enough storehouse of words and a fine filter, would it be possible to see cultural change at the micro level, to follow minute fluctuations in human thought processes and activities? Tiny factoids, multiplied endlessly, might assume imposing dimensions.

By chance, Google Books, the megaproject to digitize every page of every book ever printed was starting to roll just as the authors were looking for their next target of inquiry.

In 2010, working with Google, they perfected the Ngram Viewer, which takes its name from the computer-science term for a word or phrase. This “robot historian,” as they call it, can search the 30 million volumes already digitized by Google Books and instantly generate a usage-frequency timeline for any word, phrase, date or name, a sort of stock-market graph illustrating the ups and downs of cultural shares over time.

The Ngram Viewer delivers the what and the when but not the why. Take the case of specific years. All years get attention as they approach, peak when they arrive, then taper off as succeeding years occupy the attention of the public. Mentions of the year 1872 had declined by half in 1896, a slow fade that took 23 years. The year 1973 completed the same trajectory in less than half the time.

This may be potato chips for intellectuals, but it is irresistible. You cannot eat just one ngram.

Idaho Statesman is pleased to provide this opportunity to share information, experiences and observations about what's in the news. Some of the comments may be reprinted elsewhere in the site or in the newspaper. We encourage lively, open debate on the issues of the day, and ask that you refrain from profanity, hate speech, personal comments and remarks that are off point. Thank you for taking the time to offer your thoughts.

Commenting FAQs | Terms of Service