Cite as 804 F.3d 87 (2nd Cir. 2015)
website can enter search words or terms of their own choice, receiving in response a list of all books in the database in which those terms appear, as well as the number of times the term appears in each book. A brief description of each book, entitled “About the Book,” gives some rudimentary additional information, including a list of the words and terms that appear with most frequency in the book. It sometimes provides links to buy the book online and identifies libraries where the book can be found.[1] The search tool permits a researcher to identify those books, out of millions, that do, as well as those that do not, use the terms selected by the researcher. Google notes that this identifying information instantaneously supplied would otherwise not be obtainable in lifetimes of searching.
No advertising is displayed to a user of the search function. Nor does Google receive payment by reason of the searcher’s use of Google’s link to purchase the hook.
The search engine also makes possible new forms of research, known as “text mining” and “data mining.” Google’s “ngrams” research tool draws on the Google Library Project corpus to furnish statistical information to Internet users about the frequency of word and phrase usage over centuries.[2] This tool permits users to discern fluctuations of interest in a particular subject over time and space by showing increases and decreases in the frequeney of reference and usage in different periods and different linguistic regions. It also allows researchers to comb over the tens of millions of books Google has scanned in order to examine “word frequencies, syntactic patterns, and thematic markers” and to derive information on how nomenclature, linguistic usage, and literary style have changed over time. Authors Guild, Inc., 954 F.Supp.2d at 287. The district court gave as an example “track[ing] the frequency of references to the United States as a single entity (‘the United States is’) versus references to the United States in the plural (‘the United States are’) and how that usage has changed over time.” Id.[3]
The Google Books search function also allows the user a limited viewing of text. In addition to telling the number of times the word or term selected by the searcher appears in the book, the search function will display a maximum of three “snippets” containing it. A snippet is a horizontal segment comprising ordinarily an eighth of a page. Each page of a conventionally formatted book[4] in the Google Books data-
- ↑ Appendix A exhibits, as an example, a web page that would be revealed to a searcher who entered the phase “fair use,” showing snippets from Aran Latman, Robert A. Gorman, & Jane C. Ginsburg, Copyright for the Eighties (1985).
- ↑ Appendix B exhibits the ngram for the phrase “fair use.”
- ↑ For discussions and examples of scholarship and journalism powered by searchable digital text repositories, see, e.g., David Bamman & David Smith, Extracting Two Thousand Years of Latin from a Million Book Library, J. Computing & Cultural Heritage 5 (2012), 1–13; Jean-Baptiste Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books, Science 331 (Jan. 14, 2011), 176–182; Marc Egnal, Evolution of the Novel in the United States: The Statistical Evidence, 37 Soc. Sci. Hist. 231 (2013); Catherine Rampell, The ‘New Normal’ Is Actually Pretty Old, N.Y. Times Economix Blog (Jan. 11, 2011), http://economix.blogs.nytimes.com/2011/01/11/the-new-normal-is-actually-pretty-old/?_r=0; and Christopher Forstall et al., Modeling the Scholars: Detecting Intertextuality through Enhanced Word-Level N-Gram Matching, Digital Scholarship in the Humanities (May 15, 2014), http://dx.doi.org/10.1093/llc/fqu014.
- ↑ For unconventionally formatted books, the number of snippets per page may vary so as to approximate the same effect. The pages of