Jump to content

United States v. Google/Findings of Fact/Section 2G

From Wikisource

G. The Importance of Scale

86. Early on, Google understood that the information gleaned from user queries and click activity were a strong proxy for users’ intent and that such information could be used to deliver superior results. See UPX251 at 870 (“[M]ost of the knowledge that powers Google, that makes it magical, ORIGINATES in the minds of Google users.”); id. at 871 (“As people interact with the search results page, their actions teach us about the world.”); UPX203 at 906 (“If a document gets a positive reaction, we figure it is good. If the reaction is negative, it is probably bad. Grossly simplified, this is the source of Google’s magic.”).

87. Greater query volume means more user data, or “scale.” As the most widely used GSE in the United States, Google receives nine times more queries each day than all of its rivals combined across all devices. The disparity is even more pronounced on mobile. There, Google receives nineteen times more queries than all of its other rivals put together. See Tr. at 4761:6-24, 4762:19–4763:2 (Whinston) (discussing UPXD102 at 47, 49).

88. There are different types of user data. Click data, for example, includes the search results on which a user clicks; whether the user returns to the SERP and how quickly; how long a user hovers over SERP results; and the user’s scrolling patterns on the SERP. See UPX4 at .004. From such data, a GSE learns not only about the user’s interests but also the relevance of the search results and quality of the webpages that the user visits. Tr. at 1767:215–1771:22 (Lehman) (discussing UPX4 at .004).

89. Another type of user information is query data. GSEs accumulate query data to, among other things, learn what information users are looking for. Google’s scale means that it not only sees more queries than its rivals, but also more unique queries, known as “long-tail queries.” To illustrate the point, Dr. Whinston analyzed 3.7 million unique query phrases on Google and Bing, showing that 93% of unique phrases were only seen by Google versus 4.8% seen only by Bing. On mobile, where Google has more scale, the disparity was even higher. See id. at 5785:12–5788:1 (Whinston) (98.4% of unique phrases seen only by Google, 1% by Bing; 99.8% of tail queries on Google not seen at all by Bing) (discussing UPXD104 at 44).

90. Google has used its scale advantage to improve the quality of its search product. At every stage of the search process, user data is a critical input that directly improves quality.

91. Crawling. GSEs must determine the order in which they crawl the web. User data helps GSEs determine which sites to crawl, because it allows general search providers to understand the relative popularity of various sites. Id. at 2207:7-9 (Giannandrea). User data also helps GSEs determine the frequency with which to crawl websites. Id. at 10274:16–10275:1 (Oard). “Freshness,” or the recency, of information is an important factor in search quality. GSEs “need to know how to recrawl [sites] to make sure that [they] do at all times have a reasonably fresh copy of the web that you are looking at.” Id. at 6310:2-5 (Nayak); see UPX870 at .013 (“If we build too infrequently, our users could miss out on important news or get stale results[.]”). Popular sites, like the New York Times, are worth crawling more often than less visited sites. Tr. at 2207:3-6 (Giannandrea).

92. Indexing. While click data is “not particularly important for indexing,” query data is: GSEs need to ensure that their index covers queries that are frequently entered. Id. at 2211:1317 (Giannandrea). But see id. at 10274:16-21 (Oard) (opining that click data helps Google “decide whether to keep those pages . . . [or] future pages in the index or not”). User data also helps determine where a webpage resides within the larger index. Id. at 10274:22–10275:1 (Oard). Google divides its index into tiers. Id. Each page is assigned to a tier based on how fresh it needs to be, and the fresher tiers are rebuilt more frequently. Id.

93. Retrieval and Ranking. Because humans are imperfect, so too are their queries. Google relies on user data to decipher what a user means when a query is typed imprecisely. For example, user data allows Google to identify misspellings and reformulate queries using synonyms to produce better results. Id. at 8088:15-24 (Gomes) (spelling, synonyms, and autocomplete use query data to improve); id. at 2273:3-15 (Giannandrea) (“reformulation,” which is when a user misspells a query and then re-enters it with another spelling, is important to improve spell check); UPX224 at 914 (Google built its spelling technology by “look[ing] at all the ways in which people mis-spell words in queries and text all over the web, and us[ing] that to predict what you actually mean”).

94. Google scores potentially relevant results to determine the order in which they are placed, or ranked, on the SERP. Scoring is done using a number of signals and ranking systems, which are technologies that attempt to discern the user’s intent and thus identify the most relevant results for a particular query. See UPX204 at 243; Tr. at 1764:1-25 (Lehman). Many of these signals, discussed below, rely on user data.

95. Query-based Salient Terms, or QBST, is a Google signal that helps respond to queries by identifying words and pairs of words that “should appear prominently on web pages that are relevant to that query.” Tr. at 1807:25–1808:10 (Lehman) (e.g., “1600 Pennsylvania Avenue” and “White House”). QBST is a “memorization system[]” that helps the GSE “understand facts about the world[.]” Id. at 1838:11-25 (Lehman). It is trained on about 13 months of user data. Id. at 1808:14-20 (Lehman); UPX1007 at 371.

96. Navboost is another signal that pairs queries and documents through memorizing user click data. Tr. at 1838:11-25 (Lehman). It allows Google to remember which documents users clicked after entering a query and to identify when a single document is clicked in response to multiple queries. See UPX196 at 175; Tr. at 1806:2-15 (Lehman) (describing functions of Glue, a “relative” signal to Navboost); id. at 2215:3-4 (Giannandrea) (NavBoost “was considered very important”). Prior to 2017, Google trained Navboost on 18 months of user data. Tr. at 6405:1525 (Nayak). Since then, it has trained Navboost on 13 months of user data. Id. Thirteen months of user data acquired by Google is equivalent to over 17 years of data on Bing. See id. at 5793:1423 (Whinston); id. at 10350:8–10351:8 (Oard) (same) (discussing UPXD105 at 50).

97. More recent ranking signals developed by Google rely less on user data. Those include RankBrain, DeepRank, RankEmbed, RankBERT, and MUM. See UPX255 at .010; UPX2034. Known as “generalization” systems, these signals “may not be so good at memorizing facts, but they’re really good at understanding language.” Tr. at 1846:18-22 (Lehman). Such systems are “designed to fill holes in [click] data”; they allow Google to generalize from situations where it has data to situations it does not. Id. at 1896:2-19 (Lehman).

98. Although these newer systems are less dependent on user data, they were designed with user data and continue to be trained on it, albeit using less volume. See id. at 1845:12-21 (Lehman) (discussing UPX255 at .010–.011) (older signals use up to 1 trillion examples, whereas newer algorithms require only 1 billion); UPX226 at 483 (“Learning from this user feedback is perhaps the central way that web ranking has improved for 15 years.”) (discussing BERT and RankBrain); see also Tr. at 2652:11-14 (Parakhin) (“The more data of this nature we have, the more we can train algorithms to be better in predicting what is good and what is bad.”).

99. MUM is a large language model (LLM), or “a computational system that tries to, in some way, capture patterns in language.” Tr. at 1912:22-23 (Lehman). Whereas RankBERT “exhibited fairly weak performance” on newer scoring metrics, MUM “achieved essentially human-level performance.” Id. at 1915:10-20 (Lehman). MUM is trained on a subset of the web corpus, as well as some click training data, to allow it to “understand the structure of language and acquire some kind of reasoning abilities.” Id. at 1919:8-14 (Lehman); id. at 6358:8-20 (Nayak).

100. Google has also developed three newer LLMs: LaMDA, PaLM, and PaLM2. LaMDA was released in 2021 and is focused on conversation; PaLM and PaLM2 expanded on LaMDA and have more capabilities. Id. at 6363:22–6364:3 (Nayak). These systems were not built with user data. Id. at 6364:13-22 (Nayak).

101. Google has also developed a Search Generative Experience, which leverages artificial intelligence (AI) in search. Id. at 6364:4-12 (Nayak). This experimental product “add[s] generative AI into the search results to enhance them[.]” Id. at 8217:3-5 (Reid); see infra Section II.H.

102. The more recent LLM signals did not replace Navboost and QBST in ranking. Tr. at 1931:21-24 (Lehman); UPX190 at 740 (“Navboost remains one of the most power ranking components historically[.]”). Nor did they render the generalization systems obsolete. See Tr. at 6366:21–6367:10 (Nayak); see also FOF ¶¶ 114–115. LLMs are used as “additional signals that get balanced both against each other as well as against other signals[.]” Tr. at 6367:5-7 (Nayak).

103. Traditional systems like Navboost can also beat out LLMs (and even generalization systems) in certain aspects of SERP production, like freshness. UPX214 at 696; UPX256 at 185.

104. To be sure, there are diminishing returns to user data, but that inflection point is far from established. And, in any event, user data does not become worthless even after the point of diminishing returns. See Tr. at 10078:7-9 (Murphy) (“[T]here’s pretty much always diminishing returns, but that doesn’t mean they’re not valuable even after some diminishing returns have set in.”); id. at 6337:8-18 (Nayak) (“[T]he value you get from every additional piece of data starts falling,” but the overall value “continues to increase a little bit.”).

105. Google continues to maintain significant volumes of data—despite the expense of storing it—because its value outweighs that cost. See id. at 6337:17-25 (Nayak) (“[A]s you get more data, it’s more expensive to process.”); id. at 10349:24–10350:7 (Oard) (“[T]he cost of keeping and using this data goes up with the amount of data that we keep. The value goes up as well. And at some point, if the value were to decline to the point where it wasn’t worth the cost, people would stop doing it[.] . . . [T]here’s a sweet spot where you would stop doing it, and Google hasn’t stopped doing it yet.”); id. at 10079:9-10 (Murphy) (“I would presume if they maintain it and it’s costly to maintain it, there’s a reason they maintain it.”).

106. For GSEs with little scale, even a small amount of data can result in meaningful improvements. Id. at 10347:7-10 (Oard) (“And when you have very little, then not only do you get better, but you keep getting better at a faster and faster rate up to some point where the rate at which you’re getting better starts to slow down.”); id. at 2047:21–2048:3 (Weinberg) (“[W]e lack the scale to do as much experimentation as we want[.]”).