Jump to content

Page:United States v Google 20240805.pdf/39

From Wikisource
This page has been proofread, but needs to be validated.
Case 1:20-cv-03010-APM
Document 1033
Filed 08/05/24
Page 39 of 286

89. Another type of user information is query data. GSEs accumulate query data to, among other things, learn what information users are looking for. Google’s scale means that it not only sees more queries than its rivals, but also more unique queries, known as “long-tail queries.” To illustrate the point, Dr. Whinston analyzed 3.7 million unique query phrases on Google and Bing, showing that 93% of unique phrases were only seen by Google versus 4.8% seen only by Bing. On mobile, where Google has more scale, the disparity was even higher. See id. at 5785:12–5788:1 (Whinston) (98.4% of unique phrases seen only by Google, 1% by Bing; 99.8% of tail queries on Google not seen at all by Bing) (discussing UPXD104 at 44).

90. Google has used its scale advantage to improve the quality of its search product. At every stage of the search process, user data is a critical input that directly improves quality.

91. Crawling. GSEs must determine the order in which they crawl the web. User data helps GSEs determine which sites to crawl, because it allows general search providers to understand the relative popularity of various sites. Id. at 2207:7-9 (Giannandrea). User data also helps GSEs determine the frequency with which to crawl websites. Id. at 10274:16–10275:1 (Oard). “Freshness,” or the recency, of information is an important factor in search quality. GSEs “need to know how to recrawl [sites] to make sure that [they] do at all times have a reasonably fresh copy of the web that you are looking at.” Id. at 6310:2-5 (Nayak); see UPX870 at .013 (“If we build too infrequently, our users could miss out on important news or get stale results[.]”). Popular sites, like the New York Times, are worth crawling more often than less visited sites. Tr. at 2207:3-6 (Giannandrea).

92. Indexing. While click data is “not particularly important for indexing,” query data is: GSEs need to ensure that their index covers queries that are frequently entered. Id. at 2211:1317 (Giannandrea). But see id. at 10274:16-21 (Oard) (opining that click data helps Google “decide

35