Wikidata: The Making Of
Denny Vrandečić Wikimedia Foundation |
Lydia Pintscher Wikimedia Deutschland |
Markus Krötzsch TU Dresden |
ABSTRACT
Wikidata, now a decade old, is the largest public knowledge graph, with data on more than 100 million concepts contributed by over 560,000 editors. It is widely used in applications and research. At its launch in late 2012, however, it was little more than a hopeful new Wikimedia project, with no content, almost no community, and a severely restricted platform. Seven years earlier still, in 2005, it was merely a rough idea of a few PhD students, a conceptual nucleus that had yet to pick up many important influences from others to turn into what is now called Wikidata. In this paper, we try to recount this remarkable journey, and we review what has been accomplished, what has been given up on, and what is yet left to do for the future.
CCS CONCEPTS
• Human-centered computing → Wikis; • Social and professional topics → Socio-technical systems; History of software; • Information systems → Wikis.
KEYWORDS
Wikidata, knowledge graph, Wikibase, MediaWiki
ACM Reference Format:
Denny Vrandečić, Lydia Pintscher, and Markus Krötzsch. 2023. Wikidata: The Making Of. In Companion Proceedings of the ACM Web Conference 2023 (WWW ’23 Companion), April 30–May 04, 2023, Austin, TX, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3543873.3585579
This work is licensed under a Creative Commons Attribution-Share Alike International 4.0 License.
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA
© 2023 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9419-2/23/04.
https://doi.org/10.1145/3543873.3585579
1 INTRODUCTION
For many practitioners and researchers, Wikidata [68] simply is the largest freely available knowledge graph today. Indeed, with more than 1.4 billion statements about over 100 million concepts across all domains of human knowledge,[1] it is a valuable resource in many applications. Wikidata content is behind answers of smart assistants such as Alexa or Siri, is used in software and mobile apps (see Fig. 1), and enables research, e.g., in life sciences [38, 73], humanities and social sciences [33, 66, 76], artificial intelligence [1, 10, 49, 53, 57], and beyond [3, 46, 51].
However, Wikidata is much more than a data resource. It is, first and foremost, an international community of volunteers who subscribe to the goal of making free knowledge available to the world. It shares this and other goals with the wider Wikimedia Movement[2] to which Wikidata belongs. Indeed, Wikidata is also a project (and website) of the Wikimedia Foundation, along with sister projects such as Wikipedia and Wikimedia Commons, backed by dedicated staff to create and maintain the infrastructure that enables the work of the community.
Figure 1: Apps using Wikidata (from upper left): Wikipedia iOS app, mobile search on e/OS/, in-flight app by Eurowings/Lufthansa Systems, Siri (historical glitch exposing Wikidata IDs), and WikiShootMe tool for Wikipedia editors
The complexity and scale of the endeavor may suggest that Wikidata was the result of a long and carefully prepared strategic plan of the Wikimedia Foundation, possibly in response to demands from the Wikipedia community. There is certainly some truth to that. However, the real history of how Wikidata was conceived, and how it eventually developed into its present form is not that straightforward: it involves a group of PhD students (naïve but optimistic[3]) a free software project that brought structured data
615
- ↑ All statistics reported are current at the time of this writing. Up-to-date numbers are found at https://www.wikidata.org/wiki/Wikidata:Statistics.
- ↑ https://meta.wikimedia.org/wiki/Wikimedia_movement
- ↑ We maintain that these are different qualities.