Unpacking Wikidata’s possibilities with Lydia Pintscher
Episode 31 of Whose Voices? podcast
Unpacking Wikidata’s possibilities with Lydia Pintscher
Reviewed by Soizic Pénicaud
Introduction: Welcome feminist troublemakers, friends and allies to the Whose Voices podcast, the audio explorations of Whose Knowledge?’s tech and knowledge justice work. Are you looking for global majority feminist perspectives on Internet infrastructures, anti-caste and anti-capitalist approaches to tech, and what a truly multilingual, indigenous, and queer internet could look like? Join us as we guide you through an audio journey of deconstructing power structures that underpin how we exist on and offline. So tune in, turn up the volume, and let's ignite the flames of change together! Because when we ask: “whose voices?”, the answer is clear: us. In this season, on Whose Voices, we're in conversation with incredible activists, community builders, and changemakers, providing a space to discuss how we can reimagine and redesign the Internet together. This year's season is focused on decolonizing structured data with interviews carried out during Wikimania Singapore 2023 and a pre-convening we held to dive deeper into these systems. Structured data is at the core of how the internet as we currently know it works. These are pieces of information organized in such a way that they can be easily read, understood, and processed by machines. Through these systems, massive amounts of data get sorted out, organized and classified in relation to other pieces of data. Whose voices inform these specific regulations, traditions, and epistemologies?
Maari Maitreyi: Hello everyone, and welcome to another episode of the Whose Voices podcast. I'm Maari Maitreyi, knowledge justice researcher at Whose Knowledge?, and I'm here with Lydia Pintscher. Let's say hi to Lydia!
Lydia Pintscher: Hi, nice to be here!
Maari Maitreyi: Nice to meet you, Lydia. Do you think you could introduce yourself a little bit and tell us a little bit about the day-to-day work that you do?
Lydia Pintscher: Of course. I work as the portfolio lead for Wikidata at Wikimedia Deutschland. I'm working with the development team, the community, and other partners on moving Wikidata forward. And Wikidata is Wikipedia's sister project and it's a knowledge graph.
Maari Maitreyi: Wonderful, thank you for being here. Now Lydia, in your words, for those of our listeners who may not be familiar with what structured data is: can you tell us what structured data is?
Lydia Pintscher: Of course. So, for me, structured data means it is data that is structured in a way that both humans and machines can read it and can work with it.
Maari Maitreyi: And so we are kind of thinking a lot in the last few days about decolonizing structured data. How do you envision decolonizing structured data?
Lydia Pintscher: Right, that's a very complex question (laughs)! But I think coming from this, from the perspective of one of the people building a knowledge graph that underlies a lot of the day-to-day technology you use… So this could, for example, be you asking the digital personal assistant on your phone a random knowledge question. And, what I want to ensure is that people see their perspective reflected in that knowledge graph and see themselves represented in the data we have. And, in some areas, Wikidata is already very good at this, and in other areas, we have a lot more work to do.
Maari Maitreyi: Would you also maybe explain a little bit about what Wikidata is?
Lydia Pintscher: Wikidata is what we call a knowledge graph and you can imagine it as being a huge graph of data. Let's take for example Berlin, the city, and Germany, the country, and now you can draw a connection from Berlin to Germany. One is the capital of the other, and you can also draw a connection from Germany to our chancellor, Olaf Scholz. And through this, you can build up a huge knowledge graph that holds data about many, many, many things that you might be interested in and might have questions about. And now in Wikidata, all of this is available in a structured form, structured data as we were talking about. And that means you are then also able to query all that data and ask interesting questions. For example, one of those questions you could ask Wikidata is: “how many countries have a female head of government?”, for example. Similar questions that you might not immediately have the answer to, and where also just a quick Google Search might not give you the answer yet.
Maari Maitreyi: How are you seeing Wikimedians make use of Wikidata?
Lydia Pintscher: So originally when we started Wikidata, one of the primary goals, and it is still a very important goal of Wikidata, was to power the info boxes that you see in many Wikipedia articles. So these are the little boxes at the top of an article that hold most of the important data. So for example, for a country, it would have things like the capital, the GDP. And all of that, before Wikidata existed, was duplicated across all the many languages supports. So there were about 300 Wikipedias, all of them having to be updated whenever, for example, a famous person dies. That's a huge amount of work, and it also means that a lot of the smaller- or medium-sized Wikipedias got left behind, because they might not have the necessary people to always update their articles or even start them. So what Wikidata did was provide one central place where this basic data about concepts could be stored and shared across all the Wikipedias and the other Wikimedia projects. And that means it's a lot less work for people to update this data once, instead of having to do it across 300-plus languages across many projects that we have. Now, not all languages of course equally benefit from Wikidata. So for example, let's take English Wikipedia, our biggest Wikipedia, there are a lot of people contributing to it and they benefit much less from this centralization. But take for example a small Wikipedia that has, let's say, five editors, they can benefit massively from one central place where they can get this data and start their articles, and by that way then provide more content to their readers and give them more access to information in their language.
Maari Maitreyi: And in your opinion, how successful are some of these cross-Wikipedia efforts mediated through Wikidata?
Lydia Pintscher: We have some very successful examples. So one concrete example of that would be someone on Commons uploads a picture of a person or location, adds that to the Wikidata item, and immediately a lot of articles that make use of Wikidata have an image that they didn't have before, and people get a visual representation of the article they're reading and thereby get a better understanding of the topic they care about.
Maari Maitreyi: In talking about decolonizing Wikidata, people from the Global South generally talk a lot about how their knowledge frameworks are not easily adaptable to Wikidata. There are issues around ontologies, taxonomies. Can you say, in your opinion, what does that look like for someone who does not understand the technicalities of these taxonomies?
Lydia Pintscher: Yeah. So, this is all about how we model the world, right? The world is very complex, and there are many different ways you can model this reality. All of them are abstractions and are missing things, have holes, and are particular points of view on the world. Now what is happening on Wikidata is that a lot of people are coming together and trying to hash out how we model the world or even how we model particular small pieces of that puzzle. That is not easy. The important thing, I think, is that Wikidata is an open project where we can have these discussions. Many other technology areas and knowledge graphs that underlie other technology don't even allow you to have these conversations. Another big step forward, while also incomplete, is that Wikidata has aimed to represent the world in a more truthful, more complex way than typical knowledge graphs. So you can, for example, model conflicting points of view, and you can say who said one thing and who supports another view on a topic. That is not a typical thing for a knowledge graph to do. Now, is this enough? Probably not. But it is getting us a step closer to having a closer and more representative view of the world. On the other hand, there are so many users of that data, but the more complex you make the model of the world, the harder those applications are to build. So there's always this balance that we need to strike. And that is very hard for an open community to figure out: how do we represent the world in a good way while still allowing people to build meaningful applications on top of that data? So now, why is this complicated, or why does a complex model make it harder to build an application on top of that data? One of the things that Wikidata allows is to have conflicting statements. So you could, for example, have around a contested political area, geographic area, that two different countries claim, you can say it belongs to this country according to this government body and it belongs to that other country according to that government body. Now, let's say you're building a digital personal assistant and you want to answer people's questions. Now one of your users asks you: “Which country does this area belong to?” Most technology builders make a decision to give you one answer, and that answer might depend on where you are, taking into account local laws, for example. Very rarely do they expose to you: “well, this is complicated, and depending on who you ask, there are different opinions on this, this is a complex, contested zone”. So if you, for example, know that in very ancient times someone was born a hundred years before another person, but you don't know when either of those people were born, then you have some information, you know, that one was born a hundred years before the other, but you cannot put this in relation to anyone else and you cannot, for example, build a timeline on top of that.
Maari Maitreyi: Those are great examples, actually. It helps to visualize what kind of issues can come up. Lydia, there is a lot of us from different parts of the Global Majority putting in a lot of labor and work volunteering on Wikidata and several of the other Wiki platforms and generating, essentially populating this database, the outputs of which then get used by profit-making companies in ways that are not transparent, in ways which don't compensate people who've worked on it, or even credit them. What does decolonizing Wikidata mean in this sense, or in this situation? How can we both protect ourselves and also have better ownership of our labor?
Lydia Pintscher: That's a very hard question, right? And that is something that is true for anyone who contributes to Wikidata. I personally believe that we make knowledge available to everyone and, as much as we like it or do not like it, that means everyone. My personal hope and aspiration, and what I work towards, is that we empower a lot more small and medium-sized companies, organizations, grassroots projects and so on to build on top of that data. Because the situation we had, before Wikidata, was that large organizations, large companies, had the resources to build their own knowledge graph, and anyone who wanted to build a small useful application had to do all of that. And that is a lot of work and it takes a lot of expertise. My goal, and I think we have achieved that in part, is that we lower that barrier. That people now have a data source available that they can build alternatives on top of. You now can build a personal digital assistant on top of the data that Wikidata has, which was nearly impossible before if you didn't have a lot of resources to put into it.
Maari Maitreyi: Have you seen any interesting uses, projects, that are being built on top of Wikidata? Have you heard it from anyone at Wikimania, or seen it? Would love to know.
Lydia Pintscher: There's so many great projects built on top of Wikidata! The one that we all know is Wikipedia. A lot of Wikipedia articles use data from Wikidata to provide knowledge to their readers. Govdirectory is a website that uses Wikidata to understand government institutions and how to contact them. So anyone who goes to Govdirectory can find ways to contact their government and make their voice heard. So, if you have a particular political issue that you want to reach out to someone in your government, that's the way to find them. Another one that goes in a similar direction, last time we had a national election in Germany, someone built an app where you could scan election posters with your mobile phone, and it would show you a record of how this person advertised on and that you're supposed to vote for, how they voted in the past, how they're representing you, which party they belong to.
Maari Maitreyi: So Lydia, what do you see is the future of Wikidata? Which directions are we heading in?
Lydia Pintscher: I want us to continue building out the really amazing knowledge graph about general knowledge, about stuff everyone wants to know about. And then I want us to build a lot more smaller, specialized knowledge graphs around specific topics. So there's one, for example, for a project called Enslaved, which is collecting data about the transatlantic slave trade, which is a really important resource for scholars to study it and understand it better. And I want a lot of these to come and be connected to wiki data so that more people have access to that data and can make better decisions on that data, make great applications on top of that data.
Maari Maitreyi: Is there anything I haven't asked you that maybe you would like to say a little bit more about?
Lydia Pintscher: I would love for many more people to come and contribute to Wikidata, and on both sides, right? Both on putting data into Wikidata, but also building new, exciting applications on top of it. There are so many uses you can make of that data that I, my team, or maybe no one else has yet even thought about, but it's in your hands now, and I would love to see more of that.
Maari Maitreyi: That was really great talking to you, Lydia, and we hope to see you around Wikimania and also beyond.
Lydia Pintscher: Thank you, everyone.
Outro: And that's a wrap. Before we say goodbye, remember to keep the fire burning by staying connected with us on social media. Follow us on X and Instagram at @whoseknowledge for when the latest episode drops, and Whose Knowledge? on LinkedIn for opportunities to collaborate and to keep updated on our projects. And don't forget to join our Mastodon instance where communities come together to play, organize, and amplify. And for those who just can't get enough of Whose Voices and Whose Knowledge?, we welcome you to visit our website whoseknowledge.org, where you'll find resources, recommended readings, and our thriving blog reflecting on digital justice issues of the day. Until next time, keep resisting, keep dreaming, and keep asking the most radical question of all: whose voices? ours!
This work is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license, which allows free use, distribution, and creation of derivatives, so long as the license is unchanged and clearly noted, and the original author is attributed.
Public domainPublic domainfalsefalse