Jump to content

User:Billinghurst/doc

From Wikisource

DNB and Persondata

[edit]

You might want to amplify what the project pages say about Persondata. So far this hasn't seemingly entered the consciousness of participants in a serious way. Just occurred to me as I was working over what was posted in the project's early days. Charles Matthews (talk) 08:50, 14 November 2009 (UTC)

I cannot say that I am fully around what is happening with metadata. I will see what I can find. That really might be a good question for Magnus when you meet him. billinghurst (talk) 09:59, 14 November 2009 (UTC)
Spoke to Pathoschild and he said Persondata is ugly, and we should be able to do it better. I will let him add his own comments. He said that Magnus probably would have some good input into this. billinghurst (talk) 10:20, 14 November 2009 (UTC)
Metadata was one of the main motivations for creating standard {{header}} and {{author}} fields, with explicit text-only fields. Implementing metadata sitewide for works and authors should be fairly easy once we decide on a good format.
The approach taken by w:Template:Persondata is to add an HTML table with cells identified by classes. This will work now without any software changes, but it's not an elegant solution; it depends on CSS to hide it from users, confuses screen-readers, presents data in a one-dimensional format that works well for indexes but little else, is difficult to extend, and cannot contain metadata about metadata. An example of this format on Wikisource, with some simplifications for machine-only parsing, would be:
<table class="metadata author-metadata">
    <tr class="author-metadata-first-name"> 
        <td>First name</td>
        <td>Abraham</td>
    </tr>
    <tr class="author-metadata-last-name">
        <td>Last name</td>
        <td>Lincoln</td>
    </tr>
    <tr class="author-metadata-birth-year">
        <td>Birth year</td>
        <td>1809</td>
    </tr>
    <tr class="author-metadata-death-year">
        <td>Death year</td>
        <td>1865</td>
    </tr>
    <tr class="author-metadata-description">
        <td>Description</td>
        <td>16th President of the United States (1861 – 1865), with Hannibal Hamlin (1861 - 1865) and <a href="/wiki/Author:Andrew_Johnson" title="Author:Andrew Johnson">Andrew Johnson</a>, succeeding <a href="/wiki/Author:James_Buchanan" title="Author:James Buchanan">James Buchanan</a>; succeeded by Johnson. <a href="http://en.wikipedia.org/wiki/Whig_Party_(United_States)" class="extiw" title="w:Whig Party (United States)">Whig</a> House Representative from Illinois (1847 - 1849). Illinois militia (<a href="http://en.wikipedia.org/wiki/Black_Hawk_War" class="extiw" title="w:Black Hawk War">1832</a>)<br /> <i>The icon&#160;<a href="/wiki/File:Speaker_Icon.svg" class="image"><img alt="Speaker Icon.svg" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Speaker_Icon.svg/20px-Speaker_Icon.svg.png" width="20" height="20" /></a> identifies that the work includes a spoken word version.</i></td>
    </tr>
    <tr class="author-metadata-image">
        <td>Image</td>
        <td>http://en.wikisource.org/wiki/File:Abraham_Lincoln_head_on_shoulders_photo_portrait.jpg</td>
    </tr>
    <tr class="author-metadata-link-wikipedia">
        <td>Wikipedia link</td>
        <td>http://en.wikipedia.org/wiki/Abraham_Lincoln</td>
    </tr>
    <tr class="author-metadata-link-wikiquote">
        <td>Wikiquote link</td>
        <td>http://en.wikiquote.org/wiki/Abraham_Lincoln</td>
    </tr>
    <tr class="author-metadata-link-commons">
        <td>Commons link</td>
        <td>http://commons.wikimedia.org/wiki/Abraham_Lincoln</td>
    </tr>
</table>
An idea I discussed with Billinghurst is to have XML data tucked into a CDATA comment. This is ignored by browsers and screen-readers, is very easy to machine-parse, can contain multidimensional data, and can contain any data (even images, if we really wanted to). The example below presents the same data (with added metadata), but is 17% shorter. MediaWiki strips comments before outputting to HTML, but a very simple extension could add a <comment> or <metadata> tag (and there would be no obstacle to implementing it, since there should be no performance issues).
<div class="metadata">&lt;!--<![CDATA[
<metadata topic="author">
    <names>
        <name type="first" label="First name">Abraham</name>
        <name type="last" label="Last name">Lincoln</name>
    </names>
    <dates>
        <date type="birth" label="Birth year">1809</date>
        <date type="death" label="Death year">1865</date>
    </dates>
    <profile>
        <description>16th President of the United States (1861 – 1865), with Hannibal Hamlin (1861 - 1865) and <a href="/wiki/Author:Andrew_Johnson" title="Author:Andrew Johnson">Andrew Johnson</a>, succeeding <a href="/wiki/Author:James_Buchanan" title="Author:James Buchanan">James Buchanan</a>; succeeded by Johnson. <a href="http://en.wikipedia.org/wiki/Whig_Party_(United_States)" class="extiw" title="w:Whig Party (United States)">Whig</a> House Representative from Illinois (1847 - 1849). Illinois militia (<a href="http://en.wikipedia.org/wiki/Black_Hawk_War" class="extiw" title="w:Black Hawk War">1832</a>)<br /> <i>The icon&#160;<a href="/wiki/File:Speaker_Icon.svg" class="image"><img alt="Speaker Icon.svg" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Speaker_Icon.svg/20px-Speaker_Icon.svg.png" width="20" height="20" /></a> identifies that the work includes a spoken word version.</i></description>
        <image>http://en.wikisource.org/wiki/File:Abraham_Lincoln_head_on_shoulders_photo_portrait.jpg</image>
    </profile>
    <links>
        <link target="Wikipedia" label="Biography on Wikipedia">http://en.wikipedia.org/wiki/Abraham_Lincoln</link>
        <link target="Wikiquote" label="Quotes on Wikiquote">http://en.wikiquote.org/wiki/Abraham_Lincoln</link>
        <link target="Commons" label="Media on Commons">http://commons.wikimedia.org/wiki/Abraham_Lincoln</link>
    </links>
</metadata>
<!]]>--></div>
Either format could easily be output by {{header}} and {{author}}, although the Persondata-style table would appear at the top of the page and confuse screen-readers even more than it does on Wikipedia (that is why {{persondata}} is placed at the bottom of the article there). Whichever format we choose, we can set up an API that extracts the data from the page and displays it in any of various formats. —Pathoschild 11:49:21, 14 November 2009 (UTC)
For my sake, a link to w:CDATA billinghurst (talk)
I spoke to Duesentrieb, who says he's working on getting HTML5 microdata or RDFa into MediaWiki before the next release. Both would allow us to easily mark up the header, author, and license templates in a standard, machine-readable way. HTML5 microdata in particular looks suited to our use. Either of these would be a more ideal solution than a custom metadata format, if they're really coming. —Pathoschild 23:55:01, 15 November 2009 (UTC)
Excellent news P/child. Anything that a nonce like me should be reading at this point? Or just leave it until we have a better idea of the direction? billinghurst (talk) 02:08, 16 November 2009 (UTC)