User:SnowyCinema/The Next Stage of Wikisource Automation

Due to the success of the test I performed with JSON data at Module:JSON pull test, I have confirmed that it is indeed possible to use a JSON file, as a subpage of an Index, to automate many of the repetitive pains of our transcription process, such as manual or semi-manual entry of chapter headers, page headers and footers (which I have already expressed my own distaste for doing), table of contents representations, and maybe even transclusion itself.

The way that this would work is that I would keep data on the work, chapters, sections, and many other things in a /data.json subpage of a work's index. Example: Index:Sandbox.djvu/data.json. What would be done then is to use automation templates to fill in a lot of the needed data, such as {{header auto}} or {{footer auto}} (to be created), which would take data from the JSON file and the Index page's pagelist to automatically generate a header or footer, depending only on the page number and little else.

Page headers and footers

In an automation/repetition sense, page headers and footers are one of the biggest pains to deal with on the whole site, as has been very clearly elucidated in another essay I've written in the past. But now, all we'll ever have to do again (for most kinds of works) is just put both {{header auto}} and {{footer auto}} in the Header and Footer sections of the Page: namespace. Then, the headers and footers will automatically fill themselves in as needed. The headers and footers can be styled as needed within Index:{filename}/styles.css, which was already possible anyway.

Exceptions

But I know what you might be thinking. The chapter or book titles are often shortened to fit in the context of a page header. Example: The actual chapter title is "What a Long Chapter Title This Turned Out to Be". The title showing in the header is "What a Long Title".

Not to fear. In the data of each chapter, in /data.json there will be an option for a "headerTitle" field, which will allow for each of these exceptions to be specified on a per-chapter basis.

Transclusion

I am also wanting to make the need for specific page tags in transclusions a thing of the past, for the most part (with the exception of more complex transclusion situations that will inherently involve some manual planning).

Example: Instead of having to specify Chapter 1's pages at Cheery and the Chum/Chapter 1 as such:

<pages index="Cheery and the chum (IA cheerychum00yate).pdf" from=13 to=18 />

You might just be able to say:

{{transclude auto|Cheery and the chum (IA cheerychum00yate).pdf}}

This specific theory has yet to be specifically tested, and I can't guarantee it will work as I hope. . . . But I think it will.

Potential issues

Broader accessibility

While this method would make the wikitext of works a lot DRYer, and make some things easier during transcription, the JSON element of it would be too difficult for many editors, especially new or less-technically-oriented ones.

I have my own ways to automate the population of the /data.json files every transcription quickly, but that's high-end stuff. Maybe we need some kind of an on-site Wizard, sort of like the UploadWizard at Wikimedia Commons, that will help users fill it in—or import a lot of that data from an already-existing table of contents—or something...

So, the technical implementation of the automation itself is one thing, but making that automation more accessible is an entirely separate project.

Changing content models

Someone (presumably an admin) will have to change the content model of every /data.json subpage to JSON for every transcription. This can be done to any page on the site through the form at Special:ChangeContentModel.

This creates a problem in that admins would have to come in and review every single /data.json subpage that's ever created. It's unfortunately not like a .css subpage, where CSS is assumed the content model by the software automatically. Maybe we can convince the WMF staff to make a change where .json is by default read as a JSON content model, to fix this issue with the workflow?

Additional acknowledgments

I have to extend credit to Inductiveload (now absent from the project inexplicably) for having already thought of this very thing apparently. Maybe I'm just picking up where he left off.

I, the copyright holder of this work, hereby release it into the public domain. This applies worldwide.

In case this is not legally possible:

I grant anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

Public domainPublic domainfalsefalse