User:Inductiveload/Requests/Batch uploads
I can upload batches of files from the IA or HathiTrust. However, I will require the metadata to do so. I will not do uploads if you don't give me the data (unless I really, really want to anyway).
I can also create files from batches of images. In this case, you will need to provide details of where I can get the images from. I can help you with batch downloading images if you need. If you already have the images, probably the easiest way to share them with me is to upload to the Internet Archive as an "image ZIP" following these instructions.
Data file format
[edit]I will need a spreadsheet (XLSX, CSV or ODS) with the following columns (the names are important, don't change them).
Column | Required? | Purpose | Example |
---|---|---|---|
title |
Required | The title of the work. For a batch, this is often the same for every row. | The Atlantic |
subtitle |
Optional | the work subtitle. Optional (but give it if there is one) | A magazine of Literature, Science, Art and Politics |
author |
Optional | Author(s), slash separated | "Oscar Wilde" or "Q30875" |
editor |
Optional | Editors(s), slash separated | |
illustrator |
Optional | Illustrator(s), slash separated | |
translator |
Optional | Translator(s), slash separated | |
year |
Required | The publication year | 1868 |
volume |
Optional | The volume number | 22 |
subpage |
Optional | The volume subpage at Wikisource (if it's not just "Volume XX"). Not required if the work doesn't have a subpage (e.g. a simple single-volume book), or if it does and it's "Volume XX" (in that case, it is inferred from the presence of volume ). |
|
vol_detail |
Optional | Optional detail string for the volume for the book template and the index page | July–December 1868 |
vol_disp |
Optional | The volume display string for the Commons book template. Will not be used in a page title. If not given, "Volume XX" and then the vol_detail , if any, brackets. |
Volume 22 (July–December 1868) |
filename |
Optional | The target filename (no extension). If not given, a default will be attempted with a format like Brave New World - Huxley - 1932 or The Atlantic Monthly - Volume 22 . |
The Atlantic Monthly - Volume 22 |
id |
Required | The external source ID. The URL for "url" sources, blank if you provide a file archive to me somehow. Required otherwise. | atlantic22bostuoft |
source |
Required | The source: either "ia", "ht" or "url" | ia |
file |
Optional | If uploading files from some file archive you give me (rather than directly from the IA, or a URL etc), the filename in that collection | File 1.pdf
|
oclc |
Optional | The OCLC number | 297234877 |
lccn |
Optional | The LCCN number | |
city |
Optional | The city of publications | Boston |
publisher |
Optional | The publisher | Fields, Osgood, & Co. |
printer |
Optional | The printer | |
license |
Required | The license (so it can be inserted into {{pd-scan| ) |
PD-US-expired |
pagelist |
Optional | Manual pagelist tag. If you don't provide this, one will be generated from the IA or HT metadata, if possible. This is usually incomplete, but it's generally a good start. | |
img_pg |
Optional | The image page (used as the title page). Usually the source provides this information via the page list metadata. | |
language |
Required | The work's language code | en |
commonscats |
Required | Categories for the work at Commons, slash-separated | The Atlantic Monthly, 1868 |
vollist |
Optional (required for multi-vol works) | The volume list template (or wikitext) | {{Atlantic Monthly volumes}} |
only_pages |
Optional | if only some pages should be include from the source, then one or more numbers of ranges, comma-separated. | 1-100,103,105-199
|
rm_pages |
Optional | if some pages should be excluded, then one or more numbers of ranges, comma-separated. Note: applies after the included pages. | 1,5-8,1234
|
to_ws |
Optional | if the file should be uploaded to Wikisource, rather than Commons, then y |
|
ws_lang |
Optional | the target Wikisource: where the index pages will be made, and where the files will be uploaded if to_ws is set. Default is en . Use mul for Multilingual Wikisource. |
|
access |
Optional | set to us if the work is not accessible outside the US (usually for Hathi) |
|
no_commons_until |
Optional | The date after which the file can me moved to Commons (used for the until parameter of {{Do not move to commons}}. Mandatory if to_ws is set |
2035 |
no_commons_reason |
Optional | The reason the file shouldn't be moved to Commons (used for the why parameter of {{Do not move to commons}}. Mandatory if to_ws is set |
Multi-author work published in UK |
user |
Optional | Requesting user name - will be used in the index page creation summary if given (which will both ping that user and make it clear who found that file) | Inductiveload |
- All data, like printer, that is available should be provided. It's a lot easier to put it in now than patch it in later.
- You can add as many other columns as you like for your own purposes, such as building up strings. They will be ignored.
There some examples here: https://drive.google.com/drive/folders/1fW5ozskDJiyVoQycUoGEB7d-L_Uh6N7b
Authors, etc
[edit]If you provide strings like Oscar Wilde
, they will be used as-is. If you provide a Wikidata ID like Q30875, then it will be used in the creator template at commons and the linked Wikisource author page (in this case, Author:Oscar Wilde) will be used for the index page.
Separate multiple authors with slashes, e.g. Oscar Wilde/Albert Einstein
.
Sources
[edit]I can download from the following sources using the relevant ID of the work at that source:
ia
: The Internet Archiveht
: HathiTrust
I can also use direct URLs to any other online resource at a publicly-accessible location. Set url
in this case.
I may be able to add other sources if generally useful - just ask. This can include something like a Dropbox or other (decent: no dodgy hosts, please) web drive, as long as the images are in a unique folder per work and are in order.
Licenses and copyright
[edit]I can upload files locally to Wikisources if needed, if they are not suitable for Commons for copyright reasons.
If the file is not a US work (e.g. a non-US author), you must not specify PD-US
as the copyright if the file is going to go to Commons. You should specify a suitable template. Usually, this is PD-old-auto-expired: in that case you must also give deathyear
to show why the work is PD in the country of origin.
If the file is coming to Wikisource (usually because it's copyright in the country of origin, but not in the US), you should set to_ws
to yes
, set ws_lang
if not en
and you must provide no_commons_until
and no_commons_reason
.
Spreadsheet automation
[edit]Note, you can often use the volume number to build the other cells with spreadsheet equations. For example, if the volume number is col G and the title is col C, then the filename for row 2 might be =C2 & " - Volume " & G2
.
Likewise, you can increment numbers. If row 2's volume is 1, then you can make row 3's 2 using =G2 + 1
.
You can zero-pad number with, e.g. TEXT(G2, "00")
In this way, you can save a lot of tedious typing. However, do make sure that the data stays accurate. Very often things like publisher, printer or even the date ranges of volumes can change halfway though a series.
If you use formulae, I'd prefer to receive an XLSX file than a CSV file, since I can adjust the formulae if needed.
Authority control
[edit]The OCLC number is optional, but highly recommended, because the OCLC ID is a very good way to link the files and indexes with structured data, as it (should be) a unique key.
Sending the file
[edit]You can send me the file by creating a task on my Workboard at Phabricator and attaching your spreadsheet, or commenting on my talk page and providing a link to some other file host (e.g. Google Drive, Dropbox, etc).
If you use formulae in your spreadsheet, I'd rather have the original spreadsheet (XLSX/ODS) than an exported CSV file, because if I need to make changes to anything, it's easier if the formulae still work.
Known issues
[edit]- Pagelists are generated from the source's upstream data. The quality of this ranges from near-perfect to complete junk. It will be your responsibility to deal with that these. All indexes are created with "to be checked" statuses for this reason.
- You can provide a
pagelist
field, then it will be set to "to be proofread".
- You can provide a
Your tasks
[edit]You have some work to do even once the batch upload is complete:
- If the works are part of a series, any index volume list templates (e.g. {{American Printer volumes}}) in the
vollist
column should be created also - All the Commons categories you specify should exist and be categorised
- Finishing the pagelists on the index pages (the upload will include an automatically-generated pagelist from the IA or HathiTrust metadata, but this is usually incomplete)
- Adding {{small scan link}} templates to Author and Portal pages as appropriate
- Generally tidying up if there are other rough edges.
By making a batch upload request, you agree to undertake these tasks.