Jump to content

Help:Index pages

From Wikisource
Index pages

Advanced instructions for using Index pages. See also Help:Beginner's guide to Index: files for a quick overview of Index pages.

Index pages and the workspace

[edit]

An "index page" (or "index file") is a page within the Index namespace. The Index namespace is the focus of the "workspace" in which proofreading and transcription take place. Each index page represents one work to be transcribed. Index pages have page lists, giving a numbered link for each page in the work. These links link to pages in the Page namespace (the other part of the workspace). The page titles of the Index and Page namespace pages will be the same.

For example, if the title of the index page is "Index:My book.djvu", the pages will link to:

  1. Page:My book.djvu/1
  2. Page:My book.djvu/2
  3. Page:My book.djvu/3
    and so forth...

Usually an index page is based on a file, either DjVu or PDF, but it can also be created manually from image files such as JPEG, PNG etc. When based on a single file, the page title of the index page must match the title of the file.

For example, If the file is "File:My book.djvu" the index page will be "Index:My book.djvu"

In addition to the page list, index pages also hold metadata for the work, such as title, author, year of publication etc. This information is useful for reference and it can be used by the final work in the main namespace.

Creating Index pages

[edit]
Index page with text fields.

Prior to creating an index page, you must upload a scan. This scan can be in DjVu or PDF format. In most situations, the scan should be uploaded to Wikimedia Commons rather than Wikisource. However, Wikimedia Commons' policy has extra requirements beyond the purely legal; it may not always accept scans for some works, even though they are legally free (either public domain or licensed) and acceptable for Wikisource. In these situations only, the scan can be uploaded directly to Wikisource.

For creating an index page using individual images, please see below.

A new index page must be created for every new transcription. Index pages are created in the same way as any other page. Some ways to create an index page are:

  1. From the file page (making sure that you are on Wikisource and not Wikimedia Commons), change the "File:" prefix in the url to "Index:", go to that page and select "Create".
  2. Enter the name of the index page in the search box then, on the search results, click the red link with this title.

When creating an index page from a scan file, the name of the page must exactly match the name of the file except for the namespace prefix. For example, if the name of a scan, following upload, is "File:My book.djvu" then the index page must be "Index:My book.djvu". Note that only the prefix has changed (from "File" to "Index"). Any other changes will prevent the index page from working correctly.

The new index page will not look like a normal wiki page. It will have a series of text fields instead of a single edit box (see image). Each text field is a parameter that are described below; most will be blank but a few are pre-filled automatically. Some of these parameters alter or support the process of transcription and proofreading; other parameters contain metadata and navigation links.

This is based on the index template. If you create or edit the page with JavaScript disabled, you will see a normal wiki edit box containing the index template. This can be completed and used in the same way as any other template.

The index page can be saved at this point. Any and all fields can be filled before saving and at any time in the future.

Parameters

[edit]

Index pages have specific, preset parameters as part of the index template. The following instructions explain how to use each parameter.

Type
The type of the original work. This will default to Book, which is the most common type of work on Wikisource.
  • Collection is used for a collection of media that are related. This is discouraged on the English Wikisource and the individual items should be loaded separately.
  • Journal or magazine is used for complete numbers or issues of a journal. Often these are made up of multiple articles or papers that will be transcluded separately. Uploading single articles from a journal is discouraged. However, do be aware that some journals and magazines will have different copyright terms on the various articles within them.
  • Thesis, report is generally used for works that have not been formally published. Care needs to be taken with these to ensure that they meet the Wikisource inclusion criteria. See What Wikisource includes for more details.
  • Dictionary is used for books that will be transcluded in very small sections.
Title
The title of this work. If there is a subtitle it should be included here as well. The title should be wikilinked to the main namespace. The subtitle should not be part of the link. If there is more than one work with the same name then you will need to disambiguate.
Language
The primary language of the work using the standard codes with two or three letters. Here on the English Wikisource, this will be en, enm (Middle English), ang (Old English) or sco (Scots).
Volume
If this Index file is a part of a multi-volume work, enter the volume number into this field. If different volumes use different subpages it may help to wikilink this parameter to the volume's subpage in the Main namespace. For example, [[My book/Volume 01|vol. 1]], if it is the first volume of the work "My book".
Author
The name of the author should go here. It is common practice to wikilink this name to the author's page in the Author namespace.
Translator
If the work was originally published in another language, enter the name of the translator(s). This should be wikilinked to the Author: namespace.
Editor
If this is a multi-author work, such as an encyclopedia or a journal, enter the name of the editor(s). This should be wikilinked to the Author: namespace.
Illustrator
Enter the name(s) of any illustrators that are credited in the work. This should be wikilinked to the Author: namespace.
School
This applies mainly to Theses and Reports. The institution under whose auspices the work was produced is entered here. This should be linked to the Portal: namespace. An alternate use would be if the work was a collaborative production from among a group of unnamed followers of a particular author. This would mostly apply to artworks and hence rarely used here.
Publisher
The name of the publisher should go here. If a portal exists for this author, the publisher's name may optionally be wikilinked to the appropriate page in the Portal namespace.
Location
The location of the publisher should go here.
Year of publication
The year the work was published should go here.
Sort key
If the name of the Index page begins with an article (the, a, an) or a similar word, this parameter can be used to sort it correctly in Category pages. It acts as the {{DEFAULTSORT}} for the Index page. See w:wp:Categorization#Sort keys for details.
International Standard Book Number, OCLC, LCCN, ARK from BNF, National Archives Number
A unique identifier for the work assigned by one of these organisations. Only one of these is needed and for some early works there will be no identifier available.
Scans
Select the appropriate file type from the drop-down list. It will create a wikilink to the scan's page in the File namespace. If this field is left as "other" the wikilink will not be created. Based on currently supported file types this will be djvu or pdf. This can be overridden if, for example, an index page is being created manually from individual page images (see below).
Cover image
The title page of the work. The number in the text field represents the page of the scan to be shown, the default is the first page. This can be overridden.
Progress
This controls the Index page's categorization. It will normally appear as a drop-down menu, but text can be entered instead. In most cases the initial status of a newly added work will be "To be proofread". The template behind the index page records the setting with a short alphabetic code.
Available options for the Progress parameter
Menu option Code Status Description Tracking category
Done T Done—All pages of the work proper are validated All pages in the file that relate to the work have been validated or are set to "without text". There are no problematic pages. Completion of any advertisement pages is optional for this setting. Index Validated
To be validated V Proofread—All pages of the work proper are proofread, but not all are validated All pages in the file that relate to the work have been proofread at least once and the work is ready for validation. This includes tables of contents and pages in the index. Images included as part of the work should be present. Completion of any advertisement pages is optional for this setting. Index Proofread
To be proofread C To be proofread There are text pages waiting to be proofread. At this point problematic pages may be present. Index Not-Proofread
Ready for Match & Split MS Ready for Match and Split There is already a main namespace text that is the same edition as this Index and is not proofread from a scan. See Help:Match and split for more details. If the requirements for "match and split" are met, setting this status may help to prevent other users from undertaking unnecessary proofreading. Index - Ready for Match and Split
Needs an OCR text layer OCR Needs an OCR text layer The file for this Index doesn't have an embedded text-layer and needs the attention of an experienced editor. Index - Text Layer Requested
Source file is incorrect (missing pages, unordered pages, etc) L Source file must be fixed before proofreading The source file this Index is built from has one or more structural issues such as omitted pages, duplicate pages, pages out of published order or similar flaws. When this has been set, all proofreading work under the Index should cease until an experienced editor has been able to investigate the problem and sort it out.

Use either the Index talk page or the Volumes field to note what the problems are. This will help the investigating editor in determining how to resolve the issue(s) and reduce the amount time and effort needed to implement the solution(s).

Index - File to fix
Source file must be checked (for missing pages, unordered pages, etc) before proofreading X Source file must be checked before proofreading commences The source file has been uploaded but it has not yet been checked for faults such as missing or mis-ordered pages Index - File to check
A small number of index pages do not have a status; this will generally only occur when the index page is malformed. Index - Unknown progress
Pages
See Page numbers in the Index namespace

This field is used to host the means to generate a graphical representation of all the positions (or scanned pages) found in a typical .DjVu or .PDF uploaded source file as they are [re]mapped with manually assigned page numbering to offset any differences between the actual source file and this graphical representation. Such assignments, offset or otherwise, automatically link to their corresponding targets in the Page: namespace where any embedded page content extracted from a source file is displayed side-by-side with a thumbnail image of the file position (or scanned page) associated with that content. This is done in order to better facilitate the transcription and proofreading process as explained earlier on.

The means to accomplish the above is done through the <pagelist /> tag with pre-defined commands. The Pages field is automatically populated with the <pagelist /> tag by default and, if left untouched, will always generate a basic position#-to-page# graphical representation of 1-to-1 (i.e. no offsets, customizations, etc.) all the way through & until the end of the uploaded source file is detected for you.

The Pagelist tag

The <pagelist /> tag is a powerful yet simple way to depict how the position-sequence to page-numbering found in any given work is to be specifically structured and works across the many various types of structures one might encounter just as easily. You can command it to indicate positions which shouldn't be numbered; for instance, <pagelist 1to2=- 3=1 /> will cause positions 1 and 2 to be represented as unnumbered pages (-), and page numbering will start by setting the third position of this document as page 1.

Since early 2024, sometimes pagelists for recently created files do not work and display Invalid interval. The solution is to use the "purge file" button () on the top right of the Index: page.

You can also use text to label positions. For example, <pagelist 1=Cover 2to6=- 7=Title 8=2 20="Plate 1" />. Note that quotation marks (") are required when there are spaces in the text label.

If a sequence of positions were designated with lower case Roman numerals as page numbers in their original paper-printed form, use <pagelist 5to10=roman 5=1 11=1 /> to indicate this. This will set position 5 to i, position 6 to ii, and so on. Note the 11=1. This is used to start the Arabic numeral count following the end of the Roman numeral assignments. The equivalent tag for upper case roman numerals is "highroman".

The <pagelist /> command can be invoked multiple times, which useful in dictionaries (see Index:A Dictionary of Music and Musicians vol 4.djvu) or when the work is made up of several smaller works each with their own range of positions-to-pages (see Index:Tracts for the Times Vol 1.djvu). When using multiple pagelists, the following syntax is used: <pagelist from=147 to=185 />. This code will show only positions 147–185 for example.

General recommendations for labeling pages
Label the front and back covers, if they contain signifcant content of themselves, as "Cover" ("Cvr").

Label the frontispiece as "Frontispiece" ("Fpiece").

Preliminary sections of a work that are not part of the sequences or ranges of numbering as depicted in the original printed work should be named: "Half-title", "Title", "Contents", etc. Any of these pages that do fall in a numbered range should be numbered according to the sequence or flow of numbering ranges in the work. It is recommended that if there are more than a few of these such as a run of title, copyright, dedication, Table of contents, etc.. then front matter be numbered using roman numerals for simplicity (even if this numbering doesn't appear on pages in the work concerned.) Generally, a half-title (if present) or title will be the nominal (page i) of such a run (excepting image plates).

Full page images that are not part of the contiguous flow of file-positions to page-numbering should be labeled as "Image" ("Img"). Alternatively, they could be labeled with their plate or figure numbers. E.g. "Plate_V" or "Fig_72". Whenever in doubt, unique labels are always preferred over the re-use of previously assigned labels.

For full page images that are a part of the contiguous flow of file-positions to page-numbering however, they should never be labeled with anything other than the expected page-number (or logical text-name). Any deviation from this practice must be clearly documented and expected to be justified if ever a cause for questioning the deviations arises.

Label any position(s) containing advertisements as "Advert" ("Adv") page(s).

Label positions that are void of any published content (i.e. blank pages) that are not a part of the contiguous flow of file-positions to page-numbering with "-" ("–", "—"), either may be used provided they are applied consistently within a work. "–" (En-dash) is preferable to ("-") (hyphen) for readability with smaller size typefaces. These positions mostly occur in the end-matter of a book, but may also appear on either side of full-page images. For an example of both of these scenarios as applied, see the file Index:Mexico as it was and as it is.djvu. (NB. Some contributors have made a distinction between end-matter pages ("–") and backs of image plates ("—").)

Positions that represent such 'blanks' but are a part of the contiguous flow of file-positions to page-numbering should not be labeled other than with their expected page-number (or logical text-name). The fact there is "nothing" on the page at that file-position will be indicated by the "without text" Proof-Reading status.

Rules when labeling pages
All existing positions, from the absolute first to the very last, whether transcribed in full or just a blank, regardless if ever transcluded or not, must be accounted for by being assigned a page label!!

Skipping ranges in the sequence of a source file's file-positions or omitting the labeling of certain source file file-positions that clearly exist in said source file (even if never actually created as a page in the Page: namespace) are both unacceptable practices as a part of standing WS policy. Any desire to [re]produce such customizations should be made in the final transclusion of the finished product to the main namespace instead; or should be made directly to the structure of the source file itself prior to uploading as needed.

The creation of such unassigned &/or omitted positions from the <pagelist /> command line accounting of their corresponding pages in the Page: namespace only serves the Wikisource community as a means to grow the Orphaned Pages List while managing to accomplish little else if anything positive at all at the same time.

Any application of such methods as described in this sub-section, done in order to circumvent or mask any structural issues eventually discovered within an uploaded source file, is never an acceptable practice.

Volumes
If this is a multi-volume work, put links to the Index pages for the other volumes here. See Index:History of England (Froude) Vol 2.djvu for a simple example of this and Index:Popular Science Monthly Volume 5.djvu for a more complex example.
Table of Contents
A table of contents (ToC) for the text. Usually this will provide links to the chapters in the Main: namespace.
The table of contents can be typed in here directly, either using plain wikitext (lists, tables, or simple links), or via the {{Auxiliary Table of Contents}} template. However, if the text includes its own table of contents, this can be shown instead by "transcluding" the pages from the text (use the name of each page, wrapped in two curly brackets—or braces—at either side). See Index:Air Service Boys over the Rhine.djvu for an example of this. Hint: don't leave spaces or returns between the separate pages when using this. This will ensure vertical alignment between the separate pages.
If a page contains more than just the table of contents, you can transclude just the table of contents by using Labelled Section Transclusion, and adding it to the index with {{#lst:Page:index name.djvu/page number|section name}} – see Index:Amazing Stories Volume 16 Number 06.djvu for an example.
If the table of contents is long, a scrolling window can be used, by placing <div style="width: 95%; height: 700px; overflow: auto; border:thin grey solid; padding: 0px 5px 0px 20px;"> before the table of contents (and a </div> after it).
Sometimes the table of contents in the text is complex or contains a lot of detail, resulting in a very long Index page. In these cases a simplified table of contents with links to the chapters may be created. For example, compare Index:History of england froude.djvu with Index:History of England (Froude) Vol 2.djvu.
Scan resolution in edit mode
This overrides the default calculated resolution for the thumbnail image displayed in edit mode for any Page: namespace page. For example a value of 1000 in this field will produce a thumbnail based at 1000 pixels.
Currently, some browsers will experience a phenomenon called 'black-nail'—short for an all black thumbnail being displayed in edit mode in error. Experimenting with this value should provide a solution for this by forcing a lower resolution than the automatically calculated default—a value anywhere in the range of 300 to 1600 typically works here.
Header
This parameter controls the header on each page in the Page namespace associated with the Index page. The header of each new page in the Page namespace will be pre-filled with whatever text has been entered in this parameter of the index page.
The header is mostly used for titles (of the book, chapter, article etc.) and page numbers—anything at the top of the page that should not be transcluded to the main namespace. If a common format is repeated throughout the work, it saves time to include all or part of the formatting and text in this parameter.
A commonly used formatting template for this parameter is {{RunningHeader}}. This template has three parameters of its own, which create left-, centre-, and right-justified text. The {{{pagenum}}} magic word is also useful. This will copy the number or text used for a page link in the pagelist parameter. This allows for automatically generated page numbers, assuming the pagelist parameter is correct. If the book contains sidenotes, you can include the {{sidenotes begin}} template in this field.
The header can be edited on individual pages in the Page namespace. Doing so will not affect the header parameter in the Index page nor any other page in the Page namespace. The header will never affect the work in the main namespace.
The drawback of using this field is that it doesn't take into account left and right headers and formats them all the same way. One way to cope with this is to put the page number at both sides of the running header template and delete the appropriate one when proofreading the page. An example of this technique can be found at Index:The Rover Boys on the Great Lakes.djvu: {{rh|{{{pagenum}}}|ROVER BOYS ON THE GREAT LAKES.|{{{pagenum}}}}} Alternatively, the {{rvh}} template may be used, which automatically detects whether the page number is odd or even.
Footer
Like the header, this parameter controls the default text in the footer of each page in the Page namespace associated with the Index page.
It is common for page numbers to be shown in the footers of pages. The wikitext {{{pagenum}}} can be used to simplify this process, along with the {{RunningHeader}} template. Moreover, the <references /> tag—optionally the {{reflist}} or {{smallrefs}} template—is used to display footnotes. If the book contains sidenotes, you can include the {{sidenotes end}} template in this field.
The footer can be edited on individual pages in the Page namespace. Doing so will not affect the footer parameter in the Index page nor any other page in the Page namespace. The footer will never affect the work in the main namespace.
Categories
This parameter is used for project management categories (e.g. Category:WikiProject NLS). Categories under Category:Works (e.g. those in Works by type, Works by genre, Works by subject, Works by country, etc.) or Category:Authors (e.g. those in Authors by type, Authors by nationality, Authors by occupation, etc.) should not be used on Index pages (belonging on mainspace transclusion pages and Author pages, respectively) and should be moved over when seen.

Using individual image files

[edit]

Index can be made out of JPEGs, PNGs and other image files as well as container formats of scans like PDF and DjVu. This would cover, for example, individual photographs of pages or non-print works such as inscriptions or plaques. Due to the extra complexity and other drawbacks of this process, this is not recommended for anything other than very short works: such as single pages or works of just 2-3 pages in length.

The process is similar to the normal Index page process, with the following exceptions:

  1. Creating the page. Create a new page in the Index namespace as you would in any other namespace. If this page involves only one image, it is a good idea to use the filename for the pagename. For example: File:Inscription.jpg leads to Index:Inscription.jpg. If this page involves multiple files, use a pagename that makes sense. If the filenames of the page images have a common element, it may make sense to use that; using the filetype is optional. For example: Index:1900 Conservative political pamphlet.
  2. Parameters. Some of the parameters will need to be entered manually.
    • Scans: This parameter is a drop down list of file types. Choose the type of file you are using. If this is not available in the list, choose "other".
    • Cover image: No automatic cover image will be generated. Instead of a page number, the image needs to be entered manually with the complete image code. For example: [[File:Inscription.jpg|200px]]. If using multiple pages, use either the first source image or the one that best corresponds to a "cover image" for the work.
    • Pages: No automatic pagelist will be generated. Instead of the <pagelist /> tag, each page needs to be added manually. Each page should be a wikilink to a specific page in the Page namespace, using the name of the source files in the File namespace (replacing the "File:" prefix with a "Page:" prefix). For example:
      1. One image: If using Index:Inscription.jpg (based on (File:Inscription.jpg) the wikilink should be [[Page:Inscription.jpg|1]].
      2. Multiple images: These should be added in sequence. If using, for example, Index:1900 Conservative political pamphlet (based on different images), the wikilinks should be along the lines of: [[Page:1900 Conservative political pamphlet page 1.jpg|1]] [[Page:1900 Conservative political pamphlet page 2.jpg|2]] etc.

Please note that individual image files do not contain OCR text layers like PDF and DjVu files (although TIFF files can contain text, they are not usable in this process). The OCR tool may be used to request ad hoc OCR of individual page images. Otherwise, it will be necessary to transcribe the entire text from the image.

Examples

[edit]

When creating an index page in this way, it can help to have other examples for reference. Therefore, the following may be useful.

Single pages:

Multiple pages:

Index talk pages

[edit]

As the transcriptions of our works are a team-effort; where there is a style of formatting used from the style guide; or certain templates used; or other information that the original contributor wishes to convey to assisting transcribers, we encourage such information to be added to the Index: talk page. To assist transcribers to know that such information is available the Index: page will display the text (with the link pointing to the talk page):


Proofreading and transclusion from the Index page

[edit]

Index pages are the focus of proofreading. Each page in the pagelist should be proofread and the progress parameter amended accordingly.

For more information, see:

Index page template

[edit]

The default layout of an index page is controlled by the Proofreadpage index template. Javascript must be enabled in the user's browser for the template to function. If javascript is disabled or not available, the user will just see the template itself in a normal edit window. It may be useful occasionally for a user to deliberately disable javascript in order to edit the template directly but this should be rare.

Tools

[edit]

On the index pages, there are four tools that can be utilised:

  • Book to scroll (icon ) that enables the file to be viewed in a scrollable format, rather than the typical page at a time
  • BookReader (icon ) provides a 1-up or 2-up scan viewer
  • Purge file tool (icon ) that enables the djvu or pdf layers of the file to be refreshed at Commons.