Wiktenauer logo.png

Wiktenauer:Tutorial/Transcribing

From Wiktenauer
< Wiktenauer:Tutorial
Revision as of 19:57, 21 April 2015 by Michael Chidester (talk | contribs) (→‎Guidelines)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Introduction   Uploading   Indexing   Transcribing   Proofreading   Validating   Translating   Publishing   Final Advice    

Transcribing treatises

Transcription is the foundation of the Wiktenauer's library, transforming raw scans of our texts into electronic documents that can be searched by computers and read more easily by humans. Once an Index page is created, the software will automatically fill in most of the relevant data for each Page and they will be ready for transcription.

For the sake of clarity, we'll be breaking down the process of readying a text for publication into several steps. If you wish, you can perform all of the tasks listed under the Transcribing, Proofreading, and Validating tabs in a single edit (though a second user will still be required to officially validate the page).

How to transcribe a page

Note: To get an idea about how this process works, it is a good idea to browse the Index page of an ongoing project.

  1. If you click on any of the page or folio numbers at the bottom of an Index page, you will see an image of that page side-by-side with a text field. The text field is usually blank, unless the image comes from a PDF containing a text layer.
    • If it is blank: write the text you see in the image into the text field. For printed matter, you can use an OCR program to speed up the process.
    • If it is not blank: Correct any errors you notice in the text in the text field so that it matches the text in the image.
  2. Preview your work, set the status at the bottom to "Not proofread" (which is red), then save.
    • If you are unable to transcribe the entire page for any reasons, set the status to "Problematic" (which is purple) so another user can find and finish it at a later time.
    • If you think you have created a detailed, polished transcription in one go, set the status to "Proofread" (which is yellow) for another user to validate. For details of what this entails, see the proofreading tutorial tab.
    • If the image has no discernible text, set the status to "Without text" (which is grey) and save.
  3. The page is now transcribed.
    • The message at the top of the page will be "This page needs to be proofread." This message will be highlighted in red.
    • For blank pages, the message at the top will be "This page does not need to be proofread." This message will be highlighted in grey.
    • When you look at the Index page, the page number will also be highlighted in red or grey.
  4. Repeat the process for every page in the scan.
Fig 1: Side-by-side page layout for transcription in the Page namespace

The side by side layout

When you view a page in the Page namespace, the screen will be split into two sections (fig 1). This is the side-by-side layout that allows users to transcribe and proofread the text on Wiktenauer (left section) by comparing it with the scanned text (right section). When you edit an existing page in the Page namespace, the current formatted text will appear above the editing area, and can be edited beneath using the same interface.

Once you've finished transcribing a scan, you can navigate to the next one or back to the Index using the arrow tabs at the top of the page.

Transcribing

To transcribe a page, you should fill in the text in the left section so that it matches the scan in the right section as much as possible. You do not have to make an identical, photographic copy of the scan, you should just try to get as close as possible. Wiktenauer is a website, not a book, and the text is more important than the calligraphy or typography. Some things work in books but do not work on Wiktenauer, and some manuscript conventions cannot be replicated easily in print at all. For example, columns of text are not necessary and do not work well on Wiktenauer; they should be ignored during transcription (the second column placed beneath the first and so on). Remember that several pages will typically be compiled together when the transcription is finally published in the mainspace. Things like columns will not be readable.

Page status

Screenshot from the Page namespace, showing the page status radio buttons along with surrounding features such as the summary field, the save button and the preview button.
Fig 2: Page status buttons

When you save the page, you should also set the page status. You should see a row of color-coded radio buttons just above the save button (fig 2). If you have just started a transcription but are unable to complete it, then select the purple button (for "Problematic"); you can finish it later, or a more experienced transcriber will see it and resolve the difficult part. If you have completely transcribed the page, then select the red button to indicate that it's ready to be proofread. The page status will be further updated by additional users during the proofreading process.

By convention, manuscripts for which we only have microfilm scans should not be promoted to "Proofread" or "Validated" until digital (color) scans become available. Exceptions may be made in the case of exceptionally large and clear scans. They should still be proofread and transcluded to the mainspace as laid out in subsequent sections, but their status should remain red and they should be proofread again when color scans come online.

Blank pages

Blank pages can be left blank and set to the "No text" (grey) page status. These pages will be ignored when pages are added to the mainspace. This includes book covers and illustrations (illustrations to be included in the final text will have their status changed during proofreading). If the illustration is unavailable at present, see Problematic pages.

Problematic pages

If you have a problem while transcribing a page and cannot finish it, you can set the page status to "Problematic" (purple). This will alert other transcribers that a problem exists, which they may be able to solve. Commons problems include pages with illustrations (if no image file is available), pages with equations, pages with difficult characters (especially text that does not use the Roman alphabet), and pages with very sloppy handwriting. In some of these cases, special templates exist to identify the problem. These are useful to anyone else looking at the page and they can attract the attention of people able to fix the problem.

Match and split

"Match & Split" is the term used by Wiktenauer to refer to the task of processing a completed transcription (or translation) that was published online or in print and later donated to the wiki for indexing. The process is much faster and easier than authoring an original transcription. Whenever possible, refer back to the original document during match & split rather than any existing version on the wiki (to avoid any errors that may have crept in).

  1. Match: Ensure that the transcription was created from the exact manuscript or print edition that you are currently indexing. If not, create a new index page for it (though it might still be useful as a reference for the current index).
  2. Split: Select the text from the transcription one page at a time and copy it into the relevant Page. Make sure you capture all special formatting included in the transcription as well as any footnotes. See the proofreading tutorial for instructions on footnotes.
  3. When saving the page, use your discretion about what page status to apply: a rough transcription (one that doesn't meet the standards on this page) should be submitted for proofreading (red), while a polished translation maybe ready for validation (yellow). If you're unsure, stick with red for the first pass.
    • All transcriptions extracted from 19th and 20th century works in the public domain should be marked red, due to the fact that the standards of transcription have changed over time and the work will probably need to be updated.

Make sure that the source of the transcription is properly documented in the sourcebox on the discussion page, since the original author's name won't appear in the page history.

Guidelines

Deletions and insertions
Code Text
text<sup> insertion </sup>text text insertion text
text<sub> insertion </sub>text text insertion text
{{dec|u|underline}} underline
{{dec|o|overline}} overline
{{dec|s|deletion}} deletion
{{dec|s|deletion}}<sup>replacement</sup> deletionreplacement

Do include

  • Text formatting, such as bold or italics (use '''bold''' or ''italics''; see the Cheatsheet for more information)
  • Rubrics—usually red text or blue text used for emphasis (use {{red|red text}} or {{blue|blue text}})
  • Under- and over-line (see the table)
  • Deletions and insertions in manuscripts, sub- and superscript in books (see the table)
  • Footnotes (see the cheatsheet)
  • Special characters, such as:
    • Dropped or raised initials
    • Capitalization; if the capital letters are the same size as the normal text, use {{smallcaps}}
    • Horizontal lines (use ---- for grey/black or <hr style="background: [color];"/>)
    • Section breaks (such as rows of asterisks: * * * * * )

Do not include

  • Different text sizes
  • Columns. The text columns should just continue from the previous column on the page
  • Corrected spellings. Many of the texts on Wiktenauer predate the concept of "correct" spelling; even if you're looking at a work written in a modern language, use the template {{SIC}} if you want to note an error
  • The <poem></poem> tag

Optional

  • Line breaks. Webpages will normally ignore single line-breaks, so text broken into different lines (common with scanned text and a convention for professional transcriptions) will be seen normally by a reader. Line breaks can sometimes cause problems, but using them is a matter for the individual transcriber. Do not use hard line breaks (such as with the <br/> tag) to delineate separate lines in the source unless you wrap them in <noinclude></noinclude> tags.
  • Numbered lines or verses. This academic practice can be a useful reference, but it will cause problems when publishing the transcription in the mainspace. If you're going to use them, wrap each number in <noinclude></noinclude> tags to avoid pulling them across. The line numbers must be recreated in the mainspace separately.
  • Advanced typography. Creating a page that looks like the original is nice, but the text itself is more important; some typography can be difficult to produce, and some can cause problems with the website.
  • Pages that are not part of the work itself, such as adverts, do not need to be transcribed or included in the main version. On the other hand, if a user wants to transcribe and include these pages, that is allowed.

Other common things to correct

  • Paragraph breaks. A blank line should be left between paragraphs, as standard for electronic and internet formatting.
  • Spaces before punctuation. These may be removed or left in, depending on the transcriber's preference, but this decision should be consistent throughout the transcription.


Continue the tutorial with proofreading