# Text structure

If the <teiHeader> element contained the meta-information about the document, then <text> element contains the document itself. Text element is mandatory and consist of any of these three elements.

  • the front matter <front> contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the beginning of the document, before the main body.
  • the text body <body> contains the whole body of a single unitary text, excluding any front or back matter.
  • the back matter <back> contains any appendixes, etc. following the main part of a text.

TIP

For front matter and back matter I would recommend visiting TEI guidelines (opens new window) as for mediaeval manuscripts <body> element is often the only element needed.

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <!-- Tei Header, see TEI Header chapter -->
  <text>
    <front>
      <!-- contains any prefatory matter -->
    </front>
    <body>
      <!-- contains the textual body of the document -->
    </body>
    <back>
      <!-- contains any appendixes -->
    </back>
  </text>
</TEI>

# Structural elements

There are two primary elements that should be used to structure and divide text.

# Divs

<div xml:id="div1" xml:lang="lat" hand="Hand_Unknown-1"></div>

Divs should be used to divide text to bigger chunks like chapters or whatever works for your text. They can be also freely nested. Attributes like xml:id and hand are optional. xml:id can be later used to link text division with translation, hand can be used to refer to the scribe that was defined in the <teiHeader> element and the <handDesc> section. Attribute xml:lang should be always present and must contain valid language code, see the table.

# Headings

<head xml:lang="lat" hand="Hand_Unknown-1"></head>

The head element should be used for any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. The xml:lang and hand attributes can be used to further describe the properties of the header. Language should be valid ISO-639 language abbreviation, see this list and hand should be linked to an existing hand ID in the <handDesc> in the <teiHeader>.

# Paragraphs

<p xml:id="p13" xml:lang="lat" hand="Hand_Unknown-1"></p>

Paragraph tag should be used to divide text into paragraphs. Unlike the divs the attribute xml:id is mandatory for paragraphs if you consider providing translation of the text. The xml:id attribute will be used to identify paragraphs across different language variants. The xml:lang attribute should contain valid language code, see the table. The hand attribute is again optional and can be used to point to a scriber that was defined in <teiHeader> section.

# Footnotes

<p>Paragraph tag should be used to divide text into paragraphs.<note type="footnote">For further information see TEIP5 Guidelines</note>Unlike the divs the attribute... </p>

For footnotes the note element should be used. The type attribute should be set to footnote and the content of the note element should be the footnote text.

# Page beginnings

<pb source="#E1S" n="f23v" facs="image.jpg" />

A pb element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the pb element itself. The source attribute should contain ID of the edition or source to which is the page beginning referring. The facs attribute can refer to the linked image file containing the folio.

# Line beginnings

<lb source="#E1S" n="1" break="no" />

By convention, lb elements should appear at the point in the text where a new line starts. The n attribute, if used, indicates the number or other value associated with the text between this point and the next lb element, typically the sequence number of the line within the page, or another appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the l element is available) except in circumstances where structural units cannot otherwise be marked. The break attribute indicate whether or not the element concerned is considered to mark the end of an orthographic token in the same way as whitespace.

break description
yes the element bearing this attribute is considered to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace
no the element bearing this attribute is considered not to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace
maybe the encoding does not take any position on this issue.

# Names

Almost every text contains names, whether they are the names of people, places or organizations. These names should be encoded and for that purpose a variety of elements can be used.

# Universal name element

<name type="place" ref="#Place_Prague">Prague</name>

Contains a proper noun or noun phrase.

Types Description
person Specifies that the name is persons name.
place The name contains name of the place.
org Name of the organisation.

Names can also refer to predefined people or places with ref attribute.

# Person name

<persName ref="#Person_A2E">Aragorn</persName>

This element should be used only for a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. persName is an equivalent to <name type="person"></name>. Additional level of description can be achieved by using forename, surname and other elements, see marking up people as these tags are same. The ref attribute should be used to link the person to the listPerson in the teiHeader.

# Place name

<placeName ref="#UniFr">Université de Fribourg</placeName>

It should contain an absolute or relative place name. The placeName element is a equivalent to <name type="place"></name>. Additional level of description can be achieved in a similar fashion as with the person, see describing places. The ref attribute should be used to link the place to the listPlace in the teiHeader.

# Organization name

<orgName ref="#UniFr">Université de Fribourg</orgName>

It should contain an organization name. The orgName element is a equivalent to <name type="org"></name>. Additional level of description can be achieved in a similar fashion as with the person and place, see list of organizations. The ref attribute should be used to link the place to the listOrg in the teiHeader.

Do not duplicate!

As all three forementioned elements can be described with additional elements in similar fashion as the items in lists (listPerson, listPlace, listOrg) do not be tempted to duplicate the infomration. These additional elements should be only used to achieve higher level of detail in the markup if the information is present in the original text.

# Dates and measures

# Dates

<date when="1230-12-31">31st December 1230</date>

Dates in text should be marked with element date and with set attribute when describing the date in a standard format, e.g. yyyy-mm-dd. When the exact date is unknown and only year or year and month is known the unknown part can be omitted. There are also other attributes which can be used to describe the date:

Attribute Description
when supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
notBefore specifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.
notAfter specifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.
from indicates the starting point of the period in standard form, e.g. yyyy-mm-dd.
to indicates the ending point of the period in standard form, e.g. yyyy-mm-dd.

TIP

When from, to is used when can be omitted, this applies also for notBefore and notAfter. It is also expected when notBefore or from is used that notAfter and to will be used.

# Measures

<measure type="currency" units="gold">12 gold</measure>

Measures should be used whenever a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name is found in text. The attribute type should be used to further describe type of the measure. Units can be also described within the attribute units

Type Description
currency indicates that the measure is a valid currency
volume indicates that the masure is a volume of some object
height the measure describes height of something
width the measure describes width of somethin
weight the measure describes weight of something or someone
depth specifies the depth of something
area describe the size of the area
time the measure is time

# Citation

During the encoding process you will find passages of text that are taken from other sources. It is important to identify them and categorize them.

# Cited quotation

<cit type="ascribed literal">
  <bibl>
    <!-- Author's name -->
  </bibl>
  <quote>
    <!-- the actual citation -->
  </quote>
  <ref cRef="Gn 1:1">
    <!-- link to the Biblical source and optional description -->
  </ref>
</cit>

Contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.

Type Description
ascribed If the citation is ascribed to someone
biblie Should be used only for biblical citations
example If the citation serves as an example
literal Citation is literal
paraphrase The text was paraphrased

TIP

Types can be combined as can be seen in the example above. It is important to keep the space between each type.

# Quote

<quote>In principio creavit Deus celum et terram.</quote>

The only mandatory element in the citation part. It must contain a phrase or passage attributed by the narrator or author to some agency external to the text.

Important

It may happen that a reference is made without actually quoting it. Since this is not actual quotation, we should avoid using quote element, but ref and bibl elements can still be used.

See the example

 



  Augustine says he stole pears in a garden when he was a young man
  <ref target="#AugConf">
    <bibl type="source">Aug., <title>Confessiones</title>, II, 4</bibl>
  </ref>

Note that quote and cit elements were omitted and ref was kept.

# Reference

<ref target="#BernEpist"></ref>
<!-- Or when refering to Bible -->
<ref cRef="Gn 1:1"></ref>

Defines a reference to another location, possibly modified by additional text or comment. It can be used as an self-closing element <ref /> if no further description is provided. As can be seen in the example above, there are also two different uses of the tag. The cRef is mutually exclusive with target and should be used only if is referred to the Bible. In all other cases target attribute should be used.

# Biblical citations


 


Unde in Genesim: <cit type="bible">
<quote>In principio creavit Deus celum et terram.</quote>
<ref cRef="Gn 1:1" decls="#biblicalCitations">This is a literal quotation of the first verse of the Bible</ref>
</cit>

As can be seen the whole citation is inside the cit element with attribute type set to the Bible. The quotation itself is inside the quote element and followed by a reference linking to the Genesis with a further description.

TIP

In case of a biblical citations with descriptions were used, following snippet should be also included in <encodingDesc> inside the <teiHeader> section, see tei header.

Click to view the snippet
<refsDecl xml:id="biblicalCitations">
  <cRefPattern matchPattern="(.+) (.+):(.+)" replacementPattern="http://vulsearch.sourceforge.net/html/$1.html#x$2_$3">
    <p>This pointer pattern extracts and references the
    <q>book,</q> <q>chapter,</q> and <q>verse</q> parts of a biblical
    reference pointing to a single verse, like “Gn 1:1”, and
    reconstructs a link to an online version of the biblical
    text.</p>
  </cRefPattern>
</refsDecl>
# Other sources



 
 


Bernardus: <cit type="ascribed">
<quote>Auferatur malus ne generet malos. Non potest arbor mala
fructus nisi malos facere.</quote>
<ref target="#bernEpist"><bibl><author>Bern.</author>,
<title>Epist.</title>, 102 (VIII, 257-8)</bibl></ref>
</cit>

For other sources above example should be used. Note that ref element does contain other more specifying elements like title, author and bible. These elements were also used for bibliography, see. The target attribute is also refering to the unique ID used in the bibliographic section of our document.

# Bibliographic citation

<bibl></bible>

As was described in the example above, this element is used to contain a loosely-structured bibliographic citation with sub-components like author, title and so on, see bibliography for more information.

# Glosses and segments

# Glosses

<gloss xml:lang="czo">vyek</gloss>

Glosses should be marked with gloss element with language description in the xml:lang attribute if is different than the original text.

Glosses and additions

Consider wrapping the glosses within an addition if they're not part of the main text, because only add element can describe placing of the gloss.

Omnis etas <add place="above"><gloss xml:lang="czo">vyek</gloss></add>

# Segments

<seg xml:id="seg1">mensuras</seg>
<!-- later when described -->
<p corresp="#seg1">
  Attende hic discretissimum et notabile...
</p>

The seg element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a ptr or other similar element. The seg element should always contain attribute xml:id which identifies the segment and can be later used to link the segment with proper description. The description itself should be in different div element and each description should have it's own paragraph p those paragraphs are linked with segments via corresp attribute that contains the ID of the segment.

# Different readings

Many digital editions are based on number of sources of the same text. Thus, it is often needed to differentiate between various readings and keep track of them. The set of elements that can be used for that purpose will follow:

# Apparatus entry

<app >
  <lem><!-- lemma --></lem>
  <rdg><!-- reading --></rdg>
</app>

Contains one entry in a critical apparatus, with an optional lemma and usually one or more readings or notes on the relevant passage.

# Lemma

<lem wit="#A17">
  Equo baio sedere: expeditionem significat.
</lem>

Contains the lemma, or base text, of a textual variation. This element is optional as the apparatus entry can contain only readings without lemma suggestion. The attribute wit should be used to identify the source in the list of witnesses listWit, see witness description.

# Readings

<rdg wit="#C72 #D109">
  Equos rufos vel baios habere: bonum nuntium significat.
</rdg>

Contains a single reading within a textual variation. Likewise the lemma element the rdg element should contain attribute wit that will point to the list of witnesses.

TIP

Note that IDs of different sources with the same reading are separated withspace.

Readings can contain additions, corrections and deletions, all these topics are further described on the Text corrections page.

# Lacuna




 



<app>
  <lem wit="#El #Hg">Auctoritee</lem>
  <rdg wit="#La #Ra2 #X">
    <lacunaEnd wit="#X"/>auctorite
  </rdg>
</app>

If a witness is incomplete (whether a single fragment, a series of fragments, or a relatively complete text with one or more lacunae), it is usually desirable to record explicitly where its preserved portions begin and end. The following empty tags, which may occur within any lem or rdg element, indicate the beginning or end of a fragmentary witness or of a lacuna within a witness. In the example above the text of the exemplar X starts with auctorite, thus the lacuna ends there.

# Lacuna start

<lacunaStart wit="#X" />

Indicates the beginning of a lacuna in the text of a mostly complete textual witness.

# Lacuna end

<lacunaEnd wit="#X" />

Indicates the end of a lacuna in a mostly complete textual witness.