# Text structure
If the <teiHeader>
element contained the meta-information about the document, then <text>
element contains the document itself.
Text element is mandatory and consist of any of these three elements.
- the front matter
<front>
contains any prefatory matter (headers, abstracts, title page, prefaces, dedications, etc.) found at the beginning of the document, before the main body. - the text body
<body>
contains the whole body of a single unitary text, excluding any front or back matter. - the back matter
<back>
contains any appendixes, etc. following the main part of a text.
TIP
For front matter and back matter I would recommend visiting TEI guidelines (opens new window) as for mediaeval manuscripts <body>
element is often the only element needed.
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<!-- Tei Header, see TEI Header chapter -->
<text>
<front>
<!-- contains any prefatory matter -->
</front>
<body>
<!-- contains the textual body of the document -->
</body>
<back>
<!-- contains any appendixes -->
</back>
</text>
</TEI>
# Structural elements
There are two primary elements that should be used to structure and divide text.
# Divs
<div xml:id="div1" xml:lang="lat" hand="Hand_Unknown-1"></div>
Div
s should be used to divide text to bigger chunks like chapters or whatever works for your text. They can be also freely nested.
Attributes like xml:id
and hand
are optional. xml:id
can be later used to link text division with translation, hand
can be
used to refer to the scribe that was defined in the <teiHeader>
element and the <handDesc>
section. Attribute xml:lang
should be always present and must
contain valid language code, see the table.
# Headings
<head xml:lang="lat" hand="Hand_Unknown-1"></head>
The head
element should be used for any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. The xml:lang
and hand
attributes can be used to further describe the properties of the header. Language should be valid ISO-639 language abbreviation, see this list and hand
should be linked to an existing hand ID in the <handDesc>
in the <teiHeader>
.
# Paragraphs
<p xml:id="p13" xml:lang="lat" hand="Hand_Unknown-1"></p>
Paragraph tag should be used to divide text into paragraphs. Unlike the divs the attribute xml:id
is mandatory for paragraphs if you consider
providing translation of the text. The xml:id
attribute will be used to identify paragraphs across different language variants. The xml:lang
attribute should contain valid language code, see the table. The hand
attribute is again optional and can be used to point
to a scriber that was defined in <teiHeader>
section.
# Footnotes
<p>Paragraph tag should be used to divide text into paragraphs.<note type="footnote">For further information see TEIP5 Guidelines</note>Unlike the divs the attribute... </p>
For footnotes the note
element should be used. The type
attribute should be set to footnote
and the content of the note
element should be the footnote text.
# Page beginnings
<pb source="#E1S" n="f23v" facs="image.jpg" />
A pb element should appear at the start of the page which it identifies. The global n
attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the pb element itself. The source
attribute should contain ID of the edition or source to which is the page beginning referring. The facs
attribute can refer to the linked image file containing the folio.
# Line beginnings
<lb source="#E1S" n="1" break="no" />
By convention, lb
elements should appear at the point in the text where a new line starts. The n
attribute, if used, indicates the number or other value associated with the text between this point and the next lb element, typically the sequence number of the line within the page, or another appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the l element is available) except in circumstances where structural units cannot otherwise be marked. The break
attribute indicate whether or not the element concerned is considered to mark the end of an orthographic token in the same way as whitespace.
break | description |
---|---|
yes | the element bearing this attribute is considered to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace |
no | the element bearing this attribute is considered not to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace |
maybe | the encoding does not take any position on this issue. |
# Names
Almost every text contains names, whether they are the names of people, places or organizations. These names should be encoded and for that purpose a variety of elements can be used.
# Universal name element
<name type="place" ref="#Place_Prague">Prague</name>
Contains a proper noun or noun phrase.
Types | Description |
---|---|
person | Specifies that the name is persons name. |
place | The name contains name of the place. |
org | Name of the organisation. |
Names can also refer to predefined people or places with ref
attribute.
# Person name
<persName ref="#Person_A2E">Aragorn</persName>
This element should be used only for a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. persName
is an equivalent to <name type="person"></name>
. Additional level of description can be achieved by using forename
, surname
and other elements, see marking up people as these tags are same. The ref
attribute should be used to link the person to the listPerson
in the teiHeader
.
# Place name
<placeName ref="#UniFr">Université de Fribourg</placeName>
It should contain an absolute or relative place name. The placeName
element is a equivalent to <name type="place"></name>
. Additional level of description can be achieved in a similar fashion as with the person, see describing places. The ref
attribute should be used to link the place to the listPlace
in the teiHeader
.
# Organization name
<orgName ref="#UniFr">Université de Fribourg</orgName>
It should contain an organization name. The orgName
element is a equivalent to <name type="org"></name>
. Additional level of description can be achieved in a similar fashion as with the person and place, see list of organizations. The ref
attribute should be used to link the place to the listOrg
in the teiHeader
.
Do not duplicate!
As all three forementioned elements can be described with additional elements in similar fashion as the items in lists (listPerson
, listPlace
, listOrg
) do not be tempted to duplicate the infomration. These additional elements should be only used to achieve higher level of detail in the markup if the information is present in the original text.
# Dates and measures
# Dates
<date when="1230-12-31">31st December 1230</date>
Dates in text should be marked with element date
and with set attribute when
describing the date in a standard format, e.g. yyyy-mm-dd. When the exact date is unknown and only year or year and month is known the unknown part can be omitted. There are also other attributes which can be used to describe the date:
Attribute | Description |
---|---|
when | supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd. |
notBefore | specifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd. |
notAfter | specifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd. |
from | indicates the starting point of the period in standard form, e.g. yyyy-mm-dd. |
to | indicates the ending point of the period in standard form, e.g. yyyy-mm-dd. |
TIP
When from
, to
is used when
can be omitted, this applies also for notBefore
and notAfter
. It is also expected when notBefore
or from
is used that notAfter
and to
will be used.
# Measures
<measure type="currency" units="gold">12 gold</measure>
Measures should be used whenever a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name is found in text. The attribute type
should be used to further describe type of the measure. Units can be also described within the attribute units
Type | Description |
---|---|
currency | indicates that the measure is a valid currency |
volume | indicates that the masure is a volume of some object |
height | the measure describes height of something |
width | the measure describes width of somethin |
weight | the measure describes weight of something or someone |
depth | specifies the depth of something |
area | describe the size of the area |
time | the measure is time |
# Citation
During the encoding process you will find passages of text that are taken from other sources. It is important to identify them and categorize them.
# Cited quotation
<cit type="ascribed literal">
<bibl>
<!-- Author's name -->
</bibl>
<quote>
<!-- the actual citation -->
</quote>
<ref cRef="Gn 1:1">
<!-- link to the Biblical source and optional description -->
</ref>
</cit>
Contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
Type | Description |
---|---|
ascribed | If the citation is ascribed to someone |
biblie | Should be used only for biblical citations |
example | If the citation serves as an example |
literal | Citation is literal |
paraphrase | The text was paraphrased |
TIP
Types can be combined as can be seen in the example above. It is important to keep the space between each type.
# Quote
<quote>In principio creavit Deus celum et terram.</quote>
The only mandatory element in the citation part. It must contain a phrase or passage attributed by the narrator or author to some agency external to the text.
Important
It may happen that a reference is made without actually quoting it. Since this is not actual quotation, we should avoid
using quote
element, but ref
and bibl
elements can still be used.
See the example
Augustine says he stole pears in a garden when he was a young man
<ref target="#AugConf">
<bibl type="source">Aug., <title>Confessiones</title>, II, 4</bibl>
</ref>
Note that quote
and cit
elements were omitted and ref
was kept.
# Reference
<ref target="#BernEpist"></ref>
<!-- Or when refering to Bible -->
<ref cRef="Gn 1:1"></ref>
Defines a reference to another location, possibly modified by additional text or comment. It can be used as an
self-closing element <ref />
if no further description is provided. As can be seen in the example above, there
are also two different uses of the tag. The cRef
is mutually exclusive with target
and should be used only if
is referred to the Bible. In all other cases target
attribute should be used.
# Biblical citations
Unde in Genesim: <cit type="bible">
<quote>In principio creavit Deus celum et terram.</quote>
<ref cRef="Gn 1:1" decls="#biblicalCitations">This is a literal quotation of the first verse of the Bible</ref>
</cit>
As can be seen the whole citation is inside the cit
element with attribute type
set to the Bible. The quotation itself
is inside the quote
element and followed by a reference linking to the Genesis with a further description.
TIP
In case of a biblical citations with descriptions were used, following snippet should be also included in <encodingDesc>
inside the
<teiHeader>
section, see tei header.
Click to view the snippet
<refsDecl xml:id="biblicalCitations">
<cRefPattern matchPattern="(.+) (.+):(.+)" replacementPattern="http://vulsearch.sourceforge.net/html/$1.html#x$2_$3">
<p>This pointer pattern extracts and references the
<q>book,</q> <q>chapter,</q> and <q>verse</q> parts of a biblical
reference pointing to a single verse, like “Gn 1:1”, and
reconstructs a link to an online version of the biblical
text.</p>
</cRefPattern>
</refsDecl>
# Other sources
Bernardus: <cit type="ascribed">
<quote>Auferatur malus ne generet malos. Non potest arbor mala
fructus nisi malos facere.</quote>
<ref target="#bernEpist"><bibl><author>Bern.</author>,
<title>Epist.</title>, 102 (VIII, 257-8)</bibl></ref>
</cit>
For other sources above example should be used. Note that ref
element does contain other more specifying elements like
title, author and bible. These elements were also used for bibliography, see. The target
attribute
is also refering to the unique ID used in the bibliographic section of our document.
# Bibliographic citation
<bibl></bible>
As was described in the example above, this element is used to contain a loosely-structured bibliographic citation with
sub-components like author
, title
and so on, see bibliography for more information.
# Glosses and segments
# Glosses
<gloss xml:lang="czo">vyek</gloss>
Glosses should be marked with gloss
element with language description in the xml:lang
attribute if is different than the original text.
Glosses and additions
Consider wrapping the glosses within an addition if they're not part of the main text, because only add
element can describe placing of the gloss.
Omnis etas <add place="above"><gloss xml:lang="czo">vyek</gloss></add>
# Segments
<seg xml:id="seg1">mensuras</seg>
<!-- later when described -->
<p corresp="#seg1">
Attende hic discretissimum et notabile...
</p>
The seg
element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a ptr or other similar element. The seg
element should always contain attribute xml:id
which identifies the segment and can be later used to link the segment with proper description. The description itself should be in different div
element and each description should have it's own paragraph p
those paragraphs are linked with segments via corresp
attribute that contains the ID of the segment.
# Different readings
Many digital editions are based on number of sources of the same text. Thus, it is often needed to differentiate between various readings and keep track of them. The set of elements that can be used for that purpose will follow:
# Apparatus entry
<app >
<lem><!-- lemma --></lem>
<rdg><!-- reading --></rdg>
</app>
Contains one entry in a critical apparatus, with an optional lemma and usually one or more readings or notes on the relevant passage.
# Lemma
<lem wit="#A17">
Equo baio sedere: expeditionem significat.
</lem>
Contains the lemma, or base text, of a textual variation. This element is optional as the apparatus entry can contain only readings without lemma suggestion.
The attribute wit
should be used to identify the source in the list of witnesses listWit
, see witness description.
# Readings
<rdg wit="#C72 #D109">
Equos rufos vel baios habere: bonum nuntium significat.
</rdg>
Contains a single reading within a textual variation. Likewise the lemma
element the rdg
element should contain attribute
wit
that will point to the list of witnesses.
TIP
Note that IDs of different sources with the same reading are separated withspace.
Readings can contain additions, corrections and deletions, all these topics are further described on the Text corrections page.
# Lacuna
<app>
<lem wit="#El #Hg">Auctoritee</lem>
<rdg wit="#La #Ra2 #X">
<lacunaEnd wit="#X"/>auctorite
</rdg>
</app>
If a witness is incomplete (whether a single fragment, a series of fragments, or a relatively complete text with one or more lacunae), it is usually desirable to record explicitly where its preserved portions begin and end. The following empty tags, which may occur within any lem or rdg element, indicate the beginning or end of a fragmentary witness or of a lacuna within a witness. In the example above the text of the exemplar X starts with auctorite, thus the lacuna ends there.
# Lacuna start
<lacunaStart wit="#X" />
Indicates the beginning of a lacuna in the text of a mostly complete textual witness.
# Lacuna end
<lacunaEnd wit="#X" />
Indicates the end of a lacuna in a mostly complete textual witness.