TEI TEIガイドラインP5: 9 Dictionaries

9 Dictionaries

Contents

This chapter defines a module for encoding human-orientedmonolingual and multilingual dictionaries which may also beuseful for computational lexica intended for use bylanguage-processing software. Dictionaries are most familiar in theirprinted form; however, increasing numbers of dictionaries exist alsoin electronic forms which are independent of any particular printedform, but from which various displays can be produced.

Both typographically and structurally, dictionaries are extremelycomplex. In addition, dictionaries are of interest to many communities withdifferent and sometimes conflicting goals. As a result, many generalproblems of text encoding are particularly pronounced here, and morecompromises and alternatives within the encoding scheme may berequired in future.²⁹ Two problems are particularly prominent.

First, because the structure of dictionary entries varies widelyboth among and within dictionaries, the simplest way for an encodingscheme to accommodate the entire range of structures actuallyencountered is to allow virtually any element to appear virtuallyanywhere in a dictionary entry. It is clear, however, that strong andconsistent structural principles do govern the vast majority ofconventional dictionaries, as well as many or most entries even inmore ‘exotic’ dictionaries; encoding guidelines should include these structural principles. Wetherefore define two distinct elements for dictionary entries, one(entry) which captures the regularities of many conventionaldictionary entries, and a second (entryFree) which uses thesame elements, but allows them to combine much more freely. It ishowever recommended that entry be used in preference toentryFree wherever possible.These elements and their contents are described in sections 9.2 The Structure of Dictionary Entries, 9.6 Unstructured Entries, and 9.4 Headword and Pronunciation References.

Second, since so much of the information in printed dictionaries isimplicit or highly compressed, their encoding requires clear thoughtabout whether it is to capture the precise typographic form of thesource text or the underlying structure of the information itpresents. Since both of these views of the dictionary may be ofinterest, it proves necessary to develop methods of recording both,and of recording the interrelationship between them as well. Usersinterested mainly in the printed format of the dictionary will requirean encoding to be faithful to an original printed version. However,other users will be interested primarily in capturing the lexicalinformation in a dictionary in a form suitable for further processing,which may demand the expansion or rearrangement of the informationcontained in the printed form. Further, some users wish to encodeboth of these views of the data, and retain the linksbetween related elements of the two encodings. Problems of recordingthese two different views of dictionary data are discussed in section9.5 Typographic and Lexical Information in Dictionary Data, together with mechanisms for retaining bothviews when this is desired.

To deal with this complexity, and in particular to account for thewide variety of linguistic context within which a dictionary may bedesigned, it can be necessary to customize or change the schema byproviding more restriction or possibly alternate content models forthe elements defined in this chapter. Section 9.3.2 Grammatical Informationillustrates this with the provision of a closed set of values forgrammatical descriptors.

This chapter contains a large number of examples taken fromexisting dictionaries; in each case, the original source isidentified. In presenting such examples, we have tried to retain theoriginal typographic appearance of the example as well as presenting asuggested encoding for it. Where this has not been possible (forexample in the display of pronounciation) we have adopted thetransliteration found in the electronic edition of the OxfordAdvanced Learner's Dictionary. Also, the middle dot in quotedentries is rendered with a full stop, while within the sampletranscriptions hyphenation and syllabification points are indicated bya vertical bar |, regardless of their appearance in the sourcetext.

9.1 Dictionary Body and Overall StructureTEI: Dictionary Body and Overall Structure¶

Overall, dictionaries have the same structure of front matter,body, and back matter familiar from other texts. In addition,this modules defines entry, entryFree,and superEntry as component-level elements which can occurdirectly within a text division or the text body.

The following tags can therefore be used to mark the gross structure of aprinted dictionary; the dictionary-specific tags are discussed furtherin the following section.

text contains a single text of any kind, whether unitary orcomposite, for example a poem or drama, a collection of essays, a novel,a dictionary, or a corpus sample.
front (front matter) contains any prefatory matter (headers,title page, prefaces, dedications, etc.)found at the start of a document, before the main body.
body (text body) contains the whole body of a single unitary text, excluding any front or back matter.
back (back matter) contains any appendixes, etc. following the main part of a text.
div (text division) contains a subdivision of the front, body, or back of atext.
entry contains a reasonably well-structured dictionary entry.
entryFree (unstructured entry) contains a dictionary entry which does not necessarily conform to the constraints imposed by the entry element.
superEntry groups successive entries for a set of homographs.

As members of the class att.entryLike,entry and entryFree share the followingattributes:

att.entryLike groups the different styles of dictionary entries.
type indicates type of entry, in dictionaries with multiple types.
sortKey contains a (sortable) character sequence reflecting the entry's alphabetical position in the printed dictionary.

The front and back matter of a dictionary may well containspecialized material such as lists of common and proper nouns,grammatical tables, gazetteers, a ‘guide to the use of thedictionary’, etc. These should be tagged using elementsdefined elsewhere in these Guidelines, chiefly in the core module(chapter 3 コアモジュール) together with the specialized dictionaryelements defined in this chapter.

The body element consists of a set ofentries, optionally grouped into one or severaldiv elements. These text divisions might correspond, forexample, sections for different letters of the alphabet, or tosections for different languages in bilingual dictionaries, etc. Inprint dictionaries, entries are typically typographically distinctentities, each headed by some morphological form of the lexical itemdescribed (the headword), and sorted in alphabeticalorder or (especially for non-alphabetic scripts) in some otherconventional sequence. Dictionary entries should be encoded asdistinct successive items, each marked as an entry orentryFree element. The type attribute may be usedto distinguish different types of entries, for example main entries,related entries, run-on entries, or entries for cross-references,etc.

Some dictionaries provide distinct entries for homographs, on thebasis of etymology, part-of-speech, or both, and typically provide anumeric superscript on the headword identifying the homographnumber. In these cases each homograph should be encoded as a separateentry; the superEntry element may optionally be used to groupsuch successive homograph entries. In addition to a series ofentry elements, the superEntry may contain apreliminary form group (see section 9.3.1 Information on Written and Spoken Forms)when information about hyphenation, pronunciation, etc., is given onlyonce for two or more homograph entries. If the homograph number is tobe recorded, the global attribute n may be used for thispurpose. In some dictionaries, homographs are treated in distinctparts of the same entry; in these cases, they may be separated by useof the hom element, for which see section 9.2.1 Hierarchical Levels.

A sort key, given in the key attribute, is oftenrequired for superentries and entries, especially in cases where theorder of entries does not follow the local character-set collatingsequence (as, for example, when an entry for ‘3D’ appears at theplace where ‘three-D’ would appear).

The body of a bilingual dictionary with two parts will thus have anoverall structure resembling the following:

A dictionary with no internal divisions might have a structure likethe following; a superEntry is shown grouping two homographentries.

9.2 The Structure of Dictionary EntriesTEI: The Structure of Dictionary Entries¶

A simple dictionary entry may contain information about the form ofthe word treated, its grammatical characterization, its definition,synonyms, or translation equivalents, its etymology, cross-referencesto other entries, usage information, and examples. These we refer toas the constituent parts or constituents ofthe entry; some dictionary constituents possess no internal structure,while others are most naturally viewed as groups of smaller elements,which may be marked in their own right. In some styles of markup,tags will be applied only to the low-level items, leaving theconstituent groups which contain them untagged. We distinguish theclass of top-level constituents of dictionary entries,which can occur directly within entries, from the class ofphrase-level constituents, which can normally occur onlywithin top-level constituents. The top-level constituents ofdictionary entries are described in section 9.2.2 Groups and Constituents,and documented more fully, together with their phrase-levelsub-constituents, in section 9.3 Top-level Constituents of Entries.

In addition, however, dictionary entries often have a complexhierarchical structure. For example, an entry may consist of two ormore sub-parts, each corresponding to information for a differentpart-of-speech homograph of the headword. The entry (or part-of-speechhomographs, if the entry is split this way) may also consist ofsenses, each of which may in turn be composed of two or moresub-senses, etc. Each sub-part, homograph entry, sense, or sub-sensewe call a level; at any level in an entry, any or all ofthe constituent parts of dictionary entries may appear. Thehierarchical levels of dictionary entries are documented in section9.2.1 Hierarchical Levels.

» 9.2.2 Groups and Constituents
Home | 目次

9.2.1 Hierarchical LevelsTEI: Hierarchical Levels¶

The outermost structural level of an entry is marked with theelements entry or entryFree. The homelement marks the subdivision of entries into homographs differing intheir part-of-speech. The sense element marks the subdivisionof entries and part-of-speech homographs into senses; this elementnests recursively in order to provide for a hierarchy of sub-senses ofany depth. All of these levels may each contain any of theconstituent parts of an entry. A special case of hierarchicalstructure is represented by the re (related entry) element,which is discussed in section 9.3.6 Related Entries. Finally, theelement dictScrap may be used at any point in the hierarchyto delimit parts of the dictionary entry which are structurallyanomalous, as further discussed in section 9.6 Unstructured Entries.

entry contains a reasonably well-structured dictionary entry.
entryFree (unstructured entry) contains a dictionary entry which does not necessarily conform to the constraints imposed by the entry element.
hom (homograph) groups information relating to one homograph within an entry.
sense groups together all information relating to one word sense in a dictionary entry, for example definitions, examples, and translation equivalents.
level gives the nesting depth of this sense.
dictScrap (dictionary scrap) encloses a part of a dictionary entry in which other phrase-level dictionary elements are freely combined.

For example, an entry with two senses will have the following structure:

An entry with two homographs, the first with two senses and the second with three(one of which has two sub-senses), may have a structure like this:

In some dictionaries, homographs have separate entries; insuch a case, as noted in section 9.1 Dictionary Body and Overall Structure, the two homographs may betreated as entries, optionally grouped in a superEntry:

The hierarchical levels of dictionary entries are declared as shown in the followingschema fragment. As may be seen, the content model for entry specifies thatentries do not nest, that homographs nest within entries, and that senses nestwithin entries, homographs, or senses, and may be nested to any depth to reflect theembedding of sub-senses. Any of the top-level constituents (def,usg, form, etc.) can appear at any level (i.e., within entries,homographs, or senses).

« 9.2.1 Hierarchical Levels
Home | 目次

9.2.2 Groups and ConstituentsTEI: Groups and Constituents¶

As noted above, dictionary entries, and subordinate levels within dictionary entries,may comprise several constituent parts, each providing a different type ofinformation about the word treated. The top-level constituents ofdictionary entries are:

information about the form of the word treated (orthography,pronunciation, hyphenation, etc.)
grammatical information (part of speech, grammatical sub-categorization,etc.)
definitions or translations into another language
etymology
examples
usage information
cross-references to other entries
notes
entries (often of reduced form) for related words, typically calledrelated entries

Any of the hierarchical levels (entry, entryFree,hom, and sense) may contain any of these top-level constituents, sinceinformation about word form, particular grammatical information, specialpronunciation, usage information, etc., may apply to an entire entry, or to only onehomograph, or only to a particular sense. The examples below illustrate this point.

The following elements are used to encode these top-level constituents:

form (form information group) groups all the information on the written and spoken forms of one headword.
gramGrp (grammatical information group) groups morpho-syntactic information about a lexical item, e.g. pos, gen, number, case, or iType (inflectional class).
def (definition) contains definition text in a dictionary entry.
cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
usg (usage) contains usage information in a dictionary entry.
xr (cross-reference phrase) contains a phrase, sentence, or icon referring the reader tosome other location in this or another text.
etym (etymology) encloses the etymological information in a dictionary entry.
re (related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry.
note contains a note or annotation.

In a simple entry with no internal hierarchy, all top-level constituents appear atthe entry level.

com.peti.tor/k@m"petit@(r)/ n person who competes. OALD

<entry>
<form>
  <orth>competitor</orth>
  <hyph>com|peti|tor</hyph>
  <pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<def>person who competes.</def>
</entry>

For the elements which appear within the form and gramGrpelements of this example, see below, section 9.3.1 Information on Written and Spoken Forms, and section9.3.2 Grammatical Information.

Any top-level constituent can appear at any level when thehierarchical structure of the entry is more complex. The most obvious examples aredef and cit, which appear at the sense level whenseveral senses or translations exist:

disproof(dIs"pru:f) n. 1. facts that disprove something. 2. theact of disproving. CED

<entry>
<form>
  <orth>disproof</orth>
  <pron>dIs"pru:f</pron>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<sense n="1">
  <def>facts that disprove something.</def>
</sense>
<sense n="2">
  <def>the act of disproving.</def>
</sense>
</entry>

In the following example, gramGrp is used to distinguish two homographs:

bray/breI/ n cry of an ass; sound of a trumpet. ∙ vt [VP2A]make a cry or sound of this kind. OALD

<entry>
<form>
  <orth>bray</orth>
  <pron>breI</pron>
</form>
<hom>
  <gramGrp>
   <pos>n</pos>
  </gramGrp>
  <def>cry of an ass; sound of a trumpet.</def>
</hom>
<hom>
  <gramGrp>
   <pos>vt</pos>
   <subc>VP2A</subc>
  </gramGrp>
  <def>make a cry or sound of this kind.</def>
</hom>
</entry>

Information of the same kind can appear at different levels within the same entry;here, grammatical information occurs both at entry and homograph level.

ca.reen/k@"ri:n/ vt,vi 1 [VP6A] turn (a ship) on one side forcleaning, repairing, etc. 2 [VP6A, 2A] (cause to) tilt, lean over to one side.OALD

<entry>
<form>
  <orth>careen</orth>
  <hyph>ca|reen</hyph>
  <pron>k@"ri:n</pron>
</form>
<gramGrp>
  <pos>vt</pos>
  <pos>vi</pos>
</gramGrp>
<sense n="1">
  <gramGrp>
   <subc>VP6A</subc>
  </gramGrp>
  <def>turn (a ship) on one side for cleaning, repairing, etc.</def>
</sense>
<sense n="2">
  <gramGrp>
   <subc>VP6A</subc>
   <subc>VP2A</subc>
  </gramGrp>
  <def>(cause to) tilt, lean over to one side.</def>
</sense>
</entry>

Alone among the constituent groups, form can appear at thesuperEntry level as well as at the entry, hom, andsense levels:

a.ban.don 1/@"band@n/ v [T1] 1 to leave completely and for ever; desert: The sailors abandoned theburning ship. 2 …abandon 2 n [U] the state when one'sfeelings and actions are uncontrolled; freedom from control...LDOCE

<superEntry>
<form>
  <orth>abandon</orth>
  <hyph>a|ban|don</hyph>
  <pron>@"band@n</pron>
</form>
<entry n="1">
  <gramGrp>
   <pos>v</pos>
   <subc>T1</subc>
  </gramGrp>
  <sense n="1">
   <def>to leave completely and for ever … </def>
  </sense>
  <sense n="2"/>
</entry>
<entry n="2">
  <gramGrp>
   <pos>n</pos>
   <subc>U</subc>
  </gramGrp>
  <def>the state when one's feelings and actions are uncontrolled; freedom
     from control…</def>
</entry>
</superEntry>

9.3 Top-level Constituents of EntriesTEI: Top-level Constituents of Entries¶

This section describes the top-level constituents of dictionary entries, together withthe phrase-level constituents peculiar to each.

the form element, which groups orthographic information andpronunciations, is described in section 9.3.1 Information on Written and Spoken Forms
the gramGrp element, which groups elements for the grammaticalcharacterization of the headword, is described in section 9.3.2 Grammatical Information
the def element, which describes the meaning of the headword, isdescribed in section 9.3.3 Sense Information
the etym element and its special phrase-level elements are documentedin section 9.3.4 Etymological Information
the cit element and its specific applications are described insection 9.3.3 Sense Information and section 9.3.5 Other Information
the usg, lbl, xr, and note elements aredescribed in section 9.3.5 Other Information
the re element, which marks nested entries for related words, isdescribed in section 9.3.6 Related Entries

» 9.3.2 Grammatical Information
Home | 目次

9.3.1 Information on Written and Spoken FormsTEI: Information on Written and Spoken Forms¶

Dictionary entries most often begin with information about the form of the word towhich the entry applies. Typically, the orthographic form of the word, sometimesmarked for syllabification or hyphenation, is the first item in an entry. Otherinformation about the word, including variant or alternate forms, inflected forms,pronunciation, etc., is also often given.

The following elements should be used to encode this information: the formelement groups one or more occurrences of any of them; it can also berecursively nested to reflect more complex sub-grouping of information about wordform(s), as shown in the examples.

form (form information group) groups all the information on the written and spoken forms of one headword.
type classifies form as simple, compound, etc.
orth (orthographic form) gives the orthographic form of a dictionary headword.
type gives the type of spelling.
extent gives the extent of the orthographic information provided.
pron (pronunciation) contains the pronunciation(s) of the word.
extent indicates whether the pronunciation is for whole word or part.
hyph (hyphenation) contains a hyphenated form of a dictionary headword, or hyphenation information in some other form.
syll (syllabification) contains the syllabification of the headword.
stress contains the stress pattern for a dictionary headword, if given separately.
lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.

In addition to those listed above, the following elements, which encode morphologicaldetails of the form, may also occur within form elements:

gram (grammatical information) within an entry in a dictionary or a terminological data file, contains grammatical information relating to a term, word, or form.

type	classifies the grammatical information given according to some convenient typology — in the case of terminological information, preferably the dictionary of data element types specified in ISO WD 12 620.

gen (gender) identifies the morphological gender of a lexical item, as given in the dictionary.
number indicates grammatical number associated with a form, as given in a dictionary.
case contains grammatical case information given by a dictionary for a given form.
per (person) contains an indication of the grammatical person (1st, 2nd, 3rd, etc.) associated with a given inflected form in a dictionary.
tns (tense) indicates the grammatical tense associated with a given inflected form in a dictionary.
mood contains information about the grammatical mood of verbs (e.g. indicative, subjunctive, imperative).

iType (inflectional class) indicates the inflectional class associated with a lexical item.

type	indicates the type of indicator used to specify the inflection class, when it is necessary to distinguish between the usual abbreviated indications (e.g. inv) and other kinds of indicators, such as special codes referring to conjugation patterns, etc.

Of these, the gram element is most general, and all of theothers are synonymous with a gram element with appropriate values (gen,number, case, etc.) for the type attribute.

Different dictionaries use different means to mark hyphenation,syllabification, and stress, and they often use some unusual glyphs(e.g., the ‘middle dot’ for hyphenation). All ofthese glyphs are in the Unicode character set, as discussed in Character References. When transcribing representations of pronunciationthe International Phonetic Alphabet should be used. It may beconvenient (as has been done in the text of this chapter) to use asimple transliteration scheme for this; such a scheme should however beproperly documented in the header.

In the simplest case, nothing is given but the orthography:

<form>
<orth>doom-laden</orth>
</form>

Often, however, pronunciation is given.

soucoupe [sukup] … DNT

<form>
<orth>soucoupe</orth>
<pron>sukup</pron>
</form>

For a variety of reasons including ease of processing, it may be desired to splitinto separate elements information which is collapsed into a single element in thesource text; orthography and hyphenation may for example be transcribed as separateelements, although given together in the source text. For a discussion of the issuesinvolved, and of methods for retaining both the presentation form and theinterpreted form, see section 9.5 Typographic and Lexical Information in Dictionary Data.

This example splits orthography and hyphenation, and adds syllabification because itdiffers from hyphenation:

ar.ea … W7

Multiple orthographic forms may be given, e.g. to illustrate a word's inflectionalpattern:

brag … vb. brags, bragging, bragged … CED

<form>
<orth>brag</orth>
</form>
<gramGrp>
<pos>vb</pos>
</gramGrp>
<form type="infl">
<orth>brags</orth>
<orth>bragging</orth>
<orth>bragged</orth>
</form>

Or the inflectional pattern may be indicated by reference to a table ofparadigms, as here:

horrifier[ORifje] (7) vt … [C/R]

<form>
<orth>horrifier</orth>
<pron>ORifje</pron>
<iType type="vbtable">7</iType>
</form>

Explanatory labels may be attached to alternate forms:

MTBF abbrev. for mean time between failures. CED

<entry>
<form type="abbrev">
  <orth>MTBF</orth>
</form>
<form type="full">
  <lbl>abbrev. for</lbl>
  <orth>mean time between failures</orth>
</form>
</entry>

When multiple orthographic forms are given, a pronunciation may be associated withall of them, as here:

biryani or biriani(%bIrI"A:nI) … CED

<form>
<orth>biryani</orth>
<orth>biriani</orth>
<pron>%bIrI"A:nI</pron>
</form>

In other cases, different pronunciations are provided for different orthographicforms; here, the form element is repeated to associate the firstorthographic form explicitly with the first pronunciation, and the secondorthographic form with the second pronunciation:

mackle("mak^@l) or macule ("makju:l) … CED

<form>
<orth>mackle</orth>
<pron>"makəl</pron>
</form>
<form>
<orth>macule</orth>
<pron>"makju:l</pron>
</form>

Recursive nesting of the form element can preserve relations among elementsthat are implicit in the text. For example, in the CED entry for ‘hospitaller’,it is clear that ‘U.S.’ is associated only with ‘hospitaler’, but that thepronunciation applies to both forms. The following encoding preserves theserelations:

hospitaller or U.S. hospitaler ("hQspIt@l@) … CED

<form>
<orth>hospitaller</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>hospitaler</orth>
</form>
<pron>"hQspIt@l@</pron>
</form>

The formal declarations for the elements of the form group arethese:

The classes of morphological elements, and of elements allowed within theform group, are declared thus:

9.3.2 Grammatical InformationTEI: Grammatical Information¶

The gramGrp element groups grammatical information, such as part of speech,subcategorization information (e.g., syntactic patterns for verbs, count/massdistinctions for nouns), etc. It can contain any of the following elements:

pos (part of speech) indicates the part of speech assigned to a dictionary headword such as noun, verb, or adjective.
subc (subcategorization) contains subcategorization information (transitive/intransitive, countable/non-countable, etc.)
colloc (collocate) contains a collocate of the headword.

In addition, gramGrp can contain any of the morphological elements definedin section 9.3.1 Information on Written and Spoken Forms for form.Elements conveying morphological information bear differentinterpretations within gramGrp and form groups, the differencebeing that in the form group, the morphological information specifiedpertains to the specific alternate form in question, while within gramGrpit applies to the headword form. For example, in the entry ‘pinna ('pIn@) n., pl. -nae (-ni:) or -nas’CED, theword defined can be either singular or plural; the ‘pl.’ specification appliesonly to the inflected forms provided. Compare this with ‘pants (paents) pl.n.’, where ‘pl.’ applies to the headword itself.

As noted above in section 9.3.1 Information on Written and Spoken Forms, the elements for morphologicalinformation are simply shorthand for the general purpose gram element.Consider this entry for the French word médire:

médire v.t. ind. (de) … PLC

This entry can betagged using specialized grammatical elements:

<form>
<orth>médire</orth>
</form>
<gramGrp>
<pos>v</pos>
<subc>t ind</subc>
<colloc type="prep">de</colloc>
</gramGrp>

Or using the gram element:

<form>
<orth>médire</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
<gram type="subc">t ind</gram>
<gram>de</gram>
</gramGrp>

Like form, gramGrp can be repeated, recursively nested, or used atthe sense level to show relations among elements.

isotope adj. et n. m. … DNT

<form>
<orth>isotope</orth>
</form>
<gramGrp>
<pos>adj</pos>
</gramGrp>
<gramGrp>
<pos>n</pos>
<gen>m</gen>
</gramGrp>

wits (wIts) pl. n. 1. (sometimes sing.) the ability toreason and act, esp. quickly … CED

<entry>
<form>
  <orth>wits</orth>
  <pron>wIts</pron>
</form>
<gramGrp>
  <number>pl</number>
  <pos>n</pos>
</gramGrp>
<sense n="1">
  <gramGrp>
   <number>sometimes sing.</number>
  </gramGrp>
  <def>the ability to reason and act, esp. quickly …</def>
</sense>
</entry>

9.3.3 Sense InformationTEI: Sense Information¶

Dictionaries may describe the meanings of words in a wide variety of different ways —by means of synonyms, paraphrases, translations into other languages, formaldefinitions in various highly stylized forms, etc. No attempt is made here todistinguish all the different forms which sense information may take;all of them maybe tagged using the def element described in section 9.3.3.1 Definitions.

As a special case it is frequently desirable to distinguishthe provision of translation equivalents in other languages from otherforms of sense information; the use of cittype="translation" (which groups a translation equivalent withrelated information such as its grammatical description) for this purpose is describedin section 9.3.3.2 Translation Equivalents.

9.3.3.1 DefinitionsTEI: Definitions¶

Dictionary definitions are those pieces of prose in a dictionary entry thatdescribe the meaning of some lexical item. Most often, definitions describe theheadword of the entry; in some cases, they describe translated texts, examples,etc.; see cit type="translation", section 9.3.3.2 Translation Equivalents, andcit type="example", section 9.3.5.1 例. Thedef element directly contains the text of the definition; unlikeform and gramGrp, it does not serve solely togroup a set of smaller elements. The close analysis of definition text, such asthe tagging of hypernyms, typical objects, etc., is not covered by theseGuidelines.

Definitions may occur directly within an entry; when multipledefinitions are given, they are typically identified as belonging todistinct senses, as here:

demigod (…) n. 1.a. a being who is part mortal, partgod. b. a lesser deity. 2. a godlike person. CP

<entry>
<form>
  <orth>demigod</orth>
  <pron> … </pron>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<sense n="1">
  <sense n="a">
   <def>a being who is part mortal, part god.</def>
  </sense>
  <sense n="b">
   <def>a lesser deity.</def>
  </sense>
</sense>
<sense n="2">
  <def>a godlike person.</def>
</sense>
</entry>

In multilingual dictionaries, it is sometimes possible to distinguish translationequivalents from definitions proper; here a def element isdistinguished from the translation information within which it appears.

rémoulade[Remulad] nf remoulade, rémoulade (dressing containing mustard and herbs). CR

<entry>
<form>
  <orth>rémoulade</orth>
  <pron>Remulad</pron>
</form>
<gramGrp>
  <pos>n</pos>
  <gen>f</gen>
</gramGrp>
<cit type="translation" xml:lang="en">
  <quote>remoulade</quote>
  <quote>rémoulade</quote>
  <def>dressing containing mustard and herbs</def>
</cit>
</entry>

« 9.3.3.1 Definitions
Home | 目次

9.3.3.2 Translation EquivalentsTEI: Translation Equivalents¶

Multilingual dictionaries contain information about translations of a given wordin some source language for one or more target languages. Minimally, thedictionary provides the corresponding translation in the target language; othermaterial, such as morphological information (gender, case), various kinds ofusage restrictions, etc., may also be given. If translation equivalents are tobe distinguished from other kinds of sense information, they may be encodedusing cit type="translation". The global xml:lang attributeshould be used to specify the target language.

As in monolingual dictionaries, the sense element is used inmultilingual dictionaries to group information (forms, grammatical information,usage, translation(s), etc.) about a given sense of a word where necessary.Information about the individual translation equivalents within a sense isgrouped using cit type="translation". This information may include thetranslation text (tagged q or quote), morphologicalinformation (gen, case, etc.), usage notes (usg),translation labels (lbl), and definitions (def).Whenbibliographic data is provided, the quote element should be used.

cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.

Note how in the following example, different translation equivalents are groupedinto the same or different senses, following the punctuation of the source andthe usage labels:

dresser … (a) (Theat) habilleur m, -euse f; (Comm:window ~) étalagiste mf. she's a stylish ~ elle s'habille avec chic; V hair.(b) (tool) (for wood) raboteuse f; (for stone) rabotin m. CR

<entry n="1">
<form>
  <orth>dresser</orth>
</form>
<sense n="a">
  <sense>
   <usg type="dom">Theat</usg>
   <cit type="translation" xml:lang="fr">
    <quote>habilleur</quote>
    <gen>m</gen>
   </cit>
   <cit type="translation" xml:lang="fr">
    <quote>-euse</quote>
    <gen>f</gen>
   </cit>
  </sense>
  <sense>
   <usg type="dom">Comm</usg>
   <form type="compound">
    <orth>window <oRef/>
    </orth>
   </form>
   <cit type="translation" xml:lang="fr">
    <quote>étalagiste</quote>
    <gen>mf</gen>
   </cit>
  </sense>
  <cit type="example">
   <quote>she's a stylish <oRef/>
   </quote>
   <cit type="translation" xml:lang="fr">
    <quote>elle s'habille avec chic</quote>
   </cit>
  </cit>
  <xr type="see">V. <ref target="#hair">hair</ref>
  </xr>
</sense>
<sense n="b">
  <usg type="category">tool</usg>
  <sense>
   <usg type="hint">for wood</usg>
   <cit type="translation" xml:lang="fr">
    <quote>raboteuse</quote>
    <gen>f</gen>
   </cit>
  </sense>
  <sense>
   <usg type="hint">for stone</usg>
   <cit type="translation" xml:lang="fr">
    <quote>rabotin</quote>
    <gen>m</gen>
   </cit>
  </sense>
</sense>
</entry>

<entry xml:id="hair">

</entry>

In the following example, a distinction is made between thetranslation equivalent (‘OAS’) and a descriptive phrase providingfurther information for the user of the dictionary.

O.A.S. ... nf (abrév de Organisation de l'Armée secrète) OAS (illegal military organization supporting French rule ofAlgeria). CR

<entry>
<cit type="translation" xml:lang="en">
  <quote>OAS</quote>
  <def>illegal military organization supporting French rule of
     Algeria</def>
</cit>
</entry>

Note that cit type="translation" may also be used in monolingualdictionaries when a translation is given for a foreign word:

havdalah or havdoloh Hebrew.(Hebrew hAvdA"lA; Yiddish hAv"dOl@) n. Judaism. the ceremony marking the end of thesabbath or of a festival, including the blessings over wine, candles andspices. [literally: separation] CED

<entry type="foreign">
<form>
  <orth>havdalah</orth>
  <orth>havdoloh</orth>
</form>
<usg type="dom">Judaism</usg>
<def>the ceremony marking the end of the sabbath or of a festival,
   including the blessings over wine, candles and spices.</def>
<cit type="translation" xml:lang="en">
  <note>literally</note>
  <quote>separation</quote>
</cit>
</entry>

9.3.4 Etymological InformationTEI: Etymological Information¶

The element etym marks a block of etymological information. Etymologies maycontain highly structured lists of words in an order indicating their descent fromeach other, but often also include related words and forms outside the direct lineof descent, for comparison. Not infrequently, etymologies include commentary ofvarious sorts, and can grow into short (or long!) essays with prose-like structure.This variation in structure makes it impracticable to define tags which capture theentire intellectual structure of the etymology or record the precise interrelationof all the words mentioned. It is, however, feasible to mark some of the moreobvious phrase-level elements frequently found in etymologies, using tags defined inthe core module or elsewhere in this chapter. Of particular relevance for themarkup of etymologies are:

etym (etymology) encloses the etymological information in a dictionary entry.
lang (language name) name of a language mentioned in etymological or other linguistic discussion.
date contains a date in any format.
mentioned marks words or phrases mentioned, not used.
gloss identifies a phrase or word used to provide a gloss or definition for some other word or phrase.
pron (pronunciation) contains the pronunciation(s) of the word.
usg (usage) contains usage information in a dictionary entry.
lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.

As in other prose, individual word forms mentioned in an etymological description aretagged with mentioned elements. Pronunciations, usage labels, and glossescan be tagged using the pron, usg, and gloss elementsdefined elsewhere in these Guidelines. In addition, the lang element may beused to identify a particular language name where it appears, in addition to usingthe xml:lang attribute of the mentioned element.

例:

abismo m. (del gr. a priv. y byssos, fondo). Sima, granprofundidad. …

<entry>
<form>
<orth>abismo</orth>
</form>
<etym>del <lang>gr.</lang>
<mentioned>a</mentioned> priv. y <mentioned>byssos</mentioned>,
<gloss>fondo</gloss>
</etym>
</entry>

neume\'n(y)üm\ n [F, fr. ML pneuma, neuma, fr. Gk pneumabreath — more at pneumatic]: any of various symbols used inthe notation of Gregorian chant … [WNC]

<entry>
<etym>
  <lang>F</lang> fr. <lang>ML</lang>
  <mentioned>pneuma</mentioned>
  <mentioned>neuma</mentioned> fr. <lang>Gk</lang>
  <mentioned>pneuma</mentioned>
  <gloss>breath</gloss>
  <xr type="etym">more at <ptr target="#pneumatic"/>
  </xr>
</etym>
<def>any of various symbols … </def>
</entry>

<entry xml:id="pneumatic">

</entry>

9.3.5 Other InformationTEI: Other Information¶

9.3.5.1 例TEI: 例¶

Dictionaries typically include examples of word use, usually accompanyingdefinitions or translations. In some cases, the examples are quotations fromanother source, and are occasionally followed by a citation to the author.

The cit type="example" element contains usage examples and associatedinformation; the example text itself should be enclosed in a q orquote element. The cit element associates a quotation witha bibliographic reference to its source.

q (separated from the surrounding text with quotation marks) contains material which is marked as (ostensibly) being somehow different than the surrounding text, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned not used.
quote (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.

例 frequently abbreviate the headword, and so their transcription willfrequently make use of the oRef or oVar elements describedbelow in section 9.4 Headword and Pronunciation References.

例:

multiplex/…/ adj tech having many parts: the multiplex eyeof the fly. LDOCE

<quote>the multiplex eye of the fly.</quote>

Or when one wants a more comprehensive representation ofexamples:

<cit type="example">
<quote>the multiplex eye of the fly.</quote>
</cit>

As the following example shows, cit can also contain elementssuch as pron, def, etc.

some … 4. (S~ and any are used with more): Give me ~ more/s@'mO:(r)/OALD

In multilingual dictionaries, examples may also be accompanied bytranslations:

horrifier … vt to horrify. elle étaithorrifiée par la dépense she was horrified at the expense. CR

<entry>
<cit type="translation" xml:lang="en">
  <quote>to horrify</quote>
</cit>
<cit type="example">
  <quote>elle était horrifiée par la dépense</quote>
  <cit type="translation" xml:lang="en">
   <quote>she was horrified at the expense.</quote>
  </cit>
</cit>
</entry>

When a source is indicated, the example should be markedwith a cit element:

valeur … n. f. … 2. Vx. Vaillance, bravoure(spécial., au combat). ‘La valeur n'attend pas le nombre des années’(Corneille). … DNT

<sense n="2">
<usg type="time">Vx.</usg>
<def>Vaillance, bravoure (spécial., au combat)</def>
<cit type="example">
  <quote>La valeur n'attend pas le nombre des années</quote>
  <bibl>
   <author>Corneille</author>
  </bibl>
</cit>
</sense>

« 9.3.5.1 例
» 9.3.5.3 Cross References to Other Entries
Home | 目次

9.3.5.2 Usage Information and Other LabelsTEI: Usage Information and Other Labels¶

Most dictionaries provide restrictive labels and phrases indicating the usage ofgiven words or particular senses. Other labels, not necessarily related tousage, may be attached to forms, translations, cross references, and examples.Usage and other labels should be marked with the following elements:

usg (usage) contains usage information in a dictionary entry.
lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.

Typical usage labels mark

temporal use (archaic, obsolete, etc.)
register (slang, formal, taboo, ironic, facetious, etc.)
style (literal, figurative, etc.)
connotative effect (e.g. derogatory, offensive)
subject field (Astronomy, Philosophy, etc.)
national or regional use (Australian, U.S., Midland dialect,etc.)

Many dictionaries provide an explanation and/or a list of such usagelabels in a preface or appendix. The type of the usage information may beindicated in the type attribute on the usg element. Sometypical values are:

geo: geographic area
time: temporal, historical era (‘archaic’, ‘old’, etc.)
dom: domain
reg: register
style: style (figurative, literal, etc.)
plev: preference level (‘chiefly’, ‘usually’, etc.)
acc: acceptability
lang: language for foreign words, spellings pronunciations, etc.
gram: grammatical usage

In addition to this kind of information, multilingual dictionaries oftenprovide ‘semantic cues’ to help the user determine the rightsense of a word in the source language (and hence the correct translation).These include synonyms, concept subdivisions, typical subjects and objects,typical verb complements, etc. These labels are also marked with theusg element; sample values for the type attribute in thesecases include:

syn: synonym given to show use
hyper: hypernym given to show usage
colloc: collocation given to show usage
comp: typical complement
obj: typical object
subj: typical subject
verb: typical verb
hint: unclassifiable piece of information to guide sense choice

In this entry, one spelling is marked as geographically restricted:

colour or U.S. color … CED

<form>
<orth>colour</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>color</orth>
</form>
</form>

In the next example, usage labels are used to indicate domains, register, andsynonyms associated with different senses:

palette[palEt] nf (a) (Peinture: lit, fig) palette. (b)(Boucherie) shoulder. (c) (aube de roue) paddle; (battoir à linge) beetle;(Manutention, Constr) pallet. CR

<sense n="a">
<usg type="dom">Peinture</usg>
<usg type="style">lit</usg>
<usg type="style">fig</usg>
<cit type="translation" xml:lang="en">
  <quote>palette</quote>
</cit>
</sense>
<sense n="b">
<usg type="dom">Boucherie</usg>
<cit type="translation" xml:lang="en">
  <quote>shoulder</quote>
</cit>
</sense>
<sense n="c">
<sense>
  <usg type="syn">aube de roue</usg>
  <cit type="translation" xml:lang="en">
   <quote>paddle</quote>
  </cit>
</sense>
<sense>
  <usg type="syn">battoir à linge</usg>
  <cit type="translation" xml:lang="en">
   <quote>beetle</quote>
  </cit>
</sense>
<sense>
  <usg type="dom">Manutention</usg>
  <usg type="dom">Constr</usg>
  <cit type="translation" xml:lang="en">
   <quote>pallet</quote>
  </cit>
</sense>
</sense>

When the usage label is hard to classify, it may be described as a ‘hint’:

rempaillage […] nm reseating, rebottoming (with straw). CR

<entry>
<cit type="translation" xml:lang="en">
  <quote>reseating</quote>
  <quote>rebottoming</quote>
  <usg type="hint">with straw</usg>
</cit>
</entry>

9.3.5.3 Cross References to Other EntriesTEI: Cross References to Other Entries¶

Dictionary entries frequently refer to information in other entries, often usingextremely dense notations to convey the headword of the entry to be sought, theparticular part of the entry being referred to, and the nature of theinformation to be sought there (synonyms, antonyms, usage notes, etymology, anillustration, etc.)

Cross references may be tagged in dictionaries using the ref andptr elements defined in the core module (section 3.6 簡単なリンクと相互参照). In addition, the xr element may be used to group all theinformation relating to a cross reference.

xr (cross-reference phrase) contains a phrase, sentence, or icon referring the reader tosome other location in this or another text.
ref (reference) defines a reference to another location, possibly modified by additional text or comment.
ptr/ (pointer) defines a pointer to another location.
lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.

As in other types of text, the actual pointing element (e.g. ref orptr) is used to tag the cross-reference target proper (indictionaries, usually the headword, possibly accompanied by a homograph number,a sense number, or other further restriction specifiying what portion of thetarget entry is being referred to).The xr elementis used to group the target with any accompanying phrases or symbols used tolabel the cross reference; the cross reference label itself may be tagged as albl or may remain untagged. Both of the following are thuslegitimate:

glee … Compare madrigal (sense 1)CED

<entry>
<form>
<orth>glee</orth>
</form>
<xr>Compare <ptr target="#madrigal.1"/>
</xr>
</entry>
<entry xml:id="madrigal.1">

</entry>

hostellerie Syn. de hôtellerie (sens 1). DNT

<xr type="syn">
<lbl>Syn. de</lbl>
<ref>hôtellerie (sens 1)</ref>.
</xr>

In addition to using, or not using, lbl to mark thecross-reference label, the two examples differ in another way. The formerassumes that the first sense of madrigal has theidentifier madrigal.1, and that the specific form of the reference in thesource volume can be reconstructed, if needed, from that information. The latterdoes not require the first sense of ‘hôtellerie’ to have an identifier, andretains the print form of the cross reference; by omitting the targetattribute of the ref element, however, the second example does assumeimplicitly either that some software could usefully parse the phrase tagged as aref and find the location referred to, or else that such processingwill not be necessary.

The type attribute on the pointing element or on the xrelement may be used to indicate what kind of cross reference is being made,using any convenient typology. Since different dictionaries may label the samekind of cross reference in different ways, it may be useful to give normalizedindications in thetype attribute, enabling the encoder to distinguishirregular forms of cross reference more reliably:

rose2 … vb. the past tense of rise. CED

<entry n="2">
<form>
  <orth>rose</orth>
</form>
<xr type="inflectedForm">
  <lbl>the past tense of</lbl>
  <ref target="#rise">rise</ref>
</xr>
</entry>

from cross-references for synonyms and the like:

antagonist … syn see adverseW7

<xr type="synonym">
<lbl>syn see</lbl>
<ref target="#adverse">adverse</ref>
</xr>

Strictly speaking, the reference above is not to the entry foradverse, but to the list of synonyms found at thatentry. Slightly more complicated is the following reference to an illustrationaccompanying another entry:

ax, axe … → see the illus at toolOALD

This entry refers to the illustration at the entryfor tool, not the entry itself. The targetattribute might give the identifier of the illustration itself, or of theenclosing entry (in which case the type attribute might be used toinfer that the reference is actually to the illustration, not the entry as awhole).

<xr type="illustration">
<lbl>see the illus at</lbl>
<ptr target="#tool.illus"/>
</xr>

In some cases, the cross reference is to a particular subset of themeanings of the entry in question:

globe …V. armillaire (sphère)PR

<xr>V. <ref target="#armillaire">armillaire</ref>
<lbl type="sense-restriction">sphère</lbl>
</xr>

Cross-references occasionally occur in definition texts, exampletexts, etc., or may be free-standing within an entry. These maytypically be encoded using ref or ptr, without anenclosing xr. For example:

entacher … Acte entaché denullité, contenant un vice de forme ou passé par un incapable*. DNT

The asterisk signals a reference to the entry forincapable.

<def>contenant un vice de forme ou passé par un <ptr target="#incapable"/>.</def>

Insome cases, the form in the definition is inflected, and thus ref mustbe used, as here:

justifier …4. IMPRIM Donner a (une ligne) une longeurconvenable au moyen de blancs (2, sens 1, 3). DNT

<sense n="4">
<usg type="dom">imprim</usg>
<def>Donner a (une ligne) une longeur convenable au moyen de
<ref target="#blanc-2.1 #blanc-2.3">blancs (2, sens 1, 3)</ref>
</def>
</sense>

9.3.5.4 注釈 within EntriesTEI: 注釈 within Entries¶

Dictionaries may include extensive explanatory notes about usage, grammar,context, etc. within entries. Very often, such notes appear as a separatesection at the end of an entry. The standard note element should be used forsuch material.

note contains a note or annotation.

For example:

ain't(eInt)Not standard. contraction of am not, is not, are not,have not or has not: I ain't seen it.….Usage. Although the interrogative form ain't I? would be a natural contraction of am I not?, it is generally avoided in spoken English and neverused in formal English. CED

<entry>
<form type="contr">
  <orth>ain't</orth>
  <pron>eInt</pron>
</form>
<usg type="reg">Not standard</usg>
<form type="full">
  <lbl>contraction of</lbl>
  <orth>am not</orth>
  <orth>is not</orth>
  <orth>are not</orth>
  <orth>have not</orth>
  <orth>has not</orth>
</form>
<cit type="example">
  <quote>I ain't seen it.</quote>
</cit>
<note type="usage">Although the interrogative form <mentioned>ain't
     I?</mentioned> would be a natural contraction of <mentioned>am I
     not?</mentioned>, it is generally avoided in spoken English and
   never used in formal English.</note>
</entry>

The formal declaration for note is given in section 3.8 注釈, Annotation, and Indexing.

« 9.3.5 Other Information
Home | 目次

9.3.6 Related EntriesTEI: Related Entries¶

The re element encloses a degenerate entry which appears in the body ofanother entry for some purpose. Many dictionaries include related entries for directderivatives or inflected forms of the entry word, or for compound words, phrases,collocations, and idioms containing the entry word.

Related entries can be complex, and may in fact include any of the information to befound in a regular entry. Therefore, the re element is defined to containthe same elements as an entry element, with the exception that it may notcontain any nested re elements.

例:

bevvy("bEvI) Dialect. ~ n., pl. -vies. 1. a drink, esp. analcoholic one: we had a few bevvies last night. 2. a night of drinking. ~ vb. -vies, -vying, -vied (intr.) 3. to drink alcohol [probably from Old French bevee,buvee, drinking] —'bevvied adj. CED

<entry>
<form>
  <orth>bevvy</orth>
  <pron>"bEvI</pron>
</form>
<usg type="reg">Dialect</usg>
<hom>
  <gramGrp>
   <pos>n</pos>
  </gramGrp>
  <sense n="1">
   <def>a drink, esp. an alcoholic one: we had a few bevvies last night.</def>
  </sense>
</hom>

<hom>
  <gramGrp>
   <pos>vb</pos>
  </gramGrp>
  <sense n="3">
   <def>to drink alcohol</def>
  </sense>
</hom>
<etym>probably from <lang>Old French</lang>
  <mentioned>bevee</mentioned>, <mentioned>buvee</mentioned>
  <gloss>drinking</gloss>
</etym>
<re type="derived">
  <form>
   <orth>bevvied</orth>
  </form>
  <gramGrp>
   <pos>adj</pos>
  </gramGrp>
</re>
</entry>

9.4 Headword and Pronunciation ReferencesTEI: Headword and Pronunciation References¶

例, definitions, etymologies, and occasionally other elements such as crossreferences, orthographic forms, etc., often contain a shortened or iconic reference tothe headword, rather than repeating the headword itself. The references may be to theorthographic form or to the pronunciation, to the form given or to a variant of thatform. The following elements are used to encode such iconic references to a headword:

oRef/ (orthographic-form reference) in a dictionary example, indicates a reference to the orthographic form(s) of the headword.
type indicates the kind of typographic modification made to the headword in the reference.
pRef/ (pronunciation reference) in a dictionary example, indicates a reference to the pronunciation(s) of the headword.
oVar (orthographic-variant reference) in a dictionary example, indicates a reference to variant orthographic form(s) of the headword.
type indicates the kind of variant involved.
pVar (pronunciation-variant reference) in a dictionary example, indicates a reference to variant pronunciation(s) of the headword.

As members of the class att.ptrLike.form, all these elementsshare a target attribute, which may optionally be used to resolve anyambiguity about the headword form being referred to.

att.ptrLike.form (form pointers) common attributes for elements in the dictionary base which point at orthographic or pronunciation forms of the headword.
target identifies the orthographic form or pronunciation referred to.

Headword references come in a variety of formats:

~: indicates a reference to the full form of the headword
pref~: gives a prefix to be affixed to the headword
~suf: gives a suffix to be affixed to the headword
A~: gives the first letter in upper case, indicating that the headword iscapitalized
pref~suf: gives a prefix and a suffix to be affixed to the headword
a.: gives the initial of the word followed by a full stop, to indicate referenceto the full form of the headword
A.: refers to a capitalized form of the headword

The oRef element should be used for iconic or shortened references to theorthographic form(s) of the headword itself. It is an empty element and replaces, ratherthan enclosing, the reference. Note that the reference to a headword is not necessarilya simple string replacement. In the example ‘colour1, (US = color) …~ films; ~ TV; Red, blue and yelloware ~s.’OALD, the tilde stands for either headword form(colour, color).

例:

colonel … army officer above a lieutenant-~. OALD

<def>army officer above a lieutenant-<oRef/>
</def>

academy … The Royal A~ of Arts OALD

<q>The Royal <oRef type="cap"/> of Arts</q>

The following example demonstrates the use of the target attribute to refer toa specific form of the headword:

vag- or vago- comb form … : vagus nerve< vagal > < vagotomy> W7

<entry>
<form>
  <orth xml:id="di-o1">vag-</orth>
  <orth xml:id="di-o2">vago-</orth>
</form>
<def>vagus nerve</def>
<cit type="example">
  <quote>
   <oRef target="#di-o1" type="nohyph"/>al</quote>
  <quote>
   <oRef target="#di-o2" type="nohyph"/>tomy</quote>
</cit>
</entry>

In many cases the reference is not to the orthographic form of the headword, but ratherto another form of the headword — usually to an inflected form. In these cases, theelement oVar should be used; this element takes as its content the string as itappears in the text.

take … < Mr Burton took us forFrench > NPEG

<cit type="example">
<quote>Mr Burton <oVar type="pt">took</oVar> us for French</quote>
</cit>

take … < was quite ~n with him> NPEG

<cit type="example">
<quote>was quite <oVar type="pp">
<oRef/>n</oVar> with him</quote>
</cit>

The next example shows a discontinuous reference, using theattributesnext andprev, which are defined in the additional module for linking,segmentation, and alignment (see chapter 16 Linking, Segmentation, and Alignment) and therefore require thatthat module be selected in addition to that for dictionaries.

mix up… < it's easy to mix her up with her sister > NPEG

<cit type="example">
<quote>it's easy to <oVar next="#ov2" xml:id="ov1">mix</oVar>
her <oVar prev="#ov1" xml:id="ov2">up</oVar> with her sister</quote>
</cit>

In addition, some dictionaries make reference to the pronunciation of the headword in thepronunciation of related entries, variants, or examples. The pRefandpVar elements should be used for such references.

hors d'oeuvre/,aw'duhv (Fr O:r dœvr)/ n, pl hors d'oeuvres also horsd'oeuvre /'duhv(z) (Fr ~)/ NPEG

<form>
<orth>hors d'oeuvre</orth>
<pron>%aU"dUv</pron>
<form>
  <usg type="lang">Fr</usg>
  <pron xml:id="di-p2">OR d0vR</pron>
</form>
</form>
<form type="infl">
<number>pl</number>
<orth>hors d'oeuvres</orth>
<orth>hors d'oeuvre</orth>
<pron extent="part">"dUv(z)</pron>
<form>
  <usg type="lang">Fr</usg>
  <pron>
   <pRef target="#di-p2"/>
  </pron>
</form>
</form>

Because headword and pronunciation references can occur virtually anywhere in an entry,the oRef, oVar, pRef, and pVar elements can appearwithin any other element defined for dictionary entries.

Since existing printed dictionaries use different conventions for headword references(swung dash, first letter abbreviated form, capitalization, or italicization of the word,etc.) the exact method used should be documented in the header.

9.5 Typographic and Lexical Information in Dictionary DataTEI: Typographic and Lexical Information in Dictionary Data¶

Among the many possible views of dictionaries, it is useful to distinguish at least thefollowing three, which help to clarify some issues raised with particular urgency bydictionaries, on account of the complexity of both their typography and theirinformation structure.

(a) the typographic view — thetwo-dimensional printed page, including information about line and page breaksand other features of layout
(b) the editorial view — the one-dimensional sequence of tokenswhich can be seen as the input to the typesetting process; the wording andpunctuation of the text and the sequencing of items are visible in this view,but specifics of the typographic realization are not
(c) the lexical view — this view includes the underlyinginformation represented in a dictionary, without concern for its exact textualform

For example, a domain indication in a dictionary entry might be broken over a line andtherefore hyphenated (‘naut-’‘ical’); the typographic view of the dictionary preserves this information. In apurely editorial view, the particular form in which the domain name is given in theparticular dictionary (as ‘nautical’, rather than ‘naut.’, ‘Naut.’, etc.)would be preserved, but the fact of the line break would not. Font shifts mightplausibly be included in either a strictly typographic or an editorial view. In thelexical view, the only information preserved concerning domain would be some standardsymbol or string representing the nautical domain (e.g. ‘naut.’) regardless of theform in which it appears in the printed dictionary.

In practice, publishers begin with the lexical view — i.e., lexical data as it mightappear in a database — and generate first the editorial view, which reflects editorialchoices for a particular dictionary (such as the use of the abbreviation ‘Naut.’for ‘nautical’, the fonts in which different types of information are to berendered, etc.), and then the typographic view, which is tied to a specific printedrendering. Computational linguists and philologists often begin with the typographicview and analyse it to obtain the editorial and/or lexical views. Some users mayultimately be concerned with retaining only the lexical view, or they may wish topreserve the typographic or editorial views as a reference text, perhaps as a guardagainst the loss or misinterpretation of information in the translation process. Someresearchers may wish to retain all three views, and study their interrelations, sinceresearch questions may well span all three views.

In general, an electronic encoding of a text will allow the recovery of at least one viewof that text (the one which guided the encoding); if editorial and typographic practicesare consistently applied in the production of a printed dictionary, or if exceptions tothe rules are consistently recorded in the electronic encoding, then it is inprinciple possible to recover the editorial view from an encoding of thelexical view, and the typographic view from an encoding of the editorial view. Inpractice, of course, the severe compression of information in dictionaries, the varietyof methods by which this compression is achieved, the complexity of formulatingcompletely explicit rules for editorial and typographic practice, and the relativerarity of complete consistency in the application of such rules, all make the mechanicaltransformation of information from one view into another something of a vexed question.

This section describes some principles which may be useful in capturing one or the otherof these views as consistently and completely as possible, and describes some methods ofattempting to capture more than one view in a single encoding. Only the editorial andlexical views are explicitly treated here; for methods of recording the physical ortypographic details of a text, see chapter 11 Representation of Primary Sources. Other approaches tothese problems, such as the use of repetitive encoding and links to show theircorrespondences, or the use of feature structures to capture the information structure,and of the ana and inst attributes to link feature structures to atranscription of the editorial view of a dictionary, are not discussed here (forfeature structures, see chapter 18 素性構造. For linkage of textual form andunderlying information, see chapter 17 簡易分析機能).

» 9.5.2 Lexical View
Home | 目次

9.5.1 Editorial ViewTEI: Editorial View¶

Common practice in encoding texts of all sorts relies on principles such as thefollowing, which can be used successfully to capture the editorial view whenencoding a dictionary:

All characters of the source text should be retained, with the possibleexception of rendition text (for which see further below).
Characters appearing in the source text should typically be given ascharacter data content in the document, rather than as the value of anattribute; again, rendition text may optionally be excepted from this rule.
Apart from the characters or graphics in the source text, nothing elseshould appear as content in the document, although it may be given inattribute values.
The material in the source text should appear in the encoding in the sameorder. Complications of the character sequence by footnotes, marginal notes,etc., text wrapping around illustrations, etc., may be dealt with by theusual means (for notes, see section 3.8 注釈, Annotation, and Indexing).³¹

In a very conservative transcription of the editorial view of a text, renditioncharacters (e.g. the commas, parentheses, etc., used in dictionaryentries to signal boundaries among parts of the entry) and renditiontext (for example, conjunctions joining alternate headwords, etc.) aretypically retained. Removing the tags from such a transcription will leave all andonly the characters of the source text, in their original sequence.³²

Consider, for example, the following entry:

pinna ('pIn@) n., pl. -nae (-ni:) or-nas. 1. any leaflet of a pinnate compound leaf. 2. Zoology.a feather, wing, fin, or similarly shaped part. 3. another name for auricle (sense 2). [C18: via New Latin from Latin: wing,feather, fin] CED

A conservative encoding of the editorial view of thisentry, which retains all rendition text, might resemble the following:

<entry>
<form>
  <orth>pinna</orth>
  <pron>("pIn@)</pron>
</form>
<gramGrp>
  <pos>n.</pos>, </gramGrp>
<form type="infl">
  <number>pl.</number>
  <form>
   <orth type="lat" extent="part">-nae</orth>
   <pron extent="part">(-ni:)</pron>
  </form> or <orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">1. <def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">2. <usg type="dom">Zoology</usg>
  <def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">3. <xr type="syn">
   <lbl>another name for</lbl>
   <ref target="#auricle.2">auricle (sense 2).</ref>
  </xr>
</sense>
<etym>[<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>,
<gloss>fin</gloss>]</etym>
</entry>
<entry xml:id="auricle.2">

</entry>

A somewhat simplified encoding of the editorial view of this entry might exploit thefact that rendition text is often systematically recoverable. For example,parentheses consistently appear around pronunciation in this dictionary, and thusare effectively implied by the start- and end-tags for pron.³³ In such anencoding, removing the tags should exactly reproduce the sequence of characters inthe source, minus rendition text. The original character sequence can be recoveredfully by replacing tags with any rendition text they imply.

Encoding in this way, the example given above might resemble the following. ThetagUsage element in the header would be used to record the followingpatterns of rendition text:

parentheses appear around pron elements
commas appear before inflected forms
the word ‘or’ appears before alternate forms
brackets appear around the etymology
full stops appear after pos, inflection information, and sensenumbers
senses are numbered in sequence unless otherwise specified using theglobal n attribute

<entry>
<form>
  <orth>pinna</orth>
  <pron>"pIn@</pron>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<form type="infl">
  <number>pl</number>
  <form>
   <orth type="lat" extent="part">-nae</orth>
   <pron extent="part">-ni:</pron>
  </form>
  <orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">
  <def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
  <usg type="dom">Zoology</usg>
  <def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
  <xr type="syn">
   <lbl>another name for</lbl>
   <ref>auricle (sense 2).</ref>
  </xr>
</sense>
<etym>
  <date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>

When rendition text is omitted, it is recommended that the means to regenerate it befully documented, using the tagUsage element of the TEI header.

If rendition text is used systematically in a dictionary, with only a few mistakes orexceptions, the global attribute rend may be used on any tag to flagexceptions to the normal treatment. The values of the rend attribute arenot prescribed, but it can be used with values such as no-comma,no-left-paren, etc. Specific values can be documented using therendition element in the TEI header.

In the following (imaginary) example, no left parenthesis precedes thepronunciation:

biryani or biriani %bIrI"A:nI) any of avariety of Indian dishes … [from Urdu]

This irregularity can be recordedthus:

<entry>
<form>
  <orth>biryani</orth>
  <orth>biriani</orth>
  <pron rend="noleftparen">%bIrI"A:nI</pron>
</form>
<def>any of a variety of Indian dishes … </def>
<etym>from <lang>Urdu</lang>
</etym>
</entry>

9.5.2 Lexical ViewTEI: Lexical View¶

If the text to be interchanged retains only the lexical view of the text, there maybe no concern for the recoverability of the editorial (not to speak of thetypographic) view of the text. However, it is strongly recommended that the TEIheader be used to document fully the nature of all alterations to the original data,such as normalization of domain names, expansion of inflected forms, etc.

In an encoding of the lexical view of a text, there are degrees of departure from theoriginal data: normalizing inconsistent forms like‘nautical’, ‘naut’.,‘Naut.’, etc., to ‘nautical’ is a relatively slight alteration;expansion of ‘delay -ed -ing’ to‘delay, delayed, delaying’ is a moresubstantial departure. Still more severe is the rearranging of the order ofinformation in entries; for example:

reorganizing the order of elements in an entry to show their relationship,as in
clem (klEm) or clam vb. clems, clemming, clemmed orclams, clamming, clammed CED
where in a strictly lexical view onemight wish to group‘clem’ and ‘clam’ with their respectiveinflected forms.
splitting an entry into two separate entries, as in
celi.bacy /"selIb@sI/ n [U] state of livingunmarried, esp as a religious obligation. celi.bate /"selIb@t/ n [C]unmarried person (esp a priest who has taken a vow not to marry).OALD
For some purposes, this entry might usefully be split into anentry for ‘celibacy’ and a separate entry for ‘celibate’.

An encoding which captures the lexical view of the example given in the previoussection might look something like the following. In this encoding:

abbreviated forms have been silently expanded
some forms have been moved to allow related forms to be grouped together
the part of speech information has been moved to allow all forms to begiven together
the cross reference to ‘auricle’ has been simplified

<entry>
<form>
  <orth>pinna</orth>
  <pron>"pIn@</pron>
  <form type="infl">
   <number>pl</number>
   <form>
    <orth type="lat">pinnae</orth>
    <pron>'pIni:</pron>
   </form>
   <orth type="std">pinnas</orth>
  </form>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<sense n="1">
  <def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
  <usg type="dom">Zoology</usg>
  <def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
  <xr type="syn">
   <ptr target="#auricle.2"/>
  </xr>
</sense>
<etym>
  <date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>

« 9.5.2 Lexical View
Home | 目次

9.5.3 Retaining Both ViewsTEI: Retaining Both Views¶

It is sometimes desirable to retain both the lexical and the editorial view, in whichcase a potential conflict exists between the two. When there is a conflict betweenthe encodings for the lexical and editorial views, the principles described in thefollowing sections may be applied.

9.5.3.1 Using Attribute Values to Capture Alternate ViewsTEI: Using Attribute Values to Capture Alternate Views¶

If the order of the data is the same in both views, then both views may becaptured by encoding one ‘dominant’ view in the characterdata content of the document, and encoding the other using attribute values onthe appropriate elements. If all tags were to be removed, the remainingcharacters would be those of the dominant view of the text.

The attribute class att.lexicographic is used toprovide attributes for use in encoding multiple views of the same dictionaryentry. These attributes are available for use on all elements defined in thischapter when the base module for dictionaries is selected.

When the editorial view is dominant, the following attributes may be used tocapture the lexical view:

att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
norm (normalized) gives a normalized form of information given by the source text in a non-normalized form
split gives the list of split values for a merged form

When the lexical view is dominant, the following attributes may be used to recordthe editorial view:

att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.

orig	(original) gives the original string or is the empty string when the element does not appear in the source text.
mergedIn	gives a reference to another element, where the original appears as a merged form.

One attribute is useful in either view:

att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
opt (optional) indicates whether the element is optional or not

For example, if the source text had the domain label ‘naut.’, it might beencoded as follows. With the editorial view dominant:

The lexical view of the same label would transcribe the normalized formas content of the usg element, the typographic form as an attributevalue:

<usg orig="naut." type="dom">nautical</usg>

If the source text gives inflectional information for the verbdelay as ‘delay, -ed, -ing’, it might usefully beexpanded to ‘delayed, delayed, delaying’. An encoding of the editorial viewmight take this form:

<form>
<orth>delay</orth>
<form type="infl">
  <orth norm="delayed" extent="part">-ed</orth>
  <tns norm="pst,pstp"/>
</form>
<form type="infl">
  <orth norm="delaying" extent="part">-ing</orth>
  <tns norm="prsp"/>
</form>
</form>

Note the use of the tns tag with null content, to enable therepresentation of implicit information even though it has no print realization.

The lexical view might be encoded thus:

<form>
<orth>delay</orth>
<form type="infl">
  <orth orig="-ed">delayed</orth>
  <tns orig="">pst</tns>
  <tns orig="">pstp</tns>
</form>
<form type="infl">
  <orth orig="-ing">delaying</orth>
  <tns orig="">prsp</tns>
</form>
</form>

A particular problem may be posed by the common practice of presenting twoalternate forms of a word in a single string, by marking some parts of the wordas optional in some forms. The following entry is for a word which can bespelled either ‘thyrostimuline’ or‘thyréostimuline’:

thyr(é)ostimuline [tiR(e)ostimylin] …

With theeditorial view dominant, this entry might begin thus:

<form>
<orth split="thyrostimuline, thyréostimuline">thyr(é)ostimuline</orth>
<pron split="tiRostimylin, tiReostimylin">tiR(e)ostimylin</pron>
</form>

With the lexical view dominant, however, two orth andtwo pron elements would be encoded, in order to disentangle the twoforms; the orig attribute would be used to record the typographicpresentation of the information in the source.

<form>
<orth xml:id="dic-o1" orig="thyr(é)ostimuline">thyrostimuline</orth>
<pron xml:id="dic-p1" orig="tiR(e)ostimylin">tiRostimylin</pron>
</form>
<form>
<orth mergedIn="#dic-o1">thyréostimuline</orth>
<pron mergedIn="#dic-p1">tiReostimylin</pron>
</form>

This example might also be encoded using the opt attribute combinedwith the attributes next and prev defined in chapter 16 Linking, Segmentation, and Alignment.

<form>
<orth next="#dict-o2" xml:id="dict-o1">thyr</orth>
<orth
   next="#dict-o3"
   prev="#dict-o1"
   xml:id="dict-o2"
   opt="true">é</orth>
<orth prev="#dict-o2" xml:id="dict-o3">ostimuline</orth>
<pron next="#dict-p2" xml:id="dict-p1">tiR</pron>
<pron
   next="#dict-p3"
   prev="#dict-p1"
   xml:id="dict-p2"
   opt="true">e</pron>
<pron prev="#dict-p2" xml:id="dict-p3">ostimylin</pron>
</form>

Note that this transcription preserves both the lexical andeditorial views in a single encoding. However, it has the disadvantagethat the strings corresponding to entire words do not appear in theencoding uninterrupted, and therefore complex processing is requiredto retrieve them from the encoded text. The use of the optattribute is recommended, however, when long spans of text areinvolved, or when the optional part contains embedded tags.

For example, the following gives two definitions in one text:‘picture drawnwith coloured chalk made into crayons’, and‘coloured chalk made intocrayons’:

pas.tel /"pastl US: pa"stel/ n1 (picture drawn with) coloured chalk made into crayons. 2… OALD

A simple encoding solution would be to leave the definition text unanalysed, butthis might be felt inadequate since it does not show that there are twodefinitions. A possible alternative encoding would be:

<sense n="1">
<def>coloured chalk made into crayons</def>
<def>picture drawn with coloured chalk made into crayons</def>
</sense>

This transcribes some characters of the source text twice, however, whichdeviates from the usual practice. The following encoding records both theeditorial and lexical views:

<sense n="1">
<def next="#d2" xml:id="d1" opt="true">picture drawn with</def>
<def prev="#d1" xml:id="d2">coloured chalk made into crayons</def>
</sense>

9.5.3.2 Recording Original Locations of Transposed ElementsTEI: Recording Original Locations of Transposed Elements¶

The attributes described in the previous section are useful only when the orderof material is the same in both the editorial and the lexical view. When the twoviews impose different orders on the data, the standard linking mechanisms may be used toshow the original location of material transposed in an encoding of the lexicalview.

If the original is only slightly modified, the anchor element may beused to mark the original location of the material, and the locationattribute may be used on the lexical encoding of that material to indicate itsoriginal location(s). Like those in the preceding section, this attribute isdefined for the attribute class att.lexicographic:

att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
opt (optional) indicates whether the element is optional or not

For example:

pinna("pIn@) n., pl. -nae (-ni:) or -nas. CED

<form>
<orth>pinna</orth>
<pron>'pIn@</pron>
<anchor xml:id="p01"/>
<form type="infl">
  <number>pl</number>
  <form>
   <orth extent="part">-nae</orth>
   <pron extent="part">-ni:</pron>
  </form>
  <orth extent="part">-nas</orth>
</form>
</form>
<gramGrp>
<pos location="#p01">n</pos>
</gramGrp>

9.6 Unstructured EntriesTEI: Unstructured Entries¶

The content model for the entry element provides an entrystructure suitable for many average dictionaries, as well as manyregular entries in more exotic dictionaries. However, the structureof some dictionaries does not allow the restrictions imposed by thecontent model for entry. To handle these cases, theentryFree and dictScrap elements are provided tosupport much wider variation in entry structure. ThedictScrap element offers less freedom, in that it can onlycontain phrase level elements, but it can itself appear at any pointwithin a dictionary entry where any of the structural components of adictionary entry are permitted. As such, it acts as a container forotherwise anomalous parts of an entry.

The entryFree element places no constraints at all uponthe entry: any element defined in this chapter, as well as all thenormal phrase-level and inter-level elements, canappear anywhere within it. With the entryFree element, theencoder is free to use any element anywhere, as well as to use or omitgrouping elements such as form, gramGrp, etc.

The entryFree element allows the encoding of entries whichviolate the structure specified for the entry element. Forexample, in the following entry from a dictionary already inelectronic form, it is necessary to include a pron elementwithin a def. This is not permitted in the content model forentry, but it poses no problem in the entryFreeelement.

<ent
h="demigod"> <hwd>demi|god</hwd> <pr> <ph>"demIgQd</ph> </pr> <hps
ps="n"> <hsn> <def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman,
eg<cf>Hercules</cf> <pr> <ph>"h3:kjUli:z</ph> </pr> </def> </hsn>
</hps> </ent> <ref target="#DIC-OALD">OALD</ref>

<entryFree>
<form>
  <orth>demigod</orth>
  <hyph>demi|god</hyph>
  <pron>"demIgQd</pron>
</form>
<gramGrp>
  <pos>n</pos>
</gramGrp>
<def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman, eg
<mentioned>Hercules</mentioned>
</def>
<pron>"h3:kjUli:z</pron>
</entryFree>

The entryFree element also makes it possible to transcribe a dictionary usingonly phrase-level (‘atomic’) elements—that is, using no groupingelements at all. This can be desirable if the encoder wants a completely‘flat’ view, with no indication of or commitment to the associationof one element with another. The following encoding uses no grouping elements, and keepsall rendition text:

biryani or biriani(%bIrI"A:nI) any of a variety of Indian dishes…[fromUrdu] CED

<entryFree>
<orth>biryani</orth> or <orth>biriani</orth>
<pron>(%bIrI"A:nI)</pron>
<def>any of a variety of Indian dishes …</def>
<etym>[from <lang>Urdu</lang>]</etym>
</entryFree>

Here is an alternative way of representing the same structure, this time usingdictScrap:

<entry>
<dictScrap>
  <orth>biryani</orth> or <orth>biriani</orth>
  <pron>(%bIrI"A:nI)</pron>
  <def>any of a variety of Indian dishes …</def>
  <etym>[from <lang>Urdu</lang>]</etym>
</dictScrap>
</entry>

« 9.6 Unstructured Entries
Home | 目次

9.7 辞書モジュールTEI: 辞書モジュール¶

The module defined in this chapter makes available the followingcomponents:

Module dictionaries: Printed dictionaries

Elements defined: case colloc def dictScrap entry entryFree etym form gen gram gramGrp hom hyph iType lang lbl mood number oRef oVar orth pRef pVar per pos pron re sense stress subc superEntry syll tns usg xr
Classes defined: att.entryLike att.lexicographic att.ptrLike.form model.entryLike model.formPart model.gramPart model.morphLike model.ptrLike.form

The selection and combination of modules to form a TEI schema is described in1.2 TEIスキーマの定義.

↑ Contents « 8 Transcriptions of Speech » 10 Manuscript Description

注釈

29.

We refer the reader to previous andcurrent discussions of a common format for encoding dictionaries. Forexample, Amsler and Tompa (1988); Calzolari et al. (1990);Fought and Van Ess-Dykema; Ide and Veronis (1995); Ide et al. (1993); Ide et al. (1992); DANLEX Group (1987); and Tutin and Veronis (1998); Ide et al. (2000).

↵

30.

Tana de Gámez, ed., Simon and Schuster's International Dictionary (New York: Simon and Schuster, 1973).

↵

31.

Complications of sequence caused by marginal or interlinearinsertions and deletions, which are frequent in manuscripts, or byunconventional page layouts, as in concrete poetry, magazines withimaginative graphic designers, and texts about the nature of typographyas a medium, typically do not occur in dictionaries, and so are notdiscussed here.

↵

32.

This is a slight oversimplification. Even in conservativetranscriptions, it is common to omit page numbers, signatures of gatherings,running titles and the like. The simple description above also elides, for thesake of simplicity, the difficulties of assigning a meaning to the phrase‘original sequence’ when it is applied to the printed characters of asource text; the ‘original sequence’ retained or recovered from aconservative transcription of the editorial view is, of course, the oneestablished during the transcription by the encoder.

↵

33.

The omission of rendition text is particularly common in systemsfor document production; it is considered good practice there, since automaticgeneration of rendition text is more reliable and more consistent thanattempting to maintain it manually in the electronic text.

↵

type	indicates type of entry, in dictionaries with multiple types.
sortKey	contains a (sortable) character sequence reflecting the entry's alphabetical position in the printed dictionary.

type	gives the type of spelling.
extent	gives the extent of the orthographic information provided.

norm	(normalized) gives a normalized form of information given by the source text in a non-normalized form
split	gives the list of split values for a merged form