9 Dictionaries
Contents
This chapter defines a module for encoding human-orientedmonolingual and multilingual dictionaries which may also beuseful for computational lexica intended for use bylanguage-processing software. Dictionaries are most familiar in theirprinted form; however, increasing numbers of dictionaries exist alsoin electronic forms which are independent of any particular printedform, but from which various displays can be produced.
Both typographically and structurally, dictionaries are extremelycomplex. In addition, dictionaries are of interest to many communities withdifferent and sometimes conflicting goals. As a result, many generalproblems of text encoding are particularly pronounced here, and morecompromises and alternatives within the encoding scheme may berequired in future.29 Two problems are particularly prominent.
First, because the structure of dictionary entries varies widelyboth among and within dictionaries, the simplest way for an encodingscheme to accommodate the entire range of structures actuallyencountered is to allow virtually any element to appear virtuallyanywhere in a dictionary entry. It is clear, however, that strong andconsistent structural principles do govern the vast majority ofconventional dictionaries, as well as many or most entries even inmore ‘exotic’ dictionaries; encoding guidelines should include these structural principles. Wetherefore define two distinct elements for dictionary entries, one(entry) which captures the regularities of many conventionaldictionary entries, and a second (entryFree) which uses thesame elements, but allows them to combine much more freely. It ishowever recommended that entry be used in preference toentryFree wherever possible.These elements and their contents are described in sections 9.2 The Structure of Dictionary Entries, 9.6 Unstructured Entries, and 9.4 Headword and Pronunciation References.
Second, since so much of the information in printed dictionaries isimplicit or highly compressed, their encoding requires clear thoughtabout whether it is to capture the precise typographic form of thesource text or the underlying structure of the information itpresents. Since both of these views of the dictionary may be ofinterest, it proves necessary to develop methods of recording both,and of recording the interrelationship between them as well. Usersinterested mainly in the printed format of the dictionary will requirean encoding to be faithful to an original printed version. However,other users will be interested primarily in capturing the lexicalinformation in a dictionary in a form suitable for further processing,which may demand the expansion or rearrangement of the informationcontained in the printed form. Further, some users wish to encodeboth of these views of the data, and retain the linksbetween related elements of the two encodings. Problems of recordingthese two different views of dictionary data are discussed in section9.5 Typographic and Lexical Information in Dictionary Data, together with mechanisms for retaining bothviews when this is desired.
To deal with this complexity, and in particular to account for thewide variety of linguistic context within which a dictionary may bedesigned, it can be necessary to customize or change the schema byproviding more restriction or possibly alternate content models forthe elements defined in this chapter. Section 9.3.2 Grammatical Informationillustrates this with the provision of a closed set of values forgrammatical descriptors.
This chapter contains a large number of examples taken fromexisting dictionaries; in each case, the original source isidentified. In presenting such examples, we have tried to retain theoriginal typographic appearance of the example as well as presenting asuggested encoding for it. Where this has not been possible (forexample in the display of pronounciation) we have adopted thetransliteration found in the electronic edition of the OxfordAdvanced Learner's Dictionary. Also, the middle dot in quotedentries is rendered with a full stop, while within the sampletranscriptions hyphenation and syllabification points are indicated bya vertical bar |, regardless of their appearance in the sourcetext.
9.1 Dictionary Body and Overall StructureTEI: Dictionary Body and Overall Structure¶
Overall, dictionaries have the same structure of front matter,body, and back matter familiar from other texts. In addition,this modules defines entry, entryFree,and superEntry as component-level elements which can occurdirectly within a text division or the text body.
- text contains a single text of any kind, whether unitary orcomposite, for example a poem or drama, a collection of essays, a novel,a dictionary, or a corpus sample.
- front (front matter) contains any prefatory matter (headers,title page, prefaces, dedications, etc.)found at the start of a document, before the main body.
- body (text body) contains the whole body of a single unitary text, excluding any front or back matter.
- back (back matter) contains any appendixes, etc. following the main part of a text.
- div (text division) contains a subdivision of the front, body, or back of atext.
- entry contains a reasonably well-structured dictionary entry.
- entryFree (unstructured entry) contains a dictionary entry which does not necessarily conform to the constraints imposed by the entry element.
- superEntry groups successive entries for a set of homographs.
- att.entryLike groups the different styles of dictionary entries.
type indicates type of entry, in dictionaries with multiple types. sortKey contains a (sortable) character sequence reflecting the entry's alphabetical position in the printed dictionary.
The front and back matter of a dictionary may well containspecialized material such as lists of common and proper nouns,grammatical tables, gazetteers, a ‘guide to the use of thedictionary’, etc. These should be tagged using elementsdefined elsewhere in these Guidelines, chiefly in the core module(chapter 3 コアモジュール) together with the specialized dictionaryelements defined in this chapter.
The body element consists of a set ofentries, optionally grouped into one or severaldiv elements. These text divisions might correspond, forexample, sections for different letters of the alphabet, or tosections for different languages in bilingual dictionaries, etc. Inprint dictionaries, entries are typically typographically distinctentities, each headed by some morphological form of the lexical itemdescribed (the headword), and sorted in alphabeticalorder or (especially for non-alphabetic scripts) in some otherconventional sequence. Dictionary entries should be encoded asdistinct successive items, each marked as an entry orentryFree element. The type attribute may be usedto distinguish different types of entries, for example main entries,related entries, run-on entries, or entries for cross-references,etc.
Some dictionaries provide distinct entries for homographs, on thebasis of etymology, part-of-speech, or both, and typically provide anumeric superscript on the headword identifying the homographnumber. In these cases each homograph should be encoded as a separateentry; the superEntry element may optionally be used to groupsuch successive homograph entries. In addition to a series ofentry elements, the superEntry may contain apreliminary form group (see section 9.3.1 Information on Written and Spoken Forms)when information about hyphenation, pronunciation, etc., is given onlyonce for two or more homograph entries. If the homograph number is tobe recorded, the global attribute n may be used for thispurpose. In some dictionaries, homographs are treated in distinctparts of the same entry; in these cases, they may be separated by useof the hom element, for which see section 9.2.1 Hierarchical Levels.
A sort key, given in the key attribute, is oftenrequired for superentries and entries, especially in cases where theorder of entries does not follow the local character-set collatingsequence (as, for example, when an entry for ‘3D’ appears at theplace where ‘three-D’ would appear).
<div type="dictionary">
<entry>
<!-- ... -->
</entry>
<entry>
<!-- ... -->
</entry>
<entry>
<!-- ... -->
</entry>
</div>
<div type="dictionary">
<entry>
<!-- ... -->
</entry>
<entry>
<!-- ... -->
</entry>
<entry>
<!-- ... -->
</entry>
</div>
</body>
<entry>
<!-- ... -->
</entry>
<entry>
<!-- ... -->
</entry>
<superEntry>
<entry type="hom" n="1"/>
<entry type="hom" n="2"/>
</superEntry>
</body>
9.2 The Structure of Dictionary EntriesTEI: The Structure of Dictionary Entries¶
A simple dictionary entry may contain information about the form ofthe word treated, its grammatical characterization, its definition,synonyms, or translation equivalents, its etymology, cross-referencesto other entries, usage information, and examples. These we refer toas the constituent parts or constituents ofthe entry; some dictionary constituents possess no internal structure,while others are most naturally viewed as groups of smaller elements,which may be marked in their own right. In some styles of markup,tags will be applied only to the low-level items, leaving theconstituent groups which contain them untagged. We distinguish theclass of top-level constituents of dictionary entries,which can occur directly within entries, from the class ofphrase-level constituents, which can normally occur onlywithin top-level constituents. The top-level constituents ofdictionary entries are described in section 9.2.2 Groups and Constituents,and documented more fully, together with their phrase-levelsub-constituents, in section 9.3 Top-level Constituents of Entries.
In addition, however, dictionary entries often have a complexhierarchical structure. For example, an entry may consist of two ormore sub-parts, each corresponding to information for a differentpart-of-speech homograph of the headword. The entry (or part-of-speechhomographs, if the entry is split this way) may also consist ofsenses, each of which may in turn be composed of two or moresub-senses, etc. Each sub-part, homograph entry, sense, or sub-sensewe call a level; at any level in an entry, any or all ofthe constituent parts of dictionary entries may appear. Thehierarchical levels of dictionary entries are documented in section9.2.1 Hierarchical Levels.
9.2.1 Hierarchical LevelsTEI: Hierarchical Levels¶
- entry contains a reasonably well-structured dictionary entry.
- entryFree (unstructured entry) contains a dictionary entry which does not necessarily conform to the constraints imposed by the entry element.
- hom (homograph) groups information relating to one homograph within an entry.
- sense groups together all information relating to one word sense in a dictionary entry, for example definitions, examples, and translation equivalents.
level gives the nesting depth of this sense. - dictScrap (dictionary scrap) encloses a part of a dictionary entry in which other phrase-level dictionary elements are freely combined.
<sense n="1"/>
<sense n="2"/>
</entry>
<hom n="1">
<sense n="1">
<!-- ... -->
</sense>
<sense n="2">
<!-- ... -->
</sense>
</hom>
<hom n="2">
<sense n="1">
<sense n="a">
<!-- ... -->
</sense>
<sense n="b">
<!-- ... -->
</sense>
</sense>
<sense n="2">
<!-- ... -->
</sense>
<sense n="3">
<!-- ... -->
</sense>
</hom>
</entry>
<entry n="1" type="hom">
<sense n="1">
<!-- ... -->
</sense>
<sense n="2">
<!-- ... -->
</sense>
</entry>
<entry n="2" type="hom">
<sense n="1">
<sense n="a">
<!-- ... -->
</sense>
<sense n="b">
<!-- ... -->
</sense>
</sense>
<sense n="2">
<!-- ... -->
</sense>
<sense n="3">
<!-- ... -->
</sense>
</entry>
</superEntry>
9.2.2 Groups and ConstituentsTEI: Groups and Constituents¶
- information about the form of the word treated (orthography,pronunciation, hyphenation, etc.)
- grammatical information (part of speech, grammatical sub-categorization,etc.)
- definitions or translations into another language
- etymology
- examples
- usage information
- cross-references to other entries
- notes
- entries (often of reduced form) for related words, typically calledrelated entries
- form (form information group) groups all the information on the written and spoken forms of one headword.
- gramGrp (grammatical information group) groups morpho-syntactic information about a lexical item, e.g. pos, gen, number, case, or iType (inflectional class).
- def (definition) contains definition text in a dictionary entry.
- cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
- usg (usage) contains usage information in a dictionary entry.
- xr (cross-reference phrase) contains a phrase, sentence, or icon referring the reader tosome other location in this or another text.
- etym (etymology) encloses the etymological information in a dictionary entry.
- re (related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry.
- note contains a note or annotation.
com.peti.tor/k@m"petit@(r)/ n person who competes. OALD
<form>
<orth>competitor</orth>
<hyph>com|peti|tor</hyph>
<pron>k@m"petit@(r)</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>person who competes.</def>
</entry>
disproof(dIs"pru:f) n. 1. facts that disprove something. 2. theact of disproving. CED
<form>
<orth>disproof</orth>
<pron>dIs"pru:f</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>facts that disprove something.</def>
</sense>
<sense n="2">
<def>the act of disproving.</def>
</sense>
</entry>
bray/breI/ n cry of an ass; sound of a trumpet. ∙ vt [VP2A]make a cry or sound of this kind. OALD
<form>
<orth>bray</orth>
<pron>breI</pron>
</form>
<hom>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>cry of an ass; sound of a trumpet.</def>
</hom>
<hom>
<gramGrp>
<pos>vt</pos>
<subc>VP2A</subc>
</gramGrp>
<def>make a cry or sound of this kind.</def>
</hom>
</entry>
ca.reen/k@"ri:n/ vt,vi 1 [VP6A] turn (a ship) on one side forcleaning, repairing, etc. 2 [VP6A, 2A] (cause to) tilt, lean over to one side.OALD
<form>
<orth>careen</orth>
<hyph>ca|reen</hyph>
<pron>k@"ri:n</pron>
</form>
<gramGrp>
<pos>vt</pos>
<pos>vi</pos>
</gramGrp>
<sense n="1">
<gramGrp>
<subc>VP6A</subc>
</gramGrp>
<def>turn (a ship) on one side for cleaning, repairing, etc.</def>
</sense>
<sense n="2">
<gramGrp>
<subc>VP6A</subc>
<subc>VP2A</subc>
</gramGrp>
<def>(cause to) tilt, lean over to one side.</def>
</sense>
</entry>
a.ban.don 1/@"band@n/ v [T1] 1 to leave completely and for ever; desert: The sailors abandoned theburning ship. 2 …abandon 2 n [U] the state when one'sfeelings and actions are uncontrolled; freedom from control...LDOCE
<form>
<orth>abandon</orth>
<hyph>a|ban|don</hyph>
<pron>@"band@n</pron>
</form>
<entry n="1">
<gramGrp>
<pos>v</pos>
<subc>T1</subc>
</gramGrp>
<sense n="1">
<def>to leave completely and for ever … </def>
</sense>
<sense n="2"/>
</entry>
<entry n="2">
<gramGrp>
<pos>n</pos>
<subc>U</subc>
</gramGrp>
<def>the state when one's feelings and actions are uncontrolled; freedom
from control…</def>
</entry>
</superEntry>
9.3 Top-level Constituents of EntriesTEI: Top-level Constituents of Entries¶
- the form element, which groups orthographic information andpronunciations, is described in section 9.3.1 Information on Written and Spoken Forms
- the gramGrp element, which groups elements for the grammaticalcharacterization of the headword, is described in section 9.3.2 Grammatical Information
- the def element, which describes the meaning of the headword, isdescribed in section 9.3.3 Sense Information
- the etym element and its special phrase-level elements are documentedin section 9.3.4 Etymological Information
- the cit element and its specific applications are described insection 9.3.3 Sense Information and section 9.3.5 Other Information
- the usg, lbl, xr, and note elements aredescribed in section 9.3.5 Other Information
- the re element, which marks nested entries for related words, isdescribed in section 9.3.6 Related Entries
9.3.1 Information on Written and Spoken FormsTEI: Information on Written and Spoken Forms¶
Dictionary entries most often begin with information about the form of the word towhich the entry applies. Typically, the orthographic form of the word, sometimesmarked for syllabification or hyphenation, is the first item in an entry. Otherinformation about the word, including variant or alternate forms, inflected forms,pronunciation, etc., is also often given.
- form (form information group) groups all the information on the written and spoken forms of one headword.
type classifies form as simple, compound, etc. - orth (orthographic form) gives the orthographic form of a dictionary headword.
type gives the type of spelling. extent gives the extent of the orthographic information provided. - pron (pronunciation) contains the pronunciation(s) of the word.
extent indicates whether the pronunciation is for whole word or part. - hyph (hyphenation) contains a hyphenated form of a dictionary headword, or hyphenation information in some other form.
- syll (syllabification) contains the syllabification of the headword.
- stress contains the stress pattern for a dictionary headword, if given separately.
- lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.
- gram (grammatical information) within an entry in a dictionary or a terminological data file, contains grammatical information relating to a term, word, or form.
type classifies the grammatical information given according to some convenient typology — in the case of terminological information, preferably the dictionary of data element types specified in ISO WD 12 620. - gen (gender) identifies the morphological gender of a lexical item, as given in the dictionary.
- number indicates grammatical number associated with a form, as given in a dictionary.
- case contains grammatical case information given by a dictionary for a given form.
- per (person) contains an indication of the grammatical person (1st, 2nd, 3rd, etc.) associated with a given inflected form in a dictionary.
- tns (tense) indicates the grammatical tense associated with a given inflected form in a dictionary.
- mood contains information about the grammatical mood of verbs (e.g. indicative, subjunctive, imperative).
- iType (inflectional class) indicates the inflectional class associated with a lexical item.
type indicates the type of indicator used to specify the inflection class, when it is necessary to distinguish between the usual abbreviated indications (e.g. inv) and other kinds of indicators, such as special codes referring to conjugation patterns, etc.
Different dictionaries use different means to mark hyphenation,syllabification, and stress, and they often use some unusual glyphs(e.g., the ‘middle dot’ for hyphenation). All ofthese glyphs are in the Unicode character set, as discussed in Character References. When transcribing representations of pronunciationthe International Phonetic Alphabet should be used. It may beconvenient (as has been done in the text of this chapter) to use asimple transliteration scheme for this; such a scheme should however beproperly documented in the header.
<orth>doom-laden</orth>
</form>
soucoupe [sukup] … DNT
<orth>soucoupe</orth>
<pron>sukup</pron>
</form>
For a variety of reasons including ease of processing, it may be desired to splitinto separate elements information which is collapsed into a single element in thesource text; orthography and hyphenation may for example be transcribed as separateelements, although given together in the source text. For a discussion of the issuesinvolved, and of methods for retaining both the presentation form and theinterpreted form, see section 9.5 Typographic and Lexical Information in Dictionary Data.
ar.ea … W7
<orth>area</orth>
<hyph>ar|ea</hyph>
<syll>ar|e|a</syll>
</form>
brag … vb. brags, bragging, bragged … CED
<orth>brag</orth>
</form>
<gramGrp>
<pos>vb</pos>
</gramGrp>
<form type="infl">
<orth>brags</orth>
<orth>bragging</orth>
<orth>bragged</orth>
</form>
horrifier[ORifje] (7) vt … [C/R]
<orth>horrifier</orth>
<pron>ORifje</pron>
<iType type="vbtable">7</iType>
</form>
MTBF abbrev. for mean time between failures. CED
<form type="abbrev">
<orth>MTBF</orth>
</form>
<form type="full">
<lbl>abbrev. for</lbl>
<orth>mean time between failures</orth>
</form>
</entry>
biryani or biriani(%bIrI"A:nI) … CED
<orth>biryani</orth>
<orth>biriani</orth>
<pron>%bIrI"A:nI</pron>
</form>
mackle("mak^@l) or macule ("makju:l) … CED
<orth>mackle</orth>
<pron>"makəl</pron>
</form>
<form>
<orth>macule</orth>
<pron>"makju:l</pron>
</form>
hospitaller or U.S. hospitaler ("hQspIt@l@) … CED
<orth>hospitaller</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>hospitaler</orth>
</form>
<pron>"hQspIt@l@</pron>
</form>
9.3.2 Grammatical InformationTEI: Grammatical Information¶
In addition, gramGrp can contain any of the morphological elements definedin section 9.3.1 Information on Written and Spoken Forms for form.Elements conveying morphological information bear differentinterpretations within gramGrp and form groups, the differencebeing that in the form group, the morphological information specifiedpertains to the specific alternate form in question, while within gramGrpit applies to the headword form. For example, in the entry ‘pinna ('pIn@) n., pl. -nae (-ni:) or -nas’CED, theword defined can be either singular or plural; the ‘pl.’ specification appliesonly to the inflected forms provided. Compare this with ‘pants (paents) pl.n.’, where ‘pl.’ applies to the headword itself.
This entry can betagged using specialized grammatical elements:médire v.t. ind. (de) … PLC
<orth>médire</orth>
</form>
<gramGrp>
<pos>v</pos>
<subc>t ind</subc>
<colloc type="prep">de</colloc>
</gramGrp>
<orth>médire</orth>
</form>
<gramGrp>
<gram type="pos">v</gram>
<gram type="subc">t ind</gram>
<gram>de</gram>
</gramGrp>
isotope adj. et n. m. … DNT
<orth>isotope</orth>
</form>
<gramGrp>
<pos>adj</pos>
</gramGrp>
<gramGrp>
<pos>n</pos>
<gen>m</gen>
</gramGrp>
wits (wIts) pl. n. 1. (sometimes sing.) the ability toreason and act, esp. quickly … CED
<form>
<orth>wits</orth>
<pron>wIts</pron>
</form>
<gramGrp>
<number>pl</number>
<pos>n</pos>
</gramGrp>
<sense n="1">
<gramGrp>
<number>sometimes sing.</number>
</gramGrp>
<def>the ability to reason and act, esp. quickly …</def>
</sense>
</entry>
9.3.3 Sense InformationTEI: Sense Information¶
Dictionaries may describe the meanings of words in a wide variety of different ways —by means of synonyms, paraphrases, translations into other languages, formaldefinitions in various highly stylized forms, etc. No attempt is made here todistinguish all the different forms which sense information may take;all of them maybe tagged using the def element described in section 9.3.3.1 Definitions.
As a special case it is frequently desirable to distinguishthe provision of translation equivalents in other languages from otherforms of sense information; the use of cittype="translation" (which groups a translation equivalent withrelated information such as its grammatical description) for this purpose is describedin section 9.3.3.2 Translation Equivalents.
9.3.3.1 DefinitionsTEI: Definitions¶
Dictionary definitions are those pieces of prose in a dictionary entry thatdescribe the meaning of some lexical item. Most often, definitions describe theheadword of the entry; in some cases, they describe translated texts, examples,etc.; see cit type="translation", section 9.3.3.2 Translation Equivalents, andcit type="example", section 9.3.5.1 例. Thedef element directly contains the text of the definition; unlikeform and gramGrp, it does not serve solely togroup a set of smaller elements. The close analysis of definition text, such asthe tagging of hypernyms, typical objects, etc., is not covered by theseGuidelines.
demigod (…) n. 1.a. a being who is part mortal, partgod. b. a lesser deity. 2. a godlike person. CP
<form>
<orth>demigod</orth>
<pron> … </pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<sense n="a">
<def>a being who is part mortal, part god.</def>
</sense>
<sense n="b">
<def>a lesser deity.</def>
</sense>
</sense>
<sense n="2">
<def>a godlike person.</def>
</sense>
</entry>
rémoulade[Remulad] nf remoulade, rémoulade (dressing containing mustard and herbs). CR
<form>
<orth>rémoulade</orth>
<pron>Remulad</pron>
</form>
<gramGrp>
<pos>n</pos>
<gen>f</gen>
</gramGrp>
<cit type="translation" xml:lang="en">
<quote>remoulade</quote>
<quote>rémoulade</quote>
<def>dressing containing mustard and herbs</def>
</cit>
</entry>
9.3.3.2 Translation EquivalentsTEI: Translation Equivalents¶
Multilingual dictionaries contain information about translations of a given wordin some source language for one or more target languages. Minimally, thedictionary provides the corresponding translation in the target language; othermaterial, such as morphological information (gender, case), various kinds ofusage restrictions, etc., may also be given. If translation equivalents are tobe distinguished from other kinds of sense information, they may be encodedusing cit type="translation". The global xml:lang attributeshould be used to specify the target language.
- cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
- lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.
dresser … (a) (Theat) habilleur m, -euse f; (Comm:window ~) étalagiste mf. she's a stylish ~ elle s'habille avec chic; V hair.(b) (tool) (for wood) raboteuse f; (for stone) rabotin m. CR
<form>
<orth>dresser</orth>
</form>
<sense n="a">
<sense>
<usg type="dom">Theat</usg>
<cit type="translation" xml:lang="fr">
<quote>habilleur</quote>
<gen>m</gen>
</cit>
<cit type="translation" xml:lang="fr">
<quote>-euse</quote>
<gen>f</gen>
</cit>
</sense>
<sense>
<usg type="dom">Comm</usg>
<form type="compound">
<orth>window <oRef/>
</orth>
</form>
<cit type="translation" xml:lang="fr">
<quote>étalagiste</quote>
<gen>mf</gen>
</cit>
</sense>
<cit type="example">
<quote>she's a stylish <oRef/>
</quote>
<cit type="translation" xml:lang="fr">
<quote>elle s'habille avec chic</quote>
</cit>
</cit>
<xr type="see">V. <ref target="#hair">hair</ref>
</xr>
</sense>
<sense n="b">
<usg type="category">tool</usg>
<sense>
<usg type="hint">for wood</usg>
<cit type="translation" xml:lang="fr">
<quote>raboteuse</quote>
<gen>f</gen>
</cit>
</sense>
<sense>
<usg type="hint">for stone</usg>
<cit type="translation" xml:lang="fr">
<quote>rabotin</quote>
<gen>m</gen>
</cit>
</sense>
</sense>
</entry>
<!-- ... -->
<entry xml:id="hair">
<!-- ... -->
</entry>
O.A.S. ... nf (abrév de Organisation de l'Armée secrète) OAS (illegal military organization supporting French rule ofAlgeria). CR
<cit type="translation" xml:lang="en">
<quote>OAS</quote>
<def>illegal military organization supporting French rule of
Algeria</def>
</cit>
</entry>
havdalah or havdoloh Hebrew.(Hebrew hAvdA"lA; Yiddish hAv"dOl@) n. Judaism. the ceremony marking the end of thesabbath or of a festival, including the blessings over wine, candles andspices. [literally: separation] CED
<form>
<orth>havdalah</orth>
<orth>havdoloh</orth>
</form>
<usg type="dom">Judaism</usg>
<def>the ceremony marking the end of the sabbath or of a festival,
including the blessings over wine, candles and spices.</def>
<cit type="translation" xml:lang="en">
<note>literally</note>
<quote>separation</quote>
</cit>
</entry>
9.3.4 Etymological InformationTEI: Etymological Information¶
- etym (etymology) encloses the etymological information in a dictionary entry.
- lang (language name) name of a language mentioned in etymological or other linguistic discussion.
- date contains a date in any format.
- mentioned marks words or phrases mentioned, not used.
- gloss identifies a phrase or word used to provide a gloss or definition for some other word or phrase.
- pron (pronunciation) contains the pronunciation(s) of the word.
- usg (usage) contains usage information in a dictionary entry.
- lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.
As in other prose, individual word forms mentioned in an etymological description aretagged with mentioned elements. Pronunciations, usage labels, and glossescan be tagged using the pron, usg, and gloss elementsdefined elsewhere in these Guidelines. In addition, the lang element may beused to identify a particular language name where it appears, in addition to usingthe xml:lang attribute of the mentioned element.
abismo m. (del gr. a priv. y byssos, fondo). Sima, granprofundidad. …
<form>
<orth>abismo</orth>
</form>
<etym>del <lang>gr.</lang>
<mentioned>a</mentioned> priv. y <mentioned>byssos</mentioned>,
<gloss>fondo</gloss>
</etym>
</entry>
neume\'n(y)üm\ n [F, fr. ML pneuma, neuma, fr. Gk pneumabreath — more at pneumatic]: any of various symbols used inthe notation of Gregorian chant … [WNC]
<etym>
<lang>F</lang> fr. <lang>ML</lang>
<mentioned>pneuma</mentioned>
<mentioned>neuma</mentioned> fr. <lang>Gk</lang>
<mentioned>pneuma</mentioned>
<gloss>breath</gloss>
<xr type="etym">more at <ptr target="#pneumatic"/>
</xr>
</etym>
<def>any of various symbols … </def>
</entry>
<!-- ... -->
<entry xml:id="pneumatic">
<!-- ... -->
</entry>
9.3.5 Other InformationTEI: Other Information¶
9.3.5.1 例TEI: 例¶
Dictionaries typically include examples of word use, usually accompanyingdefinitions or translations. In some cases, the examples are quotations fromanother source, and are occasionally followed by a citation to the author.
- q (separated from the surrounding text with quotation marks) contains material which is marked as (ostensibly) being somehow different than the surrounding text, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned not used.
- quote (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
- cit (cited quotation) contains a quotation from some other document, together with a bibliographic reference to its source. In a dictionary it may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example.
例 frequently abbreviate the headword, and so their transcription willfrequently make use of the oRef or oVar elements describedbelow in section 9.4 Headword and Pronunciation References.
multiplex/…/ adj tech having many parts: the multiplex eyeof the fly. LDOCE
<quote>the multiplex eye of the fly.</quote>
</cit>
some … 4. (S~ and any are used with more): Give me ~ more/s@'mO:(r)/OALD
<usg type="colloc">
<oRef type="cap"/> and <mentioned>any</mentioned> are used with
<mentioned>more</mentioned>
</usg>
<cit type="example">
<quote>Give me <oRef/> more</quote>
<pron extent="part">s@'mO:(r)</pron>
</cit>
</sense>
horrifier … vt to horrify. elle étaithorrifiée par la dépense she was horrified at the expense. CR
<cit type="translation" xml:lang="en">
<quote>to horrify</quote>
</cit>
<cit type="example">
<quote>elle était horrifiée par la dépense</quote>
<cit type="translation" xml:lang="en">
<quote>she was horrified at the expense.</quote>
</cit>
</cit>
</entry>
valeur … n. f. … 2. Vx. Vaillance, bravoure(spécial., au combat). ‘La valeur n'attend pas le nombre des années’(Corneille). … DNT
<usg type="time">Vx.</usg>
<def>Vaillance, bravoure (spécial., au combat)</def>
<cit type="example">
<quote>La valeur n'attend pas le nombre des années</quote>
<bibl>
<author>Corneille</author>
</bibl>
</cit>
</sense>
9.3.5.2 Usage Information and Other LabelsTEI: Usage Information and Other Labels¶
- usg (usage) contains usage information in a dictionary entry.
- lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.
- temporal use (archaic, obsolete, etc.)
- register (slang, formal, taboo, ironic, facetious, etc.)
- style (literal, figurative, etc.)
- connotative effect (e.g. derogatory, offensive)
- subject field (Astronomy, Philosophy, etc.)
- national or regional use (Australian, U.S., Midland dialect,etc.)
- geo
- geographic area
- time
- temporal, historical era (‘archaic’, ‘old’, etc.)
- dom
- domain
- reg
- register
- style
- style (figurative, literal, etc.)
- plev
- preference level (‘chiefly’, ‘usually’, etc.)
- acc
- acceptability
- lang
- language for foreign words, spellings pronunciations, etc.
- gram
- grammatical usage
- syn
- synonym given to show use
- hyper
- hypernym given to show usage
- colloc
- collocation given to show usage
- comp
- typical complement
- obj
- typical object
- subj
- typical subject
- verb
- typical verb
- hint
- unclassifiable piece of information to guide sense choice
colour or U.S. color … CED
<orth>colour</orth>
<form>
<usg type="geo">U.S.</usg>
<orth>color</orth>
</form>
</form>
palette[palEt] nf (a) (Peinture: lit, fig) palette. (b)(Boucherie) shoulder. (c) (aube de roue) paddle; (battoir à linge) beetle;(Manutention, Constr) pallet. CR
<usg type="dom">Peinture</usg>
<usg type="style">lit</usg>
<usg type="style">fig</usg>
<cit type="translation" xml:lang="en">
<quote>palette</quote>
</cit>
</sense>
<sense n="b">
<usg type="dom">Boucherie</usg>
<cit type="translation" xml:lang="en">
<quote>shoulder</quote>
</cit>
</sense>
<sense n="c">
<sense>
<usg type="syn">aube de roue</usg>
<cit type="translation" xml:lang="en">
<quote>paddle</quote>
</cit>
</sense>
<sense>
<usg type="syn">battoir à linge</usg>
<cit type="translation" xml:lang="en">
<quote>beetle</quote>
</cit>
</sense>
<sense>
<usg type="dom">Manutention</usg>
<usg type="dom">Constr</usg>
<cit type="translation" xml:lang="en">
<quote>pallet</quote>
</cit>
</sense>
</sense>
rempaillage […] nm reseating, rebottoming (with straw). CR
<cit type="translation" xml:lang="en">
<quote>reseating</quote>
<quote>rebottoming</quote>
<usg type="hint">with straw</usg>
</cit>
</entry>
9.3.5.3 Cross References to Other EntriesTEI: Cross References to Other Entries¶
Dictionary entries frequently refer to information in other entries, often usingextremely dense notations to convey the headword of the entry to be sought, theparticular part of the entry being referred to, and the nature of theinformation to be sought there (synonyms, antonyms, usage notes, etymology, anillustration, etc.)
- xr (cross-reference phrase) contains a phrase, sentence, or icon referring the reader tosome other location in this or another text.
- ref (reference) defines a reference to another location, possibly modified by additional text or comment.
- ptr/ (pointer) defines a pointer to another location.
- lbl (label) contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms:, etc.
glee … Compare madrigal (sense 1)CED
<form>
<orth>glee</orth>
</form>
<xr>Compare <ptr target="#madrigal.1"/>
</xr>
</entry>
<entry xml:id="madrigal.1">
<!-- ... -->
</entry>
hostellerie Syn. de hôtellerie (sens 1). DNT
<lbl>Syn. de</lbl>
<ref>hôtellerie (sens 1)</ref>.
</xr>
rose2 … vb. the past tense of rise. CED
<form>
<orth>rose</orth>
</form>
<xr type="inflectedForm">
<lbl>the past tense of</lbl>
<ref target="#rise">rise</ref>
</xr>
</entry>
antagonist … syn see adverseW7
<lbl>syn see</lbl>
<ref target="#adverse">adverse</ref>
</xr>
This entry refers to the illustration at the entryfor tool, not the entry itself. The targetattribute might give the identifier of the illustration itself, or of theenclosing entry (in which case the type attribute might be used toinfer that the reference is actually to the illustration, not the entry as awhole).ax, axe … → see the illus at toolOALD
<lbl>see the illus at</lbl>
<ptr target="#tool.illus"/>
</xr>
globe …V. armillaire (sphère)PR
<lbl type="sense-restriction">sphère</lbl>
</xr>
The asterisk signals a reference to the entry forincapable.entacher … Acte entaché denullité, contenant un vice de forme ou passé par un incapable*. DNT
justifier …4. IMPRIM Donner a (une ligne) une longeurconvenable au moyen de blancs (2, sens 1, 3). DNT
<usg type="dom">imprim</usg>
<def>Donner a (une ligne) une longeur convenable au moyen de
<ref target="#blanc-2.1 #blanc-2.3">blancs (2, sens 1, 3)</ref>
</def>
</sense>
9.3.5.4 注釈 within EntriesTEI: 注釈 within Entries¶
- note contains a note or annotation.
ain't(eInt)Not standard. contraction of am not, is not, are not,have not or has not: I ain't seen it.….Usage. Although the interrogative form ain't I? would be a natural contraction of am I not?, it is generally avoided in spoken English and neverused in formal English. CED
<form type="contr">
<orth>ain't</orth>
<pron>eInt</pron>
</form>
<usg type="reg">Not standard</usg>
<form type="full">
<lbl>contraction of</lbl>
<orth>am not</orth>
<orth>is not</orth>
<orth>are not</orth>
<orth>have not</orth>
<orth>has not</orth>
</form>
<cit type="example">
<quote>I ain't seen it.</quote>
</cit>
<note type="usage">Although the interrogative form <mentioned>ain't
I?</mentioned> would be a natural contraction of <mentioned>am I
not?</mentioned>, it is generally avoided in spoken English and
never used in formal English.</note>
</entry>
The formal declaration for note is given in section 3.8 注釈, Annotation, and Indexing.
9.3.6 Related EntriesTEI: Related Entries¶
The re element encloses a degenerate entry which appears in the body ofanother entry for some purpose. Many dictionaries include related entries for directderivatives or inflected forms of the entry word, or for compound words, phrases,collocations, and idioms containing the entry word.
Related entries can be complex, and may in fact include any of the information to befound in a regular entry. Therefore, the re element is defined to containthe same elements as an entry element, with the exception that it may notcontain any nested re elements.
bevvy("bEvI) Dialect. ~ n., pl. -vies. 1. a drink, esp. analcoholic one: we had a few bevvies last night. 2. a night of drinking. ~ vb. -vies, -vying, -vied (intr.) 3. to drink alcohol [probably from Old French bevee,buvee, drinking] —'bevvied adj. CED
<form>
<orth>bevvy</orth>
<pron>"bEvI</pron>
</form>
<usg type="reg">Dialect</usg>
<hom>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>a drink, esp. an alcoholic one: we had a few bevvies last night.</def>
</sense>
</hom>
<!-- ... sense 2 ... -->
<hom>
<gramGrp>
<pos>vb</pos>
</gramGrp>
<sense n="3">
<def>to drink alcohol</def>
</sense>
</hom>
<etym>probably from <lang>Old French</lang>
<mentioned>bevee</mentioned>, <mentioned>buvee</mentioned>
<gloss>drinking</gloss>
</etym>
<re type="derived">
<form>
<orth>bevvied</orth>
</form>
<gramGrp>
<pos>adj</pos>
</gramGrp>
</re>
</entry>
9.4 Headword and Pronunciation ReferencesTEI: Headword and Pronunciation References¶
- oRef/ (orthographic-form reference) in a dictionary example, indicates a reference to the orthographic form(s) of the headword.
type indicates the kind of typographic modification made to the headword in the reference. - pRef/ (pronunciation reference) in a dictionary example, indicates a reference to the pronunciation(s) of the headword.
- oVar (orthographic-variant reference) in a dictionary example, indicates a reference to variant orthographic form(s) of the headword.
type indicates the kind of variant involved. - pVar (pronunciation-variant reference) in a dictionary example, indicates a reference to variant pronunciation(s) of the headword.
- att.ptrLike.form (form pointers) common attributes for elements in the dictionary base which point at orthographic or pronunciation forms of the headword.
target identifies the orthographic form or pronunciation referred to.
- ~
- indicates a reference to the full form of the headword
- pref~
- gives a prefix to be affixed to the headword
- ~suf
- gives a suffix to be affixed to the headword
- A~
- gives the first letter in upper case, indicating that the headword iscapitalized
- pref~suf
- gives a prefix and a suffix to be affixed to the headword
- a.
- gives the initial of the word followed by a full stop, to indicate referenceto the full form of the headword
- A.
- refers to a capitalized form of the headword
The oRef element should be used for iconic or shortened references to theorthographic form(s) of the headword itself. It is an empty element and replaces, ratherthan enclosing, the reference. Note that the reference to a headword is not necessarilya simple string replacement. In the example ‘colour1, (US = color) …~ films; ~ TV; Red, blue and yelloware ~s.’OALD, the tilde stands for either headword form(colour, color).
colonel … army officer above a lieutenant-~. OALD
</def>
academy … The Royal A~ of Arts OALD
vag- or vago- comb form … : vagus nerve< vagal > < vagotomy> W7
<form>
<orth xml:id="di-o1">vag-</orth>
<orth xml:id="di-o2">vago-</orth>
</form>
<def>vagus nerve</def>
<cit type="example">
<quote>
<oRef target="#di-o1" type="nohyph"/>al</quote>
<quote>
<oRef target="#di-o2" type="nohyph"/>tomy</quote>
</cit>
</entry>
take … < Mr Burton took us forFrench > NPEG
<quote>Mr Burton <oVar type="pt">took</oVar> us for French</quote>
</cit>
take … < was quite ~n with him> NPEG
<quote>was quite <oVar type="pp">
<oRef/>n</oVar> with him</quote>
</cit>
mix up… < it's easy to mix her up with her sister > NPEG
<quote>it's easy to <oVar next="#ov2" xml:id="ov1">mix</oVar>
her <oVar prev="#ov1" xml:id="ov2">up</oVar> with her sister</quote>
</cit>
hors d'oeuvre/,aw'duhv (Fr O:r dœvr)/ n, pl hors d'oeuvres also horsd'oeuvre /'duhv(z) (Fr ~)/ NPEG
<orth>hors d'oeuvre</orth>
<pron>%aU"dUv</pron>
<form>
<usg type="lang">Fr</usg>
<pron xml:id="di-p2">OR d0vR</pron>
</form>
</form>
<form type="infl">
<number>pl</number>
<orth>hors d'oeuvres</orth>
<orth>hors d'oeuvre</orth>
<pron extent="part">"dUv(z)</pron>
<form>
<usg type="lang">Fr</usg>
<pron>
<pRef target="#di-p2"/>
</pron>
</form>
</form>
Because headword and pronunciation references can occur virtually anywhere in an entry,the oRef, oVar, pRef, and pVar elements can appearwithin any other element defined for dictionary entries.
Since existing printed dictionaries use different conventions for headword references(swung dash, first letter abbreviated form, capitalization, or italicization of the word,etc.) the exact method used should be documented in the header.
9.5 Typographic and Lexical Information in Dictionary DataTEI: Typographic and Lexical Information in Dictionary Data¶
- (a) the typographic view — thetwo-dimensional printed page, including information about line and page breaksand other features of layout
- (b) the editorial view — the one-dimensional sequence of tokenswhich can be seen as the input to the typesetting process; the wording andpunctuation of the text and the sequencing of items are visible in this view,but specifics of the typographic realization are not
- (c) the lexical view — this view includes the underlyinginformation represented in a dictionary, without concern for its exact textualform
For example, a domain indication in a dictionary entry might be broken over a line andtherefore hyphenated (‘naut-’‘ical’); the typographic view of the dictionary preserves this information. In apurely editorial view, the particular form in which the domain name is given in theparticular dictionary (as ‘nautical’, rather than ‘naut.’, ‘Naut.’, etc.)would be preserved, but the fact of the line break would not. Font shifts mightplausibly be included in either a strictly typographic or an editorial view. In thelexical view, the only information preserved concerning domain would be some standardsymbol or string representing the nautical domain (e.g. ‘naut.’) regardless of theform in which it appears in the printed dictionary.
In practice, publishers begin with the lexical view — i.e., lexical data as it mightappear in a database — and generate first the editorial view, which reflects editorialchoices for a particular dictionary (such as the use of the abbreviation ‘Naut.’for ‘nautical’, the fonts in which different types of information are to berendered, etc.), and then the typographic view, which is tied to a specific printedrendering. Computational linguists and philologists often begin with the typographicview and analyse it to obtain the editorial and/or lexical views. Some users mayultimately be concerned with retaining only the lexical view, or they may wish topreserve the typographic or editorial views as a reference text, perhaps as a guardagainst the loss or misinterpretation of information in the translation process. Someresearchers may wish to retain all three views, and study their interrelations, sinceresearch questions may well span all three views.
In general, an electronic encoding of a text will allow the recovery of at least one viewof that text (the one which guided the encoding); if editorial and typographic practicesare consistently applied in the production of a printed dictionary, or if exceptions tothe rules are consistently recorded in the electronic encoding, then it is inprinciple possible to recover the editorial view from an encoding of thelexical view, and the typographic view from an encoding of the editorial view. Inpractice, of course, the severe compression of information in dictionaries, the varietyof methods by which this compression is achieved, the complexity of formulatingcompletely explicit rules for editorial and typographic practice, and the relativerarity of complete consistency in the application of such rules, all make the mechanicaltransformation of information from one view into another something of a vexed question.
This section describes some principles which may be useful in capturing one or the otherof these views as consistently and completely as possible, and describes some methods ofattempting to capture more than one view in a single encoding. Only the editorial andlexical views are explicitly treated here; for methods of recording the physical ortypographic details of a text, see chapter 11 Representation of Primary Sources. Other approaches tothese problems, such as the use of repetitive encoding and links to show theircorrespondences, or the use of feature structures to capture the information structure,and of the ana and inst attributes to link feature structures to atranscription of the editorial view of a dictionary, are not discussed here (forfeature structures, see chapter 18 素性構造. For linkage of textual form andunderlying information, see chapter 17 簡易分析機能).
9.5.1 Editorial ViewTEI: Editorial View¶
- All characters of the source text should be retained, with the possibleexception of rendition text (for which see further below).
- Characters appearing in the source text should typically be given ascharacter data content in the document, rather than as the value of anattribute; again, rendition text may optionally be excepted from this rule.
- Apart from the characters or graphics in the source text, nothing elseshould appear as content in the document, although it may be given inattribute values.
- The material in the source text should appear in the encoding in the sameorder. Complications of the character sequence by footnotes, marginal notes,etc., text wrapping around illustrations, etc., may be dealt with by theusual means (for notes, see section 3.8 注釈, Annotation, and Indexing).31
In a very conservative transcription of the editorial view of a text, renditioncharacters (e.g. the commas, parentheses, etc., used in dictionaryentries to signal boundaries among parts of the entry) and renditiontext (for example, conjunctions joining alternate headwords, etc.) aretypically retained. Removing the tags from such a transcription will leave all andonly the characters of the source text, in their original sequence.32
A conservative encoding of the editorial view of thisentry, which retains all rendition text, might resemble the following:pinna ('pIn@) n., pl. -nae (-ni:) or-nas. 1. any leaflet of a pinnate compound leaf. 2. Zoology.a feather, wing, fin, or similarly shaped part. 3. another name for auricle (sense 2). [C18: via New Latin from Latin: wing,feather, fin] CED
<form>
<orth>pinna</orth>
<pron>("pIn@)</pron>
</form>
<gramGrp>
<pos>n.</pos>, </gramGrp>
<form type="infl">
<number>pl.</number>
<form>
<orth type="lat" extent="part">-nae</orth>
<pron extent="part">(-ni:)</pron>
</form> or <orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">1. <def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">2. <usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">3. <xr type="syn">
<lbl>another name for</lbl>
<ref target="#auricle.2">auricle (sense 2).</ref>
</xr>
</sense>
<etym>[<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>,
<gloss>fin</gloss>]</etym>
</entry>
<entry xml:id="auricle.2">
<!-- .... -->
</entry>
A somewhat simplified encoding of the editorial view of this entry might exploit thefact that rendition text is often systematically recoverable. For example,parentheses consistently appear around pronunciation in this dictionary, and thusare effectively implied by the start- and end-tags for pron.33 In such anencoding, removing the tags should exactly reproduce the sequence of characters inthe source, minus rendition text. The original character sequence can be recoveredfully by replacing tags with any rendition text they imply.
- parentheses appear around pron elements
- commas appear before inflected forms
- the word ‘or’ appears before alternate forms
- brackets appear around the etymology
- full stops appear after pos, inflection information, and sensenumbers
- senses are numbered in sequence unless otherwise specified using theglobal n attribute
<form>
<orth>pinna</orth>
<pron>"pIn@</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<form type="infl">
<number>pl</number>
<form>
<orth type="lat" extent="part">-nae</orth>
<pron extent="part">-ni:</pron>
</form>
<orth type="std" extent="part">-nas</orth>
</form>
<sense n="1">
<def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
<usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
<xr type="syn">
<lbl>another name for</lbl>
<ref>auricle (sense 2).</ref>
</xr>
</sense>
<etym>
<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>
When rendition text is omitted, it is recommended that the means to regenerate it befully documented, using the tagUsage element of the TEI header.
If rendition text is used systematically in a dictionary, with only a few mistakes orexceptions, the global attribute rend may be used on any tag to flagexceptions to the normal treatment. The values of the rend attribute arenot prescribed, but it can be used with values such as no-comma,no-left-paren, etc. Specific values can be documented using therendition element in the TEI header.
This irregularity can be recordedthus:biryani or biriani %bIrI"A:nI) any of avariety of Indian dishes … [from Urdu]
<form>
<orth>biryani</orth>
<orth>biriani</orth>
<pron rend="noleftparen">%bIrI"A:nI</pron>
</form>
<def>any of a variety of Indian dishes … </def>
<etym>from <lang>Urdu</lang>
</etym>
</entry>
9.5.2 Lexical ViewTEI: Lexical View¶
If the text to be interchanged retains only the lexical view of the text, there maybe no concern for the recoverability of the editorial (not to speak of thetypographic) view of the text. However, it is strongly recommended that the TEIheader be used to document fully the nature of all alterations to the original data,such as normalization of domain names, expansion of inflected forms, etc.
- reorganizing the order of elements in an entry to show their relationship,as in
where in a strictly lexical view onemight wish to group‘clem’ and ‘clam’ with their respectiveinflected forms.clem (klEm) or clam vb. clems, clemming, clemmed orclams, clamming, clammed CED
- splitting an entry into two separate entries, as in
For some purposes, this entry might usefully be split into anentry for ‘celibacy’ and a separate entry for ‘celibate’.celi.bacy /"selIb@sI/ n [U] state of livingunmarried, esp as a religious obligation. celi.bate /"selIb@t/ n [C]unmarried person (esp a priest who has taken a vow not to marry).OALD
- abbreviated forms have been silently expanded
- some forms have been moved to allow related forms to be grouped together
- the part of speech information has been moved to allow all forms to begiven together
- the cross reference to ‘auricle’ has been simplified
<form>
<orth>pinna</orth>
<pron>"pIn@</pron>
<form type="infl">
<number>pl</number>
<form>
<orth type="lat">pinnae</orth>
<pron>'pIni:</pron>
</form>
<orth type="std">pinnas</orth>
</form>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<sense n="1">
<def>any leaflet of a pinnate compound leaf.</def>
</sense>
<sense n="2">
<usg type="dom">Zoology</usg>
<def>a feather, wing, fin, or similarly shaped part.</def>
</sense>
<sense n="3">
<xr type="syn">
<ptr target="#auricle.2"/>
</xr>
</sense>
<etym>
<date>C18</date>: via <lang>New Latin</lang> from <lang>Latin</lang>:
<gloss>wing</gloss>, <gloss>feather</gloss>, <gloss>fin</gloss>
</etym>
</entry>
9.5.3 Retaining Both ViewsTEI: Retaining Both Views¶
It is sometimes desirable to retain both the lexical and the editorial view, in whichcase a potential conflict exists between the two. When there is a conflict betweenthe encodings for the lexical and editorial views, the principles described in thefollowing sections may be applied.
9.5.3.1 Using Attribute Values to Capture Alternate ViewsTEI: Using Attribute Values to Capture Alternate Views¶
If the order of the data is the same in both views, then both views may becaptured by encoding one ‘dominant’ view in the characterdata content of the document, and encoding the other using attribute values onthe appropriate elements. If all tags were to be removed, the remainingcharacters would be those of the dominant view of the text.
The attribute class att.lexicographic is used toprovide attributes for use in encoding multiple views of the same dictionaryentry. These attributes are available for use on all elements defined in thischapter when the base module for dictionaries is selected.
- att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
norm (normalized) gives a normalized form of information given by the source text in a non-normalized form split gives the list of split values for a merged form
- att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
orig (original) gives the original string or is the empty string when the element does not appear in the source text. mergedIn gives a reference to another element, where the original appears as a merged form.
- att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
opt (optional) indicates whether the element is optional or not
<orth>delay</orth>
<form type="infl">
<orth norm="delayed" extent="part">-ed</orth>
<tns norm="pst,pstp"/>
</form>
<form type="infl">
<orth norm="delaying" extent="part">-ing</orth>
<tns norm="prsp"/>
</form>
</form>
<orth>delay</orth>
<form type="infl">
<orth orig="-ed">delayed</orth>
<tns orig="">pst</tns>
<tns orig="">pstp</tns>
</form>
<form type="infl">
<orth orig="-ing">delaying</orth>
<tns orig="">prsp</tns>
</form>
</form>
With theeditorial view dominant, this entry might begin thus:thyr(é)ostimuline [tiR(e)ostimylin] …
<orth split="thyrostimuline, thyréostimuline">thyr(é)ostimuline</orth>
<pron split="tiRostimylin, tiReostimylin">tiR(e)ostimylin</pron>
</form>
<orth xml:id="dic-o1" orig="thyr(é)ostimuline">thyrostimuline</orth>
<pron xml:id="dic-p1" orig="tiR(e)ostimylin">tiRostimylin</pron>
</form>
<form>
<orth mergedIn="#dic-o1">thyréostimuline</orth>
<pron mergedIn="#dic-p1">tiReostimylin</pron>
</form>
<orth next="#dict-o2" xml:id="dict-o1">thyr</orth>
<orth
next="#dict-o3"
prev="#dict-o1"
xml:id="dict-o2"
opt="true">é</orth>
<orth prev="#dict-o2" xml:id="dict-o3">ostimuline</orth>
<pron next="#dict-p2" xml:id="dict-p1">tiR</pron>
<pron
next="#dict-p3"
prev="#dict-p1"
xml:id="dict-p2"
opt="true">e</pron>
<pron prev="#dict-p2" xml:id="dict-p3">ostimylin</pron>
</form>
Note that this transcription preserves both the lexical andeditorial views in a single encoding. However, it has the disadvantagethat the strings corresponding to entire words do not appear in theencoding uninterrupted, and therefore complex processing is requiredto retrieve them from the encoded text. The use of the optattribute is recommended, however, when long spans of text areinvolved, or when the optional part contains embedded tags.
pas.tel /"pastl US: pa"stel/ n1 (picture drawn with) coloured chalk made into crayons. 2… OALD
<def>coloured chalk made into crayons</def>
<def>picture drawn with coloured chalk made into crayons</def>
</sense>
<def next="#d2" xml:id="d1" opt="true">picture drawn with</def>
<def prev="#d1" xml:id="d2">coloured chalk made into crayons</def>
</sense>
9.5.3.2 Recording Original Locations of Transposed ElementsTEI: Recording Original Locations of Transposed Elements¶
The attributes described in the previous section are useful only when the orderof material is the same in both the editorial and the lexical view. When the twoviews impose different orders on the data, the standard linking mechanisms may be used toshow the original location of material transposed in an encoding of the lexicalview.
- att.lexicographic defines a set of global attributes available on elements in the base tag set for dictionaries.
opt (optional) indicates whether the element is optional or not
pinna("pIn@) n., pl. -nae (-ni:) or -nas. CED
<orth>pinna</orth>
<pron>'pIn@</pron>
<anchor xml:id="p01"/>
<form type="infl">
<number>pl</number>
<form>
<orth extent="part">-nae</orth>
<pron extent="part">-ni:</pron>
</form>
<orth extent="part">-nas</orth>
</form>
</form>
<gramGrp>
<pos location="#p01">n</pos>
</gramGrp>
9.6 Unstructured EntriesTEI: Unstructured Entries¶
The content model for the entry element provides an entrystructure suitable for many average dictionaries, as well as manyregular entries in more exotic dictionaries. However, the structureof some dictionaries does not allow the restrictions imposed by thecontent model for entry. To handle these cases, theentryFree and dictScrap elements are provided tosupport much wider variation in entry structure. ThedictScrap element offers less freedom, in that it can onlycontain phrase level elements, but it can itself appear at any pointwithin a dictionary entry where any of the structural components of adictionary entry are permitted. As such, it acts as a container forotherwise anomalous parts of an entry.
The entryFree element places no constraints at all uponthe entry: any element defined in this chapter, as well as all thenormal phrase-level and inter-level elements, canappear anywhere within it. With the entryFree element, theencoder is free to use any element anywhere, as well as to use or omitgrouping elements such as form, gramGrp, etc.
h="demigod"> <hwd>demi|god</hwd> <pr> <ph>"demIgQd</ph> </pr> <hps
ps="n"> <hsn> <def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman,
eg<cf>Hercules</cf> <pr> <ph>"h3:kjUli:z</ph> </pr> </def> </hsn>
</hps> </ent> <ref target="#DIC-OALD">OALD</ref>
<form>
<orth>demigod</orth>
<hyph>demi|god</hyph>
<pron>"demIgQd</pron>
</form>
<gramGrp>
<pos>n</pos>
</gramGrp>
<def>one who is partly divine and partly human</def>
<def>(in Gk myth, etc) the son of a god and a mortal woman, eg
<mentioned>Hercules</mentioned>
</def>
<pron>"h3:kjUli:z</pron>
</entryFree>
biryani or biriani(%bIrI"A:nI) any of a variety of Indian dishes…[fromUrdu] CED
<orth>biryani</orth> or <orth>biriani</orth>
<pron>(%bIrI"A:nI)</pron>
<def>any of a variety of Indian dishes …</def>
<etym>[from <lang>Urdu</lang>]</etym>
</entryFree>
<dictScrap>
<orth>biryani</orth> or <orth>biriani</orth>
<pron>(%bIrI"A:nI)</pron>
<def>any of a variety of Indian dishes …</def>
<etym>[from <lang>Urdu</lang>]</etym>
</dictScrap>
</entry>
9.7 辞書モジュールTEI: 辞書モジュール¶
- Elements defined: case colloc def dictScrap entry entryFree etym form gen gram gramGrp hom hyph iType lang lbl mood number oRef oVar orth pRef pVar per pos pron re sense stress subc superEntry syll tns usg xr
- Classes defined: att.entryLike att.lexicographic att.ptrLike.form model.entryLike model.formPart model.gramPart model.morphLike model.ptrLike.form
↑ Contents « 8 Transcriptions of Speech » 10 Manuscript Description