8 Transcriptions of Speech

Contents

The module described in this chapter is intended for use with awide variety of transcribed spoken material. It should be stressed,however, that the present proposals are not intended to supportunmodified every variety of research undertaken upon spoken materialnow or in the future; some discourse analysts, some phonologists, anddoubtless others may wish to extend the scheme presented here toexpress more precisely the set of distinctions they wish to draw intheir transcriptions. Speech regarded as a purely acoustic phenomenonmay well require different methods from those outlined here, as mayspeech regarded solely as a process of social interaction.

This chapter begins with a discussion of some of the problemscommonly encountered in transcribing spoken language (section 8.1 General Considerations and Overview). Section 8.2 Documenting the Source of Transcribed Speech documents someadditional TEI Header elements which may be used to document therecording or other source from which transcribed text is taken.Section 8.3 Elements Unique to Spoken Texts describes the basic structural elementsprovided by this module. Finally, section 8.4 Elements Defined Elsewhere of thischapter reviews further problems specific to the encoding of spokenlanguage, demonstrating how mechanisms and elements discussedelsewhere in these Guidelines may be applied to them.

8.1 General Considerations and Overview

There is great variation in the ways different researchers havechosen to represent speech using the written medium.25 Thisreflects the special difficulties which apply to the encoding or transcription of speech. Speech varies according toa large number of dimensions, many of which have no counterpart inwriting (for example, tempo, loudness, pitch, etc.). The audibility ofspeech recorded in natural communication situations is often less thanperfect, affecting the accuracy of the transcription. Spoken materialmay be transcribed in the course of linguistic, acoustic,anthropological, psychological, ethnographic, journalistic, or manyother types of research. Even in the same field, the interests andtheoretical perspectives of different transcribers may lead them toprefer different levels of detail in the transcript and different stylesof visual display. The production and comprehension of speech areintimately bound up with the situation in which speech occurs, far moreso than is the case for written texts. A speech transcript musttherefore include some contextual features; determining which arerelevant is not always simple. Moreover, the ethical problems inrecording and making public what was produced in a private setting andintended for a limited audience are more frequently encountered indealing with spoken texts than with written ones.

Speech also poses difficult structural problems. Unlike a writtentext, a speech event takes place in time. Its beginning and end may behard to determine and its internal composition difficult to define.Most researchers agree that the utterances or turns ofindividual speakers form an important structural component in most kindsof speech, but these are rarely as well-behaved (in the structuralsense) as paragraphs or other analogous units in written texts:speakers frequently interrupt each other, use gestures as well as words,leave remarks unfinished and so on. Speech itself, though it may berepresented as words, frequently contains items such as vocalized pauseswhich, although only semi-lexical, have immense importance in theanalysis of spoken text. Even non-vocal elements such as gestures maybe regarded as forming a component of spoken text for some analyticpurposes.Below the level of the individual utterance, speech may be segmentedinto units defined by phonological, prosodic, or syntactic phenomena;no clear agreement exists, however, even as to appropriate names forsuch segments.

Spoken texts transcribed according to the guidelines presented hereare organized as follows. The overall structure of a TEI spoken textis identical to that of any other TEI text: the TEI elementfor a spoken text contains a teiHeader element, followed by atext element. Even texts primarily composed of transcribedspeech may also include conventional front and back matter, and mayeven be organized into divisions like printed texts.

We may say, therefore, that these Guidelines regard transcribedspeech as being composed of arbitrary high-level units called texts. A spokentext might typically be a conversation between a small numberof people, a lecture, a broadcast TV item, or a similar event. Eachsuch unit has associated with it a teiHeader providingdetailed contextual information such as the source of the transcript,the identity of the participants, whether the speech is scripted orspontaneous, the physical and social setting in which the discoursetakes place and a range of other aspects. Details of the headerin general are provided in chapter 2 TEIヘダー; theparticular elements it provides for use with spoken texts aredescribed below (8.2 Documenting the Source of Transcribed Speech). Details concerningadditional elements which may be used for the documentation of participant andcontextual information are given in 15.2 Contextual Information.

Defining the bounds of a spoken text is frequently a matter ofarbitrary convention or convenience. In public or semi-public contexts,a text may be regarded as synonymous with, for example, a lecture, a broadcast item,a meeting, etc. In informal or privatecontexts, a text may be simply a conversation involving a specific groupof participants. Alternatively, researchers may elect to define spokentexts solely in terms of their duration in time or length in words. Bydefault, these Guidelines assume of a text only that:
  • it is internally cohesive,
  • it is describable by a single header, and
  • it represents a single stretch of time with no significantdiscontinuities.
Deviation from these assumptions may be specified (for example, theorg attribute on the text element may take the valuecompos to specify that the components of thetext are discrete) but is not recommended.

Within a text it may be necessary to identify subdivisionsof various kinds, if only for convenience of handling. The neutraldiv element discussed in section 4.1 本文の下位区分 isrecommended for this purpose. It may be found useful also forrepresenting subdivisions relating to discourse structure, speech acttheory, transactional analysis, etc., provided only that these divisionsare hierarchically well-behaved. Where they are not, as is often thecase, the mechanisms discussed in chapters 16 Linking, Segmentation, and Alignment and20 Non-hierarchical Structures may be used.

A spoken text may contain any of the following components:
  • utterances
  • pauses
  • vocalized but non-lexical phenomena such as coughs
  • kinesic (non-verbal, non-lexical) phenomena such as gestures
  • entirely non-linguistic incidents occurring during and possiblyinfluencing the course of speech
  • writing, regarded as a special class of incident in that it canbe transcribed, for example captions or overheads displayed duringa lecture
  • shifts or changes in vocal quality

Elements to represent all of these features of spoken language arediscussed in section 8.3 Elements Unique to Spoken Texts below.

An utterance (tagged u) may contain lexical itemsinterspersed with pauses and non-lexical vocal sounds; during anutterance, non-linguistic incidents may occur and written materials may bepresented. The u element can thus contain any of the otherelements listed, interspersed with a transcription of the lexical itemsof the utterance; the other elements may all appear between utterancesor next to each other, but except for writing they do notcontain any other elements nor any data.

A spoken text itself may be without substructure, that is, it mayconsist simply of units such as utterances or pauses, not groupedtogether in any way, or it may be subdivided. If the notion of whatconstitutes a ‘text’ in spoken discourse isinevitably rather an arbitrary one, the notion of formal subdivisionswithin such a ‘text’ may appear even more debatable.Nevertheless, such divisions may be useful for such types of discourseas debates, broadcasts, etc., where structural subdivisions can easilybe identified, or more generally wherever it is desired to aggregateutterances or other parts of a transcript into units smaller than acomplete ‘text’. 例 might include‘conversations’ or ‘discourse fragments’, or more narrowly,‘that part of the conversation where topic x was discussed’,provided only that the set of all such divisions is coextensive withthe text.

Each such division of a spoken text should be represented by thenumbered or un-numbered div elements defined in chapter 4 テキスト構造モジュール. For some detailed kinds of analysis a hierarchy of suchdivisions may be found useful; nested div elements may be usedfor this purpose, as in the following example showing how a collection made up of transcribed‘sound bites’ taken from speeches given by apolitician on different occasions, might be encoded. Each extract isregarded as adistinct div, nested within a single composite div asfollows:
<div type="soundbitessubtype="conservativeorg="composite">
 <div sample="medial"/>
 <div sample="medial"/>
 <div sample="initial"/>
</div>

As a member of the class att.declaring, thediv element may also carry a decls attribute, foruse where the divisions of a text do not all share the same set of thecontextual declarations specified in the TEI header. (See furthersection 15.3 Associating ContextualInformation with a Text).

8.2 Documenting the Source of Transcribed Speech

Where a computer file isderived from a spoken text rather than a written one, it will usuallybe desirable to record additional information about the recording orbroadcast which constitutes its source. Several additional elementsare provided for this purpose within the source description componentof the TEI Header:
  • scriptStmt (script statement) contains a citation giving details of the script used fora spoken text.
  • recordingStmt (recording statement) describes a set of recordings used as the basis for transcription of aspoken text.
  • recording (recording event) details of an audio or video recording eventused as the source of a spoken text, either directly or froma public broadcast.
    typethe kind of recording.
As a member of the att.duration class,the recording element inherits the following attribute:
  • att.duration.w3c attributes for recording normalized temporal durations.
    dur(duration) indicates the length of this element in time.

Note that detailed information about the participants or setting ofan interview or other transcript of spoken language should be recordedin the appropriate division of the profile description, discussed inchapter 15 Language Corpora, rather than as part of the sourcedescription. The source description is used to hold information onlyabout the source from which the transcribed speech was taken, forexample, any script being read and any technical details of how therecording was produced. If the source was a previously-createdtranscript, it should be treated in the same way as any other sourcetext.

The scriptStmt element should be used where it is known thatone or more of the participants in a spoken text is speaking from apreviously prepared script. The script itself should be documented inthe same way as any other written text, using one of the three citationtags mentioned above. Utterances or groups of utterances may be linkedto the script concerned by means of the decls attribute,described in section 15.3 Associating ContextualInformation with a Text.
<sourceDesc>
 <scriptStmt xml:id="CNN12">
  <bibl>
   <author>CNN Network News</author>
   <title>News headlines</title>
   <date when="1991-06-12">12 Jun 91</date>
  </bibl>
 </scriptStmt>
</sourceDesc>

The recordingStmt is used to group together informationrelating to the recordings from which the spoken text was transcribed.The element may contain either a prose description or, more helpfully,one or more recording elements, each corresponding with aparticular recording. The linkage between utterances or groups ofutterances and the relevant recording statement is made by means of thedecls attribute, described in section 15.3 Associating ContextualInformation with a Text.

The recording element should be used to provide adescription of how and by whom a recording was made. es available This informationmay be provided in the form of a prose description, within which such items as statements ofresponsibility, names, places, and dates may be identified using theappropriate phrase level tags. Alternatively, a selection of elementsfrom the model.recordingPart class may beprovided. This element class makes available the following elements:
  • date contains a date in any format.
  • time contains a phrase defining a time of day in any format.
  • respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
  • equipment provides technical details of the equipment and media used foran audio or video recording used as the source for a spoken text.
  • broadcast describes a broadcast used as the source of a spoken text.
Specializedcollections may wish to add further sub-elements to these majorcomponents. These elements should be used only forinformation relating to the recording process itself; information aboutthe setting or participants (for example) is recorded elsewhere: seesections 15.2.3 The Setting Description and 15.2.2 The Participant Description below.
<recording type="video">
 <p>U-matic recording made by college audio-visual department staff,
   available as PAL-standard VHS transfer or sound-only casssette</p>
</recording>
<recording type="audiodur="P30M">
 <respStmt>
  <resp>Location recording by</resp>
  <name>Sound Services Ltd.</name>
 </respStmt>
 <equipment>
  <p>Multiple close microphones mixed down to stereo Digital
     Audio Tape, standard play, 44.1 KHz sampling frequency</p>
 </equipment>
 <date>12 Jan 1987</date>
</recording>
When a recording has been made from a public broadcast, details ofthe broadcast itself should be supplied within the recordingelement, as a nested broadcast element. A broadcast is closelyanalogous to a publication and the broadcast element shouldtherefore contain one or the other of the bibliographic citationelements bibl, biblStruct, or biblFull. Thebroadcasting agency responsible for a broadcast is regarded as itsauthor, while other participants (for example interviewers,interviewees, directors, producers, etc.) should be specified using therespStmt or editor element with an appropriateresp (see further section 3.11 書誌項目の記述または参照).
<recording type="audiodur="P10M">
 <equipment>
  <p>Recorded from FM Radio to digital tape</p>
 </equipment>
 <broadcast>
  <bibl>
   <title>Interview on foreign policy</title>
   <author>BBC Radio 5</author>
   <respStmt>
    <resp>interviewer</resp>
    <name>Robin Day</name>
   </respStmt>
   <respStmt>
    <resp>interviewee</resp>
    <name>Margaret Thatcher</name>
   </respStmt>
   <series>
    <title>The World Tonight</title>
   </series>
   <note>First broadcast on <date when="1989-11-27">27 Nov 1989</date>
   </note>
  </bibl>
 </broadcast>
</recording>
When a broadcast contains several distinct recordings (for example acompilation), additional recording elements may be furthernested within the broadcast element.
<recording dur="P100M">
 <broadcast>
  <recording/>
 </broadcast>
</recording>

8.3 Elements Unique to Spoken Texts

The following elements characterize spoken texts, transcribedaccording to these Guidelines:
  • u (utterance) a stretch of speech usually preceded and followed bysilence or by a change of speaker.
  • pause/ a pause either between or within utterances.
  • vocal any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.
  • kinesic any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc.
  • incident any phenomenon or occurrence, not necessarily vocalized orcommunicative, for example incidental noises or other events affectingcommunication.
  • writing a passage of written text revealed to participants in the course of a spoken text.
  • shift/ marks the point at which some paralinguistic feature of a series ofutterances by any one speaker changes.

The u element may appear directly within a spoken text,and may contain any of the others; the others may also appear directly(for example, a vocal may appear between two utterances) but cannotcontain a u element. In terms of the basic TEI model,therefore, we regard the u element as analogous to aparagraph, and the others as analogous to‘phrase’ elements. The class model.divPart.spoken provides the uelement; the class model.phrase.spokenprovides the six other elements listed above.

As members of the att.ascribed class,all of these elements share the following attribute:
  • att.ascribed provides attributes for elements representing speech or actionthat can be ascribed to a specific individual.
    whoindicates the person, or group of people, to whom the element content is ascribed.
As members of the att.typed and att.timed and att.duration classes,all of these elements except shift share the following attribute:
  • att.typed provides attributes which can be used to classify or subclassify elements in any way.
    typecharacterizes the element in some sense, using any convenient classification scheme or typology.
    subtypeprovides a sub-categorization of the element, if needed
  • att.timed provides attributes common to those elements which have a duration in time, expressed either absolutely or by reference to an alignment map.
    startindicates the location within a temporal alignment at which this element begins.
    endindicates the location within a temporal alignment at which this element ends.
  • att.duration.w3c attributes for recording normalized temporal durations.
    dur(duration) indicates the length of this element in time.

Each of these elements is further discussed and specified below insections 8.3.1 Utterances to 8.3.4 Writing.

We can show the relationship between four of these constituents ofspeech using the features eventive, communicative, anthropophonic (for sounds produced by the humanvocal apparatus), and lexical:
eventivecommunicativeanthropophoniclexical
incident+---
kinesic++--
vocal+++-
utterance++++
The differences are not always clear-cut. Among incidents might be included actions like slammingthe door, which can certainly be communicative. Vocals include coughing and sneezing, whichare usuallyinvoluntary noises. Equally, the distinction between utterances andvocals is not always clear, although for many analytic purposes itwill be convenient to regard them as distinct. Individual scholarsmay differ in the way borderlines are drawn and should declare theirdefinitions in the editorialDecl element of the header (see2.3.3 編集方法宣言).
The following short extract exemplifies several of these elements. Itis recoded from a text originally transcribed in the CHILDESformat.26Each utterance is encoded using a u element (see section 8.3.1 Utterances). The speakers are defined using thelistPerson element discussed in 13.3.2 人物向け要素 and each isgiven a unique identifier also used to identify their speech. Pauses marked by the transcriber are indicatedusing the pause element (see section 8.3.2 Pausing).Non-verbal vocal effects such as the child's meowing are indicatedeither with orthographic transcriptions or with the vocalelement, and entirely non-linguistic but significant incidents such asthe sound of the toy cat are represented by the incidentelements (see section 8.3.3 Vocal, Kinesic, Incident).
<u who="#mar">you
never <pause/> take this cat for show and tell
<pause/> meow meow</u>
<u who="#ros">yeah well I dont want to</u>
<incident>
 <desc>toy cat has bell in tail which continues to make a tinkling sound</desc>
</incident>
<vocal who="#mar">
 <desc>meows</desc>
</vocal>
<u who="#ros">because it is so old</u>
<u who="#mar">how <choice>
  <orig>bout</orig>
  <reg>about</reg>
 </choice>
 <emph>your</emph> cat <pause/>yours is <emph>new</emph>
 <kinesic>
  <desc>shows Father the cat</desc>
 </kinesic>
</u>
<u trans="pausewho="#fat">thats <pause/> darling</u>
<u who="#mar">no <emph>mine</emph> isnt old
mine is just um a little dirty</u>
<!-- ... -->
<listPerson>
 <person xml:id="mar">
<!-- ... -->
 </person>
 <person xml:id="ros">
<!-- ... -->
 </person>
 <person xml:id="fat">
<!-- ... -->
 </person>
</listPerson>

This example also uses some elements common to all TEI texts,notably the reg tag for editorial regularization. Unusuallystressed syllables have been encoded with the emphelement. The seg element has also been used to segment thelast utterance. Further discussion of all of such options is providedin section 8.4 Elements Defined Elsewhere.

Contextual information is of particular importance in spoken texts,and should be provided by the TEI header of a text. In general, all ofthe information in a header is understood to be relevant to the wholeof the associated text. The element u as a member of theatt.declaring class, may however specify adifferent context by means of the decls attribute (seefurther section 15.3 Associating ContextualInformation with a Text).

8.3.1 Utterances

Each distinct utterance in a spoken text is representedby a u element, described as follows:
  • u (utterance) a stretch of speech usually preceded and followed bysilence or by a change of speaker.
    trans(transition) indicates the nature of the transition between this utterance and the previous one.

Use of the who attribute to associate the utterance with aparticular speaker is recommended but not required. Its use implies asa further requirement that all speakers be identified by aperson or personGrp element in the TEIheader (see section 15.2.2 The Participant Description). Where utterances cannot beattributed with confidence to any particular participant or group ofparticipants, the encoder may choose to define‘participants’ such as all orvarious.

The trans attribute is provided as a means ofcharacterizing the transition from one utterance to the next at asimpler level of detail than that provided by the temporal alignmentmechanism discussed in section 16.5 Synchronization. The value specifiedapplies to the transition from the preceding utterance into theutterance bearing the attribute. For example:27
<u xml:id="ts_a1who="#a">Have you heard the</u>
<u xml:id="ts_b1trans="latchingwho="#b">the election results? yes</u>
<u xml:id="ts_a2trans="pausewho="#a">it's a disaster</u>
<u xml:id="ts_b2trans="overlapwho="#b">it's a miracle</u>
In this example, utterance B1 latches on to utterance A1, while there isa marked pause between B1 and A2. B2 and A2 overlap, but by anunspecified amount. For ways of providing a more precise indication ofthe degree of overlap, see section 8.4.2 Synchronization and Overlap.

An utterance may contain either running text, or text within whichother basic structural elements are nested. Where such nesting occurs,the who attribute is considered to be inherited for theelements pause, vocal, shift andkinesic; that is, a pause or shift (etc.) within an utteranceis regarded as being produced by that speaker only, while a pausebetween utterances applies to all speakers.

Occasionally, an utterance may seem to contain other utterances,for example where one speaker interrupts himself, or when another speaker produces a ‘back channel’while they are still speaking. The present version of theseGuidelines does not support nesting of one u element withinanother. The transcriber must therefore decide whether suchinterruptions constitute a change of utterance, or whether otherelements may be used. In the case of self-interruption, theshift element may be used to show that the speaker haschanged the quality of their speech:
<u who="#a">Listen to this <shift new="reading"/>The government is
confident, he said, that the current economic problems will be
completely overcome by June<shift/> what nonsense</u>
Alternatively the incident element described in section 8.3.3 Vocal, Kinesic, Incident might be used, without transcribing the read material:
<u who="#a">Listen to this
<incident>
  <desc>reads aloud from newspaper</desc>
 </incident> what
nonsense</u>
Often, back channelling is only semi-lexicalized and may therefore berepresented using the vocal element:
<u who="#a">So what could I have done <vocal who="#b">
  <desc>tut-tutting</desc>
 </vocal> about it anyway?</u>
Where this is not possible, it is simplest to regard the back channelas a distinct utterance.

8.3.2 Pausing

Speakers differ very much in their rhythm and in particular in theamount of time they leave between words. The following element isprovided to mark occasions where the transcriber judges thatspeech has been paused, irrespective of the actual amount of silence:
  • pause/ a pause either between or within utterances.
A pause contained by an utterance applies to the speaker of thatutterance. A pause between utterances applies to all speakers. Thetype attribute may be used to categorize the pause, forexample as short, medium, or long; alternatively the attributedur may be used to indicate its length more exactly, as inthe following example:
<u>Okay <pause dur="PT2M"/>U-m<pause dur="PT75S"/>the scene opens up
<pause dur="PT50S"/> with <pause dur="PT20S"/> um <pause dur="PT145S"/> you see
a tree okay?</u>
If detailed synchronization of pausing with other vocal phenomena isrequired, the alignment mechanism defined at section 16.5 Synchronizationand discussed informally below should be used. Note that thetrans attribute mentioned in the previous section may also beused to characterize the degree of pausing between (but not within)utterances.

8.3.3 Vocal, Kinesic, Incident

These three empty elements are used to indicate the presence ofnon-transcribed semi-lexical or non-lexical phenomena either between orwithin utterances.
  • vocal any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.
  • kinesic any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc.
  • incident any phenomenon or occurrence, not necessarily vocalized orcommunicative, for example incidental noises or other events affectingcommunication.

The who attribute should be used to specify the person orgroup responsible for a vocal, kinesic, or incident which is containedwithin an utterance, if this differs from that of the enclosingutterance. The attribute must be supplied for a vocal, kinesic, or incidentwhich is not contained within an utterance.

The iterated attribute may be used to indicate that thevocal, kinesic, or incident is repeated, for example laughter as opposed to laugh.These should both be distinguished from laughing,where what is being encoded is a shift in voice quality. For this lastcase, the shift element discussed in section 8.3.6 Shifts should be used.

The desc attribute may be used to supply a conventionalrepresentation for the phenomenon, for example:
non-lexical
burp, click, cough, exhale, giggle, gulp, inhale, laugh, sneeze, sniff, snort, sob, swallow, throat, yawn
semi-lexical
ah, aha, aw, eh, ehm, er, erm, hmm, huh, mm, mmhm, oh, ooh, oops, phew, tsk, uh, uh-huh, uh-uh, um, urgh, yup
Researchers may prefer to regard some semi-lexical phenomena as‘words’ within the bounds of the u element.See further the discussion at section 8.4.3 Regularization of Word Forms below. Asfor all basic categories, the definition should be made clear in theencodingDesc element of the TEI header.
Some typical examples follow:
<u who="#jan">This is just delicious</u>
<incident>
 <desc>telephone rings</desc>
</incident>
<u who="#ann">I'll get it</u>
<u who="#tom">I used to <vocal>
  <desc>cough</desc>
 </vocal> smoke a lot</u>
<u who="#bob">
 <vocal>
  <desc>sniffs</desc>
 </vocal>He thinks he's tough
</u>
<vocal who="#ann">
 <desc>snorts</desc>
</vocal>
<!-- ... -->
<listPerson>
 <person xml:id="ann">
<!-- ... -->
 </person>
 <person xml:id="bob">
<!-- ... -->
 </person>
 <person xml:id="jan">
<!-- ... -->
 </person>
 <person xml:id="kim">
<!-- ... -->
 </person>
 <person xml:id="tom">
<!-- ... -->
 </person>
</listPerson>
Note that Ann's snorting could equally well be encoded as follows:
<u who="#ann">
 <vocal>
  <desc>snorts</desc>
 </vocal>
</u>

The extent to which encoding of incidents or kinesics is included in atranscription will depend entirely on the purpose for which thetranscription was made. As elsewhere, this will depend on theparticular research agenda and the extent to which their presence isfelt to be significant for the interpretation of spoken interactions.

8.3.4 Writing

Written text may also be encountered when speech is transcribed, forexample in a television broadcast or cinema performance, or where oneparticipant shows written text to another. The writing elementmay be used to distinguish such written elements from the spoken text inwhich they are embedded.
  • writing a passage of written text revealed to participants in the course of a spoken text.
    gradualindicates whether the writing is revealed all at once or gradually.
    sourcepoints to a bibliographic citation in the header giving a full description of the source or script of the writing.
For example, if speaker A in the breakfast table conversation in section 8.3.1 Utterances above had simply shown the newspaper passage to herinterlocutor instead of reading it, the interaction might have beenencoded as follows:
<u who="#a">look at this</u>
<writing who="#atype="newspapergradual="false">Government claims economic problems
<soCalled>over by June</soCalled>
</writing>
<u who="#a">what nonsense!</u>
If the source of the writing being displayed is known,bibliographic informationabout it may be stored in a listBibl within thesourceDesc element of the TEI Header, and then pointed tousing the source attribute. For example, in the followingexample, a lecturer displays two different versions of the samepassage of text:
<sourceDesc>
<!-- ...-->
 <bibl xml:id="FOL1">Shakespeare First Folio text</bibl>
 <bibl xml:id="FOL2">Shakespeare Second Folio text</bibl>
<!-- ...-->
</sourceDesc>
<!-- ...-->
<u>.... now compare the punctuation of lines 12 and 14 in these two
versions of page 42...
<writing source="#FOL1">....</writing>
 <writing source="#FOL2">....</writing>
</u>

8.3.5 Temporal Information

As noted above, utterances, vocals, pauses, kinesics, incidents,and writing elements all inherit attributes providing informationabout their position in time from the classes att.timed and att.duration. These attributes can be used tolink parts of the transcription very exactly with points on atimeline, or simply to indicate their duration. Note that ifstart and end point to when elementswhose temporal distance from each other is specified in a timeline,then dur is ignored.

The anchor element (see 16.4 Correspondence and Alignment) may be used asan alternative means of aligning the start and end of timed elements,and is required when the temporal alignment involves points within anelement.

For further discussion of temporal alignment and synchronizationsee 8.4.2 Synchronization and Overlap below.

8.3.6 Shifts

A common requirement in transcribing spoken language is to markpositions at which a variety of prosodic features change. Manyparalinguistic features (pitch, prominence, loudness, etc.) characterizestretches of speech which are not co-extensive with utterances or any ofthe other units discussed so far. One simple method of encoding suchunits is simply to mark their boundaries. An empty element calledshift is provided for this purpose.
  • shift/ marks the point at which some paralinguistic feature of a series ofutterances by any one speaker changes.
    featureaparalinguistic feature.
    newspecifies the new state of the paralinguistic feature specified.
A shift element may appear within an utterance or a segment tomark a significant change in the particular feature defined by itsattributes, which is then understood to apply to all subsequentutterances for the same speaker, unless changed by a new shift for thesame feature in the same speaker. Intervening utterances by otherspeakers do not normally carry the same feature.For example:
<u>
 <shift feature="loudnew="f"/>Elizabeth
</u>
<u>Yes</u>
<u>
 <shift feature="loudnew="normal"/>Come and try this <pause/>
 <shift feature="loudnew="ff"/>come on
</u>
In this example, the word Elizabeth is spoken loudly, thewords Yes and Come and try this withnormal volume, and the words come on very loudly.

The values proposed here for the feature attribute arebased on those used by the Survey of English Usage (see furtherBoase 1990); this list may be revised or supplemented usingthe methods outlined in section 23.2 Personalization and Customization.

The new attribute specifies the new state of the featurefollowing the shift. If no value is specified, it is implied that thefeature concerned ceases to be remarkable at this point: the specialvalue normal may be specified to have the sameeffect.

A list of suggested values for each of the features proposed follows:
  • tempo
    a
    allegro (fast)
    aa
    very fast
    acc
    accelerando (getting faster)
    l
    lento (slow)
    ll
    very slow
    rall
    rallentando (getting slower)
  • loud (for loudness):
    f
    forte (loud)
    ff
    very loud
    cresc
    crescendo (getting louder)
    p
    piano (soft)
    pp
    very soft
    dimin
    diminuendo (getting softer)
  • pitch (for pitch range):
    high
    high pitch-range
    low
    low pitch-range
    wide
    wide pitch-range
    narrow
    narrow pitch-range
    asc
    ascending
    desc
    descending
    monot
    monotonous
    scand
    scandent, each succeeding syllable higher than the last, generally ending in a falling tone
  • tension:
    sl
    slurred
    lax
    lax, a little slurred
    ten
    tense
    pr
    very precise
    st
    staccato, every stressed syllable being doubly stressed
    leg
    legato, every syllable receiving more or less equal stress
  • rhythm:
    rh
    beatable rhythm
    arrh
    arrhythmic, particularly halting
    spr
    spiky rising, with markedly higher unstressed syllables
    spf
    spiky falling, with markedly lower unstressed syllables
    glr
    glissando rising, like spiky rising but the unstressed syllables, usually several, also rise in pitch relative to each other
    glf
    glissando falling, like spiky falling but with the unstressed syllables also falling in pitch relative to each other
  • voice (for voice quality):
    whisp
    whisper
    breath
    breathy
    husk
    husky
    creak
    creaky
    fals
    falsetto
    reson
    resonant
    giggle
    unvoiced laugh or giggle
    laugh
    voiced laugh
    trem
    tremulous
    sob
    sobbing
    yawn
    yawning
    sigh
    sighing

A full definition of the sense of the values provided for eachfeature should be provided in the encoding description section of thetext header (see section 2.3 符号化解説).

8.4 Elements Defined Elsewhere

This section describes the following features characteristic ofspoken texts for which elements are defined elsewhere in theseGuidelines:
  • segmentation below the utterance level
  • synchronization and overlap
  • regularization of orthography
The elementsdiscussed here are not provided by the module for spoken texts. Someof them are included in the core module and others are contained inthe modules for linking and for analysis respectively. The selectionof modules and their combination to define a TEI schema is discussedin section 1.2 TEIスキーマの定義.

8.4.1 Segmentation

For some analytic purposes it may be desirable to subdivide thedivisions of a spoken text into units smaller than the individualutterance or turn. Segmentation may be performed for a number ofdifferent purposes and in terms of a variety of speech phenomena.Common examples include units defined both prosodically (by intonation,pausing, etc.) and syntactically (clauses, phrases, etc.) The termmacrosyntagm has been used by a number of researchers todefine units peculiar to speech transcripts.28

These Guidelines propose that such analyses be performed in terms ofneutrally-named segments, represented by the segelement, which is discussed more fully in section 16.3 Blocks, Segments, and Anchors.This element may take a type attribute to specify the kind ofsegmentation applicable to a particular segment, if more than one ispossible in a text. A full definition of the segmentation scheme orschemes used should be provided in the segmentation element ofthe editorialDecl element in the TEI header (see 2.3.3 編集方法宣言).

In the first example below, an utterance has been segmented accordingto a notion of syntactic completeness not necessarily marked by thespeech, although in this case a pause has been recorded between the twosentence-like units. In the second, the segments are definedprosodically (an acute accent has been used to mark the position immediately following the syllablebearing the primary accent or stress), and may be thought of as‘tone units’.
<u>
 <seg>we went to the pub yesterday</seg>
 <pause/>
 <seg>there was no one there</seg>
</u>
<u>
 <seg>although its an old ide´a</seg>
 <seg>it hasnt been on the mar´ket very long</seg>
</u>
In either case, the segmentation element in the header of thetext should specify the principles adopted to define the segments markedin this way.

When utterances are segmented end-to-end in the same way as thes-units in written texts, the s element discussed in chapter 17 簡易分析機能 may be used, either as an alternative or in addition tothe more general purpose seg element. The s elementis available without formality in all texts, but does not allow segmentsto nest within each other.

Where segments of different kinds are to be distinguished within thesame stretch of speech, the type attribute may be used, as inthe following example:
<u who="#T1">
 <seg type="C">I think </seg>
 <seg type="C">this chap was writing </seg>
 <seg type="C">and he <del type="repeated">said hello</del> said </seg>
 <seg type="M">hello </seg>
 <seg type="C">and he said </seg>
 <seg type="C">I'm going to a
  
   at twenty past seven </seg>
 <seg type="C">he said </seg>
 <seg type="M">ok </seg>
 <seg type="M">right away </seg>
 <seg type="C">and so <gap extent="1"/> on they went </seg>
 <seg type="C">and they were <gap extent="3"/>
   writing there </seg>
</u>
In this example, recoded from a corpus of language-impaired speechprepared by Fletcher and Garman, the speaker's utterance has been fullysegmented into clausal (type="C") or minor (type="M") units. An additional element paraphasia has been used to define a particularcharacteristic of this corpus for which no element exists in the TEI scheme.See further chapter 23.2 Personalization and Customization for a discussion of the way inwhich this kind of user-defined extension of the TEI scheme may beperformed and chapter 1 TEIの基礎構造 for the mechanisms on which itdepends.

This example also uses the core elements gap anddel to mark editorial decisions concerning matter completelyomitted from the transcript (because of inaudibility), and words whichhave been transcribed but which the transcriber wishes to exclude fromthe segment because they are repeated, respectively. Seesection 3.4 簡単な編集上の変更 for a discussion of these and relatedelements.

It is often the case that the desired segmentation does not respectutterance boundaries; for example, syntactic units may cross utteranceboundaries. For a detailed discussion of this problem, and the variousmethods proposed by these Guidelines for handling it, see chapter20 Non-hierarchical Structures. Methods discussed there include these:
  • ‘milestone’ tags may be used;the special-purpose shift tag discussedin section 8.3.6 Shifts is an extension of this method
  • where several discontinuous segments are to be groupedtogether to form a syntactic unit (e.g. a phrasal verb with interposedcomplement), the join element may be used

8.4.2 Synchronization and Overlap

A major difference between spoken and written texts is the importanceof the temporal dimension to the former. As a very simple example,consider the following, first as it might be represented in aplayscript:
Jane: Have you read Vanity Fair?
Stig: Yes
Lou: (nods vigorously)
To encode this, we first define the participants:
<listPerson>
 <person xml:id="stig">
<!-- ... -->
 </person>
 <person xml:id="lou">
<!-- ... -->
 </person>
 <person xml:id="jane">
<!-- ... -->
 </person>
</listPerson>
Let us assume that Stig and Lou respond to Jane's question before shehas finished asking it — a fairly normal situation in spontaneousspeech. The simplest way of representing this overlapwould be to use the trans attribute previously discussed:
<u who="#jane">have you read Vanity Fair</u>
<u trans="overlapwho="#stig">yes</u>
However, this does not allow us to indicate either the extent to whichStig's utterance is overlapped, nor does it show that there are infact three things which are synchronous: the end of Jane's utterance,Stig's whole utterance, and Lou's kinesic. To overcome these problems,more sophisticated techniques, employing the mechanisms for pointing andalignment discussed in detail in section 16.5 Synchronization, are needed.If the module for linking has been enabled (as described insection 8.4.1 Segmentation above), one way to represent the simpleexample above would be as follows:
<u xml:id="utt1who="#jane">have you read Vanity <anchor synch="#utt2 #k1xml:id="a1"/> Fair</u>
<u xml:id="utt2who="#stig">yes</u>
<kinesic xml:id="k1who="#louiterated="true">
 <desc>nods head vertically</desc>
</kinesic>

For a full discussion of this and related mechanisms, section 16.5.2 Placing Synchronous Events in Time should be consulted. The rest of the presentsection, which should be read in conjunction with that more detaileddiscussion, presents a number of ways in which these mechanisms may beapplied to the specific problem of representing temporal alignment,synchrony, or overlap in transcribing spoken texts.

In the simple example above, the first utterance (that withidentifier u1) contains an anchor element, the function ofwhich is simply to mark a point within it. The synchattribute associated with this anchor point specifies the identifiers ofthe other two elements which are to be synchronized with it:specifically, the second utterance (u2) and the kinesic (k1). Note thatone of these elements has content and the other is empty.

This example demonstrates only a way of indicating a point within oneutterance at which it can be synchronized with another utterance and akinesic. For more complex kinds of alignment, involving possiblymultiple synchronization points, an additional element is provided,known as a timeline. This consists of a series ofwhen elements, each representing a point in time, and bearingattributes which indicate its exact temporal position relative to otherelements in the same timeline, in addition to the sequencing implied byits position within it.

For example:
<timeline unit="sorigin="#TS-P1">
 <when xml:id="TS-P1absolute="12:20:01"/>
 <when xml:id="TS-P2interval="4.5since="#TS-P1"/>
 <when xml:id="TS-P6"/>
 <when xml:id="TS-P3interval="1.5since="#TS-P6"/>
</timeline>
This timeline represents four points in time, named TS-P1, TS-P2, TS-P6, and TS-P3(as with all attributes named xml:id in the TEI scheme, thenames must be unique within the document but have no othersignificance). TS-P1 is located absolutely, at 12:20:01:01 BST. TS-P2 is 4.5seconds later than TS-P2 (i.e. at 12:20:46). TS-P6 isat some unspecified time later than TS-P2 and previous to TS-P3 (this isimplied by its position within the timeline, as no attribute values havebeen specified for it). The fourth point, TS-P3, is 1.5 seconds later than TS-P6.

One or more such timelines may be specified within a spoken text, tosuit the encoder's convenience. If more than one is supplied, theorigin attribute may be used on each to specify which othertimeline element it follows. The unit attributeindicates the units used for timings given on when elementscontained by the alignment map. Alternatively, to avoid the need tospecify times explicitly, the interval attribute may be usedto indicate that all the when elements in a time line are afixed distance apart.

Three methods are available for aligning points or elements within aspoken text with the points in time defined by the timeline:
  • The elements to be synchronized may specify the identifierof a when element as the value of one of the start,end, or synch attributes
  • The whenelement may specify the identifiers of all the elements to besynchronized with it using the synch attribute
  • Afree-standing link element may be used to associate thewhen element and the elements synchronized with it byspecifying their identifiers as values for its targetattribute.
For example, using the timeline given above:
<u xml:id="TS-U1start="#TS-P2end="#TS-P3">This is my <anchor synch="#TS-P6xml:id="TS-P6A"/> turn</u>
The start of utterance TS-U1 is aligned with TS-P2 and its end with TS-P3. Thetransition between the words my and turnoccurs at point TS-P6A, which is synchronous with point TS-P6 on the timeline.
The synchronization represented by the preceding examples couldequally well be represented as follows:
<timeline origin="#ts-p1unit="s">
 <when xml:id="ts-p1absolute="12:20:01"/>
 <when
   synch="#ts-u1"
   xml:id="ts-p2"
   interval="4.5"
   since="#ts-p1"/>

 <when synch="#ts-x1xml:id="ts-p6"/>
 <when
   synch="#ts-u1"
   xml:id="ts-p3"
   interval="1.5"
   since="#ts-p6"/>

</timeline>
<u xml:id="ts-u1">This is my <anchor xml:id="ts-x1"/> turn</u>
Here, the whole of the object with identifier ts-u1 (the utterance) hasbeen aligned with two different points, ts-p2 and ts-p3. This is interpretedto mean that the utterance spans at least those two points.
Finally, a linkGrp may be used as an alternative to thesynch attribute:
<timeline origin="#TS-p1unit="s">
 <when xml:id="TS-p1absolute="12:20:01"/>
 <when xml:id="TS-p2interval="4.5since="#TS-p1"/>
 <when xml:id="TS-p6"/>
 <when xml:id="TS-p3interval="1.5since="#TS-p6"/>
</timeline>
<u xml:id="TS-u1">
 <anchor xml:id="TS-u1start"/>
This is my <anchor xml:id="TS-x1"/> turn
<anchor xml:id="TS-u1end"/>
</u>
<linkGrp type="synchronous">
 <link targets="#TS-u1start #TS-p1"/>
 <link targets="#TS-u1end #TS-p2"/>
 <link targets="#TS-x1 #TS-p6"/>
</linkGrp>
As a further example of the three possibilities, consider thefollowing dialogue, represented first as it might appear in aconventional playscript:
Tom: I used to smoke - -
Bob: (interrupting) You used to smoke?
Tom: (at the same time) a lot more than this. But I never
inhaled the smoke
A commonly used convention might be to transcribe such a passage asfollows:
(1) I used to smoke [ a lot more than this ]
(2) [ you used to smoke ]
(1) but I never inhaled the smoke
Such conventions have the drawback that they are hard to generalize orto extend beyond the very simple case presented here. Their reliance onthe accidentals of physical layout may also make them difficult totransport and to process computationally. These Guidelines recommendthe following mechanisms to encode this.
Where the whole of one or another utterance is to be synchronized,the start and end attributes may be used:
<u who="#tom">I used to smoke <anchor xml:id="TS-p10"/> a lot more than this
<anchor xml:id="TS-p20"/>but I never inhaled the smoke</u>
<u start="#TS-p10end="#TS-p20who="#bob">You used to smoke</u>
Note that the second utterance above could equally well be encoded asfollows with exactly the same effect:
<u who="#bob">
 <anchor synch="#TS-p10"/>You used to smoke<anchor synch="#TS-p20"/>
</u>
If synchronization with specific timing information is required, atimeline must be included:
<timeline origin="#TS-t01">
 <when xml:id="TS-t01"/>
 <when xml:id="TS-t02"/>
</timeline>
<u who="#tom">I used to smoke
<anchor synch="#TS-t01"/>a lot more than this
<anchor synch="#TS-t02"/>but I never inhaled the smoke</u>
<u who="#bob">
 <anchor synch="#TS-t01"/>You used to smoke<anchor synch="#TS-t02"/>
</u>
As above, since the whole of Bob's utterance is to be aligned, thestart and end attributes may be used as analternative to the second pair of anchor elements:
<u start="#TS-t01end="#TS-t02who="#bob">You used to smoke</u>
An alternative approach is to mark the synchronization by pointingfrom the timeline to the text:
<timeline origin="#TS-T01">
 <when synch="#TS-nm1 #bob-u2xml:id="TS-T01"/>
 <when synch="#TS-nm2 #bob-u2xml:id="TS-T02"/>
</timeline>
<u who="#tom">I used to smoke
<anchor xml:id="TS-nm1"/>a lot more than this
<anchor xml:id="TS-nm2"/>but I never inhaled the smoke</u>
<u xml:id="bob-u2who="#bob">You used to smoke</u>
To avoid deciding whether to point from the timeline to the text or viceversa, a linkGrp may be used:
<body>
 <timeline origin="#T001">
  <when xml:id="T001"/>
  <when xml:id="T002"/>
 </timeline>
 <u who="#tom">I used to smoke
 <anchor xml:id="NM01"/>a lot more than this
 <anchor xml:id="NM02"/>but I never inhaled the smoke</u>
 <u xml:id="bob-U2who="#bob">You used to smoke</u>
 <linkGrp type="synchronize">
  <link targets="#T001 #NM01 #bob-U2"/>
  <link targets="#T002 #NM02 #bob-U2"/>
 </linkGrp>
</body>

Note that in each case, although Bob's utterance follows Tom'ssequentially in the text, it is aligned temporally with its middle,without any need to disrupt the normal syntax of the text.

As a final example, consider the following exchange, first as itmight be represented using a musical-score-like notation, in whichpoints of synchronization are represented by vertical alignment of thetext:
A : This is |my |turn
B : |Balderdash
C : |No, |it's mine
All three speakers are simultaneous at the words my,Balderdash, and No; speakers A and C aresimultaneous at the words turn and it's.This could be encoded as follows, using pointers from the alignment mapinto the text:
<timeline origin="#TSp1">
 <when synch="#TSa1 #TSb1 #TSc1xml:id="TSp1"/>
 <when synch="#TSa2 #TSc2xml:id="TSp2"/>
</timeline>
<u who="#A">this is <anchor xml:id="TSa1"/> my <anchor xml:id="TSa2"/> turn</u>
<u who="#B">balderdash</u>
<u who="#C"> no <anchor xml:id="TSc2"/> it's mine</u>

8.4.3 Regularization of Word Forms

When speech is transcribed using ordinary orthographic notation, asis customary, some compromise must be made between the sounds producedand conventional orthography. Particularly when dealing with informal,dialectal, or other varieties of language, the transcriber willfrequently have to decide whether a particular sound is to be treated asa distinct vocabulary item or not. For example, while in a givenproject kinda may not be worth distinguishing as avocabulary item from kind of, isn't mayclearly be worth distinguishing from is not; for somepurposes, the regional variant isnae might also be worthdistinguishing in the same way.

One rule of thumb might be to allow such variation only where agenerally accepted orthographic form exists, for example, in publisheddictionaries of the language register being encoded; this has thedisadvantage that such dictionaries may not exist. Another is tomaintain a controlled (but extensible) set of normalized forms for allsuch words; this has the advantage of enforcing some degree ofconsistency among different transcribers. Occasionally, as for examplewhen transcribing abbreviations or acronyms, it may be felt necessary todepart from conventional spelling to distinguish between cases where theabbreviation is spelled out letter by letter (e.g. B B Cor V A T) and where it is pronounced as a single word(VAT or RADA). Similar considerationsmight apply to pronunciation of foreign words(e.g. Monsewer vs. Monsieur).

In general, use of punctuation, capitalization, etc., in spokentranscripts should be carefully controlled. It is important todistinguish the transcriber's intuition as to what the punctuationshould be from the marking of prosodic features such as pausing,intonation, etc.

Whatever practice is adopted, it is essential that it be clearly andfully documented in the editorial declarations section of the header.It may also be found helpful to include normalized forms ofnon-conventional spellings within the text, using the elements forsimple editorial changes described in section 3.4 簡単な編集上の変更 (seefurther section 8.4.5 Speech Management).

8.4.4 Prosody

In the absence of conventional punctuation, the marking of prosodicfeatures assumes paramount importance, since these structure andorganize the spoken message. Indeed, such prosodic features as pointsof primary or secondary stress may be represented by specializedpunctuation marks. Pauses have already been dealt with in section8.3.2 Pausing; while tone units (or intonational phrases)can be indicated by the segmentation tag discussed in section8.4.1 Segmentation. The shift element discussed in section 8.3.6 Shifts may also be used to encode some prosodic features, for example where allthat is required is the ability to record shifts in voice quality.

In a more detailed phonological transcript, it is common practiceto include a number of conventional signs to mark prosodic features ofthe surrounding or (more usually) preceding speech. Such signs may beused to record, for example, particular intonation patterns,truncation, vowel quality (long or short) etc. These signs may bepreserved in a transcript either by using conventional punctuation orby marking their presence by g elements. Where a transcriptincludes many phonetic or phonemic aspects, it will generally beconvenient to use a specialized writing system in this way (seefurther chapters vi 言語と文字集合 and 5 Representation of Non-standard Characters and Glyphs. Forrepresentation of phonemic information, the use of the InternationalPhonetic Alphabet, which can be represented in Unicode characters, isrecommended.

In the following example, special characters have been defined asfollows within the encodingDesc of the TEI header
<charDecl>
 <char xml:id="lf">
  <desc>low fall intonation</desc>
 </char>
 <char xml:id="lr">
  <desc>low rise intonation</desc>
 </char>
 <char xml:id="fr">
  <desc>fall rise intonation</desc>
 </char>
 <char xml:id="rf">
  <desc>rise fall intonation</desc>
 </char>
 <char xml:id="long">
  <desc>lengthened syllable</desc>
 </char>
 <char xml:id="short">
  <desc>shortened syllable</desc>
 </char>
</charDecl>
These declarations might additionally provide information abouthow the characters concerned should be rendered, their equivalentIPA form, etc. In the transcript itself references to them can thenbe include as follows:
<div n="Lod E-03type="exchange">
 <note>C is with a friend</note>
 <u who="#cwn">
  <unclear>Excuse me<g ref="#lf"/>
  </unclear>
  <pause/> You dont have some
   aesthetic<g ref="#short"/>
  <pause/>
  <unclear>specially on early</unclear>
   aesthetics terminology <g ref="#lr"/>
 </u>
 <u who="#aj"> No<g ref="#lf"/>
  <pause/>No<g ref="#lf"/>
  <gap extent="2"/> I'm
   afraid<g ref="#lf"/>
 </u>
 <u trans="latchingwho="#cwn"> No<g ref="#lr"/>
  <unclear>Well</unclear> thanks<g ref="#lr"/>
  <pause/> Oh<g ref="#short"/>
  <unclear>you couldnt<g ref="#short"/> can we</unclear> kind of<g ref="#long"/>
  <pause/>I mean ask you to order it for us<g ref="#long"/>
  <g ref="#fr"/>
 </u>
 <u trans="latchingwho="#aj"> Yes<g ref="#fr"/> if you know the title<g ref="#lf"/> Yeah<g ref="#lf"/>
 </u>
 <u who="#cwn">
  <gap extent="3"/>
  <gap extent="4"/>
 </u>
 <u who="#aj"> Yes thats fine. <unclear>just as soon as it comes in we'll send
     you a postcard<g ref="#lf"/>
  </unclear>
 </u>
 <listPerson>
  <person xml:id="cwn">
   <p>Customer WN</p>
  </person>
  <person xml:id="aj">
   <p>Assistant K</p>
  </person>
 </listPerson>
</div>

This example, which is taken from a corpus of bookshop serviceencounters, also demonstrates the use of the unclear and gapelements discussed in section 3.4 簡単な編集上の変更. Where words are sounclear that only their extent can be recorded, the empty gapelement may be used; where the encoder can identify the words but wishesto record a degree of uncertainty about their accuracy, theunclear element may be used. More flexible and detailedmethods of indicating uncertainty are discussed in chapter 21 確信度・責任.

For more detailed work, involving a detailed phonological transcriptincluding representation of stress and pitch patterns, it is probablybest to maintain the prosodic description in parallel with theconventional written transcript, rather than attempt to embed detailedprosodic information within it. The two parallel streams may be alignedwith each other and with other streams, for example an acousticencoding, using the general alignment mechanisms discussed in section8.3.6 Shifts.

8.4.5 Speech Management

Phenomena of speech management include disfluencies suchas filled and unfilled pauses, interrupted or repeated words,corrections, and reformulations as well as interactional devices askingfor or providing feedback. Depending on the importance attached to suchfeatures, transcribers may choose to adopt conventionalizedrepresentations for them (as discussed in section 8.4.3 Regularization of Word Formsabove), or to transcribe them using IPA or some other transcriptionsystem. To simplify analysis of the lexical features of a speechtranscript, it may be felt useful to ‘tidy away’ manyof these disfluencies. Where this policy has been adopted, theseGuidelines recommend the use of the tags for simple editorialintervention discussed in section 3.4 簡単な編集上の変更, to make explicitthe extent of regularization or normalization performed by thetranscriber.

For example, false starts, repetition, and truncated words might allbe included within a transcript, but marked as editorially deleted, inthe following way:
<u>
 <del type="truncation">s</del>see
<del type="repetition">you you</del> you know
<del type="falseStart">it's</del> he's crazy
</u>
As previously noted, the gap element may be used to markpoints within a transcript where words have been omitted, for examplebecause they are inaudible:
<gap reason="passing truckextent="10unit="syllables"/>
The unclear element may be used to mark words which havebeen included although the transcriber is unsure of their accuracy:
<u>...and then <unclear reason="passing truck">marbled queen</unclear>
</u>
Where a transcriber is believed to have incorrectly identified aword, the elements corr or sic embedded within achoice element may be used to indicateboth the original and a corrected form of it:
<choice>
 <corr>SCSI</corr>
 <sic>skuzzy</sic>
</choice>
These elements are further discussed in section 3.4.1 明らかな間違い.
Finally phenomena such as code switching, where a speaker switchesfrom one language to another, may easily be represented in atranscript by using the foreign element provided by thecore tagset:
<u who="#P1">I proposed that <foreign xml:lang="de"> wir können
 <pause dur="PT1S"/> vielleicht </foreign> go to warsaw
and <emph>vienna</emph>
</u>

8.4.6 Analytic Coding

The recommendations made here only concern the establishment of abasic text. Where a more sophisticated analysis is needed, moresophisticated methods of markup will also be appropriate, for example,using stand-off markup to indicate multiple segmentation of thestream of discourse, or complex alignment of several segments within it.Where additional annotations (sometimes called‘codes’ or ‘tags’) are used torepresent such features as linguistic word class (noun, verb, etc.),type of speech act (imperative, concessive, etc.), or information status(theme/rheme, given/new, active/semi-active/new), etc., a selection fromthe general purpose analytic tools discussed in chapters 16 Linking, Segmentation, and Alignment, 17 簡易分析機能, and 18 素性構造, may be used toadvantage.

8.5 発話モジュール

The module described in this chapter makes available the following components:The selection and combination of modules to form a TEI schema is described in1.2 TEIスキーマの定義.

Contents « 7 Performance Texts » 9 Dictionaries

注釈
25.
For adiscussion of several of these see Edwards and Lampert (eds.) (1993); Johansson (1994); andJohansson et al. (1991).
26.
The original is a conversation between two children andtheir parents, recorded in 1987, and discussed inMacWhinney (1988)
27.
Forthe most part, the examples in this chapter use no sentence punctuationexcept to mark the rising intonation often found in interrogativestatements; for further discussion, see section 8.4.3 Regularization of Word Forms.
28.
The term wasapparently first proposed by Loman and Jørgensen (1971),where it is defined as follows: ‘A text can be analysed as a sequenceof segments which are internally connected by a network of syntacticrelations and externally delimited by the absence of such relations withrespect to neighbouring segments. Such a segment is a syntactic unitcalled a macrosyntagm’ (trans. S. Johansson).


Copyright TEIコンソーシアム 2007 Licensed under the GPL. Copying and redistribution is permitted and encouraged.
Version 1.0.