8 Transcriptions of Speech
Contents
The module described in this chapter is intended for use with awide variety of transcribed spoken material. It should be stressed,however, that the present proposals are not intended to supportunmodified every variety of research undertaken upon spoken materialnow or in the future; some discourse analysts, some phonologists, anddoubtless others may wish to extend the scheme presented here toexpress more precisely the set of distinctions they wish to draw intheir transcriptions. Speech regarded as a purely acoustic phenomenonmay well require different methods from those outlined here, as mayspeech regarded solely as a process of social interaction.
This chapter begins with a discussion of some of the problemscommonly encountered in transcribing spoken language (section 8.1 General Considerations and Overview). Section 8.2 Documenting the Source of Transcribed Speech documents someadditional TEI Header elements which may be used to document therecording or other source from which transcribed text is taken.Section 8.3 Elements Unique to Spoken Texts describes the basic structural elementsprovided by this module. Finally, section 8.4 Elements Defined Elsewhere of thischapter reviews further problems specific to the encoding of spokenlanguage, demonstrating how mechanisms and elements discussedelsewhere in these Guidelines may be applied to them.
8.1 General Considerations and OverviewTEI: General Considerations and Overview¶
There is great variation in the ways different researchers havechosen to represent speech using the written medium.25 Thisreflects the special difficulties which apply to the encoding or transcription of speech. Speech varies according toa large number of dimensions, many of which have no counterpart inwriting (for example, tempo, loudness, pitch, etc.). The audibility ofspeech recorded in natural communication situations is often less thanperfect, affecting the accuracy of the transcription. Spoken materialmay be transcribed in the course of linguistic, acoustic,anthropological, psychological, ethnographic, journalistic, or manyother types of research. Even in the same field, the interests andtheoretical perspectives of different transcribers may lead them toprefer different levels of detail in the transcript and different stylesof visual display. The production and comprehension of speech areintimately bound up with the situation in which speech occurs, far moreso than is the case for written texts. A speech transcript musttherefore include some contextual features; determining which arerelevant is not always simple. Moreover, the ethical problems inrecording and making public what was produced in a private setting andintended for a limited audience are more frequently encountered indealing with spoken texts than with written ones.
Speech also poses difficult structural problems. Unlike a writtentext, a speech event takes place in time. Its beginning and end may behard to determine and its internal composition difficult to define.Most researchers agree that the utterances or turns ofindividual speakers form an important structural component in most kindsof speech, but these are rarely as well-behaved (in the structuralsense) as paragraphs or other analogous units in written texts:speakers frequently interrupt each other, use gestures as well as words,leave remarks unfinished and so on. Speech itself, though it may berepresented as words, frequently contains items such as vocalized pauseswhich, although only semi-lexical, have immense importance in theanalysis of spoken text. Even non-vocal elements such as gestures maybe regarded as forming a component of spoken text for some analyticpurposes.Below the level of the individual utterance, speech may be segmentedinto units defined by phonological, prosodic, or syntactic phenomena;no clear agreement exists, however, even as to appropriate names forsuch segments.
Spoken texts transcribed according to the guidelines presented hereare organized as follows. The overall structure of a TEI spoken textis identical to that of any other TEI text: the TEI elementfor a spoken text contains a teiHeader element, followed by atext element. Even texts primarily composed of transcribedspeech may also include conventional front and back matter, and mayeven be organized into divisions like printed texts.
We may say, therefore, that these Guidelines regard transcribedspeech as being composed of arbitrary high-level units called texts. A spokentext might typically be a conversation between a small numberof people, a lecture, a broadcast TV item, or a similar event. Eachsuch unit has associated with it a teiHeader providingdetailed contextual information such as the source of the transcript,the identity of the participants, whether the speech is scripted orspontaneous, the physical and social setting in which the discoursetakes place and a range of other aspects. Details of the headerin general are provided in chapter 2 TEIヘダー; theparticular elements it provides for use with spoken texts aredescribed below (8.2 Documenting the Source of Transcribed Speech). Details concerningadditional elements which may be used for the documentation of participant andcontextual information are given in 15.2 Contextual Information.
- it is internally cohesive,
- it is describable by a single header, and
- it represents a single stretch of time with no significantdiscontinuities.
Within a text it may be necessary to identify subdivisionsof various kinds, if only for convenience of handling. The neutraldiv element discussed in section 4.1 本文の下位区分 isrecommended for this purpose. It may be found useful also forrepresenting subdivisions relating to discourse structure, speech acttheory, transactional analysis, etc., provided only that these divisionsare hierarchically well-behaved. Where they are not, as is often thecase, the mechanisms discussed in chapters 16 Linking, Segmentation, and Alignment and20 Non-hierarchical Structures may be used.
- utterances
- pauses
- vocalized but non-lexical phenomena such as coughs
- kinesic (non-verbal, non-lexical) phenomena such as gestures
- entirely non-linguistic incidents occurring during and possiblyinfluencing the course of speech
- writing, regarded as a special class of incident in that it canbe transcribed, for example captions or overheads displayed duringa lecture
- shifts or changes in vocal quality
Elements to represent all of these features of spoken language arediscussed in section 8.3 Elements Unique to Spoken Texts below.
An utterance (tagged u) may contain lexical itemsinterspersed with pauses and non-lexical vocal sounds; during anutterance, non-linguistic incidents may occur and written materials may bepresented. The u element can thus contain any of the otherelements listed, interspersed with a transcription of the lexical itemsof the utterance; the other elements may all appear between utterancesor next to each other, but except for writing they do notcontain any other elements nor any data.
A spoken text itself may be without substructure, that is, it mayconsist simply of units such as utterances or pauses, not groupedtogether in any way, or it may be subdivided. If the notion of whatconstitutes a ‘text’ in spoken discourse isinevitably rather an arbitrary one, the notion of formal subdivisionswithin such a ‘text’ may appear even more debatable.Nevertheless, such divisions may be useful for such types of discourseas debates, broadcasts, etc., where structural subdivisions can easilybe identified, or more generally wherever it is desired to aggregateutterances or other parts of a transcript into units smaller than acomplete ‘text’. 例 might include‘conversations’ or ‘discourse fragments’, or more narrowly,‘that part of the conversation where topic x was discussed’,provided only that the set of all such divisions is coextensive withthe text.
<div sample="medial"/>
<div sample="medial"/>
<div sample="initial"/>
</div>
As a member of the class att.declaring, thediv element may also carry a decls attribute, foruse where the divisions of a text do not all share the same set of thecontextual declarations specified in the TEI header. (See furthersection 15.3 Associating ContextualInformation with a Text).
8.2 Documenting the Source of Transcribed SpeechTEI: Documenting the Source of Transcribed Speech¶
- scriptStmt (script statement) contains a citation giving details of the script used fora spoken text.
- recordingStmt (recording statement) describes a set of recordings used as the basis for transcription of aspoken text.
- recording (recording event) details of an audio or video recording eventused as the source of a spoken text, either directly or froma public broadcast.
type the kind of recording.
- att.duration.w3c attributes for recording normalized temporal durations.
dur (duration) indicates the length of this element in time.
Note that detailed information about the participants or setting ofan interview or other transcript of spoken language should be recordedin the appropriate division of the profile description, discussed inchapter 15 Language Corpora, rather than as part of the sourcedescription. The source description is used to hold information onlyabout the source from which the transcribed speech was taken, forexample, any script being read and any technical details of how therecording was produced. If the source was a previously-createdtranscript, it should be treated in the same way as any other sourcetext.
<scriptStmt xml:id="CNN12">
<bibl>
<author>CNN Network News</author>
<title>News headlines</title>
<date when="1991-06-12">12 Jun 91</date>
</bibl>
</scriptStmt>
</sourceDesc>
The recordingStmt is used to group together informationrelating to the recordings from which the spoken text was transcribed.The element may contain either a prose description or, more helpfully,one or more recording elements, each corresponding with aparticular recording. The linkage between utterances or groups ofutterances and the relevant recording statement is made by means of thedecls attribute, described in section 15.3 Associating ContextualInformation with a Text.
- date contains a date in any format.
- time contains a phrase defining a time of day in any format.
- respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
- equipment provides technical details of the equipment and media used foran audio or video recording used as the source for a spoken text.
- broadcast describes a broadcast used as the source of a spoken text.
<p>U-matic recording made by college audio-visual department staff,
available as PAL-standard VHS transfer or sound-only casssette</p>
</recording>
<respStmt>
<resp>Location recording by</resp>
<name>Sound Services Ltd.</name>
</respStmt>
<equipment>
<p>Multiple close microphones mixed down to stereo Digital
Audio Tape, standard play, 44.1 KHz sampling frequency</p>
</equipment>
<date>12 Jan 1987</date>
</recording>
<equipment>
<p>Recorded from FM Radio to digital tape</p>
</equipment>
<broadcast>
<bibl>
<title>Interview on foreign policy</title>
<author>BBC Radio 5</author>
<respStmt>
<resp>interviewer</resp>
<name>Robin Day</name>
</respStmt>
<respStmt>
<resp>interviewee</resp>
<name>Margaret Thatcher</name>
</respStmt>
<series>
<title>The World Tonight</title>
</series>
<note>First broadcast on <date when="1989-11-27">27 Nov 1989</date>
</note>
</bibl>
</broadcast>
</recording>
8.3 Elements Unique to Spoken TextsTEI: Elements Unique to Spoken Texts¶
- u (utterance) a stretch of speech usually preceded and followed bysilence or by a change of speaker.
- pause/ a pause either between or within utterances.
- vocal any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.
- kinesic any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc.
- incident any phenomenon or occurrence, not necessarily vocalized orcommunicative, for example incidental noises or other events affectingcommunication.
- writing a passage of written text revealed to participants in the course of a spoken text.
- shift/ marks the point at which some paralinguistic feature of a series ofutterances by any one speaker changes.
The u element may appear directly within a spoken text,and may contain any of the others; the others may also appear directly(for example, a vocal may appear between two utterances) but cannotcontain a u element. In terms of the basic TEI model,therefore, we regard the u element as analogous to aparagraph, and the others as analogous to‘phrase’ elements. The class model.divPart.spoken provides the uelement; the class model.phrase.spokenprovides the six other elements listed above.
- att.ascribed provides attributes for elements representing speech or actionthat can be ascribed to a specific individual.
who indicates the person, or group of people, to whom the element content is ascribed.
- att.typed provides attributes which can be used to classify or subclassify elements in any way.
type characterizes the element in some sense, using any convenient classification scheme or typology. subtype provides a sub-categorization of the element, if needed - att.timed provides attributes common to those elements which have a duration in time, expressed either absolutely or by reference to an alignment map.
start indicates the location within a temporal alignment at which this element begins. end indicates the location within a temporal alignment at which this element ends. - att.duration.w3c attributes for recording normalized temporal durations.
dur (duration) indicates the length of this element in time.
Each of these elements is further discussed and specified below insections 8.3.1 Utterances to 8.3.4 Writing.
eventive | communicative | anthropophonic | lexical | |
incident | + | - | - | - |
kinesic | + | + | - | - |
vocal | + | + | + | - |
utterance | + | + | + | + |
never <pause/> take this cat for show and tell
<pause/> meow meow</u>
<u who="#ros">yeah well I dont want to</u>
<incident>
<desc>toy cat has bell in tail which continues to make a tinkling sound</desc>
</incident>
<vocal who="#mar">
<desc>meows</desc>
</vocal>
<u who="#ros">because it is so old</u>
<u who="#mar">how <choice>
<orig>bout</orig>
<reg>about</reg>
</choice>
<emph>your</emph> cat <pause/>yours is <emph>new</emph>
<kinesic>
<desc>shows Father the cat</desc>
</kinesic>
</u>
<u trans="pause" who="#fat">thats <pause/> darling</u>
<u who="#mar">no <emph>mine</emph> isnt old
mine is just um a little dirty</u>
<!-- ... -->
<listPerson>
<person xml:id="mar">
<!-- ... -->
</person>
<person xml:id="ros">
<!-- ... -->
</person>
<person xml:id="fat">
<!-- ... -->
</person>
</listPerson>
This example also uses some elements common to all TEI texts,notably the reg tag for editorial regularization. Unusuallystressed syllables have been encoded with the emphelement. The seg element has also been used to segment thelast utterance. Further discussion of all of such options is providedin section 8.4 Elements Defined Elsewhere.
Contextual information is of particular importance in spoken texts,and should be provided by the TEI header of a text. In general, all ofthe information in a header is understood to be relevant to the wholeof the associated text. The element u as a member of theatt.declaring class, may however specify adifferent context by means of the decls attribute (seefurther section 15.3 Associating ContextualInformation with a Text).
- » 8.3.2 Pausing
- Home | 目次
8.3.1 UtterancesTEI: Utterances¶
- u (utterance) a stretch of speech usually preceded and followed bysilence or by a change of speaker.
trans (transition) indicates the nature of the transition between this utterance and the previous one.
Use of the who attribute to associate the utterance with aparticular speaker is recommended but not required. Its use implies asa further requirement that all speakers be identified by aperson or personGrp element in the TEIheader (see section 15.2.2 The Participant Description). Where utterances cannot beattributed with confidence to any particular participant or group ofparticipants, the encoder may choose to define‘participants’ such as all orvarious.
<u xml:id="ts_b1" trans="latching" who="#b">the election results? yes</u>
<u xml:id="ts_a2" trans="pause" who="#a">it's a disaster</u>
<u xml:id="ts_b2" trans="overlap" who="#b">it's a miracle</u>
An utterance may contain either running text, or text within whichother basic structural elements are nested. Where such nesting occurs,the who attribute is considered to be inherited for theelements pause, vocal, shift andkinesic; that is, a pause or shift (etc.) within an utteranceis regarded as being produced by that speaker only, while a pausebetween utterances applies to all speakers.
confident, he said, that the current economic problems will be
completely overcome by June<shift/> what nonsense</u>
<incident>
<desc>reads aloud from newspaper</desc>
</incident> what
nonsense</u>
<desc>tut-tutting</desc>
</vocal> about it anyway?</u>
8.3.2 PausingTEI: Pausing¶
- pause/ a pause either between or within utterances.
<pause dur="PT50S"/> with <pause dur="PT20S"/> um <pause dur="PT145S"/> you see
a tree okay?</u>
- « 8.3.2 Pausing
- » 8.3.4 Writing
- Home | 目次
8.3.3 Vocal, Kinesic, IncidentTEI: Vocal, Kinesic, Incident¶
- vocal any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc.
- kinesic any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc.
- incident any phenomenon or occurrence, not necessarily vocalized orcommunicative, for example incidental noises or other events affectingcommunication.
The who attribute should be used to specify the person orgroup responsible for a vocal, kinesic, or incident which is containedwithin an utterance, if this differs from that of the enclosingutterance. The attribute must be supplied for a vocal, kinesic, or incidentwhich is not contained within an utterance.
The iterated attribute may be used to indicate that thevocal, kinesic, or incident is repeated, for example laughter as opposed to laugh.These should both be distinguished from laughing,where what is being encoded is a shift in voice quality. For this lastcase, the shift element discussed in section 8.3.6 Shifts should be used.
- non-lexical
- burp, click, cough, exhale, giggle, gulp, inhale, laugh, sneeze, sniff, snort, sob, swallow, throat, yawn
- semi-lexical
- ah, aha, aw, eh, ehm, er, erm, hmm, huh, mm, mmhm, oh, ooh, oops, phew, tsk, uh, uh-huh, uh-uh, um, urgh, yup
<incident>
<desc>telephone rings</desc>
</incident>
<u who="#ann">I'll get it</u>
<u who="#tom">I used to <vocal>
<desc>cough</desc>
</vocal> smoke a lot</u>
<u who="#bob">
<vocal>
<desc>sniffs</desc>
</vocal>He thinks he's tough
</u>
<vocal who="#ann">
<desc>snorts</desc>
</vocal>
<!-- ... -->
<listPerson>
<person xml:id="ann">
<!-- ... -->
</person>
<person xml:id="bob">
<!-- ... -->
</person>
<person xml:id="jan">
<!-- ... -->
</person>
<person xml:id="kim">
<!-- ... -->
</person>
<person xml:id="tom">
<!-- ... -->
</person>
</listPerson>
<vocal>
<desc>snorts</desc>
</vocal>
</u>
The extent to which encoding of incidents or kinesics is included in atranscription will depend entirely on the purpose for which thetranscription was made. As elsewhere, this will depend on theparticular research agenda and the extent to which their presence isfelt to be significant for the interpretation of spoken interactions.
8.3.4 WritingTEI: Writing¶
- writing a passage of written text revealed to participants in the course of a spoken text.
gradual indicates whether the writing is revealed all at once or gradually. source points to a bibliographic citation in the header giving a full description of the source or script of the writing.
<writing who="#a" type="newspaper" gradual="false">Government claims economic problems
<soCalled>over by June</soCalled>
</writing>
<u who="#a">what nonsense!</u>
<!-- ...-->
<bibl xml:id="FOL1">Shakespeare First Folio text</bibl>
<bibl xml:id="FOL2">Shakespeare Second Folio text</bibl>
<!-- ...-->
</sourceDesc>
<!-- ...-->
<u>.... now compare the punctuation of lines 12 and 14 in these two
versions of page 42...
<writing source="#FOL1">....</writing>
<writing source="#FOL2">....</writing>
</u>
- « 8.3.4 Writing
- » 8.3.6 Shifts
- Home | 目次
8.3.5 Temporal InformationTEI: Temporal Information¶
As noted above, utterances, vocals, pauses, kinesics, incidents,and writing elements all inherit attributes providing informationabout their position in time from the classes att.timed and att.duration. These attributes can be used tolink parts of the transcription very exactly with points on atimeline, or simply to indicate their duration. Note that ifstart and end point to when elementswhose temporal distance from each other is specified in a timeline,then dur is ignored.
The anchor element (see 16.4 Correspondence and Alignment) may be used asan alternative means of aligning the start and end of timed elements,and is required when the temporal alignment involves points within anelement.
For further discussion of temporal alignment and synchronizationsee 8.4.2 Synchronization and Overlap below.
8.3.6 ShiftsTEI: Shifts¶
- shift/ marks the point at which some paralinguistic feature of a series ofutterances by any one speaker changes.
feature aparalinguistic feature. new specifies the new state of the paralinguistic feature specified.
<shift feature="loud" new="f"/>Elizabeth
</u>
<u>Yes</u>
<u>
<shift feature="loud" new="normal"/>Come and try this <pause/>
<shift feature="loud" new="ff"/>come on
</u>
The values proposed here for the feature attribute arebased on those used by the Survey of English Usage (see furtherBoase 1990); this list may be revised or supplemented usingthe methods outlined in section 23.2 Personalization and Customization.
The new attribute specifies the new state of the featurefollowing the shift. If no value is specified, it is implied that thefeature concerned ceases to be remarkable at this point: the specialvalue normal may be specified to have the sameeffect.
- tempo
- a
- allegro (fast)
- aa
- very fast
- acc
- accelerando (getting faster)
- l
- lento (slow)
- ll
- very slow
- rall
- rallentando (getting slower)
- loud (for loudness):
- f
- forte (loud)
- ff
- very loud
- cresc
- crescendo (getting louder)
- p
- piano (soft)
- pp
- very soft
- dimin
- diminuendo (getting softer)
- pitch (for pitch range):
- high
- high pitch-range
- low
- low pitch-range
- wide
- wide pitch-range
- narrow
- narrow pitch-range
- asc
- ascending
- desc
- descending
- monot
- monotonous
- scand
- scandent, each succeeding syllable higher than the last, generally ending in a falling tone
- tension:
- sl
- slurred
- lax
- lax, a little slurred
- ten
- tense
- pr
- very precise
- st
- staccato, every stressed syllable being doubly stressed
- leg
- legato, every syllable receiving more or less equal stress
- rhythm:
- rh
- beatable rhythm
- arrh
- arrhythmic, particularly halting
- spr
- spiky rising, with markedly higher unstressed syllables
- spf
- spiky falling, with markedly lower unstressed syllables
- glr
- glissando rising, like spiky rising but the unstressed syllables, usually several, also rise in pitch relative to each other
- glf
- glissando falling, like spiky falling but with the unstressed syllables also falling in pitch relative to each other
- voice (for voice quality):
- whisp
- whisper
- breath
- breathy
- husk
- husky
- creak
- creaky
- fals
- falsetto
- reson
- resonant
- giggle
- unvoiced laugh or giggle
- laugh
- voiced laugh
- trem
- tremulous
- sob
- sobbing
- yawn
- yawning
- sigh
- sighing
A full definition of the sense of the values provided for eachfeature should be provided in the encoding description section of thetext header (see section 2.3 符号化解説).
8.4 Elements Defined ElsewhereTEI: Elements Defined Elsewhere¶
- segmentation below the utterance level
- synchronization and overlap
- regularization of orthography
8.4.1 SegmentationTEI: Segmentation¶
For some analytic purposes it may be desirable to subdivide thedivisions of a spoken text into units smaller than the individualutterance or turn. Segmentation may be performed for a number ofdifferent purposes and in terms of a variety of speech phenomena.Common examples include units defined both prosodically (by intonation,pausing, etc.) and syntactically (clauses, phrases, etc.) The termmacrosyntagm has been used by a number of researchers todefine units peculiar to speech transcripts.28
These Guidelines propose that such analyses be performed in terms ofneutrally-named segments, represented by the segelement, which is discussed more fully in section 16.3 Blocks, Segments, and Anchors.This element may take a type attribute to specify the kind ofsegmentation applicable to a particular segment, if more than one ispossible in a text. A full definition of the segmentation scheme orschemes used should be provided in the segmentation element ofthe editorialDecl element in the TEI header (see 2.3.3 編集方法宣言).
<seg>we went to the pub yesterday</seg>
<pause/>
<seg>there was no one there</seg>
</u>
<u>
<seg>although its an old ide´a</seg>
<seg>it hasnt been on the mar´ket very long</seg>
</u>
When utterances are segmented end-to-end in the same way as thes-units in written texts, the s element discussed in chapter 17 簡易分析機能 may be used, either as an alternative or in addition tothe more general purpose seg element. The s elementis available without formality in all texts, but does not allow segmentsto nest within each other.
<seg type="C">I think </seg>
<seg type="C">this chap was writing </seg>
<seg type="C">and he <del type="repeated">said hello</del> said </seg>
<seg type="M">hello </seg>
<seg type="C">and he said </seg>
<seg type="C">I'm going to a
at twenty past seven </seg>
<seg type="C">he said </seg>
<seg type="M">ok </seg>
<seg type="M">right away </seg>
<seg type="C">and so <gap extent="1"/> on they went </seg>
<seg type="C">and they were <gap extent="3"/>
writing there </seg>
</u>
This example also uses the core elements gap anddel to mark editorial decisions concerning matter completelyomitted from the transcript (because of inaudibility), and words whichhave been transcribed but which the transcriber wishes to exclude fromthe segment because they are repeated, respectively. Seesection 3.4 簡単な編集上の変更 for a discussion of these and relatedelements.
- ‘milestone’ tags may be used;the special-purpose shift tag discussedin section 8.3.6 Shifts is an extension of this method
- where several discontinuous segments are to be groupedtogether to form a syntactic unit (e.g. a phrasal verb with interposedcomplement), the join element may be used
8.4.2 Synchronization and OverlapTEI: Synchronization and Overlap¶
Stig: Yes
Lou: (nods vigorously)
<person xml:id="stig">
<!-- ... -->
</person>
<person xml:id="lou">
<!-- ... -->
</person>
<person xml:id="jane">
<!-- ... -->
</person>
</listPerson>
<u trans="overlap" who="#stig">yes</u>
<u xml:id="utt2" who="#stig">yes</u>
<kinesic xml:id="k1" who="#lou" iterated="true">
<desc>nods head vertically</desc>
</kinesic>
For a full discussion of this and related mechanisms, section 16.5.2 Placing Synchronous Events in Time should be consulted. The rest of the presentsection, which should be read in conjunction with that more detaileddiscussion, presents a number of ways in which these mechanisms may beapplied to the specific problem of representing temporal alignment,synchrony, or overlap in transcribing spoken texts.
In the simple example above, the first utterance (that withidentifier u1) contains an anchor element, the function ofwhich is simply to mark a point within it. The synchattribute associated with this anchor point specifies the identifiers ofthe other two elements which are to be synchronized with it:specifically, the second utterance (u2) and the kinesic (k1). Note thatone of these elements has content and the other is empty.
This example demonstrates only a way of indicating a point within oneutterance at which it can be synchronized with another utterance and akinesic. For more complex kinds of alignment, involving possiblymultiple synchronization points, an additional element is provided,known as a timeline. This consists of a series ofwhen elements, each representing a point in time, and bearingattributes which indicate its exact temporal position relative to otherelements in the same timeline, in addition to the sequencing implied byits position within it.
<when xml:id="TS-P1" absolute="12:20:01"/>
<when xml:id="TS-P2" interval="4.5" since="#TS-P1"/>
<when xml:id="TS-P6"/>
<when xml:id="TS-P3" interval="1.5" since="#TS-P6"/>
</timeline>
One or more such timelines may be specified within a spoken text, tosuit the encoder's convenience. If more than one is supplied, theorigin attribute may be used on each to specify which othertimeline element it follows. The unit attributeindicates the units used for timings given on when elementscontained by the alignment map. Alternatively, to avoid the need tospecify times explicitly, the interval attribute may be usedto indicate that all the when elements in a time line are afixed distance apart.
- The elements to be synchronized may specify the identifierof a when element as the value of one of the start,end, or synch attributes
- The whenelement may specify the identifiers of all the elements to besynchronized with it using the synch attribute
- Afree-standing link element may be used to associate thewhen element and the elements synchronized with it byspecifying their identifiers as values for its targetattribute.
<when xml:id="ts-p1" absolute="12:20:01"/>
<when
synch="#ts-u1"
xml:id="ts-p2"
interval="4.5"
since="#ts-p1"/>
<when synch="#ts-x1" xml:id="ts-p6"/>
<when
synch="#ts-u1"
xml:id="ts-p3"
interval="1.5"
since="#ts-p6"/>
</timeline>
<u xml:id="ts-u1">This is my <anchor xml:id="ts-x1"/> turn</u>
<when xml:id="TS-p1" absolute="12:20:01"/>
<when xml:id="TS-p2" interval="4.5" since="#TS-p1"/>
<when xml:id="TS-p6"/>
<when xml:id="TS-p3" interval="1.5" since="#TS-p6"/>
</timeline>
<u xml:id="TS-u1">
<anchor xml:id="TS-u1start"/>
This is my <anchor xml:id="TS-x1"/> turn
<anchor xml:id="TS-u1end"/>
</u>
<linkGrp type="synchronous">
<link targets="#TS-u1start #TS-p1"/>
<link targets="#TS-u1end #TS-p2"/>
<link targets="#TS-x1 #TS-p6"/>
</linkGrp>
Bob: (interrupting) You used to smoke?
Tom: (at the same time) a lot more than this. But I never
inhaled the smoke
(2) [ you used to smoke ]
(1) but I never inhaled the smoke
<anchor xml:id="TS-p20"/>but I never inhaled the smoke</u>
<u start="#TS-p10" end="#TS-p20" who="#bob">You used to smoke</u>
<anchor synch="#TS-p10"/>You used to smoke<anchor synch="#TS-p20"/>
</u>
<when xml:id="TS-t01"/>
<when xml:id="TS-t02"/>
</timeline>
<u who="#tom">I used to smoke
<anchor synch="#TS-t01"/>a lot more than this
<anchor synch="#TS-t02"/>but I never inhaled the smoke</u>
<u who="#bob">
<anchor synch="#TS-t01"/>You used to smoke<anchor synch="#TS-t02"/>
</u>
<when synch="#TS-nm1 #bob-u2" xml:id="TS-T01"/>
<when synch="#TS-nm2 #bob-u2" xml:id="TS-T02"/>
</timeline>
<u who="#tom">I used to smoke
<anchor xml:id="TS-nm1"/>a lot more than this
<anchor xml:id="TS-nm2"/>but I never inhaled the smoke</u>
<u xml:id="bob-u2" who="#bob">You used to smoke</u>
<timeline origin="#T001">
<when xml:id="T001"/>
<when xml:id="T002"/>
</timeline>
<u who="#tom">I used to smoke
<anchor xml:id="NM01"/>a lot more than this
<anchor xml:id="NM02"/>but I never inhaled the smoke</u>
<u xml:id="bob-U2" who="#bob">You used to smoke</u>
<linkGrp type="synchronize">
<link targets="#T001 #NM01 #bob-U2"/>
<link targets="#T002 #NM02 #bob-U2"/>
</linkGrp>
</body>
Note that in each case, although Bob's utterance follows Tom'ssequentially in the text, it is aligned temporally with its middle,without any need to disrupt the normal syntax of the text.
B : |Balderdash
C : |No, |it's mine
<when synch="#TSa1 #TSb1 #TSc1" xml:id="TSp1"/>
<when synch="#TSa2 #TSc2" xml:id="TSp2"/>
</timeline>
<u who="#A">this is <anchor xml:id="TSa1"/> my <anchor xml:id="TSa2"/> turn</u>
<u who="#B">balderdash</u>
<u who="#C"> no <anchor xml:id="TSc2"/> it's mine</u>
8.4.3 Regularization of Word FormsTEI: Regularization of Word Forms¶
When speech is transcribed using ordinary orthographic notation, asis customary, some compromise must be made between the sounds producedand conventional orthography. Particularly when dealing with informal,dialectal, or other varieties of language, the transcriber willfrequently have to decide whether a particular sound is to be treated asa distinct vocabulary item or not. For example, while in a givenproject kinda may not be worth distinguishing as avocabulary item from kind of, isn't mayclearly be worth distinguishing from is not; for somepurposes, the regional variant isnae might also be worthdistinguishing in the same way.
One rule of thumb might be to allow such variation only where agenerally accepted orthographic form exists, for example, in publisheddictionaries of the language register being encoded; this has thedisadvantage that such dictionaries may not exist. Another is tomaintain a controlled (but extensible) set of normalized forms for allsuch words; this has the advantage of enforcing some degree ofconsistency among different transcribers. Occasionally, as for examplewhen transcribing abbreviations or acronyms, it may be felt necessary todepart from conventional spelling to distinguish between cases where theabbreviation is spelled out letter by letter (e.g. B B Cor V A T) and where it is pronounced as a single word(VAT or RADA). Similar considerationsmight apply to pronunciation of foreign words(e.g. Monsewer vs. Monsieur).
In general, use of punctuation, capitalization, etc., in spokentranscripts should be carefully controlled. It is important todistinguish the transcriber's intuition as to what the punctuationshould be from the marking of prosodic features such as pausing,intonation, etc.
Whatever practice is adopted, it is essential that it be clearly andfully documented in the editorial declarations section of the header.It may also be found helpful to include normalized forms ofnon-conventional spellings within the text, using the elements forsimple editorial changes described in section 3.4 簡単な編集上の変更 (seefurther section 8.4.5 Speech Management).
8.4.4 ProsodyTEI: Prosody¶
In the absence of conventional punctuation, the marking of prosodicfeatures assumes paramount importance, since these structure andorganize the spoken message. Indeed, such prosodic features as pointsof primary or secondary stress may be represented by specializedpunctuation marks. Pauses have already been dealt with in section8.3.2 Pausing; while tone units (or intonational phrases)can be indicated by the segmentation tag discussed in section8.4.1 Segmentation. The shift element discussed in section 8.3.6 Shifts may also be used to encode some prosodic features, for example where allthat is required is the ability to record shifts in voice quality.
In a more detailed phonological transcript, it is common practiceto include a number of conventional signs to mark prosodic features ofthe surrounding or (more usually) preceding speech. Such signs may beused to record, for example, particular intonation patterns,truncation, vowel quality (long or short) etc. These signs may bepreserved in a transcript either by using conventional punctuation orby marking their presence by g elements. Where a transcriptincludes many phonetic or phonemic aspects, it will generally beconvenient to use a specialized writing system in this way (seefurther chapters vi 言語と文字集合 and 5 Representation of Non-standard Characters and Glyphs. Forrepresentation of phonemic information, the use of the InternationalPhonetic Alphabet, which can be represented in Unicode characters, isrecommended.
<char xml:id="lf">
<desc>low fall intonation</desc>
</char>
<char xml:id="lr">
<desc>low rise intonation</desc>
</char>
<char xml:id="fr">
<desc>fall rise intonation</desc>
</char>
<char xml:id="rf">
<desc>rise fall intonation</desc>
</char>
<char xml:id="long">
<desc>lengthened syllable</desc>
</char>
<char xml:id="short">
<desc>shortened syllable</desc>
</char>
</charDecl>
<note>C is with a friend</note>
<u who="#cwn">
<unclear>Excuse me<g ref="#lf"/>
</unclear>
<pause/> You dont have some
aesthetic<g ref="#short"/>
<pause/>
<unclear>specially on early</unclear>
aesthetics terminology <g ref="#lr"/>
</u>
<u who="#aj"> No<g ref="#lf"/>
<pause/>No<g ref="#lf"/>
<gap extent="2"/> I'm
afraid<g ref="#lf"/>
</u>
<u trans="latching" who="#cwn"> No<g ref="#lr"/>
<unclear>Well</unclear> thanks<g ref="#lr"/>
<pause/> Oh<g ref="#short"/>
<unclear>you couldnt<g ref="#short"/> can we</unclear> kind of<g ref="#long"/>
<pause/>I mean ask you to order it for us<g ref="#long"/>
<g ref="#fr"/>
</u>
<u trans="latching" who="#aj"> Yes<g ref="#fr"/> if you know the title<g ref="#lf"/> Yeah<g ref="#lf"/>
</u>
<u who="#cwn">
<gap extent="3"/>
<gap extent="4"/>
</u>
<u who="#aj"> Yes thats fine. <unclear>just as soon as it comes in we'll send
you a postcard<g ref="#lf"/>
</unclear>
</u>
<listPerson>
<person xml:id="cwn">
<p>Customer WN</p>
</person>
<person xml:id="aj">
<p>Assistant K</p>
</person>
</listPerson>
</div>
This example, which is taken from a corpus of bookshop serviceencounters, also demonstrates the use of the unclear and gapelements discussed in section 3.4 簡単な編集上の変更. Where words are sounclear that only their extent can be recorded, the empty gapelement may be used; where the encoder can identify the words but wishesto record a degree of uncertainty about their accuracy, theunclear element may be used. More flexible and detailedmethods of indicating uncertainty are discussed in chapter 21 確信度・責任.
For more detailed work, involving a detailed phonological transcriptincluding representation of stress and pitch patterns, it is probablybest to maintain the prosodic description in parallel with theconventional written transcript, rather than attempt to embed detailedprosodic information within it. The two parallel streams may be alignedwith each other and with other streams, for example an acousticencoding, using the general alignment mechanisms discussed in section8.3.6 Shifts.
8.4.5 Speech ManagementTEI: Speech Management¶
Phenomena of speech management include disfluencies suchas filled and unfilled pauses, interrupted or repeated words,corrections, and reformulations as well as interactional devices askingfor or providing feedback. Depending on the importance attached to suchfeatures, transcribers may choose to adopt conventionalizedrepresentations for them (as discussed in section 8.4.3 Regularization of Word Formsabove), or to transcribe them using IPA or some other transcriptionsystem. To simplify analysis of the lexical features of a speechtranscript, it may be felt useful to ‘tidy away’ manyof these disfluencies. Where this policy has been adopted, theseGuidelines recommend the use of the tags for simple editorialintervention discussed in section 3.4 簡単な編集上の変更, to make explicitthe extent of regularization or normalization performed by thetranscriber.
<del type="truncation">s</del>see
<del type="repetition">you you</del> you know
<del type="falseStart">it's</del> he's crazy
</u>
</u>
<corr>SCSI</corr>
<sic>skuzzy</sic>
</choice>
<pause dur="PT1S"/> vielleicht </foreign> go to warsaw
and <emph>vienna</emph>
</u>
8.4.6 Analytic CodingTEI: Analytic Coding¶
The recommendations made here only concern the establishment of abasic text. Where a more sophisticated analysis is needed, moresophisticated methods of markup will also be appropriate, for example,using stand-off markup to indicate multiple segmentation of thestream of discourse, or complex alignment of several segments within it.Where additional annotations (sometimes called‘codes’ or ‘tags’) are used torepresent such features as linguistic word class (noun, verb, etc.),type of speech act (imperative, concessive, etc.), or information status(theme/rheme, given/new, active/semi-active/new), etc., a selection fromthe general purpose analytic tools discussed in chapters 16 Linking, Segmentation, and Alignment, 17 簡易分析機能, and 18 素性構造, may be used toadvantage.
8.5 発話モジュールTEI: 発話モジュール¶
- Elements defined: broadcast equipment incident kinesic pause recording recordingStmt scriptStmt shift u vocal writing
- Classes defined: att.duration model.divPart.spoken model.global.spoken model.recordingPart
↑ Contents « 7 Performance Texts » 9 Dictionaries