The independence of dimensions in multidimensional dialogue act annotation Volha Petukhova and Harry Bunt Tilburg Center for Creative Computing Tilburg University, The Netherlands, {v.petukhova,h.bunt}@uvt.nl Abstract This paper presents empirical evidence for the orthogonality of the DIT++ multidimensional dialogue act annotation scheme, showing that the ten dimensions of communication which underlie this scheme are addressed independently in natural dialogue. 1 Introduction Studies of human dialogue behaviour indicate that natural dialogue utterances are very often multifunctional. This observation has inspired the development of multidimensional approaches to dialogue analysis and annotation, e.g. (Allen & Core, 1997) , (Larsson, 1998), (Popescu-Belis, 2005), (Bunt, 2006). The most frequently used annotation scheme that implements this approach is DAMSL (Allen and Core, 1997), which allows multiple labels to be assigned to utterances in four layers: Communicative Status, Information Level, Forward-Looking Function (FLF) and Backward-Looking Function (BLF). The FLF layer is subdivided into five classes, including (roughly) the classes of commissive and directive functions, well known from speech act theory. The BLF layer has four classes: Agreement, Understanding, Answer, and Information Relation. These nine classes, also referred to as `dimensions', form mutually exclusive sets of tags; no further motivation is given for the particular choice of classes. Popescu-Belis (2005) argues that dialogue act tagsets should seek a multidimensional theoretical grounding and defines the following aspects of utterance function that could be relevant for choosing 197 The independence of dimensions, required by this definition, has the effect that an utterance may have a function in one dimension independent of the functions that it may have in other dimensions, and helps to explain why utterances may have multiple functions. Moreover, it leads to more manageable and dimensions (1) the traditional clustering of illocutionary forces in speech act theory into five classes: Representatives, Commissives, Directives, Expressives and Declarations; (2) turn management; (3) adjacency pairs; (4) topical organization in dialogue; (5) politeness functions; and (6) rhetorical roles. Structuring an annotation scheme by grouping related communicative functions into clusters makes the structure of the schema more transparent. Such clusters or `dimensions' are usually defined as a set of functions related to the same type of information, such as Acknowledging, Signalling Understanding and Signalling Non-understanding, or Dialogue Opening and Dialogue Closing. Bunt (2006) shows that this does not always lead to a notion of dimension that has any conceptual and theoretical significance, and argues that some of the function classes of DAMSL do not constitute proper dimensions. In particular, a theoretically grounded multidimensional schema should provide an account of the possible multifunctionality of dialogue utterances. In (Bunt, 2006); (Bunt and Girard, 2005) a dimension in dialogue act analysis is defined as an aspect of participating in dialogue which can be addressed: · by dialogue acts which have a function specifically for dealing with this aspect; · independently of the other dimensions. Proceedings of NAACL HLT 2009: Short Papers, pages 197­200, Boulder, Colorado, June 2009. c 2009 Association for Computational Linguistics more adaptable annotation schemas (compared to, for instance, DAMSL and its derivatives), since it allows annotators to leave out certain dimensions that they are not interested in, or to extend the schema with additional dimensions; and it allows restricting or modifying the set of tags in a particular dimension without affecting the rest of the schema. Based on the above definition and extensive theoretical and empirical studies, 10 dimensions are defined in the DIT++ dialogue act annotation scheme1 : the domain or task/activity (Task); feedback on the processing of previous utterances by the speaker (Auto-feedback) or by other interlocutors (Allofeedback); managing difficulties in the speaker's utterance production (Own-Communication Management, OCM) or that of other interlocutors (Partner Communication Management, PCM); the speaker's need for time to continue the dialogue (Time Management); establishing and maintaining contact (Contact Management); the allocation of the next turn (Turn Management); the way the speaker is planning to structure the dialogue (Dialogue Structuring); and attention for social aspects of the interaction (Social Obligations Management, SOM). This paper investigates the independence of these ten dimensions. In Section 2 we discuss the notion of independence of dimensions and how it can be tested. Section 3 reports test results and Section 4 draws conclusions. relatedness. The phi measure is related to the chisquare statistic, used to test the independence of categorical variables, and is similar to the correlation coefficient in its interpretation. If a dimension is not independent from other dimensions, then there would be no utterances in the data which address only that dimension. We therefore also investigate to which extent it happens that an utterance addresses only one dimension. We also investigate whether a dimension is addressed only in reaction to a certain other dimension. For example, the answer dimension as defined in DAMSL cannot be seen as independent, because answers need questions in order to exist. The test here is to examine the relative frequencies of pairs . To sum up, we performed four tests, examining: 1. the relative frequency of communicative function co-occurrences across dimensions; 2. the extent of relatedness between dimensions measure with the phi coefficient; 3. for all dimensions whether there are utterances addressing only that dimension; 4. the relative frequency of pairs of dimension and previous dimension. 3 Test results 2 Independence of dimensions We define two dimensions D1 and D2 in an annotation scheme to be independent iff (1) an utterance may be assigned a value in D1 regardless of whether it is assigned a value in D2; and (2) it is not the case that whenever an utterance has a value in D1, this determines its value in D2.2 Dependences between dimensions can be determined empirically by analyzing annotated dialogue data. Dimension tags which always co-occur are nearly certainly dependent; zero co-occurrence scores also suggest possible dependences. Besides co-occurrence scores, we also provide a statistical analysis using the phi coefficient as a measure of For more information about the scheme and its dimensions please visit http://dit.uvt.nl/ 2 See Petukhova and Bunt (2009) for a more extensive discussion. 1 Since different types of dialogue may have different tag distributions, three different dialogue corpora have been examined: · The DIAMOND corpus3 of two-party instructional human-human Dutch dialogues (1,408 utterances); · The AMI corpus4 of task-oriented humanhuman multi-party English dialogues (3,897 utterances); · The OVIS corpus5 of information-seeking human-computer Dutch dialogues (3,942 utterances). All three corpora were manually segmented and tagged according to the DIT++ annotation scheme. For more information see Geertzen, J., Girard, Y., and Morante R. 2004. The DIAMOND project. Poster at CATALOG 2004. 4 Augmented Multi-party Interaction (http: //www.amiproject.org/) 5 Openbaar Vervoer Informatie System (Public Transport Information System) http://www.let.rug.nl/~ annoord/Ovis/ v 3 198 Table 1: Co-occurrences of communicative functions across dimensions in AMI corpus expressed in relative frequency in % implicated and entailed functions excluded and included (in brackets). The test results presented in this section are similar for all three corpora. The co-occurrence results in Table 1 show no dependences between dimensions, although some combinations of dimensions occur frequently, e.g. time and turn management acts often co-occur. A speaker who wants to win some time to gather his thoughts and uses Stalling acts mostly wants to continue in the sender role, and his stalling behaviour may be intended to signal that as well (i.e., to be interpreted as a Turn Keeping act). But stalling behaviour does not always have that function; especially an extensive amount of stallings accompanied by relatively long pauses may be intended to elicit support for completing an utterance. It is also interesting to have a look at cooccurrences of communicative functions taking implicated and entailed functions into account (the corpora were reannotated for this purpose). An implicated function is for instance the positive feedback (on understanding and evaluating the preceding utterance(s) of the addressee) that is implied by an expression of thanks; examples of entailed functions are the positive feedback on the preceding utterance that is implied by answering a question, by accepting an invitation, or by rejecting an offer. Co-occurrence scores are higher when entailed and implicated functions are taken into account (the scores given in brackets in Table 1). For example, questions, which mostly belong to the Task dimension, much of the time have an accompanying Turn Management function, either releasing the turn or assigning it to another dialogue participant, allowing the question to be answered. Similarly, when accepting a request the speaker needs to have the turn, so communicative functions like Accept Re199 quest will often be accompanied by functions like Turn Take or Turn Accept. Such cases contribute to the co-occurrence score between the Task and Turn Management dimensions. Table 1 shows that some dimensions do not occur in combination. We do not find combinations of Contact and Time Management, Contact and Partner Communication Management, or Partner Communication Management and Discourse Structuring, for example. Close inspection of the definitions of the tags in these pairs of dimensions does not reveal combination restrictions that would make one of these dimensions depend on the others. Table 2 presents the extent to which dimensions are related when the corpus data are annotated with or without taking implicated and entailed functions into account, according to the calculated phi coefficient. No strong positive (phi values from .7 to 1.0) or negative (-.7 to -1.0) relations are observed. There is a weak positive association (.6) between Turn and Time Management (see co-occurrence analysis above) and between OCM and Turn Management (.4). Weak negative associations are observed between Task and Auto-feedback (-.5) when entailed and implicated functions are not considered; between Task and Contact Management (-.6); and between Auto- and Allo-feedback (-.6) when entailed and implicated functions are included in the analysis. The weak negative association means that an utterance does not often have communicative functions in these two dimensions simultaneously. Some negative associations become positive if we take entailed and implicated functions into account, because, as already noted, dialogue acts like answers, accepts and rejects, imply positive feedback. Table 2: Extent of relation between dimensions for AMI corpus expressed in the Phi coefficient (implicated and entailed functions excluded (white cells) and included (grey cells)). The third independence test, mentioned above, shows that each dimension may be addressed by an utterance which does not address any other dimension. The Task dimension is independently addressed in 28.8% of the utterances; 14.2% of the utterances have a function in the Auto-Feedback dimension only; for the other dimensions these figures are 0.7% - Allo-Feedback; 7.4% - Turn Management; 0.3% - Time Management; 0.1% - Contact Management; 1.9% - Discourse Structuring; 0.5% OCM; 0.2% - PCM; and 0.3% - SOM. annotation scheme, using co-occurrences matrices and the phi coefficient for measuring relatedness between dimensions. The results show that, although some dimensions are more related and co-occur more frequently than others, on the whole the ten DIT++ dimensions may be considered to be independent aspects of communication. Acknowledgments This research was conducted as part of ISO project 24617-2: Semantic annotation framework, Part 2: Dialogue acts, and sponsored by Tilburg University. References James F. Allen and Mark G. Core. 1997. Draft of DAMSL: Dialog Act Markup in Several Layers. Jens Allwood. 2000. An activity-based approach to pragmatics. In Bunt, H., and Black, W. (eds.) Abduction, Belief and Context in Dialogue; Studies in Computational Pragmatics, pp. 47­80. Benjamins, Amsterdam. Harry Bunt and Yann Girard. 2005. Designing an open, multidimensional dialogue act taxonomy. In Gardent, C., and Gaiffe, B. (eds). Proc. 9th Workshop on the Semantics and Pragmatics of Dialogue, pp. 37­44. Harry Bunt. 2006. Dimensions in dialogue annotation. In Proceedings of LREC 2006. Mark G. Core and James F. Allen. 1997. Coding dialogues with the DAMSL annotation scheme. In Working Notes: AAAI Fall Symposium on Communicative Action in Humans and Machines, pp. 28­35. Staffan Larsson. 1998. Coding Schemas for Dialogue Moves. Technical report from the S-DIME project. Volha Petukhova and Harry Bunt. 2009. Dimensions in communication. TiCC Technical Report 2009-002, Tilburg University. Andrei Popescu-Belis. 2005. Dialogue Acts: One or More Dimensions? ISSCO Working Paper 62, ISSCO. Table 3: Overview of relative frequency (in%) of pairs of dimension and previous dimensions by previous utterances observed in AMI data, per dimension, drawn from the set of 5 pairs from the dialogue history. We finally investigated the occurrences of tags given the tags of the previous utterances, taking five previous utterances into account. Table 3 shows no evidence of dependences across the dialogue history. There are some frequent patterns, for example, retractions and self-corrections often follow hesitations because the speaker, while monitoring his own speech and noticing that part of it needs revision, needs time to construct the corrected part. 4 Conclusions In this paper we investigated the independence of the dimensions defined in the DIT++ dialogue act 200