9512.net
甜梦文库
当前位置:首页 >> >>

Multimodal discourse modelling in a multi-user multi-domain environment


MULTIMODAL DISCOURSE MODELLING IN A MULTI-USER MULTI-DOMAIN ENVIRONMENT1
Stephanie Seneff, Dave Goddeau, Christine Pao, and Joe Polifroni Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 USA http://www.sls.lcs.mit.edu

ABSTRACT
This paper describes the discourse component of GALAXY , a multidomain, multimodal conversational system. In designing this module, we are attempting to develop domain-independent mechanisms, controlled via declarative tables, to promote convenientinstantiation of a discourse component for each new domain. Direct anaphoric reference as well as elliptical reference are dealt with appropriately. Users can also refer verbally to items selected via mouse clicks. Cross domain references are particularly challenging, as is the ambiguity problem arising from different case roles for different subdomains. Users often utter fragments, sometimes in response to serverinitiated dialogue exchanges, so an extensive fragment interpretation mechanism is supported.

for the query. G ALAXY is implemented in a client-server framework, with the client interfacing with the user and consulting several distinct knowledge servers to answer a query. Generic inheritance mechanisms are applied in the client’s multi-domain discourse component, and servers can augment or supercede the discourse actions with additional inheritance requirements that are dependent on more speci?c domain knowledge than is available to the client. In practice, such augmentations are usually associated with situations where the server has momentarily taken control of the dialogue. For example, the AirTravel server might ask the user for a return date.

2. GENERAL ISSUES
In our prior experience with many components of our systems we have come to appreciate the advantages of maintaining the speci?cations of properties particular to a domain or language in external declarative tables. In designing the discourse component, we tried to adhere to this same philosophy as much as was feasible. Ideally, this would mean that an effective discourse mechanism could be put in place for a new domain by simply ?lling out a table specifying the new domain’s inheritance requirements. A challenging problem that has emerged as a consequence of multiple domains is that some words/phrases are ambiguous as to domain, or even as to case role. The system supports the possibility of underspecifying the case role and/or domain association, leaving multiple options open for possible resolution at a later time. Such ambiguity is resolved through conjunction with the domain speci?ed by other constituents, either in the current frame or the history record. The city of Boston is probably the best example of this problem. The CityGuide domain understands TOWN to mean a delimited geographical area de?ning spatial limits for a search, as in “the bookstores in Boston.” The AirTravel domain, on the other hand, understands the concept CITY to be a point location, as in “?ights from Denver to Boston.” A query such as “What about Boston?” cannot be properly interpreted until context is considered. Other cases include, for example, restaurants in Boston named “Hong Kong” and “India,” which obviously have other interpretations as well.

1. INTRODUCTION
G ALAXY is a multi-domain, multi-user, multi-modal conversational system that has been under development in the Spoken Language Systems Group at MIT-LCS for the last three years [1]. G ALAXY focuses on information of interest to a traveller, including world wide weather and air travel information, and tourist assistance for the city of Boston. In addition to text and speech input, GALAXY understands integrated speech and mouse-click references to items in a list or on a map. This paper mainly concerns GALAXY ’s discourse module. While we have applied this module thus far only in the context of travel related domains, we believe it is capable of supporting more generic discourse solutions. The main role of the discourse module is to interpret sentences in context. Users can refer back to previous information either directly through anaphoric reference (e.g., “this one,”) or indirectly by not repeating prior constraints that are implied. Users may also utter queries that are unevaluable, due to missing critical information. Part of the discourse module’s role is to identify such problems and initiate a subdialogue to ?ll in the missing elements. Users often utter fragments, particularly in response to such explicit requests for information, and these are usually interpreted by incorporating them into preceding queries. Finally, it is the discourse module’s responsibility to determine the appropriate domain server
1 This research was supported by DARPA under contract N66001-94-C6040, monitored though Naval Command, Control and Ocean Surveillance Center.

tic class, and the other specifying which predicates, if present in the current frame, mask inheritance of particular other predicates. We adopted this same strategy for GALAXY . After dealing with explicit pronominal reference and implicit predicate inheritance, the system then makes a branch point decision based on whether or not the current query is a fragment. Fragments require special treatment, as they are typically incorporated into the preceding clause either by insertion or replacement. In our system a fragment can only contain a topic or one or more predicates. To interpret the fragment, the system must “?nd a home” for the fragment’s topic or predicate(s), splicing it into the clause in history. A fragment could also be a response to a speci?c question in a serverinitiated dialogue exchange. For such cases, the discourse table contains a list of semantic categories that would be appropriate responses for each such server-initiated exchange. If the fragment matches the conditions, the discourse component bypasses inheritance, deferring to the server to deal with a subdialogue without further complications. Following fragment analysis, the ?nal discourse step is to ?ll in any obligatory case roles. For example, directions and ?ights require a source and destination, “nearest” requires an argument for comparison, and a “property” (such as phone number) requires a possessor or “of” predicate. In the event that no suitable ?ller for the role can be found, the entry in the history is marked as “missing,” and an appropriate response string, such as “Where are you?” is generated. Since such an interchange is likely to provoke a fragment response by the user, the fragment analyzer gives priority to ?lling any missing slots. Once the inheritance for the current frame is completed, the history table is updated. This includes replacing the source and destination (or marking them as “old” if the utterance contains no source or destination), and updating the focus and the slots for inheritable predicates, such as in or date. Any topic whose semantic class is speci?ed in the history table as a potential coreference class is also stored in the history keyed under its class. The entire frame is entered into the history as the most recent clause, which would be recalled for any subsequent fragment analysis. If the system displayed a list of items, the topic of the current clause is entered as the frame associated with both the most recently displayed list and the most recent list for this particular semantic class. These two entries are needed to resolve requests such as “Go back to the list,” or “Show me the restaurants again,” respectively. Figure 3 shows several examples of entries in the discourse control ?le for GALAXY . We have adopted a standardized format for entering knowledge under a diverse set of headings, to facilitate development of a new domain. The symbol “&” is a generic join that may in practice mean “AND,” “OR,” or some other relationship, depending upon the table heading. The heading “TOPIC DOMAINS” is used to determine the appropriate domain server, and also to check for consistency within a single utterance. The entry under “DOMAIN DEFAULTS” indicates, for example, that any references to “weather there” should be interpreted as “weather in Boston,” in the context of any CityGuide question. The ?rst entry under “PREDI-

CATE INHERITANCE” states that any NP’s in the semantic classes EVENT or WEATHER should inherit a prior date. TOPIC DOMAINS CityGuide&AirTravel&Weather: CITY & TOWN AirTravel&Weather: CITY MONTH DATE DOMAIN DEFAULTS CityGuide: IN CITY Boston SEMANTIC CLASS LOCATION &AirTravel: CITY AIRPORT EVENT: FLIGHT FARE PROPERTY : PHONE HOURS MENU ... PREDICATE INHERITANCE MONTH DATE: EVENT WEATHER IN & STATE: CITY PRONOUN REFERENCE LOCATION : there to FIRST PERSON : I me from here Figure 3: Representative entries from the discourse table for
GALAXY

5. SERVER DISCOURSE ACTIVITIES
The server response can affect discourse context in a number of ways. First, the server returns information at the user’s request, and the user may refer to that information in later interactions. As mentioned previously, the servers provide information in list form, accessible by clicking or numerical reference. The server can also ask the user for clari?cation, and the user is likely to respond with fragments that the client may not be able to interpret on its own. The server may also interpret parts of the user frame more fully, returning a replacement for discourse update. This is especially crucial for dates that are expressed relative to other dates, as in “three days later.” If not replaced, the date would keep incrementing by three with each subsequent AirTravel query! Finally, the server may take initiative in helping the user toward a common goal. The AirTravel server provides a good example of server discourse activities, since it tends to take the initiative during ?ight reservations dialogues. For instance, it may ask questions not directly related to the user’s immediate request. A frequent server response to a booking request, “Please book this ?ight,” is “Will this be one way or round trip?” The referent in this case is not the ?ight (all ?ights are one way!), but the entire itinerary. The AirTravel server maintains a distinction between browsing mode, when the user takes most of the initiative, and booking mode, when the system takes some initiative. The discourse component must keep track of both sides of the user-system dialogue. During booking mode, the server may display information that the user did not speci?cally request, for instance by showing fares after both legs of a round trip ?ight have been booked. When the system is taking initiative, a semantic frame created by the server is incorporated directly into the history. The server may also set context for non-speech interaction, allowing mouse clicks to be interpreted in a domain-speci?c context, for example clicking to get more information on a ?ight or to book a fare.

Utt1: Action1: Utt2: Action2: Utt3: Action3: Utt4: Action4: Utt5: Action5: Utt6: Action6: Utt7: Action7: Utt8: Action8:

WHAT IS THE FORECAST FOR DALLAS TOMORROW show forecast for Dallas tomorrow HOW ABOUT BOSTON show forecast for Boston tomorrow ARE THERE ANY FLIGHTS THERE FROM DALLAS show ?ights from Dallas to Boston on May 3rd WHAT IS THE CLOSEST BANK TO HERE request missing source AT MIT show the closest bank to MIT HOW DO I GET THERE give directions from MIT to the closest bank to MIT HOW FAR IS THE ROYAL EAST FROM THIS BANK give distance between the Royal East and the closest bank to MIT HOW ABOUT LAGROCERIA give distance between Lagroceria and the closest bank to MIT

< < < < < < < <

> >

No Parse 142 (24%)

Discourse Used Correctly 154 (26%)

Discourse Used Incorrectly 33 (5%)

Discourse Not Used 271 (45%)

>

>

Table 1: Breakdown of discourse performance on wizard-collected data.

>

7. ASSESSMENT
We have been collecting data for GALAXY in a wizard mode over the past several months. Subjects were informed that the system was able to understand some utterances in context, and we were hoping this would encourage them to use discourse capabilities. Table 1 summarizes the system’s performance on a designated training set. We were encouraged to see how often the discourse module was needed, although we clearly still have some problems that need to be addressed. As data were collected, we slowly augmented the system to accomodate newly identi?ed discourse phenomena. We have observed that subjects tend to try out discourse, and, if it works correctly, they continue to make use of it. If they encounter a discourse problem, they tend to revert to speaking fully speci?ed utterances, for fear that discourse will not work correctly. By dividing our wizard data into an earlier half and a later half, we observed that there was a 50% increase in the use of discourse during the later time period. We suspect this increased usage re?ects the improved behavior of the discourse model over time. Discourse processing is particularly vulnerable to logical programming defects, since errors can propagate across both utterances and domains. Therefore, it is important to be able to con?rm that the system is still healthy after changes have been made. To this end, we have established a procedure to evaluate the system on a series of sentences specially designed to exercise most of the discourse capabilities. For each sentence, the output of the current system is compared to a veri?ed reference. This has been extremely valuable for detecting inadvertently introduced errors during active system development.

>

>

>

Figure 4: An example dialogue between a user and our GALAXY system, illustrating domain switching. Clause: [what about Topic: [CITY & TOWN name: Boston] Domain: AirTravel&Weather&CityGuide ]

Figure 5: The semantic frame for the fragment, “What about Boston?”

6. AN EXAMPLE
Figure 4 gives an example dialogue, particularly exercising crossdomain discourse reference. Utt1 is the context setting query for Utt2, a straightforward “what about” question. Utt2 results in the semantic frame shown in Figure 5. “Boston” is ambiguous as to both category and domain, and these ambiguities are resolved based on the fact that the previous query was a weather query. The system substitutes “Boston” for “Dallas,” returning the reconstructed history frame to the client. The user switches domains in Utt3. Nonetheless, two items are inherited from the history, “Boston,” through direct anaphoric reference, and “tomorrow,” elliptically. The AirTravel server converts “tomorrow” into the appropriate date and sends the reconstructed date back to the client. When the user abruptly switches to CityGuide in Utt4, the discourse process tries to ?nd a referent for “here,” but rejects the source “Dallas” because CITY is not a point location in the CityGuide domain. The system responds appropriately with the query, “Where are you?” Utt5 is then an example of a fragment in the context of a missing element, so “MIT” is entered into the history as a source. Utt6 has a pronominal reference “there” to tag the destination, along with an elliptical source. The system knows that source and destination are obligatory predicates for “directions” clauses. It correctly picks up “the closest bank to MIT” as the destination, by retrieving it from the “focus” slot, and then ?nds “MIT” itself in the “source” slot, introduced during Utt5. Utt7 has a reference to “this bank,” which is easily resolved via an unambiguous match on semantic class. Utt8 is analagous to Utt2 – both “Lagroceria” and the “Royal East” are restaurants, so the system infers that the former should substitute for the latter in the preceding clause.

8.

REFERENCES

1. D. Goddeau, E. Brill, J. Glass, C. Pao, M. Phillips, J. Polifroni, S. Seneff, and V. Zue, “G ALAXY: A Human-language Interface to On-line Travel Information,” Proceedings, ICSLP-94, pp. 707-710, Yokohama, Japan, Sept. 1994. 2. M. Phillips and D. Goddeau, “Fast Match for Segment-based Large Vocabulary Continuous Speech Recognition,” Proceedings, ICSLP-94, pp. 1359-1362, Yokohama, Japan, Sept. 1994. 3. S. Seneff, “T INA: A Natural Language System for Spoken Language Applications,” Computational Linguistics, Vol. 18, No. 1, pp. 61–86. 1992. 4. Glass, J., D. Goddeau, L. Hetherington, M. McCandless, C. Pao, M. Phillips, J. Polifroni, S. Seneff, and V. Zue, “The MIT ATIS System: December 1994 Progress Report”, Proc. ARPA Spoken Language Technology Workshop, pp. 252-256, Austin, TX, January 1995. 5. V. Zue, S. Seneff, J. Glass, D. Goddeau, D. Goodine, C. Pao, M. Phillips, and J. Polifroni, “P EGASUS : A Spoken Dialogue Interface for On-Line Air Travel Planning,” Speech Communications, Vol. 15, pp. 331–340, 1994.



更多相关文章:
A Multimodal Discourse Analysis of Meaning Construction of ....pdf
A Multimodal Discourse Analysis of Meani
Multimodal Discourse Analysis on the Dissemination Capacity ....pdf
Multimodal Discourse Analysis on the Dissemination Capacity of Animation in Political News_电子/电路_工程科技_专业资料。动画是一种集合电影、漫画、绘画、摄影、...
A Multimodal Discourse Analysis of 2012 London Olympic Emblem....pdf
A Multimodal Discourse Analysis of 2012 London Olympic Emblem_文学研究_人文社科_专业资料。At present,communication is developing from mono-modality to multi-...
MULTIMODAL AND MULTIMEDIA SYSTEMS ARCHITECTURES FOR ADVANCED ....pdf
MULTIMODAL AND MULTIMEDIA SYSTEMS ARCHITECTURES FOR ...(from the User Modelling Expert), the task (...management of a single generalised discourse context...
Robust multimodal discourse processing.pdf
Robust Nonrigid Multimod... 暂无评价 12页 ...basis for multimodal discourse modelling and ...We construe an uni?ed representation of user and...
multimodal discourse a....ppt
Multimodal Discourse Analysis_英语学习_外语学习_教育专区。多模态话语分析理论语...Multiple modes, namely semiotic modes ,can include gestures, posture, ...
in a Multimodal Environment by.pdf
Word learning in a multi... 暂无评价 4页 免费 Multimodal discourse mod.... In a human-centered computing environment, users interact with their ...
New_Discourse_on_Language_Functional_Perspectives_o....pdf
His research interests include modelling language as...a multi-perspectival stance in which theory ...Their collective commitment to multimodal discourse ...
Planning Multimodal Discourse.pdf
Planning Multimodal Discourse_专业资料。In this talk, we will, show how techniques for plan-ning text and discourse can be generalized to plan the ...
AMultimodalDiscourseAnalysisoftheMovieSpotlight’sPosterfrom....doc
Definition of Multimodal Discourse Human interact with the outside environment ...(Gu 3-12) Multimodal discourse is a discourse that contains multiple ...
Discourse structure for context question answering.pdf
Discourse structure for context question answering_...environment, instead of asking questions, a user ...to facilitate multimodal multimedia question answering...
User Modelling in A Flexible and Robust Interface.pdf
User Modelling in A Flexible and Robust Interface...multiple possible plans, a shortcoming shared with...been in the domain of discoursebased consultants....
Learning with imperfections - a multi-agent neural-genetic ....pdf
Learning with imperfections - a multi-agent neural-genetic trading system ...A dynamic environment is generally characterised by a multimodal and non-...
multi-modal discourse in....pdf
Multi-modal Discourse Investigation with SIDGrid_专业资料。Human communicative ...The materials for these studies include multi-media multimodal data collections...
Taking Account of the User's View in 3D Multimodal ....pdf
View in 3D Multimodal Instruction Dialogue_专业资料...2 Problems In a virtual environment, the user ...Domain Plan Reasoner DPR to get discourse ...
Multimodal Analysis on Teacher Talk.doc
Multimodal Analysis on Teacher Talk_法律资料_人文社科_专业资料。龙源期刊网 ...multimodally, how to make classroom discourse enlightening, vivid and coherent...
multimodal discourse analysis as the confluence of discourse ....pdf
4 Multimodal Discourse Analysis as the Confluence of Discourse and Technology extended and very complex strings of social interactions among multiple identities....
Structured interview-based evaluation of spoken multimodal ....pdf
the user test of the first prototype of a multimodal conversational system....HCA’s domains of knowledge and discourse include his fairytales, his ...
COORDINATION OF REFERRING EXPRESSIONS IN MULTIMODAL HUMAN-....pdf
expressions, especially in a spatial domain [2]....multimodal dialogue system to see if the users’... which are specific for the discourse shared ...
Semantics in Multi-facet Hypermedia Authoring.pdf
(SEN) Modelling, Analysis and Simulation (MAS) ...semantic relations from the domain in a coherent ..., goals, terminology, and modes of discourse. ...
更多相关标签:

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。
甜梦文库内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图