<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
<meta http-equiv="Content-Type" content= "text/html; charset=utf-8" />
<title>EMMA: Extensible MultiModal Annotation markup
language</title>
<style type="text/css">
/**/
</style>
<link rel="stylesheet" type="text/css" href="https://www.w3.org/StyleSheets/TR/W3C-REC.css" />
</head>
<body>
<div class="head">
<div class="banner"><img alt="W3C" src="https://www.w3.org/Icons/w3c_home" width="72" height=
"48" /></div>
<h1 class="notoc" id="s0">EMMA: Extensible MultiModal Annotation
markup language</h1>
<h2>W3C Recommendation
10 February 2009</h2>
<dl>
<dt>This version:</dt>
<dd>http://www.w3.org/TR/2009/REC-emma-20090210/</dd>
<dt>Latest version:</dt>
<dd>http://www.w3.org/TR/emma/</dd>
<dt>Previous version:</dt>
<dd>http://www.w3.org/TR/2008/PR-emma-20081215/</dd>
</dl>
<dl>
<dt>Editor:</dt>
<dd>Michael Johnston, AT&T</dd>
<dt>Authors:</dt>
<dd>Paolo Baggia, Loquendo</dd>
<dd>Daniel C. Burnett, Voxeo (formerly of Vocalocity and Nuance)</dd>
<dd>Jerry Carter, Nuance</dd>
<dd>Deborah A. Dahl, Invited Expert</dd>
<dd>Gerry McCobb, Openstream</dd>
<dd>Dave Raggett, (until 2007, while at W3C/Volantis and W3C/Canon)</dd>
</dl>
<p>Please refer to the
errata
for this document, which may include some normative
corrections.</p>
<p>See also
translations.</p>
<p class="copyright">Copyright © 2009 W3C<sup>®</sup> (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.</p>
<hr title="Separator for header" /></div>
<h2 class="notoc" id="abstract">Abstract</h2>
<p>The W3C Multimodal Interaction Working Group aims to develop
specifications to enable access to the Web using multimodal
interaction. This document is part of a set of specifications for
multimodal systems, and provides details of an XML markup language
for containing and annotating the interpretation of user input.
Examples of interpretation of user input are a transcription into
words of a raw signal, for instance derived from speech, pen or
keystroke input, a set of attribute/value pairs describing their
meaning, or a set of attribute/value pairs describing a gesture.
The interpretation of the user's input is expected to be generated
by signal interpretation processes, such as speech and ink
recognition, semantic interpreters, and other types of processors
for use by components that act on the user's inputs such as
interaction managers.</p>
<h2 id="status">Status of this Document</h2>
<p>This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest
revision of this technical report can be found in the W3C technical reports index at
http://www.w3.org/TR/.</p>
<p>This is the
Recommendation
of "EMMA: Extensible MultiModal Annotation markup language".
It has been produced by the
Multimodal Interaction Working Group,
which is part of the
Multimodal Interaction Activity.
</p>
<p>Comments are welcome on [email protected]
(archive).
See W3C mailing list and archive
usage guidelines.</p>
<p>The design of EMMA has been widely reviewed
(see the
disposition of comments)
and satisfies the Working Group's technical requirements.
A list of implementations is included in the
EMMA Implementation Report.
The Working Group made a few editorial changes to the
15 December 2008 Proposed Recommendation.
Changes from the Proposed Recommendation can be found in
Appendix F.
</p>
<p>This document has been reviewed by W3C Members, by software
developers, and by other W3C groups and interested parties, and is
endorsed by the Director as a W3C Recommendation. It is a stable
document and may be used as reference material or cited from another
document. W3C's role in making the Recommendation is to draw
attention to the specification and to promote its widespread
deployment. This enhances the functionality and interoperability of
the Web.</p>
<p>This specification describes markup for representing
interpretations of user input (speech, keystrokes, pen input etc.)
together with annotations for confidence scores, timestamps, input
medium etc., and forms part of the proposals for the W3C Multimodal Interaction
Framework.</p>
<p>This document was produced by a group operating under the
5
February 2004 W3C Patent Policy. W3C maintains a public list of any
patent disclosures made in connection with the deliverables of
the group; that page also includes instructions for disclosing a
patent. An individual who has actual knowledge of a patent which
the individual believes contains
Essential Claim(s) must disclose the information in accordance
with
section 6 of the W3C Patent Policy.</p>
<p>The sections in the main body of this document are normative unless
otherwise specified. The appendices in this document are informative
unless otherwise indicated explicitly.</p>
<h2 class="notoc" id="conv">Conventions of this Document</h2>
<p>All sections in this specification are normative, unless
otherwise indicated. The informative parts of this specification
are identified by "Informative" labels within sections.</p>
<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in [RFC2119].</p>
<h2 class="notoc" id="toc">Table of Contents</h2>
- 1. Introduction
- 2. Structure of EMMA documents
- 3. EMMA structural elements
- 4 EMMA annotations
- 4.1 EMMA annotation elements
- 4.2 EMMA annotation attributes
- 4.2.1 Tokens of input:
emma:tokens attribute
- 4.2.2 Reference to processing:
emma:process attribute
- 4.2.3 Lack of input:
emma:no-input attribute
- 4.2.4 Uninterpreted input:
emma:uninterpreted attribute
- 4.2.5 Human language of input:
emma:lang attribute
- 4.2.6 Reference to signal:
emma:signal <span>and
emma:signal-size</span> attributes
- 4.2.7 Media type:
emma:media-type attribute
- 4.2.8 Confidence scores:
emma:confidence attribute
- 4.2.9 Input source:
emma:source
attribute
- 4.2.10 Timestamps
- 4.2.11 Medium, mode, and function of user
inputs:
emma:medium, emma:mode,
emma:function, emma:verbal
attributes
- 4.2.12 Composite multimodality:
emma:hook attribute
- 4.2.13 Cost:
emma:cost
attribute
- 4.2.14 Endpoint properties:
emma:endpoint-role,
emma:endpoint-address, emma:port-type,
emma:port-num, emma:message-id,
emma:service-name, emma:endpoint-pair-ref,
emma:endpoint-info-ref
attributes
- 4.2.15 Reference to
emma:grammar element: emma:grammar-ref
attribute
- 4.2.16 Reference to
emma:model
element: emma:model-ref attribute
- 4.2.17 Dialog turns:
emma:dialog-turn attribute
- 4.3 Scope of EMMA annotations
- 5.Conformance
- Appendices
<h2 id="s1">1. Introduction</h2>
<p>This section is <span>I</span>nformative.</p>
<p>This document presents an XML specification for EMMA, an
Extensible MultiModal Annotation markup language, responding to the
requirements documented in <span>Requirements for EMMA</span>
[
EMMA <span>Requirements</span>]. This
markup language is intended for use by systems that provide
semantic interpretations for a variety of inputs, including but not
necessarily limited to, speech, natural language text, GUI and ink
input.</p>
<p>It is expected that this markup will be used primarily as a
standard data interchange format between the components of a
multimodal system; in particular, it will normally be automatically
generated by interpretation components to represent the semantics
of users' inputs, not directly authored by developers.</p>
<p>The language is focused on annotating single inputs from users,
which may be either from a single mode or a composite input
combining information from multiple modes, as opposed to
information that might have been collected over multiple turns of a
dialog. The language provides a set of elements and attributes that
are focused on enabling annotations on user inputs and
interpretations of those inputs.</p>
<p>An EMMA document can be considered to hold three types of
data:</p>
-
<p>instance data</p>
<p>Application-specific markup corresponding to input information
which is meaningful to the consumer of an EMMA document. Instances
are application-specific and built by input processors at runtime.
Given that utterances may be ambiguous with respect to input
values, an EMMA document may hold more than one instance.</p>
-
<p>data model</p>
<p>Constraints on structure and content of an instance. The data
model is typically pre-established by an application, and may be
implicit, that is, unspecified.</p>
-
<p>metadata</p>
<p>Annotations associated with the data contained in the instance.
Annotation values are added by input processors at runtime.</p>
<p>Given the assumptions above about the nature of data represented
in an EMMA document, the following general principles apply to the
design of EMMA:</p>
- The main prescriptive content of the EMMA specification will
consist of metadata: EMMA will provide a means to express the
metadata annotations which require standardization. (Notice,
however, that such annotations may express the relationship among
all the types of data within an EMMA document.)
- The instance and its data model are assumed to be specified in
XML, but EMMA will remain agnostic to the XML format used to
express these. (The instance XML is assumed to be sufficiently
structured to enable the association of annotative data.)
- The extensibility of EMMA lies in the ability for additional
kinds of metadata to be included in application specific
vocabularies. EMMA itself can be extended with application and
vendor specific annotations contained within the
emma:info element <span>(Section
4.1.4)</span>.
<p>The annotations of EMMA should be considered 'normative' in the
sense that if an EMMA component produces annotations as described
in
Section 3 <span>and
Section
4</span>, these annotations must be represented using the EMMA
syntax. The Multimodal Interaction Working Group may address in
later drafts the issues of modularization and profiling; that is,
which sets of annotations are to be supported by which classes of
EMMA component.</p>
<h3 id="s1.1">1.1 Uses of EMMA</h3>
<p>The general purpose of EMMA is to represent information
automatically extracted from a user's input by an interpretation
component, where input is to be taken in the general sense of a
meaningful user input in any modality supported by the platform.
The reader should refer to the sample architecture in <span>W3C
Multimodal Interaction Framework</span>
[<span>MMI
Framework</span>], which shows EMMA conveying content between
user input modality components and an interaction manager.</p>
<p>Components that generate EMMA markup:</p>
- Speech recognizers
- Handwriting recognizers
- Natural language understanding engines
- Other input media interpreters (e.g. DTMF, pointing,
keyboard)
- Multimodal integration component
<p>Components that use EMMA include:</p>
- Interaction manager
- Multimodal integration component
<p>Although not a primary goal of EMMA, a platform may also choose
to use this general format as the basis of a general semantic
result that is carried along and filled out during each stage of
processing. In addition, future systems may also potentially make
use of this markup to convey abstract semantic content to be
rendered into natural language by a natural language generation
component.</p>
<h3 id="s1.2">1.2 Terminology</h3>
<dl>
<dt id="anchor-point">anchor point</dt>
<dd>When referencing an input interval with
emma:time-ref-uri,
emma:time-ref-anchor-point allows you to specify
whether the referenced anchor is the start or end of the
interval.</dd>
<dt id="annotation">annotation</dt>
<dd>Information about the interpreted input, for example,
timestamps, confidence scores, links to raw input, etc.</dd>
<dt id="composite-input">composite input</dt>
<dd>An input formed from several pieces, often in different modes,
for example, a combination of speech and pen gesture, such as
saying "zoom in here" and circling a region on a map.</dd>
<dt id="confidence">confidence</dt>
<dd>A numerical score describing the degree of certainty in a
particular interpretation of user input.</dd>
<dt id="data-model">data model</dt>
<dd>For EMMA, a data model defines a set of constraints on possible
interpretations of user input.</dd>
<dt id="derivation">derivation</dt>
<dd>Interpretations of user input are said to be derived from that
input, and higher level interpretations may be derived from lower
level ones. EMMA allows you to reference the user input or
interpretation a given interpretation was derived from, see
semantic
interpretation.</dd>
<dt id="dialog">dialog</dt>
<dd>For EMMA, dialog can be considered as a sequence of
interactions between
a user and the application.</dd>
<dt id="endpoint">endpoint</dt>
<dd>In EMMA, this refers to a network location which is the source
or recipient of an EMMA document. It should be noted that the usage
of the term "endpoint" in this context is different from the way
that the term is used in speech processing, where it refers to the
end of a speech input.</dd>
<dt id="gestures">gestures</dt>
<dd>In multimodal applications gestures are communicative acts made
by the user or application. An example is circling an area on a map
to indicate a region of interest. Users may be able to gesture with
a pen, keystrokes, hand movements, head
movements, or sound. Gestures often form part of
composite input. Application
gestures are typically animations and/or sound effects.</dd>
<dt id="grammar">grammar</dt>
<dd>A set of rules that describe a sequence of tokens expected in a
given input. These can be used by speech and handwriting
recognizers to increase recognition accuracy.</dd>
<dt id="handwriting-recognition">handwriting recognition</dt>
<dd>The process of converting pen strokes into text.</dd>
<dt id="ink-recognition">ink recognition</dt>
<dd>This includes the recognition of handwriting and pen
gestures.</dd>
<dt id="input-cost">input cost</dt>
<dd>In EMMA, this refers to a numerical measure indicating the
weight or processing cost associated with a user's input or part of
their input.</dd>
<dt id="input-device">input device</dt>
<dd>The device proving a particular input, for example, a
microphone, a pen, a mouse, a camera, or a keyboard.</dd>
<dt id="input-function">input function</dt>
<dd>In EMMA, this refers to <span>the</span> use a particular input
is serving, for example, as part of a recording or transcription,
as part of a dialog, or as a means to verify the user's
identity.</dd>
<dt id="input-medium">input medium</dt>
<dd>Whether the input is acoustic, visual, or tactile, for
instance, a spoken utterance is an example of an aural input, a
hand gesture as seen by a camera is an example of a visual input,
pointing with a mouse or pen is an example of a tactile input.</dd>
<dt id="input-mode">input mode</dt>
<dd>This distinguishes a particular means of providing an input
within a general input medium, for example, speech, DTMF, ink, key
strokes, video, photograph, etc.</dd>
<dt id="input-source">input source</dt>
<dd>This is the device that provided the input, for example a
particular microphone or camera. EMMA allows you to identify these
with a URI.</dd>
<dt id="input-tokens">input tokens</dt>
<dd>In EMMA, this refers to a sequence of characters, words or
other discrete units of input.</dd>
<dt id="instance-data">instance data</dt>
<dd>A representation in XML of an interpretation of user
input.</dd>
<dt id="interaction-manager">interaction manager</dt>
<dd>A processor that determines how an application interacts with a
user. This can be at multiple levels of abstraction, for example,
at a detailed level, determining what prompts to present to the
user and what actions to take in response to user input, versus a
higher level treatment in terms of goals and tasks for achieving
those goals. Interaction managers are frequently event driven.</dd>
<dt id="interpretation">interpretation</dt>
<dd>In EMMA, an interpretation of user input refers to information
derived from the user input that is meaningful to the
application.</dd>
<dt id="keystroke-input">keystroke input</dt>
<dd>Input provided by the user pressing on a sequence of keys
(buttons), such as a computer keyboard or keypad.</dd>
<dt id="lattice">lattice</dt>
<dd>A set of nodes interconnected with directed arcs such that by
following an arc, you can never find yourself back at a node you
have already visited (i.e. a directed acyclic graph). Lattices
provide a flexible means to represent the results of speech and
handwriting recognition, in terms of arcs representing words or
character sequences. Different arcs from the same node represent
different local hypotheses as to what the user said or wrote.</dd>
<dt id="metadata">metadata</dt>
<dd>Information describing another set of data, for instance, a
library catalog card with information on the author, title and
location of a book. EMMA is designed to support input processors in
providing metadata for interpretations of user input.</dd>
<dt id="multimodal-integration">multimodal integration</dt>
<dd>The process of combining inputs from different modes to create
an interpretation of composite input. This is also sometimes
referred to as
multimodal fusion.</dd>
<dt id="multimodal-interaction">multimodal interaction</dt>
<dd>The means for a user to interact with an application using more
than one mode of interaction, for instance, offering the user the
choice of speaking or typing, or in some cases, allowing the user
to provide a composite input involving multiple modes.</dd>
<dt id="natural-language-understanding">natural language
understanding</dt>
<dd>The process of interpreting text in terms that are useful for
an application.</dd>
<dt id="N-best-list">N-best list</dt>
<dd>An N-best list is a list of the most likely hypotheses for what
the user actually said or wrote, where N stands for an integral
number such as 5 for the 5 most likely hypotheses.</dd>
<dt id="raw-signal">raw signal</dt>
<dd>An uninterpreted input, such as an audio waveform captured from
a microphone.</dd>
<dt id="semantic-interpretation">semantic interpretation</dt>
<dd>A normalized representation of the meaning of a user input, for
instance, mapping the speech for "San Francisco" into the airport
code "SFO".</dd>
<dt id="semantic-processor">semantic processor</dt>
<dd>In EMMA, this refers to systems that can derive interpretations
of user input, for instance, mapping the speech for "San Francisco"
into the airport code "SFO".</dd>
<dt id="signal-interpretation">signal interpretation</dt>
<dd>The process of mapping a discrete or continuous signal into a
symbolic representation that can be used by an application, for
instance, transforming the audio waveform corresponding to someone
saying "2005" into the number 2005.</dd>
<dt id="speech-recognition">speech recognition</dt>
<dd>The process of determining the textual transcription of a piece
of speech.</dd>
<dt id="speech-synthesis">speech synthesis</dt>
<dd>The process of rendering a piece of text into the corresponding
speech, i.e. synthesi<span>z</span>ing speech from text.</dd>
<dt id="text-to-speech">text to speech</dt>
<dd>The process of rendering a piece of text into the corresponding
speech.</dd>
<dt id="time-stamp">time stamp</dt>
<dd>The time that a particular input or part of an input began or
ended.</dd>
<dt id="term-uri">URI: Uniform Resource Identifier</dt>
<dd>A URI is a unifying syntax for the expression of names and
addresses of objects on the network as used in the World Wide Web.
<span>Within this specification, the term URI refers to a Universal
Resource Identifier as defined in [
RFC3986]
and extended in [
RFC3987] with the new name
IRI. The term URI has been retained in preference to IRI to avoid
introducing new names for concepts such as "Base URI" that are
defined or referenced across the whole family of XML
specifications</span>. A URI is defined as any legal
anyURI primitive as defined in XML Schema Part 2:
Datatypes Second Edition Section 3.2.17 [
SCHEMA2].</dd>
<dt id="user-input">user input</dt>
<dd>An input provided by a user as opposed to something generated
automatically.</dd>
</dl>
<h2 id="s2">2. Structure of EMMA documents</h2>
<p>This section is <span>I</span>nformative.</p>
<p>As noted above, the main components of an interpreted user input
in EMMA are the instance data, an optional data model, and the
metadata annotations that may be applied to that input. The
realization of these components in EMMA is as follows:</p>
- instance data is contained within an EMMA
interpretation
- the data model is optionally specified as an annotation
of that instance
- EMMA annotations may be applied at different levels of
an EMMA document.
<p>An EMMA
interpretation is the primary unit for holding
user input as interpreted by an EMMA processor. As will be seen
below, multiple interpretations of a single input are possible.</p>
<p>EMMA provides a simple structural syntax for the organization of
interpretations and instances, and an annotative syntax to apply
the annotation to the input data at different levels.</p>
<p>An outline of the structural syntax and annotations found in
EMMA documents is as follows. A fuller definition may be found in
the description of individual elements and attributes in
<span>S</span>ection 3 and
<span>S</span>ection 4.</p>
- EMMA <span>s</span>tructural
<span>e</span>lements (Section 3)
- Root element: The root node of an
EMMA document, the
emma:emma element, holds EMMA
version and namespace information, and provides a container for one
or more of the following interpretation and container elements
(Section 3.1)
- Interpretation element: The
emma:interpretation element contains a given
interpretation of the input and holds application specific markup
(Section 3.2)
- Container elements:
emma:one-of is a container for one or more
interpretation elements or container elements and denotes that
these are mutually exclusive interpretations (Section 3.3.1)
emma:group is a general container for one or more
interpretation elements or container elements. It can be associated
with arbitrary grouping criteria (Section
3.3.2).
emma:sequence is a container for one or more
interpretation elements or container elements and denotes that
these are sequential in time (Section
3.3.3).
- Lattice element: The
emma:lattice element is used to contain a series of
emma:arc and emma:node elements that
define a lattice of words, gestures, meanings or other symbols. The
emma:lattice element appears within the
emma:interpretation element (Section
3.4)
- Literal element: The
emma:literal element is used as a wrapper when the
application semantics is a string literal. (Section
3.5)
- EMMA annotations (Section 4)
- EMMA annotation elements: These are
EMMA annotations such as
emma:derived-from,
emma:endpoint-info, and emma:info which
are represented as elements so that they can occur more than once
within an element and can contain internal structure. (Section 4.1)
- EMMA annotation attributes: These
are EMMA annotations such as
emma:start,
emma:end , emma:confidence, and
emma:tokens which are represented as attributes. They
can appear on emma:interpretation elements<span>.
S</span>ome can appear on container elements, lattice elements, and
elements in the application-specific markup. (Section 4.2)
<p>From the defined root node
emma:emma the structure
of an EMMA document consists of a tree of EMMA container elements
(
emma:one-of,
emma:sequence,
emma:group) terminating in a number of interpretation
elements (
emma:interpretation). The
emma:interpretation elements serve as wrappers for
either application namespace markup describing the interpretation
of the users input or an
emma:lattice element or
emma:literal element . A single
emma:interpretation may also appear directly under the
root node.</p>
<p>
The EMMA elements
emma:emma,
emma:interpretation,
emma:one-of,
and
emma:literal
and the EMMA attributes
emma:no-input,
emma:uninterpreted,
emma:medium,
and
emma:mode
are required of all
implementations. The remaining elements and attributes are optional
and may be used in some implementations and not other depending on the
specific modalities and processing being represented.
</p>
<p>To illustrate this, here is an example <span class="new">of
an</span> EMMA document <span class="new">representing</span> input
to a flight reservation application. In this example there are two
speech recognition results and associated semantic representations
of the input. The system is uncertain whether the user meant
"flights from Boston to Denver" or "flights from Austin to Denver".
The annotations to be captured are timestamps and confidence scores
for the two inputs.</p>
<p>Example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542"
<span> emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1" emma:confidence="0.75"
emma:tokens="flights from boston to denver">
<origin>Boston</origin>
<destination>Denver</destination>
</emma:interpretation>
<emma:interpretation id="int2" emma:confidence="0.68"
emma:tokens="flights from austin to denver">
<origin>Austin</origin>
<destination>Denver</destination>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>Attributes on the root
emma:emma element indicate
the version and namespace. The
emma:emma element
contains an
emma:one-of element which contains a
disjunctive list of possible interpretations of the input. The
actual semantic representation of each interpretation is within the
application namespace. In the example here the application specific
semantics involves elements
origin and
destination indicating the origin and destination
cities for looking up a flight. The timestamp is the same for both
interpretations and it is annotated using values in milliseconds in
the
emma:start and
emma:end attributes on
the
emma:one-of. The confidence scores and tokens
associated with each of the inputs are annotated using the EMMA
annotation attributes
emma:confidence and
emma:tokens on each of the
emma:interpretation elements.</p>
<h3 id="s2.1">2.<span>1</span> Data model</h3>
<p>An EMMA data model expresses the constraints on the structure
and content of instance data, for the purposes of validation. As
such, the data model may be considered as a particular kind of
annotation (although, unlike other EMMA annotations, it is not a
feature pertaining <span>to</span> a specific user input at a
specific moment in time, it is rather a static and, by its very
definition, application-specific structure). <span>The</span>
specification of <span>a data model</span> in EMMA is optional.</p>
<p>Since Web applications today use different formats to specify
data models, e.g. <span>XML Schema Part 1: Structures Second
Edition</span> [
XML Schema
<span>Structures</span>], XForms <span>1.0 (Second
Edition)</span> [
XFORMS], <span>RELAX NG
Specification</span> [
RELAX-NG], etc., EMMA
itself is agnostic to the format of data model used.</p>
<p>Data model definition and reference is defined in
Section 4.1.1.</p>
<h3 id="s2.2">2.<span>2</span> EMMA namespace prefixes</h3>
<p>An EMMA attribute is qualified with the EMMA namespace prefix if
the attribute can also be used as an in-line annotation on elements
in the application's namespace. Most of the EMMA annotation
attributes in
Section 4.2 are in this category.
An EMMA attribute is not qualified with the EMMA namespace prefix
if the attribute only appears on an EMMA element. This rule ensures
consistent usage of the attributes across all examples.</p>
<p>Attributes from other namespaces are permissible on all EMMA
elements. As an example
xml:lang may be used to
annotate the human language of character data content.</p>
<h2 id="s3">3. EMMA structural elements</h2>
<p>This section defines elements in the EMMA namespace which
provide the structural syntax of EMMA documents.</p>
<h3 id="s3.1">3.1 Root element:
emma:emma</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:emma</th>
</tr>
<tr>
<th>Definition</th>
<td>The root element of an EMMA document.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:emma element MUST immediately contain a
single
emma:interpretation element or EMMA container
element:
emma:one-of,
emma:group,
emma:sequence. It MAY also contain an optional single
emma:derivation element and an optional single
emma:info annotation element. It MAY also contain
multiple optional
emma:grammar annotation elements,
emma:model annotation elements, and
emma:endpoint-info annotation elements.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
version: the version of EMMA used for the
interpretation(s). Interpretations expressed using this
specification MUST use 1.0 for the value.
- Namespace declaration for EMMA, see below.
- Optional:
- any other namespace declarations for application specific
namespaces.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>None</td>
</tr>
</tbody>
</table>
<p>The root element of an EMMA document is named
emma:emma. It holds a single
emma:interpretation or EMMA container element
(
emma:one-of,
emma:sequence,
emma:group). It MAY also contain a single
emma:derivation element containing earlier stages of
the processing of the input (See
Section
4.1.2). It MAY also contain an optional single annotation
element:
emma:info and multiple optional
emma:grammar,
emma:model, and
emma:endpoint-info elements.</p>
<p>It MAY hold attributes for information pertaining to EMMA
itself, along with any namespaces which are declared for the entire
document, and any other EMMA annotative data. The
emma:emma element and other elements and attributes
defined in this specification belong to the XML namespace
identified by the URI "http://www.w3.org/2003/04/emma". In the
examples, the EMMA namespace is generally declared using the
attribute
xmlns:emma on the root
emma:emma element. EMMA processors MUST support the
full range of ways of declaring XML namespaces as defined by the
<span>Namespaces in XML 1.1 (Second Edition)</span> [
XMLNS]. Application markup MAY be declared in an
explicit application namespace, or an undefined namespace
(equivalent to setting xmlns="").</p>
<p>For example:</p>
<pre class="example">
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
....
</emma:emma>
</pre>
<p>or</p>
<pre class="example">
<emma version="1.0" xmlns="http://www.w3.org/2003/04/emma">
....
</emma>
</pre>
<h3 id="s3.2">3.2 Interpretation element:
emma:interpretation</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:interpretation</th>
</tr>
<tr>
<th>Definition</th>
<td>The
emma:interpretation element acts as a wrapper
for application instance data or lattices.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:interpretation element MUST immediately
contain either application instance data, or a single
emma:lattice element, or a single
emma:literal element, or in the case of uninterpreted
input or no input
emma:interpretation
<span>MUST</span> be empty. It MAY also contain <span>multiple
optional</span>
emma:derived-from
element<span>s</span> and <span>an optional single</span>
emma:info <span>element</span>.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required: Attribute
id of type
xsd:ID that uniquely identifies the interpretation
within the EMMA document.
- Optional: The annotation attributes:
emma:tokens, emma:process,
emma:no-input, emma:uninterpreted,
emma:lang, emma:signal,
<span>emma:signal-size</span>,
emma:media-type, emma:confidence,
emma:source, emma:start,
emma:end, emma:time-ref-uri,
emma:time-ref-anchor-point,
emma:offset-to-start, emma:duration,
emma:medium, emma:mode,
emma:function, emma:verbal,
emma:cost, emma:grammar-ref,
emma:endpoint-info-ref, emma:model-ref,
emma:dialog-turn.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:interpretation element is legal only as a
child of
emma:emma,
emma:group,
emma:one-of,
emma:sequence, or
emma:derivation.</td>
</tr>
</tbody>
</table>
<p>The
emma:interpretation element holds a single
interpretation represented in application specific markup, or a
single
emma:lattice element, or a single
emma:literal element.</p>
<p>The
emma:interpretation element MUST be empty if it
is marked with
emma:no-input="true" <span>(
Section 4.2.3)</span>. The
emma:interpretation element <span>MUST</span> be empty
if it has been annotated with
emma:uninterpreted="true" <span>(
Section 4.2.4)</span> or
emma:function="recording" <span>(
Section 4.2.11)</span>.</p>
<p>Attributes:</p>
- id a REQUIRED
xsd:ID value that uniquely
identifies the interpretation within the EMMA document.
<pre class="example">
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="r1" emma:medium="acoustic" emma:mode="voice">
...
</emma:interpretation>
</emma:emma>
</pre>
<p>While
emma:medium and
emma:mode are
optional on
emma:interpretation, note that all EMMA
interpretations must be annotated for
emma:medium and
emma:mode, so either these attributes must appear
directly on
emma:interpretation or they must appear on
an ancestor
emma:one-of node or they must appear on an
earlier stage of the derivation listed in
emma:derivation.</p>
<h3 id="s3.3">3.3 Container elements</h3>
<h3 id="s3.3.1">3.3.1
emma:one-of element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:one-of</th>
</tr>
<tr>
<th>Definition</th>
<td>A container element indicating a disjunction among a collection
of mutually exclusive interpretations of the input.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:one-of element MUST immediately contain a
collection of one or more
emma:interpretation elements
or container elements:
emma:one-of,
emma:group,
emma:sequence . It MAY also
contain <span>multiple optional</span>
emma:derived-from element<span>s</span> and <span>an
optional single</span>
emma:info
<span>element</span>.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
- Attribute
id of type xsd:ID
- The attribute
disjunction-type MUST be present if
emma:one-of is embedded within
emma:one-of. <span>The possible values of
disjunction-type are {recognition,
understanding, multi-device, and
multi-process}.</span>
- Optional:
- On a single non-embedded
emma:one-of the attribute
disjunction-type is optional.
- The following annotation attributes are optional:
emma:tokens, emma:process,
emma:lang, emma:signal,
<span>emma:signal-size</span>,
emma:media-type, emma:confidence,
emma:source, emma:start,
emma:end, emma:time-ref-uri,
emma:time-ref-anchor-point,
emma:offset-to-start, emma:duration,
emma:medium, emma:mode,
emma:function, emma:verbal,
emma:cost, emma:grammar-ref,
emma:endpoint-info-ref, emma:model-ref,
emma:dialog-turn.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:one-of element MAY only appear as a child
of
emma:emma,
emma:one-of,
emma:group,
emma:sequence, or
emma:derivation.</td>
</tr>
</tbody>
</table>
<p>The
emma:one-of element acts as a container for a
collection of one or more interpretation
(
emma:interpretation) or container elements
(
emma:one-of,
emma:group,
emma:sequence), and denotes that these are mutually
exclusive interpretations.</p>
<p>An N-best list of choices in EMMA MUST be represented as a set
of
emma:interpretation elements contained within an
emma:one-of element. For instance, a series of
different recognition results in speech recognition might be
represented in this way.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="r1" <span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>The function of the
emma:one-of element is to
represent a disjunctive list of possible interpretations of a user
input. A disjunction of possible interpretations of an input can be
the result of different kinds of processing or ambiguity. One
source is multiple results from a recognition technology such as
speech or handwriting recognition. Multiple results can also occur
from parsing or understanding natural language. Another possible
source of ambiguity is from the application of multiple different
kinds of recognition or understanding components to the same input
signal. For example, an single ink input signal might be processed
by both handwriting recognition and gesture recognition. Another is
the use of more than one recording device for the same input
(multiple microphones).</p>
<p>In order to make explicit these different kinds of multiple
interpretations and allow for concise statement of the annotations
associated with each, the
emma:one-of element MAY
appear within another
emma:one-of element. If
emma:one-of elements are nested then they MUST
indicate the kind of disjunction using the attribute
disjunction-type. The values of
disjunction-type are
{recognition,
understanding, multi-device, and multi-process}. For the
most common use case, where there are multiple recognition results
and some of them have multiple interpretations, the top-level
emma:one-of is
disjunction-type="recognition" and the embedded
emma:one-of has the attribute
disjunction-type="understanding".</p>
<p>As an example, in an interactive flight reservation application,
recognition yielded 'Boston' or 'Austin' and each had a semantic
interpretation as either the assertion of city name or the
specification of a flight query with the city as the destination,
this would be represented as follows in EMMA:</p>
<pre class="example">
<span>
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of disjunction-type="recognition"
start="12457990" end="12457995"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:one-of disjunction-type="understanding"
emma:tokens="boston">
<emma:interpretation>
<assert><city>boston</city></assert>
</emma:interpretation>
<emma:interpretation>
<flight><dest><city>boston</city></dest></flight>
</emma:interpretation>
</emma:one-of>
<emma:one-of disjunction-type="understanding"
emma:tokens="austin">
<emma:interpretation>
<assert><city>austin</city></assert>
</emma:interpretation>
<emma:interpretation>
<flight><dest><city>austin</city></dest></flight>
</emma:interpretation>
</emma:one-of>
</emma:one-of>
</emma:emma>
</span>
</pre>
<p>EMMA MAY explicitly represent ambiguity resulting from different
processes, devices, or sources using embedded
emma:one-of and the
disjunction-type
attribute. Multiple different interpretations resulting from
different factors MAY also be listed within a single unstructured
emma:one-of though in this case it is more complex or
impossible to uncover the sources of the ambiguity if required by
later stages of processing. If there is no embedding in
emma:one-of, then the
disjunction-type
attribute is not required. If the
disjunction-type
attribute is missing then by default the source of disjunction is
unspecified.</p>
<p>The example case above could also be represented as:</p>
<pre class="example">
<span>
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of start="12457990" end="12457995"
<span> emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation emma:tokens="boston">
<assert><city>boston</city></assert>
</emma:interpretation>
<emma:interpretation >
<flight><dest><city>boston</city></dest></flight>
</emma:interpretation>
<emma:interpretation emma:tokens="austin">
<assert><city>austin</city></assert>
</emma:interpretation>
<emma:interpretation emma:tokens="austin">
<flight><dest><city>austin</city></dest></flight>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</span>
</pre>
<p>But in this case information about which interpretations
resulted from speech recognition and which resulted from language
understanding is lost.</p>
<p>A list of
emma:interpretation elements within an
emma:one-of MUST be sorted best-first by some measure
of quality. The quality measure is
emma:confidence if
present, otherwise, the quality metric is platform-specific.</p>
<p>With embedded
emma:one-of structures there is no
requirement for the confidence scores within different
emma:one-of to be on the same scale. For example, the
scores assigned by handwriting recognition might not be comparable
to those assigned by gesture recognition. Similarly, if multiple
recognizers are used there is no guarantee that their confidence
scores will be comparable. For this reason the ordering requirement
on
emma:interpretation within
emma:one-of
only applies locally to sister
emma:interpretation
elements within each
emma:one-of. There is no
requirement on the ordering of embedded
emma:one-of
elements within a higher
emma:one-of element.</p>
<p>While
emma:medium and
emma:mode are
optional on
emma:one-of, note that all EMMA
interpretations must be annotated for
emma:medium and
emma:mode, so either these annotations must appear
directly on all of the contained
emma:interpretation
elements within the
emma:one-of, or they must appear
on the
emma:one-of element itself, or they must appear
on an ancestor
emma:one-of element, or they must
appear on an earlier stage of the derivation listed in
emma:derivation.</p>
<h3 id="s3.3.2">3.3.2
emma:group element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:group</th>
</tr>
<tr>
<th>Definition</th>
<td>A container element indicating that a number of interpretations
of distinct user inputs are grouped according to some
criteria.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:group element MUST immediately contain a
collection of one or more
emma:interpretation elements
or container elements:
emma:one-of,
emma:group,
emma:sequence . It MAY also
contain an <span>optional single</span>
emma:group-info element. It MAY also contain
<span>multiple optional</span>
emma:derived-from
element<span>s</span> and <span>an optional single</span>
emma:info <span>element</span>.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required: Attribute
id of type
xsd:ID
- Optional: The annotation attributes:
emma:tokens, emma:process,
emma:lang, emma:signal,
<span>emma:signal-size</span>,
emma:media-type, emma:confidence,
emma:source, emma:start,
emma:end, emma:time-ref-uri,
emma:time-ref-anchor-point,
emma:offset-to-start, emma:duration,
emma:medium, emma:mode,
emma:function, emma:verbal,
emma:cost, emma:grammar-ref,
emma:endpoint-info-ref, emma:model-ref,
emma:dialog-turn.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:group element is legal only as a child of
emma:emma,
emma:one-of,
emma:group,
emma:sequence, or
emma:derivation.</td>
</tr>
</tbody>
</table>
<p>The
emma:group element is used to indicate that the
contained interpretations are from distinct user inputs that are
related in some manner.
emma:group MUST NOT be used
for containing the multiple stages of processing of a single user
input. Those MUST be contained in the
emma:derivation
element instead <span>(
Section 4.1.2)</span>.
For groups of inputs in temporal order the more specialized
container
emma:sequence MUST be used <span>(
Section 3.3.3)</span>. The following example shows
three interpretations derived from the speech input "Move this
ambulance here" and the tactile input related to two consecutive
points on a map.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:group id="grp"
emma:start="1087995961542"
emma:end="1087995964542">
<emma:interpretation id="int1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<action>move</action>
<object>ambulance</object>
<destination>here</destination>
</emma:interpretation>
<emma:interpretation id="int2"
<span>emma:medium="tactile" emma:mode="ink"</span>>
<x>0.253</x>
<y>0.124</y>
</emma:interpretation>
<emma:interpretation id="int3"
<span>emma:medium="tactile" emma:mode="ink"</span>>
<x>0.866</x>
<y>0.724</y>
</emma:interpretation>
</emma:group>
</emma:emma>
</pre>
<p>The
emma:one-of and
emma:group
containers MAY be nested arbitrarily.</p>
<h4 id="s3.3.2.1">3.3.2.1 Indirect grouping criteria:
emma:group-info element</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:group-info</th>
</tr>
<tr>
<th>Definition</th>
<td>The
emma:group-info element contains or references
criteria used in establishing the grouping of interpretations in an
emma:group element.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:group-info element MUST either
immediately contain inline instance data specifying grouping
criteria or have the attribute
ref referencing the
criteria.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Optional:
ref of type
xsd:anyURI referencing the grouping criteria;
alternatively the criteria MAY be provided inline as the content of
the emma:group-info element.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:group-info element is legal only as a
child of
emma:group.</td>
</tr>
</tbody>
</table>
<p>Sometimes it may be convenient to indirectly associate a given
group with information, such as grouping criteria. The
emma:group-info element might be used to make explicit
the criteria by which members of a group are associated. In the
following example, a group of two points is associated with a
description of grouping criteria based upon a sliding temporal
window of two seconds duration.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:ex="http://www.example.com/ns/group">
<emma:group id="grp">
<emma:group-info>
<ex:mode>temporal</ex:mode>
<ex:duration>2s</ex:duration>
</emma:group-info>
<emma:interpretation id="int1"
<span> emma:medium="tactile" emma:mode="ink"</span>>
<x>0.253</x>
<y>0.124</y>
</emma:interpretation>
<emma:interpretation id="int2"
<span>emma:medium="tactile" emma:mode="ink"</span>>
<x>0.866</x>
<y>0.724</y>
</emma:interpretation>
</emma:group>
</emma:emma>
</pre>
<p>You might also use
emma:group-info to refer to a
named grouping criterion using external reference, for
instance:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:ex="http://www.example.com/ns/group">
<emma:group id="grp">
<emma:group-info ref="http://www.example.com/criterion42"/>
<emma:interpretation id="int1"
<span>emma:medium="tactile" emma:mode="ink"</span>>
<x>0.253</x>
<y>0.124</y>
</emma:interpretation>
<emma:interpretation id="int2"
<span>emma:medium="tactile" emma:mode="ink"</span>>
<x>0.866</x>
<y>0.724</y>
</emma:interpretation>
</emma:group>
</emma:emma>
</pre>
<h3 id="s3.3.3">3.3.3
emma:sequence element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:sequence</th>
</tr>
<tr>
<th>Definition</th>
<td>A container element indicating that a number of interpretations
of distinct user inputs are in temporal sequence.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:sequence element MUST immediately contain
a collection of one or more
emma:interpretation
elements or container elements:
emma:one-of,
emma:group,
emma:sequence . It MAY also
contain <span>multiple optional</span>
emma:derived-from element<span>s</span> and <span>an
optional single</span>
emma:info
<span>element</span>.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required: Attribute
id of type
xsd:ID
- Optional: The annotation attributes:
emma:tokens, emma:process,
emma:lang, emma:signal,
<span>emma:signal-size</span>,
emma:media-type, emma:confidence,
emma:source, emma:start,
emma:end, emma:time-ref-uri,
emma:time-ref-anchor-point,
emma:offset-to-start, emma:duration,
emma:medium, emma:mode,
emma:function, emma:verbal,
emma:cost, emma:grammar-ref,
emma:endpoint-info-ref, emma:model-ref,
emma:dialog-turn.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:sequence element is legal only as a child
of
emma:emma,
emma:one-of,
emma:group,
emma:sequence, or
emma:derivation.</td>
</tr>
</tbody>
</table>
<p>The
emma:sequence element is used to indicate that
the contained interpretations are sequential in time, as in the
following example, which indicates that two points made with a pen
are in temporal order.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:sequence id="seq1">
<emma:interpretation id="int1"
<span>emma:medium="tactile"</span> emma:mode="ink">
<x>0.253</x>
<y>0.124</y>
</emma:interpretation>
<emma:interpretation id="int2"
<span>emma:medium="tactile"</span> emma:mode="ink">
<x>0.866</x>
<y>0.724</y>
</emma:interpretation>
</emma:sequence>
</emma:emma>
</pre>
<p>The
emma:sequence container MAY be combined with
emma:one-of and
emma:group in arbitrary
nesting structures. The order of children in the content of the
emma:sequence element corresponds to a sequence of
interpretations. This ordering does not imply any particular
definition of sequentiality. EMMA processors are expected therefore
to use the
emma:sequence element to hold
interpretations which are either strictly sequential in nature
(e.g. the end-time of an interpretation precedes the start-time of
its follower), or which overlap in some manner (e.g. the start-time
of a follower interpretation precedes the end-time of its
precedent). It is possible to use timestamps to provide fine
grained annotation for the sequence of interpretations that are
sequential in time <span>(see
Section
4.2.10)</span>.</p>
<p>In the following more complex example, a sequence of two pen
gestures in
emma:sequence and a speech input in
emma:interpretation <span>is</span> contained in an
emma:group.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:group id="grp">
<emma:interpretation id="int1" emma:medium="acoustic"
emma:mode="voice">
<action>move</action>
<object>this-battleship</object>
<destination>here</destination>
</emma:interpretation>
<emma:sequence id="seq1">
<emma:interpretation id="int2" emma:medium="tactile"
emma:mode="ink">
<x>0.253</x>
<y>0.124</y>
</emma:interpretation>
<emma:interpretation id="int3" emma:medium="tactile"
emma:mode="ink">
<x>0.866</x>
<y>0.724</y>
</emma:interpretation>
</emma:sequence>
</emma:group>
</emma:emma>
</pre>
<h3 id="s3.4">3.4 Lattice element</h3>
<p>In addition to providing the ability to represent N-best lists
of interpretations using
emma:one-of, EMMA also
provides the capability to represent lattices of words or other
symbols using the
emma:lattice element. Lattices
provide a compact representation of large lists of possible
recognition results or interpretations for speech, pen, or
multimodal inputs.</p>
<p>In addition to providing a representation for lattice output
from speech recognition, another important use case for lattices is
for representation of the results of gesture and handwriting
recognition from a pen modality component. Lattices can also be
used to compactly represent multiple possible meaning
representations. Another use case for the lattice representation is
for associating confidence scores and other annotations with
individual words within a speech recognition result string.</p>
<p>Lattices are compactly described by a list of transitions
between nodes. For each transition the start and end nodes MUST be
defined, along with the label for the transition. Initial and final
nodes MUST also be indicated. The following figure provides a
graphical representation of a speech recognition lattice which
compactly represents eight different sequences of words.</p>
<p><img alt="speech lattice" src="lattice.png" /></p>
<p>which expands to:</p>
<pre>
a. flights to boston from portland today please
b. flights to austin from portland today please
c. flights to boston from oakland today please
d. flights to austin from oakland today please
e. flights to boston from portland tomorrow
f. flights to austin from portland tomorrow
g. flights to boston from oakland tomorrow
h. flights to austin from oakland tomorrow
</pre>
<h4 id="s3.4.1">3.4.1 Lattice markup:
emma:lattice,
emma:arc,
emma:node elements</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:lattice</th>
</tr>
<tr>
<th>Definition</th>
<td>An element which encodes a lattice representation of user
input.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:lattice element MUST immediately contain
one or more
emma:arc elements and zero or more
emma:node elements.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
initial <span>of type
xsd:nonNegativeInteger</span> indicating the number of
the initial node of the lattice.
final contains a space-separated list of
xsd:nonNegativeInteger indicating the numbers of the
final nodes in the lattice.
- Optional:
emma:time-ref-uri,
emma:time-ref-anchor-point.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:lattice element is legal only as a child
of the
emma:interpretation element.</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:arc</th>
</tr>
<tr>
<th>Definition</th>
<td>An element which encodes a transition between two nodes in a
lattice. The label associated with the arc in the lattice is
represented in the content of
emma:arc.</td>
</tr>
<tr>
<th>Children</th>
<td>The
emma:arc element MUST immediately contain
either character data or a single application namespace element or
be empty, in the case of epsilon transitions. It MAY contain an
emma:info element containing application or vendor
specific annotations.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
from <span>of type
xsd:nonNegativeInteger</span> indicating the number of
the starting node for the arc.
to <span>of type
xsd:nonNegativeInteger</span> indicating the number of
the ending node for the arc.
- Optional:
emma:start,
emma:end, emma:offset-to-start,
emma:duration, emma:confidence,
emma:cost, emma:lang,
emma:medium, emma:mode,
emma:source.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:arc element is legal only as a child of
the
emma:lattice element.</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:node</th>
</tr>
<tr>
<th>Definition</th>
<td>An element which represents a node in the lattice. The
emma:node elements are not required to describe a
lattice but might be added to provide a location for annotations on
nodes in a lattice. There MUST be at most one
emma:node specification for each numbered node in the
lattice.</td>
</tr>
<tr>
<th>Children</th>
<td>An OPTIONAL
emma:info element for application or
vendor specific annotations on the node.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
node-number <span>of type
xsd:nonNegativeInteger</span> indicating the
<span>node number</span> in the lattice.
- Optional:
emma:confidence,
emma:cost.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:node element is legal only as a child of
the
emma:lattice element.</td>
</tr>
</tbody>
</table>
<p>In EMMA, a lattice is represented using an element
emma:lattice, which has attributes
initial and
final for indicating the
initial and final nodes of the lattice. For the lattice
<span>below</span>, this will be:
<emma:lattice
initial="1" final="8"/>. The nodes are numbered with
integers. If there is more than one distinct final node in the
lattice the nodes MUST be represented as a space separated list in
the value of the
final attribute e.g.
<emma:lattice initial="1" final="9 10 23"/>.
There MUST only be one initial node in an EMMA lattice. Each
transition in the lattice is represented as an element
emma:arc with attributes
from and
to which indicate the nodes where the transition
starts and ends. The arc's label is represented as the content of
the
emma:arc element and MUST be any well-formed
character or XML content. In the example here the contents are
words. Empty (epsilon) transitions in a lattice MUST be represented
in the
emma:lattice representation as
emma:arc <span>empty</span> elements, e.g.
<emma:arc from="1" to="8"/>.</p>
<p>The example speech lattice above would be represented in EMMA
markup as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice initial="1" final="8">
<emma:arc from="1" to="2">flights</emma:arc>
<emma:arc from="2" to="3">to</emma:arc>
<emma:arc from="3" to="4">boston</emma:arc>
<emma:arc from="3" to="4">austin</emma:arc>
<emma:arc from="4" to="5">from</emma:arc>
<emma:arc from="5" to="6">portland</emma:arc>
<emma:arc from="5" to="6">oakland</emma:arc>
<emma:arc from="6" to="7">today</emma:arc>
<emma:arc from="7" to="8">please</emma:arc>
<emma:arc from="6" to="8">tomorrow</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<p>Alternatively, if we wish to represent the same information as
an N-best list using
emma:one-of, we would have the
more verbose representation:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest1" <span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="interp1">
<text>flights to boston from portland today please</text>
</emma:interpretation>
<emma:interpretationid="interp2">
<text>flights to boston from portland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp3">
<text>flights to austin from portland today please</text>
</emma:interpretation>
<emma:interpretation id="interp4">
<text>flights to austin from portland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp5">
<text>flights to boston from oakland today please</text>
</emma:interpretation>
<emma:interpretation id="interp6">
<text>flights to boston from oakland tomorrow</text>
</emma:interpretation>
<emma:interpretation id="interp7">
<text>flights to austin from oakland today please</text>
</emma:interpretation>
<emma:interpretation id="interp8">
<text>flights to austin from oakland tomorrow</text>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>The lattice representation avoids the need to enumerate all of
the possible word sequences. Also, as detailed below, the
emma:lattice representation enables placement of
annotations on individual words in the input.</p>
<p>For use cases involving the representation of gesture/ink
lattices and use cases involving lattices of semantic
interpretations, EMMA allows for application namespace elements to
appear within
emma:arc.</p>
<p>For example a sequence of two gestures, each of which is
recognized as either a line or a circle<span>,</span> might be
represented as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice initial="1" final="3">
<emma:arc from="1" to="2">
<circle radius="100"/>
</emma:arc>
<emma:arc from="2" to="3">
<line length="628"/>
</emma:arc>
<emma:arc from="1" to="2">
<circle radius="200"/>
</emma:arc>
<emma:arc from="2" to="3">
<line length="1256"/>
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<p>As an example of a lattice of semantic interpretations, in a
travel application where the source is either "Boston" or
"Austin"and the destination is either "Newark" or "New York", the
possibilities might be represented in a lattice as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice initial="1" final="3">
<emma:arc from="1" to="2">
<source city="boston"/>
</emma:arc>
<emma:arc from="2" to="3">
<destination city="newark"/>
</emma:arc>
<emma:arc from="1" to="2">
<source city="austin"/>
</emma:arc>
<emma:arc from="2" to="3">
<destination city="new york"/>
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:arc element MAY contain either an
application namespace element or character data. It MUST NOT
contain combinations of application namespace elements and
character data. However, an
emma:info element MAY
appear within an
emma:arc element alongside character
data, in order to allow for the association of vendor or
application specific annotations on a single word or symbol in a
lattice.</p>
<p>So, in summary, there are four groupings of content that can
appear within
emma:arc:</p>
- Character Data e.g. a recognized word in a speech lattice.
- Character Data and a single
emma:info element
providing vendor or application specific annotations that apply to
the character data.
- An application namespace element e.g. the gesture and
<span>semantic interpretation</span> lattice examples above.
- An application namespace element and a single
emma:info element providing vendor or application
specific annotations that apply to the character data.
<h4 id="s3.4.2">3.4.2 Annotations on lattices</h4>
<p>The encoding of lattice arcs as XML elements
(
emma:arc) enables arcs to be annotated with metadata
such as timestamps, costs, or confidence scores:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice initial="1" final="8">
<emma:arc
from="1"
to="2"
emma:start="1087995961542"
emma:end="1087995962042"
emma:cost="30">
flights
</emma:arc>
<emma:arc
from="2"
to="3"
emma:start="1087995962042"
emma:end="1087995962542"
emma:cost="20">
to
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995962542"
emma:end="1087995963042"
emma:cost="50">
boston
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995963042"
emma:end="1087995963742"
emma:cost="60">
austin
</emma:arc>
...
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<p>The following EMMA attributes MAY be placed on
emma:arc elements: absolute timestamps
(
emma:start,
emma:end), relative
timestamps (
emma:offset-to-start,
emma:duration),
emma:confidence,
emma:cost, the human language of the input
(
emma:lang),
emma:medium,
emma:mode, and
emma:source. The use case
for
emma:medium,
emma:mode, and
emma:source is for lattices which contains content
from different input modes. The
emma:arc element MAY
also contain an
emma:info element for specification of
vendor and application specific annotations on the arc.</p>
<p>The timestamps that appear on
emma:arc elements do
not necessarily indicate the start and end of the arc itself. They
MAY indicate the start and end of the signal corresponding to the
label on the arc. As a result there is no requirement that the
emma:end timestamp on an arc going into a node should
be equivalent to the
emma:start of all arcs going out
of that node. Furthermore there is no guarantee that the left to
right order of arcs in a lattice will correspond to the temporal
order of the input signal. The lattice representation is an
abstraction that represents a range of possible interpretations of
a user's input and is not intended to necessarily be a
representation of temporal order.</p>
<p>Costs are typically application and device dependent. There are
a variety of ways that individual arc costs might be combined to
produce costs for specific paths through the lattice. This
specification does not standardize the way for these costs to be
combined; it is up to the applications and devices to determine how
such derived costs would be computed and used.</p>
<p>For some lattice formats, it is also desirable to annotate the
nodes in the lattice themselves with information such as costs. For
example in speech recognition, costs might be placed on nodes as a
result of word penalties or redistribution of costs. For this
purpose EMMA also provides an
emma:node element which
can host annotations such as
emma:cost. The
emma:node element MUST have an attribute
node-number which indicates the number of the node.
There MUST be at most one
emma:node specification for
a given numbered node in the lattice. In our example, if there was
a cost of
100 on the final state this could be represented
as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice initial="1" final="8">
<emma:arc
from="1"
to="2"
emma:start="1087995961542"
emma:end="1087995962042"
emma:cost="30">
flights
</emma:arc>
<emma:arc
from="2"
to="3"
emma:start="1087995962042"
emma:end="1087995962542"
emma:cost="20">
to
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995962542"
emma:end="1087995963042"
emma:cost="50">
boston
</emma:arc>
<emma:arc
from="3"
to="4"
emma:start="1087995963042"
emma:end="1087995963742"
emma:cost="60">
austin
</emma:arc>
...
<emma:node node-number="8" emma:cost="100"/>
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<h4 id="s3.4.3">3.4.3 Relative timestamps on lattices</h4>
<p>The relative timestamp mechanism in EMMA is intended to provide
temporal information about arcs in a lattice in relative terms
using offsets in milliseconds. In order to do this the absolute
time MAY be specified on
emma:interpretation; both
emma:time-ref-uri and
emma:time-ref-anchor-point apply to
emma:lattice and MAY be used there to set the anchor
point for offsets to the start of the absolute time specified on
emma:interpretation. The offset in milliseconds to the
beginning of each arc MAY then be indicated on each
emma:arc in the
emma:offset-to-start
attribute.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1"
emma:start="1087995961542" emma:end="1087995963042"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:lattice emma:time-ref-uri="#interp1"
emma:time-ref-anchor-point="start"
initial="1" final="4">
<emma:arc
from="1"
to="2"
emma:offset-to-start="0">
flights
</emma:arc>
<emma:arc
from="2"
to="3"
emma:offset-to-start="500">
to
</emma:arc>
<emma:arc
from="3"
to="4"
emma:offset-to-start="1000">
boston
</emma:arc>
</emma:lattice>
</emma:interpretation>
</emma:emma>
</pre>
<p>Note that the offset for the first
emma:arc MUST
always be zero since the EMMA attribute
emma:offset-to-start indicates the number of
milliseconds from the anchor point to the
start of the piece
of input associated with the
emma:arc, in this case
the word "flights".</p>
<h3 id="s3.5">3.5 Literal semantics:
emma:literal
element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:literal</th>
</tr>
<tr>
<th>Definition</th>
<td>An element that contains string literal output.</td>
</tr>
<tr>
<th>Children</th>
<td>String literal</td>
</tr>
<tr>
<th>Attributes</th>
<td>None.</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:literal is a child of
emma:interpretation.</td>
</tr>
</tbody>
</table>
<p>Certain EMMA processing components produce semantic results in
the form of string literals without any surrounding application
namespace markup. These MUST be placed with the EMMA element
emma:literal within
emma:interpretation.
For example, if a semantic interpreter simply returned "boston"
this could be represented in EMMA as:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation <span>id="r1" <br />
emma:medium="acoustic" emma:mode="voice"</span>>
<emma:literal>boston</emma:literal>
</emma:interpretation>
</emma:emma>
</pre>
<p>Note that a raw recognition result of a sequence of words from
speech recognition is also a kind of string literal and can be
contained within
emma:literal. For example,
recognition of the string "flights to san francisco" can be
represented in EMMA as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation <span>id="r1" <br />
emma:medium="acoustic" emma:mode="voice"</span>>
<emma:literal>flights to san francisco</emma:literal>
</emma:interpretation>
</emma:emma>
</pre>
<h2 id="s4">4. EMMA annotations</h2>
<p>This section defines annotations in the EMMA namespace including
both attributes and elements. The values are specified in terms of
the data types defined by XML Schema Part 2: Datatypes <span>Second
Edition</span> [
<span>XML Schema
Datatypes</span>].</p>
<h3 id="s4.1">4.1 EMMA annotation elements</h3>
<h4 id="s4.1.1">4.1.1 Data model:
emma:model
element</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:model</th>
</tr>
<tr>
<th>Definition</th>
<td>The
emma:model either references or provides
inline the data model for the instance data.</td>
</tr>
<tr>
<th>Children</th>
<td>If a
ref attribute is not specified then this
element contains the data model inline.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
- Optional:
ref of type xsd:anyURI that
references the data model. Note that either an ref
attribute or in-line data model (but not both) MUST be
specified.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:model element MAY appear only as a child
of
emma:emma.</td>
</tr>
</tbody>
</table>
<p>The data model that may be used to express constraints on the
structure and content of instance data is specified as one of the
annotations of the instance. Specifying the data model is OPTIONAL,
in which case the data model can be said to be implicit. Typically
the data model is pre-established by the application.</p>
<p>The data model is specified with the
emma:model
annotation defined as an element in the EMMA namespace. If the data
model for the contents of a
emma:interpretation,
container elements, or application namespace element is to be
specified in EMMA, the attribute
emma:model-ref MUST
be specified on the
emma:interpretation, container
element, or application namespace element. Note that since multiple
emma:model elements might be specified under the
emma:emma it is possible to refer to multiple data
models within a single EMMA document. For example, different
alternative interpretations under an
emma:one-of might
have different data models. In this case, an
emma:model-ref attribute would appear on each
emma:interpretation element in the N-best list with
its value being the
id of the
emma:model
element for that particular interpretation.</p>
<p>The data model is closely related to the interpretation data,
and is typically specified as the annotation related to the
emma:interpretation or
emma:one-of
elements.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:model id="model1" ref="http://example.com/models/city.xml"/>
<emma:interpretation id="int1" emma:model-ref="model1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<city> London </city>
<country> UK </country>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:model annotation MAY reference any element
or attribute in the application instance data, as well as any EMMA
container element (
emma:one-of,
emma:group, or
emma:sequence).</p>
<p>The data model annotation MAY be used to either reference an
external data model with the
ref attribute or provide
a data model as in-line content. Either a
ref
attribute or in-line data model (but not both) MUST be
specified.</p>
<h4 id="s4.1.2">4.1.2 Interpretation derivation:
emma:derived-from element and
emma:derivation element</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:derived-from</th>
</tr>
<tr>
<th>Definition</th>
<td>An empty element which provides a reference to the
interpretation which the element it appears on was derived
from.</td>
</tr>
<tr>
<th>Children</th>
<td>None</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
resource of type xsd:anyURI that
references the interpretation from which the current interpretation
is derived.
- Optional:
composite of type xsd:boolean that is
"true" if the derivation step combines multiple inputs
and "false" if not. If composite is not
specified the value is "false" by default.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:derived-from element is legal only as a
child of
emma:interpretation,
emma:one-of,
emma:group, or
emma:sequence.</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:derivation</th>
</tr>
<tr>
<th>Definition</th>
<td>An element which contains interpretation and container elements
representing earlier stages in the processing of the input.</td>
</tr>
<tr>
<th>Children</th>
<td>One or more
emma:interpretation,
emma:one-of,
emma:sequence, or
emma:group elements.</td>
</tr>
<tr>
<th>Attributes</th>
<td>None</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:derivation MAY appear only as a child of
the
emma:emma element.</td>
</tr>
</tbody>
</table>
<p>Instances of interpretations are in general derived from other
instances of interpretation in a process that goes from raw data to
increasingly refined representations of the input. The derivation
annotation is used to link any two interpretations that are related
by representing the source and the outcome of an interpretation
process. For instance, a speech recognition process can return the
following result in the form of raw text:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="raw"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
</emma:emma>
</pre>
<p>A first interpretation process will produce:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="better"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>A second interpretation process, aware of the current date, will
be able to produce a more refined instance, such as:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="best"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Boston</origin>
<destination>Denver</destination>
<date>20030315</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>The interaction manager might need to have access to the three
levels of interpretation. The
emma:derived-from
annotation element can be used to establish a chain of derivation
relationships as in the following example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"<br />
<span> emma:medium="acoustic" emma:mode="voice"</span>>
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>20030315</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:derivation element MAY be used as a
container for representations of the earlier stages in the
interpretation of the input. The latest stage of processing MUST be
a direct child of
emma:emma.</p>
<p>The resource attribute on
emma:derived-from is a
URI which can reference IDs in the current or other EMMA
documents.</p>
<p>In addition to representing sequential derivations, the EMMA
emma:derived-from element can also be used to capture
composite derivations. Composite derivations involve combination of
inputs from different modes.</p>
<p>In order to indicate whether an
emma:derived-from
element describes a sequential derivation step or a composite
derivation step, the
emma:derived-from element has an
attribute
composite which has a boolean value. A
composite
emma:derived-from MUST be marked as
composite="true" while a sequential
emma:derived-from element is marked as
composite="false". If this attribute is not specified
the value is
false by default.</p>
<p>In the following composite derivation example the user said
"destination" using the voice mode and circled Boston on a map
using the ink mode:</p>
<div>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="voice1"
emma:start="1087995961500"
emma:end="1087995962542"
emma:process="http://example.com/myasr.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.6"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:lang="en-US"
emma:tokens="destination">
<rawinput>destination</rawinput>
</emma:interpretation>
<emma:interpretation id="ink1"
emma:start="1087995961600"
emma:end="1087995964000"
emma:process="http://example.com/mygesturereco.xml"
emma:source="http://example.com/pen/wacom123"
emma:signal="http://example.com/signals/ink5.inkml"
emma:confidence="0.5"
emma:medium="tactile"
emma:mode="ink"
emma:function="dialog"
emma:verbal="false">
<rawinput>Boston</rawinput>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="multimodal1"
emma:confidence="0.3"
<span>emma:start="1087995961500"</span>
<span>emma:end="1087995964000"</span>
emma:medium="<span>acoustic tactile</span>"
emma:mode="<span>voice ink</span>"
emma:function="dialog"
emma:verbal="true"
emma:lang="en-US"
emma:tokens="destination">
<emma:derived-from resource="#voice1" composite="true"
<emma:derived-from resource="#ink1" composite="true"
<destination>Boston</destination>
</emma:interpretation>
</emma:emma>
</pre></div>
<p>In this example, annotations on the multimodal interpretation
indicate the process used for the integration and there are two
emma:derived-from elements, one pointing to the speech
and one pointing to the pen gesture.</p>
<p>The only constraints the EMMA specification places on the
annotations that appear on a composite input are that the
emma:medium attribute MUST contain the union of the
emma:medium attributes on the combining inputs,
represented as a space delimited set of
nmtokens as
defined in
Section 4.2.11, and that the
emma:mode attribute MUST contain the union of the
emma:mode attributes on the combining inputs,
represented as a space delimited set of <span>
nmtokens
as defined in
Section 4.2.11</span>. In the
example above this meanings that the
emma:medium value
is
"acoustic tactile" and the
emma:mode
attribute is
"voice ink". How all other annotations
are handled is author defined. In the following paragraph,
informative examples on how specific annotations might be handled
are given.</p>
<p>With reference to the illustrative example above, this paragraph
provides informative guidance regarding the determination of
annotations (beyond
emma:medium and
emma:mode on a composite multimodal interpretation).
Generally the timestamp on a combined input should contain the
intervals indicated by the combining inputs. For the absolute
timestamps
emma:start and
emma:end this
can be achieved by taking the earlier of the
emma:start values
(
emma:start="1087995961500" in our example) and the
later of the
emma:end values
(
emma:end="1087995964000" in the example). The
determination of relative timestamps for composite is more complex,
informative guidance is given in
Section
4.2.10.4. Generally speaking the
emma:confidence
value will be some numerical combination of the confidence scores
assigned to the combining inputs. In our example, it is the result
of multiplying the voice and ink confidence scores
(
0.3). In other cases there may not be a confidence
score for one of the combining inputs and the author may choose to
copy the confidence score from the input which does have one.
Generally, for
emma:verbal, if either of the inputs
has the value
true then the multimodal interpretation
will also be
emma:verbal="true" as in the example. In
other words the annotation for the composite input is the result of
an inclusive OR of the boolean values of the annotations on the
inputs. If an annotation is only specified on one of the combining
inputs then it may in some cases be assumed to apply to the
multimodal interpretation of the composite input. In the example,
emma:lang="en-US" is only specified for the speech
input, and this annotation appears on the composite result also.
Similarly in our example, only the voice has
emma:tokens and the author has chosen to annotate the
combined input with the same
emma:tokens value. In
this example, the
emma:function is the same on both
combining input and the author has chosen to use the same
annotation on the composite interpretation.</p>
<p>In annotating derivations of the processing of the input, EMMA
provides the flexibility of both course-grained or fine-grained
annotation of relations among interpretations. For example, when
relating two N-best lists, within
emma:one-of elements
either there can be a single
emma:derived-from element
under
emma:one-of referring to the ID of the
emma:one-of for the earlier processing stage:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:one-of id="nbest1"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1">
<res>from boston to denver on march eleven two thousand three</res>
</emma:interpretation>
<emma:interpretation id="int2">
<res>from austin to denver on march eleven two thousand three</res>
</emma:interpretation>
</emma:one-of>
</emma:derivation>
<emma:one-of id="nbest2">
<emma:derived-from resource="#nbest1" composite="false"/>
<emma:interpretation id="int1b">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2b">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>Or there can be a separate
emma:derived-from
element on each
emma:interpretation element referring
to the specific
emma:interpretation element it was
derived from.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest2">
<emma:interpretation id="int1b">
<emma:derived-from resource="#int1" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2b">
<emma:derived-from resource="#int2" composite="false"/>
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
<emma:derivation>
<emma:one-of id="nbest1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1">
<res>from boston to denver on march eleven two thousand three</res>
</emma:interpretation>
<emma:interpretation id="int2">
<res>from austin to denver on march eleven two thousand three</res>
</emma:interpretation>
</emma:one-of>
</emma:derivation>
</emma:emma>
</pre>
<p>
Section 4.3 provides further examples of the
use of
emma:derived-from to represent sequential
derivations and addresses the issue of the scope of EMMA
annotations across derivations of user input.</p>
<h4 id="s4.1.3">4.1.3 Reference to grammar used:
emma:grammar element</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:grammar</th>
</tr>
<tr>
<th>Definition</th>
<td>An element used to provide a reference to the grammar used in
processing the input.</td>
</tr>
<tr>
<th>Children</th>
<td>None</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
<span>ref</span> of type xsd:anyURI
that references a grammar used in processing the input.
id of type xsd:ID.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:grammar is legal only as a child of the
emma:emma element.</td>
</tr>
</tbody>
</table>
<p>The grammar that was used to derive the EMMA result MAY be
specified with the
emma:grammar annotation defined as
an element in the EMMA namespace.</p>
<p>Example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:grammar id="gram1" <span>ref</span>="someURI"/>
<emma:grammar id="gram2" <span>ref</span>="anotherURI"/>
<emma:one-of id="r1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1" emma:grammar-ref="gram1">
<origin>Boston</origin>
</emma:interpretation>
<emma:interpretation id="int2" emma:grammar-ref="gram1">
<origin>Austin</origin>
</emma:interpretation>
<emma:interpretation id="int3" emma:grammar-ref="gram2">
<command>help</command>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>The
emma:grammar annotation is a child of
emma:emma.</p>
<h3 id="s4.1.4">4.1.4 Extensibility to application/vendor specific
annotations:
emma:info element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:info</th>
</tr>
<tr>
<th>Definition</th>
<td>The
emma:info element acts as a container for
vendor and/or application specific metadata regarding a user's
input.</td>
</tr>
<tr>
<th>Children</th>
<td><span>One of more</span> elements in the application namespace
providing metadata about the input.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:info element is legal only as a child of
the EMMA elements
emma:emma,
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:arc, or
emma:node.</td>
</tr>
</tbody>
</table>
<p>In
Section 4.2, a series of attributes are
defined for representation of metadata about user inputs in a
standardized form. EMMA also provides an extensibility mechanism
for annotation of user inputs with vendor or application specific
metadata not covered by the standard set of EMMA annotations. The
element
emma:info MUST be used as a container for
these annotations, UNLESS they are explicitly covered by
emma:endpoint-info. For example, if an input to a
dialog system needed to be annotated with the number that the call
originated from, their state, some indication of the type of
customer, and the name of the service, these pieces of information
could be represented within
emma:info as in the
following example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:info>
<caller_id>
<phone_number>2121234567</phone_number>
<state>NY</state>
</caller_id>
<customer_type>residential</customer_type>
<service_name>acme_travel_service</service_name>
</emma:info>
<emma:one-of id="r1" emma:start="1087995961542"
emma:end="1087995963542"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1" emma:confidence="0.75">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
<emma:interpretation id="int2" emma:confidence="0.68">
<origin>Austin</origin>
<destination>Denver</destination>
<date>03112003</date>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>It is important to have an EMMA container element for
application/vendor specific annotations since EMMA elements provide
a structure for representation of multiple possible interpretations
of the input. As a result it is cumbersome to state
application/vendor specific metadata as part of the application
data within each
emma:interpretation. An element is
used rather than an attribute so that internal structure can be
given to the annotations within
emma:info.</p>
<p>In addition to
emma:emma,
emma:info
MAY also appear as a child of other structural elements such as
emma:interpretation,
emma:info and so on.
When
emma:info appears as a child of one of these
elements the application/vendor specific annotations contained
within
emma:info are assumed to apply to all of the
emma:interpretation elements within the containing
element. The semantics of conflicting annotations in
emma:info, for example when different values are found
within
emma:emma and
emma:interpretation,
are left to the developer of the vendor/application specific
annotations.</p>
<h3 id="s4.1.5" class="notoc">4.1.5 Endpoint reference:
emma:endpoint-info element and
emma:endpoint element</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:endpoint-info</th>
</tr>
<tr>
<th>Definition</th>
<td>The
emma:endpoint-info element acts as a container
for all application specific annotation regarding the communication
environment.</td>
</tr>
<tr>
<th>Children</th>
<td>One or more
emma:endpoint elements.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
</td>
</tr>
<tr>
<th>Applies to</th>
<td>The
emma:endpoint-info elements is legal only as a
child of
emma:emma.</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:endpoint</th>
</tr>
<tr>
<th>Definition</th>
<td>The element acts as a container for application specific
endpoint information.</td>
</tr>
<tr>
<th>Children</th>
<td>Elements in the application namespace providing metadata about
the input.</td>
</tr>
<tr>
<th>Attributes</th>
<td>
- Required:
- Optional:
emma:endpoint-role,
emma:endpoint-address, emma:message-id,
emma:port-num, emma:port-type,
emma:endpoint-pair-ref,
emma:service-name, emma:media-type,
emma:medium, emma:mode.
</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint-info</td>
</tr>
</tbody>
</table>
<p>In order to conduct multimodal interaction, there is a need in
EMMA to specify the properties of the endpoint that receives the
input which leads to the EMMA annotation. This allows subsequent
components to utilize the endpoint properties as well as the
annotated inputs to conduct meaningful multimodal interaction. EMMA
element
emma:endpoint can be used for this purpose. It
can specify the endpoint properties based on a set of common
endpoint property attributes in EMMA, such as
emma:endpoint-address,
emma:port-num,
emma:port-type, etc. (
Section
4.2.14). Moreover, it provides an extensible annotation
structure that allows the inclusion of application and vendor
specific endpoint properties.</p>
<p>Note that the usage of the term "endpoint" in this context is
different from the way that the term is used in speech processing,
where it refers to the end of a speech input. As used here,
"endpoint" refers to a network location which is the source or
recipient of an EMMA document.</p>
<p>In multimodal interaction, multiple devices can be used and each
device can open multiple communication endpoints at the same time.
These endpoints are used to transmit and receive data, such as raw
input, EMMA documents, etc. The EMMA element
emma:endpoint provides a generic representation of
endpoint information which is relevant to multimodal interaction.
It allows the annotation to be interoperable, and it eliminates the
need for EMMA processors to create their own specialized
annotations for existing protocols, potential protocols or yet
undefined private protocols that they may use.</p>
<p>Moreover,
emma:endpoint-info provides a container
to hold all annotations regarding the endpoint information,
including
emma:endpoint and other application and
vendor specific annotations that are related to the communication,
allowing the same communication environment to be referenced and
used in multiple interpretations.</p>
<p>Note that EMMA provides two locations (i.e.
emma:info and
emma:endpoint-info) for
specifying vendor/application specific annotations. If the
annotation is specifically related to the description of the
endpoint, then the vendor/application specific annotation SHOULD be
placed within
emma:endpoint-info, otherwise it SHOULD
be placed within
emma:info.</p>
<p>The following example illustrates the annotation of endpoint
reference properties in EMMA.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:ex="http://www.example.com/emma/port">
<emma:endpoint-info id="audio-channel-1">
<emma:endpoint id="endpoint1"
emma:endpoint-role="sink"
emma:endpoint-address="135.61.71.103"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint2"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
<emma:endpoint id="endpoint2"
emma:endpoint-role="source"
emma:endpoint-address="136.62.72.104"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint1"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
</emma:endpoint-info>
<emma:interpretation id="int1"
emma:start="1087995961542" emma:end="1087995963542"
emma:endpoint-info-ref="audio-channel-1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<destination>Chicago</destination>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
ex:app-protocol is provided by the application
or the vendor specification. It specifies that the application
layer protocol used to establish the speech transmission from the
"source" port to the "sink" port is Session Initiation Protocol
(SIP). This is specific to SIP based VoIP communication, in which
the actual media transmission and the call signaling that controls
the communication sessions, are separated and typically based on
different protocols. In the above example, the Real-time
Transmission Protocol (RTP) is used in the media transmission
between the source port and the sink port.</p>
<h2 id="s4.2">4.2 EMMA annotation attributes</h2>
<h3 id="s4.2.1">4.2.1 Tokens of input:
emma:tokens
attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:tokens</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string holding a sequence
of input tokens.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The
emma:tokens annotation holds a list of input
tokens. In the following description, the term
tokens is
used in the computational and syntactic sense of
units of
input, and not in the sense of
XML tokens. The value
held in
emma:tokens is the list of the tokens of input
as produced by the processor which generated the EMMA document;
there is no language associated with this value.</p>
<p>In the case where a grammar is used to constrain input, the
value will correspond to tokens as defined by the grammar. So for
an EMMA document produced by input to a SRGS grammar [
SRGS], the value of
emma:tokens will be
the list of words and/or phrases that are defined as tokens in SRGS
(<span>see</span> Section 2.1 <span>of [
SRGS]</span>). Items in the
emma:tokens
list are delimited by white space and/or quotation marks for
phrases containing white space. For example:</p>
<pre class="example">
emma:tokens="arriving at 'Liverpool Street'"
</pre>
<p>where the three tokens of input are
arriving,
at
and
Liverpool Street.</p>
<p>The
emma:tokens annotation MAY be applied not just
to the lexical words and phrases of language but to any level of
input processing. Other examples of tokenization include phonemes,
ink strokes, gestures and any other discrete units of input at any
level.</p>
<p>Examples:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:tokens="From Cambridge to London tomorrow"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin emma:tokens="From Cambridge">Cambridge</origin>
<destination emma:tokens="to London">London</destination>
<date emma:tokens="tomorrow">20030315</date>
</emma:interpretation>
</emma:emma>
</pre>
<h3 id="s4.2.2">4.2.2 Reference to processing:
emma:process attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:process</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI referencing the
process used to generate the interpretation.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group,
emma:sequence</td>
</tr>
</tbody>
</table>
<p>A reference to the information concerning the processing that
was used for generating an interpretation MAY be made using the
emma:process attribute. For example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mysemproc1.xml">
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
<emma:derived-from resource="#raw"/>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/mysemproc2.xml">
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
<emma:derived-from resource="#better"/>
</emma:interpretation>
</emma:emma>
</pre>
<p>The process description document, referenced by the
emma:process annotation MAY include information on the
process itself, such as grammar, type of parser, etc. EMMA is not
normative about the format of the process description document.</p>
<h3 id="s4.2.3">4.2.3 Lack of input:
emma:no-input
attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:no-input</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute holding
xsd:boolean value that is true
if there was no input.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation</td>
</tr>
</tbody>
</table>
<p>The case of lack of input MUST be annotated as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:no-input="true"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
</emma:emma>
</pre>
<p>If the
emma:interpretation is annotated with
emma:no-input="true" then the
emma:interpretation MUST be empty.</p>
<h3 id="s4.2.4">4.2.4 Uninterpreted input:
emma:uninterpreted attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:uninterpreted</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute holding
xsd:boolean value that is true
if <span>no interpretation was produced in response to the
input</span></td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation</td>
</tr>
</tbody>
</table>
<p>An
emma:interpretation element representing input
<span>for which no interpretation was produced</span> MUST be
annotated with
emma:uninterpreted="true". For
example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="interp1" emma:uninterpreted="true"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
</emma:emma>
</pre>
<p>The notation for uninterpreted input MAY refer to any possible
stage of interpretation processing, including raw transcriptions.
For instance, no interpretation would be produced for stages
performing pure signal capture such as audio recordings. Likewise,
if a spoken input was recognized but cannot be parsed by a language
understanding component, it can be tagged as
emma:uninterpreted as in the following example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="understanding"
emma:process="http://example.com/mynlu.xml"
emma:uninterpreted="true"
emma:tokens="From Cambridge to London tomorrow"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
</emma:emma>
</pre>
<p>The
emma:interpretation MUST be empty <span class=
"add">if</span> the
emma:interpretation element is
annotated with
emma:uninterpreted="true".</p>
<h3 id="s4.2.5">4.2.5 Human language of input:
emma:lang attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:lang</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:language indicating the
language for the input.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The
emma:lang annotation is used to indicate the
human language for the input that it annotates. The values of the
emma:lang attribute are language identifiers as
defined by <span>IETF Best Current Practice 47 [
BCP47]</span>. For example,
emma:lang="fr" denotes French, and
emma:lang="en-US" denotes US English.
emma:lang MAY be applied to any
emma:interpretation element. Its annotative scope
follows the annotative scope of these elements. Unlike the
xml:lang attribute in XML,
emma:lang does
not specify the language used by element contents or attribute
values.</p>
<p>The following example shows the use of
emma:lang
for annotating an input interpretation.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:lang="fr"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<answer>arretez</answer>
</emma:interpretation>
</emma:emma>
</pre>
<p>Many kinds of input including some inputs made through pen,
computer vision, and other kinds of sensors are inherently
non-linguistic. Examples include drawing areas, arrows etc. using a
pen and music input for tune recognition. If these non-linguistic
inputs are annotated with
emma:lang then they MUST be
annotated as
emma:lang="zxx". For example, pen input
where a user circles an area on map display could be represented as
follows where
emma:lang="zxx" indicates that the ink
input is not in any human language.</p>
<pre class="example">
<span><emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="pen1"
emma:medium="tactile"
emma:mode="ink"
emma:lang="zxx">
<location>
<type>area</type>
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
</location>
</emma:interpretation>
</emma:emma></span>
</pre>
<p>If inputs for which there is no information about whether the
source input is in a particular human language, and if so which
language, are annotated with
emma:lang, then they MUST
be annotated as
emma:lang="". Furthermore, in cases
where there is not explicit
emma:lang annotation, and
none is inherited from a higher element in the document, the
default value for
emma:lang is
"" meaning
that there is no information about whether the source input is in a
language and if so which language.</p>
<p>The
xml:lang and
emma:lang attributes
serve uniquely different and equally important purposes. The role
of the
xml:lang attribute in XML 1.0 is to indicate
the language used for character data content in an XML element or
document. In contrast, the
emma:lang attribute is used
to indicate the language employed by a user when entering an input.
Critically,
emma:lang annotates the language of the
signal originating from the user rather than the specific tokens
used at a particular stage of processing. This is most clearly
illustrated through consideration of an example involving multiple
stages of processing of a user input. Consider the following
scenario: EMMA is being used to represent three stages in the
processing of a spoken input to an system for ordering products.
The user input is in Italian, after speech recognition, the user
input is first translated into English, then a natural language
understanding system converts the English translation into a
product ID (which is not in any particular language). Since the
input signal is a user speaking Italian, the
emma:lang
will be
emma:lang="it" on all of these three stages of
processing. The
xml:lang attribute, in contrast, will
initially be
"it", after translation the
xml:lang will be
"en-US", and after
language understanding it will be
"zxx" since the
product ID is non-linguistic content. The following are examples of
EMMA documents corresponding to these three processing stages,
abbreviated to show the critical attributes for discussion here.
Note that
<transcription>,
<translation>, and
<understanding> are application namespace
attributes, not part of the EMMA markup.<br /></p>
<pre class="example">
<span><emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"><br />
<transcription xml:lang="it">condizionatore</transcription><br />
</emma:interpretation>
</emma:emma>
</span>
</pre>
<pre class="example">
<span><emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
<translation xml:lang="en-US">air conditioner</translation><br />
</emma:interpretation>
</emma:emma></span>
</pre>
<pre class="example">
<span><emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"> <br />
<understanding xml:lang="zxx">id1456</understanding><br />
</emma:interpretation>
</emma:emma></span>
</pre>
<p>In order <span>to</span> handle inputs involving multiple
languages, such as through code switching, the
emma:lang tag MAY contain several language identifiers
separated by spaces.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:tokens="please stop arretez s'il vous plait"
emma:lang="en fr"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<command> CANCEL </command>
</emma:interpretation>
</emma:emma>
</pre>
<h3 id="s4.2.6">4.2.6 Reference to signal:
emma:signal
<span>and
emma:signal-size</span> attributes</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:signal</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI referencing the
input signal.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group,
emma:sequence,
<span>and</span> application instance data.</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:signal-size</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute <span>of type
xsd:nonNegativeInteger
specifying</span> the size in eight bit octets of the referenced
source.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group,
emma:sequence,
<span>and</span> application instance data.</td>
</tr>
</tbody>
</table>
<p>A URI reference to the signal that originated the input
recognition process MAY be represented in EMMA using the
emma:signal annotation.</p>
<p>Here is an example where the reference to a speech signal is
represented using the
emma:signal annotation on the
emma:interpretation element:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="intp1"
emma:signal="http://example.com/signals/sg23.bin"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:signal-size annotation can be used to
declare the exact size of the associated signal in 8-bit octets. An
example of the use of an EMMA document to represent a recording,
with
emma:signal-size indicating the size is as
follows:</p>
<pre class="example">
<span>
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="intp1"
emma:medium="acoustic"
emma:mode="voice"
emma:function="recording"
emma:uninterpreted="true"
emma:signal="http://example.com/signals/recording.mpg"
emma:signal-size="82102"
emma:duration="10000">
</emma:interpretation>
</emma:emma>
</span>
</pre>
<h3 id="s4.2.7">4.2.7 Media type:
emma:media-type
attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:media-type</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string holding the MIME
type associated with the signal's data format.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group,
emma:sequence,
emma:endpoint, <span>and</span> application instance
data.</td>
</tr>
</tbody>
</table>
<p>The data format of the signal that originated the input MAY be
represented in EMMA using the
emma:media-type
annotation. An initial set of MIME media types is defined by
[
RFC2046].</p>
<p>Here is an example where the media type for the ETSI ES 202 212
audio codec for Distributed Speech Recognition (DSR) is applied to
the
emma:interpretation element. The example also
specifies an optional sampling rate of 8 kHz and maxptime of 40
milliseconds.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="intp1"<span>
emma:signal="http://example.com/signals/signal.dsr"</span>
emma:media-type="audio/dsr-<span>es</span>202212; rate:8000; maxptime:40"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<h3 id="s4.2.8">4.2.8 Confidence scores:
emma:confidence attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:confidence</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:decimal in range 0.0 to
1.0, indicating the processor's confidence in the result.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The confidence score in EMMA is used to indicate the quality of
the input, and if confidence is annotated on an input it MUST be
given as the value of
emma:confidence. The confidence
score MUST be a number in the range from 0.0 to 1.0 inclusive. A
value of 0.0 indicates minimum confidence, and a value of 1.0
indicates maximum confidence. Note that
emma:confidence represents not only the confidence of
the speech recognizer, but rather the confidence of the whatever
processor was responsible for creating the EMMA result, based on
whatever evidence it has. For a natural language interpretation,
for example, this might include semantic heuristics in addition to
speech recognition scores. Moreover, the confidence score values do
not have to be interpreted as probabilities. In fact confidence
score values are platform-dependent, since their computation is
likely to differ between platforms and different EMMA processors.
Confidence scores are annotated explicitly in EMMA in order to
provide this information to the subsequent processes for multimodal
interaction. The example below illustrates how confidence scores
are annotated in EMMA.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="meaning1" emma:confidence="0.6">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="meaning2" emma:confidence="0.4">
<location> Austin </location>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>In addition to its use as an attribute on the EMMA
interpretation and container elements, the
emma:confidence attribute MAY also be used to assign
confidences to elements in instance data in the application
namespace. This can be seen in the following example, where the
<destination> and
<origin>
elements have confidences.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="meaning1" emma:confidence="0.6"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<destination emma:confidence="0.8"> Boston</destination>
<origin emma:confidence="0.6"> Austin </origin>
</emma:interpretation>
</emma:emma>
</pre>
<p>Although in general instance data can be represented in XML
using a combination of elements and attributes in the application
namespace, EMMA does not provide a standard way to annotate
processors' confidences in attributes. Consequently, instance data
that is expected to be assigned confidences SHOULD be represented
using elements, as in the above example.</p>
<h3 id="s4.2.9">4.2.9 Input source:
emma:source
attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:source</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI referencing the
source of input.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:one-of,
emma:group ,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The source of an interpreted input MAY be represented in EMMA as
a URI resource using the
emma:source annotation.</p>
<p>Here is an example that shows different input sources for
different input interpretations.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:myapp="http://www.example.com/myapp">
<emma:one-of id="nbest1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="intp1"
emma:source="http://example.com/microphone/NC-61">
<myapp:destination>Boston</myapp:destination>
</emma:interpretation>
<emma:interpretation id="intp2"
emma:source="http://example.com/microphone/NC-4024">
<myapp:destination>Austin</myapp:destination>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<h3 id="s4.2.10">4.2.10 Timestamps</h3>
<p>The start and end times for input MAY be indicated using either
absolute timestamps or relative timestamps. Both are in
milliseconds for ease in processing timestamps. Note that the
ECMAScript Date object's
getTime() function is a
convenient way to determine the absolute time.</p>
<h4 id="s4.2.10.1">4.2.10.1 Absolute timestamps:
emma:start,
emma:end attributes</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:start, emma:end</th>
</tr>
<tr>
<th>Definition</th>
<td>Attributes <span>of type
xsd:nonNegativeInteger</span> indicating the absolute
starting and ending times of an input in terms of the number of
milliseconds since 1 January 1970 00:00:00 GMT</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:arc, <span>and</span> application instance
data</td>
</tr>
</tbody>
</table>
<p>Here is an example of a timestamp for an absolute time.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1"
emma:start="1087995961542"
emma:end="1087995963542"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<destination>Chicago</destination>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:start and
emma:end
annotations on an input MAY be identical, however the
emma:end value MUST NOT be less than the
emma:start value.</p>
<h4 id="s4.2.10.2">4.2.10.2 Relative timestamps:
emma:time-ref-uri,
emma:time-ref-anchor-point,
emma:offset-to-start attributes</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:time-ref-uri</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute of type
xsd:anyURI indicating the URI
used to anchor the relative timestamp.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:lattice, <span>and</span> application instance
data</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:time-ref-anchor-point</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute with a value of
start or
end, defaulting to
start. It indicates
whether to measure the time from the start or end of the interval
designated with
emma:time-ref-uri.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:lattice, <span>and</span> application instance
data</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:offset-to-start</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute <span>of type
xsd:integer</span>,
defaulting to zero. It specifies the offset in milliseconds for the
start of input from the anchor point designated with
<span>
emma:time-ref-uri</span> and
<span>
emma:time-ref-anchor-point</span></td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:arc, <span>and</span> application instance
data</td>
</tr>
</tbody>
</table>
<p>Relative timestamps define the start of an input relative to the
start or end of a reference interval such as another input.</p>
<p><img alt="relative timestamps" src=
"relativetimestamps.png" /></p>
<p>The reference interval is designated with
emma:time-ref-uri attribute. This MAY be combined with
emma:time-ref-anchor-point attribute to specify
whether the anchor point is the start or end of this interval. The
start of an input relative to this anchor point is then specified
with
emma:offset-to-start attribute.</p>
<p>Here is an example where the referenced input is in the same
document:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:sequence>
<emma:interpretation id="int1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Denver</origin>
</emma:interpretation>
<emma:interpretation id="int2"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>
emma:time-ref-uri="#int1"
emma:time-ref-anchor-point="start"
emma:offset-to-start="5000">
<destination>Chicago</destination>
</emma:interpretation>
</emma:sequence>
</emma:emma>
</pre>
<p>Note that the reference point refers to an input, but not
necessarily to a complete input. For example, if a speech
recognizer timestamps each word in an utterance, the anchor point
might refer to the timestamp for just one word.</p>
<p>The absolute and relative timestamps are not mutually exclusive;
that is, it is possible to have both relative and absolute
timestamp attributes on the same EMMA container element.</p>
<p>Timestamps of inputs collected by different devices will be
subject to variation if the times maintained by the devices are not
synchronized. This concern is outside of the scope of the EMMA
specification.</p>
<h4 id="s4.2.10.3">4.2.10.3 Duration of input:
emma:duration attribute</h4>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:duration</th>
</tr>
<tr>
<th>Definition</th>
<td>Attribute <span>of type
xsd:nonNegativeInteger</span>, defaulting to zero. It
specifies the duration of the input in milliseconds.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:arc, <span>and</span> application instance
data</td>
</tr>
</tbody>
</table>
<p>The duration of an input in milliseconds MAY be specified with
the
emma:duration attribute. The
emma:duration attribute MAY be used either in
combination with timestamps or independently, for example in the
annotation of speech corpora.</p>
<p>In the following example, the duration of the signal that gave
rise to the interpretation is indicated using
emma:duration.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:duration="2300"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<origin>Denver</origin>
</emma:interpretation>
</emma:emma>
</pre>
<h4 id="s4.2.10.4">4.2.10.4 Composite Input and Relative
Timestamps</h4>
<p>This section is informative.</p>
<p>The following table provides guidance on how to determine the
values of relative timestamps on a composite input.</p>
<div>
<table summary="3 columns" border="1" cellpadding="3" cellspacing=
"0">
<caption>Informative Guidance on Relative Timestamps in Composite
Derivations</caption>
<tbody>
<tr>
<td>
emma:time-ref-uri</td>
<td>If the reference interval URI is the same for both inputs then
it should be the same for the composite input. If it is not the
same then relative timestamps will have to be resolved to absolute
timestamps in order to determine the combined timestamp. .</td>
</tr>
<tr>
<td>
emma:time-ref-anchor-point</td>
<td>If the anchor value is the same for both inputs then it should
be the same for the composite input. If it is not the same then
relative timestamps will have to be resolved to absolute timestamps
in order to determine the combined timestamp.</td>
</tr>
<tr>
<td>
emma:offset-to-start</td>
<td>Given that the
emma:time-ref-uri and
emma:time-ref-anchor-point are the same for both
combining inputs, then the
emma:offset-to-start for
the combination should be the lesser of the two. If they are not
the same then relative timestamps will have to be resolved to
absolute timestamps in order to determine the combined
timestamp.</td>
</tr>
<tr>
<td>
emma:duration</td>
<td>Given that the
emma:time-ref-uri and
emma:time-ref-anchor-point are the same for both
combining inputs, then the
emma:duration is calculated
as follows. Add together the
emma:offset-to-start and
emma:duration for each of the inputs. Take whichever
of these is greater and subtract from it the lesser of the
emma:offset-to-start values in order to determine the
combined duration. If
emma:time-ref-uri and
emma:time-ref-anchor-point are not the same then
relative timestamps will have to be resolved to absolute timestamps
in order to determine the combined timestamp.</td>
</tr>
</tbody>
</table>
</div>
<h3 id="s4.2.11">4.2.11 Medium, mode, and function of user inputs:
emma:medium,
emma:mode,
emma:function,
emma:verbal
attributes</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:medium</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type <span>
xsd:nmtokens</span>
<span>which contains a space delimited set of values from the
set</span> {
acoustic,
tactile,
visual}.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:endpoint, and application instance data</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:mode</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type <span>
xsd:nmtokens</span>
<span>which contains a space delimited set of values from</span> an
open set of values including: {<span>
voice,
dtmf</span>,
ink,
gui,
keys,
video,
photograph,
...}.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:endpoint, and application instance data</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:function</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string constrained to
values in the open set {
recording,
transcription,
dialog,
verification, ...}.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:verbal</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:boolean.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data</td>
</tr>
</tbody>
</table>
<p>EMMA provides two properties for the annotation of input
modality. One indicating the broader medium or channel
(
emma:medium) and another indicating the specific mode
of communication used on that channel (
emma:mode). The
input medium is defined from the users perspective and indicates
whether they use their voice (
acoustic), touch
(
tactile), or visual appearance/motion
(
visual) as input. Tactile includes most
hand-on input device types such as pen, mouse, keyboard, and
touch screen. Visual is used for camera input.</p>
<pre class="example">
emma:medium = <span>space delimited sequence of values from the set: </span>
[acoustic|tactile|visual]
</pre>
<p>The mode property provides the ability to distinguish between
different modes of communication that may be within a particular
medium. For example, in the tactile medium, modes include
electronic ink (
ink), and pointing and clicking on a
graphical user interface (
gui).</p>
<pre class="example">
emma:mode = <span>space delimited sequence of values from the set: </span>
[<span>voice|dtmf</span>|ink|gui|keys|video|photograph| ... ]
</pre>
<p>The
emma:medium classification is based on the
boundary between the user and the device that they use. For
emma:medium="tactile" the user physically touches the
device in order to provide input. For
emma:medium="visual" the user's movement is captured
by sensors (cameras, infrared) resulting in an input to the system.
In the case where
emma:medium="acoustic" the user
provides input to the system by producing an acoustic signal. Note
then that DTMF input will be classified as
emma:medium="tactile" since in order to provide DTMF
input the user physically presses keys on a keypad.</p>
<p>While
emma:medium and
emma:mode are
optional on specific elements such as
emma:interpretation and
emma:one-of, note
that all EMMA interpretations must be annotated for
emma:medium and
emma:mode, so either
these attributes must appear directly on
emma:interpretation or they must appear on an ancestor
emma:one-of node or they must appear on an earlier
stage of the derivation listed in
emma:derivation.</p>
<p>Orthogonal to the mode, user inputs can also be classified with
respect to their communicative function. This enables a simpler
mode classification.</p>
<pre class="example">
emma:function = [recording|transcription|dialog|verification| ... ]
</pre>
<p>For example, speech can be used for recording (e.g. voicemail),
transcription (e.g. dictation), dialog (e.g. interactive spoken
dialog systems), and verification (e.g. identifying users through
their voiceprints).</p>
<p>EMMA also supports an additional property
emma:verbal which distinguishes verbal use of an input
mode from non-verbal. This MAY be used to distinguish the use of
electronic ink to convey handwritten commands from the user of
electronic ink for symbolic gestures such as circles and arrows.
Handwritten commands, such as writing
downtown in order to
change a map display to show the downtown are classified as verbal
(
emma:function="dialog" emma:verbal="true"). Pen
gestures (arrows, lines, circles, etc), such as circling a
building, are classified as non-verbal dialog
(
emma:function="dialog" emma:verbal="false"). The use
of handwritten words to transcribe an email message is classified
as transcription (
emma:function="transcription"
emma:verbal="true").</p>
<pre class="example">
emma:verbal = [true|false]
</pre>
<p>Handwritten words and ink gestures are typically recognized
using different kinds of recognition components (handwriting
recognizer vs. gesture recognizer) and the verbal annotation will
be added by the recognition component which classifies the input.
The original input source, a pen in this case, will not be aware of
this difference. The input source identifier will tell you that the
input was from a pen of some kind but will not tell you if the mode
of input was handwriting (
show downtown) or gesture (e.g.
circling an object or area).</p>
<p>Here is an example of the EMMA annotation for a pen input where
the user's ink is recognized as either a word ("Boston") or as an
arrow:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of id="nbest1">
<emma:interpretation id="interp1"
emma:confidence="0.6"
emma:medium="tactile"
emma:mode="ink"
emma:function="dialog"
emma:verbal="true">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="interp2"
emma:confidence="0.4"
emma:medium="tactile"
emma:mode="ink"
emma:function="dialog"
emma:verbal="false">
<direction>45</direction>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>Here is an example of the EMMA annotation for a spoken command
which is recognized as either "Boston" or "Austin":</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of>
<emma:interpretation id="interp1"
emma:confidence="0.6"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="interp2"
emma:confidence="0.4"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true">
<location>Austin</location>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<p>The following table shows the relationship between the medium,
mode, and function properties and serves as an aid for classifying
inputs. For the dialog function it also shows some examples of the
classification of inputs as verbal vs. non-verbal.</p>
<table class="modes" summary="7 columns" border="1" cellpadding="3"
cellspacing="0">
<tbody>
<tr>
<th rowspan="2">Medium</th>
<th rowspan="2">Device</th>
<th rowspan="2">Mode</th>
<th colspan="4">Function</th>
</tr>
<tr>
<th>recording</th>
<th>dialog</th>
<th>transcription</th>
<th>verification</th>
</tr>
<tr>
<td rowspan="2">acoustic</td>
<td rowspan="2">microphone</td>
<td rowspan="2">voice</td>
<td rowspan="2">audiofile (e.g. voicemail)</td>
<td>spoken command / query / response (verbal = true)</td>
<td rowspan="2">dictation</td>
<td rowspan="2">speaker recognition</td>
</tr>
<tr>
<td>singing a note (verbal = false)</td>
</tr>
<tr>
<td rowspan="14">tactile</td>
<td rowspan="2">keypad</td>
<td rowspan="2">dtmf</td>
<td rowspan="2">audiofile / character stream</td>
<td>typed command / query / response (verbal = true)</td>
<td rowspan="2">text entry (T9-tegic, word completion, or word
grammar)</td>
<td rowspan="2">password / pin entry</td>
</tr>
<tr>
<td>command key "Press 9 for sales" (verbal = false)</td>
</tr>
<tr>
<td rowspan="2">keyboard</td>
<td rowspan="2">dtmf</td>
<td rowspan="2">character / key-code stream</td>
<td>typed command / query / response (verbal = true)</td>
<td rowspan="2">typing</td>
<td rowspan="2">password / pin entry</td>
</tr>
<tr>
<td>command key "Press S for sales" (verbal = false)</td>
</tr>
<tr>
<td rowspan="4">pen</td>
<td rowspan="2">ink</td>
<td rowspan="2">trace, sketch</td>
<td>handwritten command / query / response (verbal = true)</td>
<td rowspan="2">handwritten text entry</td>
<td rowspan="2">signature, handwriting recognition</td>
</tr>
<tr>
<td>gesture (e.g. circling building) (verbal = false)</td>
</tr>
<tr>
<td rowspan="2">gui</td>
<td rowspan="2">N/A</td>
<td>tapping on named button (verbal = true)</td>
<td rowspan="2">soft keyboard</td>
<td rowspan="2">password / pin entry</td>
</tr>
<tr>
<td>drag and drop, tapping on map (verbal = false)</td>
</tr>
<tr>
<td rowspan="4">mouse</td>
<td rowspan="2">ink</td>
<td rowspan="2">trace, sketch</td>
<td>handwritten command / query / response (verbal = true)</td>
<td rowspan="2">handwritten text entry</td>
<td rowspan="2">N/A</td>
</tr>
<tr>
<td>gesture (e.g. circling building) (verbal = false)</td>
</tr>
<tr>
<td rowspan="2">gui</td>
<td rowspan="2">N/A</td>
<td>clicking named button (verbal = true)</td>
<td rowspan="2">soft keyboard</td>
<td rowspan="2">password / pin entry</td>
</tr>
<tr>
<td>drag and drop, clicking on map (verbal = false)</td>
</tr>
<tr>
<td rowspan="2">joystick</td>
<td>ink</td>
<td>trace,sketch</td>
<td>gesture (e.g. circling building) (verbal = false)</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>gui</td>
<td>N/A</td>
<td>pointing, clicking button / menu (verbal = false)</td>
<td>soft keyboard</td>
<td>password / pin entry</td>
</tr>
<tr>
<td rowspan="5">visual</td>
<td rowspan="2">page scanner</td>
<td rowspan="2">photograph</td>
<td rowspan="2">image</td>
<td>handwritten command / query / response (verbal = true)</td>
<td rowspan="2">optical character recognition, object/scene
recognition (markup, e.g. SVG)</td>
<td rowspan="2">N/A</td>
</tr>
<tr>
<td>drawings and images (verbal = false)</td>
</tr>
<tr>
<td>still camera</td>
<td>photograph</td>
<td>image</td>
<td>objects (verbal = false)</td>
<td>visual object/scene recognition</td>
<td>face id, retinal scan</td>
</tr>
<tr>
<td rowspan="2">video camera</td>
<td rowspan="2">video</td>
<td rowspan="2">movie</td>
<td>sign language (verbal = true)</td>
<td rowspan="2">audio/visual recognition</td>
<td rowspan="2">face id, gait id, retinal scan</td>
</tr>
<tr>
<td>face / hand / arm / body gesture (e.g. pointing, facing)
(verbal = false)</td>
</tr>
</tbody>
</table>
<h3 id="s4.2.12">4.2.12 Composite multimodality:
emma:hook attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:hook</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string constrained to
values in the open set {
voice,
dtmf,
ink,
gui,
keys,
video,
photograph, ...} or the wildcard
any</td>
</tr>
<tr>
<th>Applies to</th>
<td>Application instance data</td>
</tr>
</tbody>
</table>
<p>The attribute
emma:hook MAY be used to mark the
elements in the application semantics within an
emma:interpretation which are expected to be
integrated with content from input in another mode to yield a
complete interpretation. The
emma:mode to be
integrated at that point in the application semantics is indicated
as the value of the
emma:hook attribute. The possible
values of
emma:hook are the list of input modes that
can be values of
emma:mode <span>(see
Section 4.2.11)</span>. In addition to these, the
value of
emma:hook can also be the wildcard
any indicating that the other content can come from
any source. The annotation
emma:hook differs in
semantics from
emma:mode as follows. Annotating an
element in the application semantics with
emma:mode="ink" indicates that that part of the
semantics came from the
ink mode. Annotating an
element in the application semantics with
emma:hook="ink" indicates that part of the semantics
needs to be integrated with content from the
ink
mode.</p>
<p>To illustrate the use of
emma:hook consider an
example composite input in which the user says "zoom in here" in
the speech input mode while drawing an area on a graphical display
in the ink input mode. <span>The fact that the
location element needs to come from the
ink mode is indicated by annotating this application
namespace element using
emma:hook</span></p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation <span>emma:medium="acoustic"</span> emma:mode="voice">
<command>
<action>zoom</action>
<location emma:hook="ink">
<type>area</type>
</location>
</command>
</emma:interpretation>
</emma:emma>
</pre>
<p>For more detailed explanation of this example see
Appendix C.</p>
<h3 id="s4.2.13">4.2.13 Cost:
emma:cost attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:cost</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:decimal in range 0.0 to
10000000, indicating the processor's cost or weight associated with
an input or part of an input.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence,
emma:arc,
emma:node, and application
instance data.</td>
</tr>
</tbody>
</table>
<p>The cost annotation in EMMA indicates the weight or cost
associated with an user's input or part of their input. The most
common use of
emma:cost is for representing the costs
encoded on a lattice output from speech recognition or other
recognition or understanding processes.
emma:cost MAY
also be used to indicate the total cost associated with particular
recognition results or semantic interpretations.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:one-of <span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="meaning1" emma:cost="1600">
<location>Boston</location>
</emma:interpretation>
<emma:interpretation id="meaning2" emma:cost="400">
<location> Austin </location>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<h3 id="s4.2.14">4.2.14 Endpoint properties:
emma:endpoint-role,
emma:endpoint-address,
emma:port-type,
emma:port-num,
emma:message-id,
emma:service-name,
emma:endpoint-pair-ref,
emma:endpoint-info-ref
attributes</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:endpoint-role</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string constrained to
values in the set {
source,
sink,
reply-to,
router}.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:endpoint-address</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI that uniquely
specifies the network address of the
emma:endpoint.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:port-type</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:QName that specifies the
type of the port.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:port-num</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:nonNegativeInteger that
specifies the port number.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:message-id</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI that specifies the
message ID associated with the data.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:service-name</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string that specifies the
name of the service.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:endpoint-pair-ref</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:anyURI that specifies the
pairing between sink and source endpoints.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:endpoint</td>
</tr>
<tr>
<th>Annotation</th>
<th>emma:endpoint-info-ref</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:IDREF referring to the
id attribute of an
emma:endpoint-info
element.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The
emma:endpoint-role attribute specifies the role
that the particular
emma:endpoint performs in
multimodal interaction. The role value
sink indicates
that the particular endpoint is the receiver of the input data. The
role value
source indicates that the particular
endpoint is the sender of the input data. The role value
reply-to indicates that the particular
emma:endpoint is the intended endpoint for the reply.
The same
emma:endpoint-address MAY appear in multiple
emma:endpoint elements, provided that the same
endpoint address is used to serve multiple roles, e.g. sink,
source, reply-to, router, etc., or associated with multiple
interpretations.</p>
<p>The
emma:endpoint-address specifies the network
address of the
emma:endpoint, and
emma:port-type specifies the port type of the
emma:endpoint. The
emma:port-num
annotates the port number of the endpoint (e.g. the typical port
number for an http endpoint is 80). The
emma:message-id annotates the message ID information
associated with the annotated input. This meta information is used
to establish and maintain the communication context for both
inbound processing and outbound operation. The service
specification of the
emma:endpoint is annotated by
emma:service-name which contains the definition of the
service that the
emma:endpoint performs. The matching
of the
sink endpoint and its pairing
source endpoint is annotated by the
emma:endpoint-pair-ref attribute. One sink endpoint
MAY link to multiple source endpoints through
emma:endpoint-pair-ref. Further bounding of the
emma:endpoint is possible by using the annotation of
emma:group (see
Section
3.3.2).</p>
<p>The
emma:endpoint-info-ref attribute associates the
EMMA result in the container element with an
emma:endpoint-info element.</p>
<p>The following example illustrates the use of these attributes in
multimodal interactions where multiple modalities are used.</p>
<pre>
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example"
xmlns:ex="http://www.example.com/emma/port">
<emma:endpoint-info id="audio-channel-1" >
<emma:endpoint id="endpoint1"
emma:endpoint-role="sink"
emma:endpoint-address="135.61.71.103"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint2"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
<emma:endpoint id="endpoint2" emma:endpoint-role="source"
emma:endpoint-address="136.62.72.104"
emma:port-num="50204"
emma:port-type="rtp"
emma:endpoint-pair-ref="endpoint1"
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
emma:service-name="travel"
emma:mode="voice">
<ex:app-protocol>SIP</ex:app-protocol>
</emma:endpoint>
</emma:endpoint-info>
<emma:endpoint-info id="ink-channel-1">
<emma:endpoint id="endpoint3" emma:endpoint-role="sink"
emma:endpoint-address="http://emma.example/sink"
emma:endpoint-pair-ref="endpoint4"
emma:port-num="80" emma:port-type="http"
emma:message-id="uuid:2e5678"
emma:service-name="travel"
emma:mode="ink"/>
<emma:endpoint id="endpoint4"
emma:endpoint-role="source"
emma:port-address="http://emma.example/source"
emma:endpoint-pair-ref="endpoint3"
emma:port-num="80"
emma:port-type="http"
emma:message-id="uuid:2e5678"
emma:service-name="travel"
emma:mode="ink"/>
</emma:endpoint-info>
<emma:group>
<emma:interpretation id="int1" emma:start="1087995961542"
emma:end="1087995963542"
emma:endpoint-info-ref="audio-channel-1"<br />
emma:medium="acoustic" emma:mode="voice">
<destination>Chicago</destination>
</emma:interpretation>
<emma:interpretation id="int2" emma:start="1087995961542"
emma:end="1087995963542"
emma:endpoint-info-ref="ink-channel-1"<br />
emma:medium="acoustic" emma:mode="voice">
<location>
<type>area</type>
<points>34.13 -37.12 42.13 -37.12 ... </points>
</location>
</emma:interpretation>
</emma:group>
</emma:emma>
</pre>
<h3 id="s4.2.15">4.2.15 Reference to
emma:grammar
element:
emma:grammar-ref attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:grammar-ref</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:IDREF referring to the
id attribute of an
emma:grammar
element<span>.</span></td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence.</td>
</tr>
</tbody>
</table>
<p>The
emma:grammar-ref annotation associates the EMMA
result in the container element with an
emma:grammar
element.</p>
<p>Example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:grammar id="gram1" <span>ref</span>="someURI"/>
<emma:grammar id="gram2" <span>ref</span>="anotherURI"/>
<emma:one-of id="r1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1" emma:grammar-ref="gram1">
<origin>Boston</origin>
</emma:interpretation>
<emma:interpretation id="int2" emma:grammar-ref="gram1">
<origin>Austin</origin>
</emma:interpretation>
<emma:interpretation id="int3" emma:grammar-ref="gram2">
<command>help</command>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<h3 id="s4.2.16">4.2.16 Reference to
emma:model
element:
emma:model-ref attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:model-ref</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:IDREF referring to the
id attribute of an
emma:model
element<span>.</span></td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of,
emma:sequence, and
application instance data.</td>
</tr>
</tbody>
</table>
<p>The
emma:model-ref annotation associates the EMMA
result in the container element with an
emma:model
element.</p>
<p>Example:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:model id="model1" ref="someURI"/>
<emma:model id="model2" ref="anotherURI"/>
<emma:one-of id="r1"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<emma:interpretation id="int1" emma:model-ref="model1">
<origin>Boston</origin>
</emma:interpretation>
<emma:interpretation id="int2" emma:model-ref="model1">
<origin>Austin</origin>
</emma:interpretation>
<emma:interpretation id="int3" emma:model-ref="model2">
<command>help</command>
</emma:interpretation>
</emma:one-of>
</emma:emma>
</pre>
<h3 id="s4.2.17">4.2.17 Dialog turns:
emma:dialog-turn
attribute</h3>
<table class="defn" summary="property definition" width="98%"
cellpadding="5" cellspacing="0">
<tbody>
<tr>
<th>Annotation</th>
<th>emma:dialog-turn</th>
</tr>
<tr>
<th>Definition</th>
<td>An attribute of type
xsd:string referring to the
dialog turn associated with a given container element.</td>
</tr>
<tr>
<th>Applies to</th>
<td>
emma:interpretation,
emma:group,
emma:one-of, and
emma:sequence.</td>
</tr>
</tbody>
</table>
<p>The
emma:dialog-turn annotation associates the EMMA
result in the container element with a dialog turn. The syntax and
semantics of dialog turns is left open to suit the needs of
individual applications. For example, some applications might use
an integer value, where successive turns are represented by
successive integers. Other applications might combine a name of a
dialog participant with an integer value representing the turn
number for that participant. Ordering semantics for comparison of
emma:dialog-turn is deliberately unspecified and left
for applications to define.</p>
<p>Example:</p>
<pre class="example">
<span>
<emma:emma version="1.0"
emma="http://www.w3.org/2003/04/emma"
xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="int1" emma:dialog-turn="u8"<br />
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<quantity>3</quantity>
</emma:interpretation>
</emma:emma></span>
</pre>
<h2 class="notoc" id="s4.3">4.3 Scope of EMMA annotations</h2>
<p>The
emma:derived-from element (
Section 4.1.2) can be used to capture both sequential
and composite derivations. This section concerns the scope of EMMA
annotations across <span>sequential</span> derivations of user
input connected using the
emma:derived-from element
(
Section 4.1.2). Sequential derivations
involve processing steps that do not involve multimodal
integration, such as applying natural language understanding and
then reference resolution to a speech transcription. EMMA
derivations describe only single turns of user input and are not
intended to describe a sequence of dialog turns.</p>
<p>For example, an EMMA document could contain
emma:interpretation elements for the transcription,
interpretation, and reference resolution of a speech input,
utilizing the
id values:
raw,
better, and
best respectively:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:process="http://example.com/myasr1.xml"
<span>emma:medium="acoustic" emma:mode="voice"</span>>
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mynlu1.xml">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/myrefresolution1.xml">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>Each member of the derivation chain is linked to the previous
one by a
derived-from element (
Section 4.1.2), which has an attribute
resource that provides a pointer to the
emma:interpretation from which it is derived. The
emma:process annotation (
Section
4.2.2) provides a pointer to the process used for each stage of
the derivation.</p>
<p>The following EMMA example represents the same derivation as
above but with a more fully specified set of annotations:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:process="http://example.com/myasr1.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.6"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:tokens="from boston to denver tomorrow"
emma:lang="en-US">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mynlu1.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.8"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:tokens="from boston to denver tomorrow"
emma:lang="en-US">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/myrefresolution1.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.8"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:tokens="from boston to denver tomorrow"
emma:lang="en-US">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>EMMA annotations on earlier stages of the derivation often
remain accurate at later stages of the derivation. Although this
can be captured in EMMA by repeating the annotations on each
emma:interpretation within the derivation, as in the
example above, there are two disadvantages of this approach to
annotation. First, the repetition of annotations makes the
resulting EMMA documents significantly more verbose. Second, EMMA
processors used for intermediate tasks such as natural language
understanding and reference resolution will need to read in all of
the annotations and write them all out again.</p>
<p>EMMA overcomes these problems by assuming that annotations on
earlier stages of a derivation automatically apply to later stages
of the derivation unless a new value is specified. Later stages of
the derivation essentially inherit annotations from earlier stages
in the derivation. For example, if there was an
emma:source annotation on the transcription
(
raw) it would also apply to the later stages of the
derivation such as the result of natural language understanding
(
better) or reference resolution
(
best).</p>
<p>Because of the assumption in EMMA that annotations have scope
over later stages of a sequential derivation, the example EMMA
document above can be equivalently represented as follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:derivation>
<emma:interpretation id="raw"
emma:process="http://example.com/myasr1.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.6"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:tokens="from boston to denver tomorrow"
emma:lang="en-US">
<answer>From Boston to Denver tomorrow</answer>
</emma:interpretation>
<emma:interpretation id="better"
emma:process="http://example.com/mynlu1.xml"
emma:confidence="0.8">
<emma:derived-from resource="#raw" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>tomorrow</date>
</emma:interpretation>
</emma:derivation>
<emma:interpretation id="best"
emma:process="http://example.com/myrefresolution1.xml">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>The fully specified derivation illustrated above is equivalent
to the reduced form derivation following it where only annotations
with new values are specified at each stage. These two EMMA
documents MUST yield the same result when processed by an EMMA
processor.</p>
<p>The
emma:confidence annotation is respecified on
the
better interpretation. This indicates the
confidence score for natural language understanding, whereas
emma:confidence on the
raw interpretation
indicates the speech recognition confidence score.</p>
<p>In order to determine the full set of annotations that apply to
an
emma:interpretation element an EMMA processor or
script needs to access the annotations directly on that element and
for any that are not specified follow the reference in the
resource attribute of the
emma:derived-from element to add in annotations from
earlier stages of the derivation.</p>
<p>The EMMA annotations break down into three groups with respect
to their scope in sequential derivations. One group of annotations
always hold<span>s</span> true for all members of a sequential
derivation. A second group <span>is</span> always respecified on
each stage of the derivation. A third group may or may not be
respecified.</p>
<table summary="7 columns" border="1" cellpadding="3" cellspacing=
"0">
<caption>Scope of Annotations in Sequential Derivations</caption>
<tbody>
<tr>
<th>Classification</th>
<th>Annotation</th>
</tr>
<tr>
<td rowspan="16">Applies to whole derivation</td>
<td>
emma:signal</td>
</tr>
<tr>
<td>
<span>emma:signal-size</span></td>
</tr>
<tr>
<td>
<span>emma:dialog-turn</span></td>
</tr>
<tr>
<td>
emma:source</td>
</tr>
<tr>
<td>
emma:medium</td>
</tr>
<tr>
<td>
emma:mode</td>
</tr>
<tr>
<td>
emma:function</td>
</tr>
<tr>
<td>
emma:verbal</td>
</tr>
<tr>
<td>
emma:lang</td>
</tr>
<tr>
<td>
emma:tokens</td>
</tr>
<tr>
<td>
emma:start</td>
</tr>
<tr>
<td>
emma:end</td>
</tr>
<tr>
<td>
emma:time-ref-uri</td>
</tr>
<tr>
<td>
emma:time-ref-anchor-point</td>
</tr>
<tr>
<td>
emma:offset-to-start</td>
</tr>
<tr>
<td>
emma:duration</td>
</tr>
<tr>
<td rowspan="2">Specified at each stage of derivation</td>
<td>
emma:derived-from</td>
</tr>
<tr>
<td>
emma:process</td>
</tr>
<tr>
<td rowspan="6">May be respecified</td>
<td>
emma:confidence</td>
</tr>
<tr>
<td>
emma:cost</td>
</tr>
<tr>
<td>
emma:grammar-ref</td>
</tr>
<tr>
<td>
emma:model-ref</td>
</tr>
<tr>
<td>
emma:no-input</td>
</tr>
<tr>
<td>
emma:uninterpreted</td>
</tr>
</tbody>
</table>
<p>One potential problem with this annotation scoping mechanism is
that earlier annotations could be lost if earlier stages of a
derivation were dropped in order to reduce message size. This
problem can be overcome by considering annotation scope at the
point where earlier derivation stages are discarded and populating
the final interpretation in the derivation with all of the
annotations which it could inherit. For example, if the
raw and
better stages were dropped the
resulting EMMA document would be:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="best"
emma:start="1087995961542"
emma:end="1087995963542"
emma:process="http://example.com/myrefresolution1.xml"
emma:source="http://example.com/microphone/NC-61"
emma:signal="http://example.com/signals/sg23.wav"
emma:confidence="0.8"
emma:medium="acoustic"
emma:mode="voice"
emma:function="dialog"
emma:verbal="true"
emma:tokens="from boston to denver tomorrow"
emma:lang="en-US">
<emma:derived-from resource="#better" composite="false"/>
<origin>Boston</origin>
<destination>Denver</destination>
<date>03152003</date>
</emma:interpretation>
</emma:emma>
</pre>
<p>Annotations on an
emma:one-of element are assumed
to apply to all of the container elements within the
emma:one-of.</p>
<p>If
emma:one-of appears with another
emma:one-of then annotations on the parent
emma:one-of are assumed to apply to the children of
the child
emma:one-of.</p>
<p>Annotations on
emma:group or
emma:sequence do not apply to their child
elements.</p>
<h2 id="s5">5. Conformance</h2>
<p>The contents of this section are normative.</p>
<h3 id="s5.1">5.1 Conforming EMMA Documents</h3>
<p>A document is a Conforming EMMA Document if it meets both the
following conditions:</p>
- It is a well-formed XML document [XML]
conforming to Namespaces in XML [XMLNS].
- It adheres to the specification described in this document
(EMMA Specification) including the constraints expressed in the
Schema (see Appendix A) and having an XML
Prolog and root element as specified in Section
3.1.
<p>The EMMA specification and these conformance criteria provide no
designated size limits on any aspect of EMMA documents. There are
no maximum values on the number of elements, the amount of
character data, or the number of characters in attribute
values.</p>
<p><span>Within this specification, the term URI refers to a
Universal Resource Identifier as defined in [
RFC3986] and extended in [
RFC3987] with the new name IRI. The term URI has
been retained in preference to IRI to avoid introducing new names
for concepts such as "Base URI" that are defined or referenced
across the whole family of XML specifications</span>.</p>
<h3 id="s5.2">5.2 Using EMMA with other Namespaces</h3>
<p>The EMMA namespace is intended to be used with other XML
namespaces as per the Namespaces in XML Recommendation [
XMLNS]. Future work by W3C is expected to address ways
to specify conformance for documents involving multiple
namespaces.</p>
<h3 id="s5.3">5.3 Conforming EMMA Processors</h3>
<p>A EMMA processor is a program that can process and/or generate
Conforming EMMA documents.</p>
<p>In a Conforming EMMA Processor, the XML parser MUST be able to
parse and process all XML constructs defined by XML 1.1 [
XML] and Namespaces in XML [
XMLNS].
It is not required that a Conforming EMMA Processor uses a
validating XML parser.</p>
<p>A Conforming EMMA Processor MUST correctly understand and apply
the semantics of each markup element or attribute as described by
this document.</p>
<p>There is, however, no conformance requirement with respect to
performance characteristics of the EMMA Processor. For instance, no
statement is required regarding the accuracy, speed or other
characteristics of output produced by the processor. No statement
is made regarding the size of input that a EMMA Processor is
required to support.</p>
<h2 id="appendices">Appendices</h2>
<h3 id="appA">Appendix A. XML and <span>RELAX NG</span>
schemata</h3>
<p>This section is Normative.</p>
<p>This section defines the formal syntax for EMMA documents in
terms of a normative XML Schema.</p>
<p>There are both an XML Schema and <span>RELAX NG</span> Schema
for the EMMA markup. The latest version of the XML Schema for EMMA
is available at
http://www.w3.org/TR/emma/emma.xsd
and the RELAX NG Schema can be found at
http://www.w3.org/TR/emma/emma.rng.</p>
<p>For stability it is RECOMMENDED that you use the dated URI
available at
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd
and
http://www.w3.org/TR/2009/REC-emma-20090210/emma.rng.</p>
<h2 id="appB">Appendix B. MIME type</h2>
<p>This section is <span>N</span>ormative.</p>
<p>This appendix registers a new MIME media type,
"
application/emma+xml".</p>
<p>The "
application/emma+xml" media type is
registered with IANA at
http://www.iana.org/assignments/media-types/application/.
</p>
<div>
<h3 id="media-type-registration">B.1 Registration of MIME media
type application/emma+xml</h3>
<dl>
<dt>MIME media type name:</dt>
<dd>
<p>
application</p>
</dd>
<dt>MIME subtype name:</dt>
<dd>
<p>
emma+xml</p>
</dd>
<dt>Required parameters:</dt>
<dd>
<p>None.</p>
</dd>
<dt>Optional parameters:</dt>
<dd>
<dl>
<dt>
charset</dt>
<dd>
<p>This parameter has identical semantics to the
charset parameter of the
application/xml
media type as specified in [
RFC3023] or its
successor.</p>
</dd>
</dl>
</dd>
<dt>Encoding considerations:</dt>
<dd>
<p>By virtue of EMMA content being XML, it has the same
considerations when sent as "
application/emma+xml"as
does XML. See RFC 3023 (or its successor), section 3.2.</p>
</dd>
<dt>Security considerations:</dt>
<dd>
<p>Several features of EMMA require dereferencing arbitrary URIs.
Implementers are advised to heed the security issues of [
RFC3986] section 7.</p>
<p>In addition, because of the extensibility features for EMMA, it
is possible that "
application/emma+xml" will describe
content that has security implications beyond those described here.
However, if the processor follows only the normative semantics of
this specification, this content will be ignored. Only in the case
where the processor recognizes and processes the additional
content, or where further processing of that content is dispatched
to other processors, would security issues potentially arise. And
in that case, they would fall outside the domain of this
registration document.</p>
</dd>
<dt>Interoperability considerations:</dt>
<dd>
<p>This specification describes processing semantics that dictate
the required behavior for dealing with, among other things,
unrecognized elements.</p>
<p>Because EMMA is extensible, conformant
"
application/emma+xml" processors MAY expect that
content received is well-formed XML, but processors SHOULD NOT
assume that the content is valid EMMA or expect to recognize all of
the elements and attributes in the document.</p>
</dd>
<dt>Published specification:</dt>
<dd>
<p>
This media type registration is extracted from Appendix B of the
"
EMMA: Extensible MultiModal Annotation markup language"
specification.
</p>
</dd>
<dt>Additional information:</dt>
<dd>
<dl>
<dt>Magic number(s):</dt>
<dd>
<p>There is no single initial octet sequence that is always present
in EMMA documents.</p>
</dd>
<dt>File extension(s):</dt>
<dd>
<p>EMMA documents are most often identified with the extensions
"
.emma".</p>
</dd>
<dt>Macintosh File Type Code(s):</dt>
<dd>
<p>TEXT</p>
</dd>
</dl>
</dd>
<dt>Person & email address to contact for further
information:</dt>
<dd>
<p>Kazuyuki Ashimura, <
[email protected]>.</p>
</dd>
<dt>Intended usage:</dt>
<dd>
<p>COMMON</p>
</dd>
<dt>Author/Change controller:</dt>
<dd>
<p>The EMMA specification is a work product of the World Wide Web
Consortium's Multimodal Interaction Working Group. The W3C has
change control over these specifications.</p>
</dd>
</dl>
</div>
<h2 id="appC">Appendix C.
emma:hook and SRGS</h2>
<p>This section is <span>I</span>nformative.</p>
<div>
<p>One of the most powerful aspects of multimodal interfaces is
their ability to provide support for user inputs which are
distributed over the available input modes. These
composite
inputs are contributions made by the user within a single turn
which have component parts in different modes. For example, the
user might say "zoom in here" in the speech mode while drawing an
area on a graphical display in the ink mode. One of the central
motivating factors for this kind of input is that different kinds
of communicative content are best suited to different input modes.
In the example of a user drawing an area on a map and saying "zoom
in here", the zoom command is easiest to provide in speech but the
spatial information, the specific area, is easier to provide in
ink.</p>
<p>Enabling composite multimodality is critical in ensuring that
multimodal systems support more natural and effective interaction
for users. In order to support composite inputs, a multimodal
architecture must provide some kind of multimodal integration
mechanism. In the W3C Multimodal Interaction Framework
<span>[
MMI Framework]</span>, multimodal
integration can be handled by an integration component which
follows the application of speech understanding and other kinds of
interpretation procedures for individual modes.</p>
<p>Given the broad range of different techniques being employed for
multimodal integration and the extent to which this is an ongoing
research problem, standardization of the specific method or
algorithm used for multimodal integration is not appropriate at
this time. In order to facilitate the development and
inter-operation of different multimodal integration mechanisms EMMA
provides markup language enabling application independent
specification of elements in the application markup where content
from another mode needs to be integrated. These representation
'hooks' can then be used by different kinds of multimodal
integration components and algorithms to drive the process of
multimodal integration. In the processing of a composite multimodal
input, the result of applying a mode-specific interpretation
component to each of the individual modes will be EMMA markup
describing the possible interpretation of that input.</p>
</div>
<p>One way to build an EMMA representation of a spoken input such
as "zoom in here" is to use grammar rules in the W3C Speech
Recognition Grammar Specification [
SRGS] using
the Semantic Interpretation <span>[
SISR]</span>
tags to build the application semantics with the
emma:hook attribute. In this approach <span>[
ECMAScript]</span> is specified in order to build
up an object representing the semantics. The resulting ECMAScript
object is then translated to XML.</p>
<p>For our example case of "zoom in here". The following SRGS rule
could be used. The <span>Semantic Interpretation for Speech
Recognition</span> specification <span>[
SISR]</span> provides a reserved property
_nsprefix for indicating the namespace to be used with an
attribute.</p>
<pre class="example">
<rule id="zoom">
zoom in here
<tag>
$.command = new Object();
$.command.action = "zoom";
$.command.location = new Object();
$.command.location._attributes = new Object();
$.command.location._attributes.hook = new Object();
$.command.location._attributes.hook._nsprefix = "emma";
$.command.location._attributes.hook._value = "ink";
$.command.location.type = "area";
</tag>
</rule>
</pre>
<p>Application of this rule will result in the following ECMAScript
object being built.</p>
<pre class="example">
command: {
action: "zoom"
location: {
_attributes: {
hook: {
_nsprefix: "emma"
_value: "ink"
}
}
type: "area"
}
}
</pre>
<p>
SI processing in an XML environment would
generate the following document:</p>
<pre class="example">
<command>
<action>zoom</action>
<location emma:hook="ink">
<type>area</type>
</location>
</command>
</pre>
<p>This XML fragment might then appear within an EMMA document as
follows:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="voice1"
emma:medium="acoustic"
emma:mode="voice">
<command>
<action>zoom</action>
<location emma:hook="ink">
<type>area</type>
</location>
</command>
</emma:interpretation>
</emma:emma>
</pre>
<p>The
emma:hook annotation indicates that this speech
input needs to be combined with ink input such as the
following:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation id="pen1"
emma:medium="tactile"
emma:mode="ink">
<location>
<type>area</type>
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
</location>
</emma:interpretation>
</emma:emma>
</pre>
<p>This representation could be generated by a pen modality
component performing gesture recognition and interpretation. The
input to the component would be an <span>Ink Markup Language</span>
specification <span>[
INKML]</span> of the ink
trace and the output would be the EMMA document above.</p>
<p>The combination will result in the following EMMA document for
the combined speech and pen multimodal input.</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation
emma:medium="acoustic tactile"
emma:mode="<span>voice ink</span>"
emma:process="http://example.com/myintegrator.xml">
<emma:derived-from resource="<span>http://example.com/voice1.emma/</span>#voice1" composite="true"/>
<emma:derived-from resource="<span>http://example.com/pen1.emma/</span>#pen1" composite="true"/>
<command>
<action>zoom</action>
<location>
<type>area</type>
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
</location>
</command>
</emma:interpretation>
</emma:emma>
</pre>
<div>
<p>There are two components to the process of integrating these two
pieces of semantic markup. The first is to ensure that the two are
compatible; that is, that no semantic constraints are violated. The
second is to fuse the content from the two sources. In our example,
the
<type>area</type> element is intended
to indicate that this speech command requires integration with an
area gesture rather than, for example, a line gesture, which would
have the subelement
<type>line</type>.
This constraint needs to be enforced by whatever mechanism is
responsible for multimodal integration.</p>
<p>Many different techniques could be used for achieving this
integration of the semantic interpretation of the pen input, a
<location> element, with the corresponding
<location> element in the speech. The
<span>
emma:hook</span> simply serves to indicate the
existence of this relationship.</p>
<p>One way to achieve both the compatibility checking and fusion of
content from the two modes is to use a well-defined general purpose
matching mechanism such as unification. <span>Graph unification
[</span>
Graph
unification<span>]</span> is a mathematical operation defined
over directed acylic graphs which captures both of the components
of integration in a single operation: the applications of the
semantic constraints and the fusing of content. One possible
semantics for the
emma:hook markup indicates that
content from the required mode needs to be unified with that
position in the application semantics. In order to unify, two
elements must not have any conflicting values for subelements or
attributes. This procedure can be defined recursively so that
elements within the subelements must also not clash and so on. The
result of unification is the union of all of the elements and
attributes of the two elements that are being unified.</p>
<p>In addition to the unification operation, in the resulting
emma:interpretation the
emma:hook
attribute needs to be removed and the
emma:mode
attribute changed to <span>the list of the modes of the individual
inputs</span> <span>, e.g.
"voice ink"</span>.</p>
<p>Instead of the unification operation, for a specific application
semantics, integration could be achieved using some other algorithm
or script. The benefit of using the unification semantics for
emma:hook is that it provides a general purpose
mechanism for checking the compatibility of elements and fusing
them, whatever the specific elements are in the application
specific semantic representation.</p>
<p>The benefit of using the
emma:hook annotation for
authors is that it provides an application independent method for
indicating where integration with content from another mode is
required. If a general purpose integration mechanism is used, such
as the unification approach described above, authors should be able
to use the same integration mechanism for a range of different
applications without having to change the integration rules or
logic. For each application the speech grammar rules [
SRGS] need to assign
emma:hook to the
appropriate elements in the semantic representation of the speech.
The general purpose multimodal integration mechanism will use the
emma:hook annotations in order to determine where to
add in content from other modes. Another benefit of the
emma:hook mechanism is that it facilitates
interoperability among different multimodal integration components,
so long as they are all general purpose and utilize
emma:hook in order to determine where to integrate
content.</p>
<p>The following provides a more detailed example of the use of the
emma:hook annotation. In this example, spoken input is
combined with two <span>ink</span> gestures. The semantic
representation assigned to the spoken input "send this file to
this" indicates two locations where content is required from ink
input using
emma:hook="ink":</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation<span> id="voice2"
emma:medium="acoustic"
emma:mode="voice"
emma:tokens="send this file to this"
emma:start="1087995961500"
emma:end="1087995963542"</span>>
<command>
<action>send</action>
<arg1>
<object emma:hook="ink">
<type>file</type>
<number>1</number>
</object>
</arg1>
<arg2>
<object emma:hook="ink">
<number>1</number>
</object>
</arg2>
</command>
</emma:interpretation>
</emma:emma>
</pre>
<p>The user gesturing on the two locations on the display can be
represented using
emma:sequence:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:sequence<span> id="ink2"</span>>
<emma:interpretation <span>emma:start="1087995960500"
emma:end="1087995960900"<br />
emma:medium="tactile"
emma:mode="ink"</span>>
<object>
<type>file</type>
<number>1</number>
<id>test.pdf</id>
<object>
</emma:interpretation>
<emma:interpretation <span>emma:start="1087995961000"
emma:end="1087995961100"<br />
emma:medium="tactile"
emma:mode="ink"</span>>
<object>
<type>printer</type>
<number>1</number>
<id>lpt1</id>
<object>
</emma:interpretation>
</emma:sequence>
</emma:emma>
</pre>
<p>A general purpose unification-based multimodal integration
algorithm could use the
emma:hook annotation as
follows. It identifies the elements marked with
emma:hook in document order. For each of those in
turn, it attempts to unify the element with the corresponding
element in order in the
emma:sequence. Since none of
the subelements conflict, the unification goes through and as a
result, we have the following EMMA for the composite result:</p>
<pre class="example">
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:interpretation<span> id="multimodal2"
emma:medium="acoustic tactile"
emma:mode="voice ink"
emma:tokens="send this file to this"
emma:process="http://example.com/myintegration.xml"
emma:start="1087995960500"
emma:end="1087995963542"</span>>
<emma:derived-from resource="<span>http://example.com/voice2.emma/</span>#voice2" composite="true"/>
<emma:derived-from resource="<span>http://example.com/ink2.emma/</span>#ink2" composite="true"/>
<command>
<action>send</action>
<arg1>
<object>
<type>file</type>
<number>1</number>
<id>test.pdf</id>
</object>
</arg1>
<arg2>
<object>
<type>printer</type>
<number>1</number>
<id>lpt1</id>
</object>
</arg2>
</command>
</emma:interpretation>
</emma:emma>
</pre></div>
<h2 id="appD">Appendix D. EMMA event interface</h2>
<p>This section is <span>I</span>nformative.</p>
<p>The W3C Document Object Model [
DOM] defines
platform and language neutral interfaces that gives programs and
scripts the means to dynamically access and update the content,
structure and style of documents. DOM Events define a generic event
system which allows registration of event handlers, describes event
flow through a tree structure, and provides basic contextual
information for each event.</p>
<p>This section of the EMMA specification extends the DOM Event
interface for use with events that describe interpreted user input
in terms of a DOM Node for an EMMA document.</p>
<pre class="example">
// File: emma.idl
#ifndef _EMMA_IDL_
#define _EMMA_IDL_
#include "dom.idl"#include "views.idl"#include "events.idl"
#pragma prefix "dom.w3c.org"module emma
{
typedef dom::DOMString DOMString;
typedef dom::Node Node;
interface EMMAEvent : events::UIEvent {
readonly attribute dom::Node node;
void initEMMAEvent(in DOMString typeArg,
in boolean canBubbleArg,
in boolean cancelableArg,
in Node node);
};
};
#endif // _EMMA_IDL_
</pre>
<h2 id="appE">Appendix E. References</h2>
<h3 id="appE1">E.1 Normative references</h3>
<dl>
<dt id="BCP47">BCP47</dt>
<dd>A. Phillips and M. Davis, editors.
Tags for the
Identification of Languages, IETF, September 2006.</dd>
<dt id="RFC3023">RFC3023</dt>
<dd>M. Murata et al.<span>,</span> editors.
XML Media Types. IETF RFC
3023<span>, January 2001</span>.</dd>
<dt id="RFC2046">RFC2046</dt>
<dd>N. Freed and N. Borenstein<span>,</span> editors.
Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types. IETF RFC 2046<span>,
November 1996</span>.</dd>
<dt>
RFC2119</dt>
<dd>S. Bradner, <span>e</span>ditor.
Key words for use in RFCs to
Indicate Requirement Levels, IETF <span>RFC 2119</span>, March
1997.</dd>
<dt id="RFC3986">RFC3986</dt>
<dd>T. Berners-Lee et al.<span>,</span> editors.
Uniform Resource Identifier
(URI): Generic Syntax. IETF RFC 3986<span>, January
2005</span>.</dd>
<dt id="RFC3987">RFC3987</dt>
<dd>M. Duerst and M. Suignard<span>,</span> editors.
Internationalized Resource
Identifiers (IRIs). IETF RFC 3987<span>, January
2005</span>.</dd>
<dt id="XML">XML</dt>
<dd>Tim Bray <span>et al.,</span> editors.
Extensible Markup
Language (XML) 1.1. World Wide Web Consortium, <span>W3C
Recommendation,</span> 2004.</dd>
<dt id="XMLNS">XMLNS</dt>
<dd>Tim Bray <span>et al.</span>, editors<span>.</span>
Namespaces in XML 1.1,
World Wide Web Consortium, <span>W3C Recommendation,</span>
200<span>6</span>.</dd>
<dt id="XSD1">XML Schema Structures</dt>
<dd>Henry S. Thompson <span>et al.</span>, editors.
XML Schema Part 1: Structures
Second Edition, World Wide Web Consortium<span>, W3C
Recommendation</span>, 2004.</dd>
<dt id="XSD2">XML Schema Datatypes</dt>
<dd>Paul V. Biron <span>and</span> Ashok Malhotra, editors.
XML Schema Part 2:
Datatypes Second Edition, World Wide Web Consortium, <span>W3C
Recommendation,</span> 2004.</dd>
</dl>
<h3 id="appE2">E.2 Informative references</h3>
<dl>
<dt id="DOM">DOM</dt>
<dd>
Document Object Model,
World Wide Web Consortium, 2005.</dd>
<dt id="ECMASCRIPT">ECMAScript</dt>
<dd>
ECMAScript</dd>
<dt id="InkML">INKML</dt>
<dd>Yi-Min Chee, Max Froumentin, Stephen M. Watt, editors.
Ink Markup Language (InkML),
World Wide Web Consortium, W3C Working Draft, 2006.</dd>
<dt id="SI">SI<span>SR</span></dt>
<dd>Luc Van Tichelen <span>and Dave Burke</span>,
editor<span>s</span>.
Semantic
Interpretation for Speech Recognition, World Wide Web
Consortium, <span>W3C Proposed Recommendation, 2007</span>.</dd>
<dt id="SRGS">SRGS</dt>
<dd>Andrew Hunt, Scott McGlashan, editors.
Speech Recognition Grammar
Specification Version 1.0, World Wide Web Consortium<span>, W3C
Recommendation,</span> 2004.</dd>
<dt id="XFORMS">XFORMS</dt>
<dd><span>John M. Boyer et al., editors.</span>
XForms <span>1.0
(Second Edition)</span>, World Wide Web Consortium, <span>W3C
Recommendation,</span> 2006.</dd>
<dt id="RELAXNG">RELAX-NG</dt>
<dd><span>James Clark and Makoto Murata, editors.</span>
<span>
RELAX NG Specification</span><span>, OASIS, Committee
Specification, 2001.</span></dd>
<dt id="EMMAreqs">EMMA Requirements</dt>
<dd>Stephane H. Maes and Stephen Potter, editors.
Requirements for EMMA, World
Wide Web Consortium, <span>W3C Note,</span> 2003<span>.</span></dd>
<dt id="graphunification">Graph Unification</dt>
<dd>Bob Carpenter. <cite>The Logic of Typed Feature
Structures</cite>, Cambridge Tracts in Theoretical Computer Science
32, Cambridge University Press, 1992.</dd>
<dd>Kevin Knight. <cite>Unification: A Multidisciplinary
Survey</cite>, ACM Computing Surveys, 21(1), 1989.</dd>
<dd>Michael Johnston. <cite>Unification-based Multimodal
Parsing</cite>, Proceedings of Association for Computational
Linguistics, pp. 624-630, 1998.</dd>
<dt id="MMIF">MMI Framework</dt>
<dd>James A. Larson, T.V. Raman and Dave Raggett, editors.
W3C Multimodal Interaction
Framework, World Wide Web Consortium<span>, W3C Note</span>,
2003<span>.</span></dd>
<dt id="MMIreqs">MMI Requirements</dt>
<dd>Stephane H. Maes and Vijay Saraswat, editors.
Multimodal Interaction
Requirements, World Wide Web Consortium<span>, W3C Note</span>,
2003<span>.</span></dd>
</dl>
<h2 id="appF">Appendix F. Changes since last draft</h2>
<p>This section is <span>I</span>nformative.</p>
<p>
Since the publication of the Proposed Recommendation of the EMMA
specification, the following minor editorial changes have been
added to the draft.
</p>
-
Fixed wrong style of text.
(1.2 Terminology)
-
Changed schemaLocation URI in example codes
from
"http://www.w3.org/TR/2008/PR-emma-20081215/"
to
"http://www.w3.org/TR/2009/REC-emma-20090210/".
(2. Structure of EMMA documents,
3. EMMA structural elements
and
4 EMMA annotations)
-
Changed the note on the status of MIME type registration from
"being submitted to the IESG for review, approval, and registration
with IANA" to "registered with IANA at
http://www.iana.org/assignments/media-types/application/" because
the EMMA MIME type is registered with IANA.
(Appendix B)
<h2 id="appG">Appendix G. Acknowledgements</h2>
<p>This section is <span>I</span>nformative.</p>
<p>The editors would like to recognize the contributions of the
current and former members of the W3C Multimodal Interaction Group
(listed in alphabetical order):</p>
<dl>
<dd>Kazuyuki Ashimura, W3C</dd>
<dd>Patrizio Bergallo, (until 2008, while at Loquendo)</dd>
<dd>Wu Chou, Avaya</dd>
<dd>Max Froumentin, (until 2006, while at W3C)</dd>
<dd>Katriina Halonen, Nokia</dd>
<dd>Jin Liu, T-Systems</dd>
<dd>Roberto Pieraccini, Speechcycle</dd>
<dd>Stephen Potter, Microsoft</dd>
<dd>Massimo Romanelli, DFKI</dd>
<dd>Yuan Shao, Canon</dd>
</dl>
<script type="application/javascript" src="https://www.w3.org/scripts/TR/fixup.js"></script></body>
</html>