[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
absurd!
My name is James Salsman; I've been a member of Computer
Professionals for Social Responsibility (where I founded
the Education Working Group, and am Education Webmaster),
the Green Party, and a vocal and financial supporter of
the Taxpayer Assets Project for several years. I also
volunteer as a mentor at Smart Valley, Inc's SmartSchools
project [www.svi.org].
The company for which I've worked for the last few months,
WebTV Networks, was recently purchased by Microsoft, so I'm
qualified to make some comments. To wit, this whole
protest is absurd and has shaken my trust of Ralph Nader
and Jamie Love. I think NT is a piece of crap (this is a
personal view and doesn't reflect the views of my employer,
of course; my prediction is that Microsoft will eventually
re-sell a version of Linux under their brand) but it's not
so bad that the market is rejecting it. I also think Java
is mostly useless, and the Java debate irrelivant -- seriously,
the ratio of Java users to Perl users is like one to 100, so
the whole controversy simply doesn't matter. And as for
unfair trade practices, if WebTV weren't allowed to bundle
the web browser with the operating system our customers would
have to plug in PROM chips instead of downloading upgrades
from our servers -- so put that in your attache cases and
litigate it!!!
As for non-participation in the standards process, that is
even more absurd. I know one of the Microsoft representatives
on the W3C HTML Working Group from my days at Carnegie Mellon,
and he has been working very hard to port IE 4 to the Mac.
Furthermore, attached is a draft RFC which I submitted just
yesterday to solve a problem which was completely ignored when
I proposed it while working at Netscape.
Sincere regards,
--
:James Salsman, WebTV Networks, 650-614-8465
INTERNET-DRAFT J. Salsman
Suggested filename: <draft-www-device-upload-00.txt> WebTV Networks
Expiration date: 15 May 1998 12 November 1997
Form-based Device Input and Upload in HTML
Status of this Memo
This draft extends an experimental protocol for the Internet
community. This draft does not specify an Internet standard of any
kind. Discussion and suggestions for improvement are requested.
Distribution of this memo will be unlimited when the W3C approves the
HTML 4.0 standard. Until then, please do not distribute this draft
beyond your department.
1. Abstract and introduction
Currently, HTML forms allow the producer of the form to request
information -- including files of data -- from the operator reading
the form. However, this capability is limited because HTML forms
don't provide a way to ask the operator to submit input from
arbitrary sources such as audio devices like microphones. Since
input and upload from various devices is a feature that will
benefit many applications, this draft proposes an extension to the
HTML INPUT TYPE=FILE form element specified in RFC 1867 to allow
information providers to express requests for uploads from audio
and other devices uniformly, and a discussion of MIME audio data
types to facilitate useful audio upload responses. This draft also
includes security and audio usability and quality discussions as
well as a description of a backward compatibility strategy that
allows new user agents to utilize HTML written with earlier
proposals for audio input in mind, and concludes with motivations.
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
Coast), or ftp.isi.edu (US West Coast).
2. HTML forms with device input file upload submission
Section 3.1 of RFC 1867 provides for the presentation of an
arbitrary "widget" to specify input for file uploads. When an
INPUT tag of type FILE is encountered with a DEVICE attribute, the
associated value (such as MICROPHONE, or MIC) might select the use
of a widget capable of buffering and editing real-time input (such
as speech) instead of entering a file selection mode.
If an ACCEPT attribute is present in a device file input element,
the browser might constrain the MIME type of uploaded data to match
those with the corresponding list of types specified. If the value
of the DEVICE parameter is FILESYSTEM or FILES then the INPUT
element might be treated as usual according to RFC 1867 except that
the subset of files presented to the operator to choose from may be
constrained by the specified list of MIME types instead of a
pattern of file names or extensions.
Since there is no original filename as specified in section 3.3 of
RFC 1867 for parameters of the 'content-disposition: form-data' and
'content-disposition: file' HTTP headers, those headers might be
provided with a 'type' parameter representing the MIME type of the
encoded data, if known, and a 'device' parameter with the same
value as the DEVICE attribute of the associated form input element,
unless the device or MIME type(s) specified are unsupported in
which case the value of the 'device' header parameter might be
'unsupported', or unless the device is unavailable in which case
the value might be 'unavailable'. If the MIME types requested are
unsupported, an additional parameter 'alternates' might be included
with a space-separated list of MIME types of the same content-type
which may be supported as alternatives for the specified device.
There may be significant limitations on the client browser's
ability to buffer input for upload. Browsers might provide an
estimate of the default MAXLENGTH available for device input and
upload through the HTTP header 'Pragma: DEVICE-MAXLENGTH='BYTES
which represents the content-length available to the browser for
buffering (see section 14.32 of RFC 2068.)
Furthermore, the VALUE attribute may be used to provide a
disambiguation between multiple similar devices when present.
If real time events, such as those described and proposed by
Gregory S. Aist in "A General Architecture for a Real-Time
Discourse Agent and a Case Study in Computerized Oral Reading
Tutoring" (Carnegie Mellon University Computational Linguistics
Program, 6 December 1996), are required, then the Real-time
Transport Protocol (RTP, currently RFC 1889) should be used
instead. Because of security concerns discussed in section 3
below, HTML scripts might not be able to invoke a form submission
when the form involves any kind of file upload without explicit
instructions from the session operator to the contrary.
2.1. Examples
<FORM ENCTYPE="multipart/form-data" METHOD=POST ACTION="_URL_">
Say something: <INPUT NAME=SPEECH1 TYPE=FILE DEVICE=MIC>
<INPUT TYPE=SUBMIT VALUE="Send Speech">
</FORM>
In this simple form, the HTML author has requested the upload of
sampled microphone input from the operator upon form submission.
<INPUT NAME=SPEECH2 TYPE=FILE DEVICE=MICROPHONE
ACCEPT="audio/l16 ;rate=11025 ;channels=1 audio/x-cepstral-voc">
Here MIC is not used as an abbreviation. The author of the HTML has
requested that the data input from the microphone be encoded as either
the MIME type Audio/L16 -- sixteen bit signed linear audio samples
(most-significant byte first) -- as specified in RFC 1890 section
4.4.8, with a single (monaural) channel and a sample rate of 11,025
samples per second, or an unspecified extended MIME Audio type named
'x-cepstral-voc'.
<INPUT NAME=FILE1 TYPE=FILE DEVICE=FILES ACCEPT="text/*">
Here the form element may be used to upload a file as usual, except
that the files to select from might be constrained to text files,
without explicit regard of their filename or extensions.
<INPUT NAME=PICTURE1 TYPE=FILE DEVICE=CAMERA VALUE=2>
The final example shows how these extensions may be used to request
input from other kinds of devices, such as the second of two or
more cameras connected to the system running the browser.
3. Security considerations
Browser operators may not want to send their files, recordings,
pictures, video, or other device inputs to arbitrary sites without
their explicit permission and direction. Therefore, browser
authors are encouraged to disallow the submission of forms which
include any kind of file upload by any means other than the
standard HTML operator-controlled buttons for form submission
without explicit instruction from the session operator to the
contrary. Accordingly, the SIZE parameter, document style sheets,
and document layers may be prevented from obscuring any kind of
file upload widget, especially those capable of accepting a default
filename. Finally, just as the operator may take direct action to
initiate, terminate, review and edit recording as described in the
next section, browser authors are encouraged to prevent HTML
scripts from taking those and similar actions, unless for example
the operator has specifically enabled such script actions with a
security option. Even then, such preferences might be specified by
the operator to reset after an interval or at the end of the
session. Furthermore, explicit information might be provided to
insure that the operator is informed when files are being uploaded.
4. User interface usability and quality concerns for audio
An audio sample is customarily recorded on computer equipment with
a dialog routine capable of allowing the user to record, pause,
play back, erase, or otherwise edit the recording. Browsers might
provide the operator with the same kind of dialog routine for audio
device input. And if a MAXLENGTH has been specified or is in force
because of limited buffer size, a display of the buffer size used
and remaining might be displayed as a dynamic bar graph (or
percentage if graphics are unavailable.) A display of time in
seconds used and remaining in the buffer may also be provided.
Most MIME types defined for audio do not provide high-quality audio
encodings. The 'audio/basic' and other types which use a sample
rate of 8,000 samples per second truncate the audio spectrum at
4,000 Hz according to the Nyquist theorem, discarding information
important for discerning consonants. Also, audio/basic and other
MIME Audio types use a sample size of eight bits, which does not
usually provide enough dynamic range for accurate automatic speech
recognition unless published automatic gain control algorithms are
reliably used. If sixteen-bit unsigned audio encodings are used
according to section 4.4.8 of RFC 1890, the sample rate --
specified as the 'rate' parameter of the MIME type 'audio/l16' --
might be at least 11,025 or 16,000 to adequately provide sufficient
information for automatic speech recognition. Otherwise, the audio
feature extraction encoding of the speech recognition algorithm
might be used to provide a more compact representation to shorten
the upload.
5. Compatibility with earlier forms of audio input
Audio device input has been proposed before and implemented from a
microphone at least as early as 1994 in experimental versions of
common Web browsers. To accommodate the syntax of these earlier
extensions, a browser might interpret a valid XML statement such as
<INPUT TYPE=AUDIO ...>
as the device input form
<INPUT TYPE=FILE DEVICE=MICROPHONE ...>
with all other attribute/value pairs of the original INPUT element
kept the same as specified. This would retain compatibility for
all implementations of which the author of this draft is aware.
6. HTML Document Type Description changes
Along with the extension to the HTML InputType entity described in
the previous section, this proposal makes an addition to the HTML
DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE
of type CDATA.
7. Motivations and conclusion
The primary motivation for these extensions is to add the
capability of speech input to Web-based educational systems. For
example, the "Test of English as a Foreign Language," or TOEFL
assessment is comprised of multiple choice questions based on media
comprised of text and audio recordings, so it would be possible to
represent the TOEFL with current HTML multimedia content and forms.
However, the TOEFL makes no provision whatsoever about the accuracy
of pronunciation by the subjects of the assessment, except that
provided by the ability to accurately identify the terms in the
text of the assessment. So while scoring on the important ability
to listen, the TOEFL does not make provisions to assess the
important ability to speak with correct pronunciation. But with
form-based audio input and upload, and speech recognition servers
capable of aligning and scoring the pronunciation of words and
phonemes, such a Web-based TOEFL could be extended to reduce the
number of inscrutable graduate teaching assistants, for example.
Of course the possibilities for language instruction enabled by
these extensions are not limited to the graduate level or English.
Other motivations include the development of "dictation servers"
capable of transforming spoken audio uploaded though an HTTP
session to the corresponding text suitable for sending in email or
including in another document, for example. Natural language
continuous speech recognition software conforming to standard APIs
for automatic dictation is as of this writing available from retail
outlets for less than US$90 so there is ample reason to believe
that dictation servers could soon become commonplace on the Web
with these extensions.
Finally, this could be a great help for hearing impaired people who
want to use a "phonology server" (similar to the server described
in the Web-TOEFL example above) to practice improving their
pronunciation without depending on a human speech coach.
The change to the HTML DTD is very simple, but very powerful. It
enables a much greater variety of services to be implemented via
the World-Wide Web than is currently possible due to the lack of a
peripheral input upload submission facility. This would be a very
valuable addition to the capabilities of the World-Wide Web.
8. Author's address and acknowledgments
James Salsman
Bovik Research (nonprofit research institute)
courtesy WebTV Networks, Microsoft Corporation
and MindSource Software Engineers
575 S. Rengstorff Avenue
Mountain View, CA 94040-1982
Email: jps@bovik.org, jsalsman@corp.webtv.net
Phone: (650) 938-1440
"TOEFL" and "Test Of English as a Foreign Language" are
registered trademarks of Educational Testing Service.
References
[RFC 1867] Form-based File Upload in HTML. E. Nebel & L. Masinter,
November 1995. ftp://ds.internic.net/rfc/rfc1867.txt
[RFC 1889] RTP: A Transport Protocol for Real-Time Applications.
H. Schulzrinne, S. Casner, R. Frederick, & V. Jacobson,
January 1996. ftp://ds.internic.net/rfc/rfc1889.txt
[RFC 1890] RTP Profile for Audio and Video Conferences with Minimal
Control. H. Schulzrinne, January 1996.
ftp://ds.internic.net/rfc/rfc1890.txt
[RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding,
J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee,
January 1997. ftp://ds.internic.net/rfc/rfc2068.txt
END OF INTERNET-DRAFT
Suggested filename: <draft-www-device-upload-00.txt>
Expiration date: 15 May 1998
:jps