-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify getUserMedia({audio:{deviceId:{exact:<audiooutput_device>}}}) in this specification mandates capability to capture of audio output device - not exclusively microphone input device #650
Comments
My opinion: The same thing should happen as if you do getusermedia({audio: {deviceid: exact: videodevice}) - that is, no device should be found. If the system has the capability of capturing outgoing audio, that should be represented as a separate input device. The only place in the specification set where capturing system audio output is referenced is in GetDisplayMedia. |
Are you actually saying that
does not mean capturing audio output on its face? How else is Are you conveying that, in your opinion, |
Correct. It does not mean capturing audio output. It means playing audio, not capturing it.
See above.
No. It means audio output, not input. Note also that input devices (audioinput and videoinput) returned by enumerateDevices are of type InputDeviceInfo, whereas output devices (audiooutput) are not. I propose we close this issue since getUserMedia() is not intended to capture audio from devices marked as audiooutput by enumerateDevices() and I think there is no doubt about it. |
Reads that way from perspective here. Performs that way at Firefox, Nightly.
There is far more than doubt. It is already currently possible to capture audio output excludively at Firefox and Nightly. You, or someone else, needs to include that exact language in the specification, and remove At which specification should the PR be filed to specifically add capture of audio output, after you include language which prohibits capturing audio output - as that is very much unclear right now? |
Firefox does not even expose entries of kind audiooutput in enumerateDevices (tested with Firefox Nightly 73.0a1, 2019-12-09).
I mean there is no doubt among browser implementers since no browser allows capturing from entries marked as audiooutput by enumerateDevices(). All browsers allow capturing audio from devices exposed as audioinput. The getUserMedia() definition says
There is no need to exclude audiooutput from the spec and there is nothing absurd about it. They represent audio output devices, which are intended for audio playback, not capture. They can be used with setSinkId to select the device to be used by a media element to output audio, or, indirectly to select an associated input device in getUserMedia() via the groupId field. |
Do not care if
though that language is actually intended to apply to The use case is very simple: capture audio output of Or, make it unequivocally clear that neither this parent specification nor any derivative specification is intended to capture audio output, so that alternative, non-standardized approaches, can be implemented. |
@guidou We must have posted at the same time.
Again, do not care what technical jargon is used re inout or output. The use case is capturing only audio output from the system, not capturing microphone input where audio might be playing in the background. Firefox does provide a means to capture |
@guidou Kindly run the code at #650 (comment) at Firefox, Nightly and Chromium. Select Is there any compelling reason to not expose What solution do you suggest to achieve the requirement of the use case if neither |
The script there is broken for Chromium because it assumes "audiooutput" devices can be captured by getUserMedia(), which is not the case in Chromium or any other browser.
I wouldn't be able to state any reason why Chromium does not support any particular feature it does not currently support. Feel free to file a feature request for it at crbug.com, although I'm not aware of any plans to support this use case in Chromium.
At the moment, you cannot accomplish that task in Chromium directly. Perhaps you can find some tool that allows you to expose the audio of output devices as if they were microphones, similar to how virtual webcams work. Note also that your use case is not mandated by this spec. Exposing the audio coming from an output device as an input device that can be captured via getUserMedia is a valid implementation choice (i.e., Firefox has implemented it on Linux), but it is not a requirement of this spec. What you have here is a feature request for Chromium to expose "Monitor of ..." devices the way Firefox does so that they can be used by getUserMedia(). I think that's a valid feature request for Chromium, since it does not support it, but it does not need any adjustment to the spec. Also, I don't think the spec should be changed to mandate that audio output devices must be exposed as if they were audio input devices. |
Your acknowledgment that the current language is capable of more than one interpretation re
At *nix the code works as expected. Have not used *indows in many years and have not used Mac at all. Therefore, it is reasonable to conclude that *indows and Mac also provide such functionality. Evidently not. They should, per this specification, is the perspective here.
Already did. You closed the issue https://bugs.chromium.org/p/chromium/issues/detail?id=1013881 as
Kindly re-open the above linked Chromium bug and answer this question: Why should that functionality not be available to users? Disregard the specification or not, the functionality is what matters, implementers do whatever they want anyway, irrespective of any specification, whether by omission, deliberate indifference to any spec, or by way of their arbitrary, undocumented "experiments". |
I have no idea how you can conclude that the current language allows interpreting "audiooutput" devices as audio input devices. No implementer does and this is the first time that I see this interpretation.
Where is it said that capture of audio output devices is specified?
I don't think any extra language is needed. I'm just saying that an extra clarification saying that audio input refers to devices marked as "audioinput" and video input refers to devices marked as "videoinput" would not be out of place, but it's pretty obvious that "audioinput" refers to audio input and "audiooutput" refers to audio output.
No, it doesn't since it expects getUserMedia to capture from devices marked as "audiooutput".
That was filed as a bug, which it is not. Therefore it cannot be "fixed".
My experience has been that implementers try to implement the spec and I would say they have succeeded for the most part since, although not perfect, there is large degree of interoperability across browsers. Things are sure to fail when you expect things that are not in the spec to work, such as having getUserMedia() capture from devices marked as "audiooutput" or having implementation specific details not covered by the spec to be the same in all implementations. |
Why is
in the specification? Why would a reader of the specification not reach the conclusion that it is possible to capture audio output per this specification where the plain language states that an
Stating that in the comment before you closed the issue https://bugs.chromium.org/p/chromium/issues/detail?id=1013881#c8
do you not have the ability to change the "Type" from "Bug" to "Feature request"? If not, how to make it clear that the issue is a feature request, not a bug? Yes, implementers do try to meet the spec. They might also do whatever they want, irrespective of any specification, without providing any documentation why https://bugs.chromium.org/p/chromium/issues/detail?id=1018580#c67
Have no issue filing the feature request, again, if you cannot change the "Type" to feature request on the issue you closed. Do not gather we will agree on interpretation of the specification re the meaning of "audiooutput" and your interpretation of "headphones" (output) meaning "Microphone" (input). You can resolve that inconsistency by updating the specification to make it clear that capturing "audiooutput" device really meaning capturing input device, not audio output from the system, in spiite of what actually occurs at Firefox. |
Because knowing the audio output devices is useful for some use cases.
The actual question I have is why would a reader conclude that it would be possible for getUserMedia() to capture from output devices when its definition mentions only input devices.
Yes, it is possible to change the type from Bug to Feature request, but the description of crbug.com/1013881 is very different from what you want. That bug contains repro steps for a bug consisting in the permission prompt being broken because "Monitor of (potentially multiple devices) should be listed at the prompt". My recommendation is that you file a new issue where you say that you would like Chromium to:
It would be a mistake to document internal implementation details that are subject to change at any time. Of course, since Chromium's source code is available, you are free to inspect it to learn about such details if you are interested.
I already explained why it would be better to file a new one.
I don't think anyone has interpreted "headphones" (output) to mean "Microphone" (input). It is pretty clear that headphones are a good example of output devices and as such they would be listed by enumerateDevices() as kind "audiooutput". getUserMedia() cannot capture from them since it captures from input devices, but you can use their groupId to select an input device to be used by getUserMedia(). I see no inconsistency about this in the spec. Finally, I don't think there is anything else to discuss in this issue since it is clear that what you want is Chromium to allow capturing audio output using getUserMedia() by exposing special "Monitor of..." input devices the way Firefox for Linux does. This does not require any change to the spec. |
Because
at least amends, if not repeals and substitutes for
by implication. |
There is no amendment or substitution at all, or anything that implies it. The MediaDeviceKind part that you mention is in Section 9, which defines enumerateDevices() (not getUserMedia). The definition of enumerateDevices() explicitly states that it allows querying input and output devices. getUserMedia() is defined in a different section that says it deals with input devices (not output). Results are returned as MediaStream/MediaStreamTrack. The only supported sources for those MediaStreamTracks are microphones or webcams (i.e., input devices only) and those are the only sources supported in this spec. Note that enumerateDevices does not deal at all with MediaStreamTracks or sources, while getUserMedia does not deal at all with MediaDeviceKind. There is no way to interpret from the spec text that one substitutes the other. In short, output devices are mentioned only for enumerateDevices() and are not mentioned anywhere as possible sources for MediaStreamTracks/getUserMedia. |
@guidou Did not write the code the implemented the methods. Assigned self the requirement to capture and record the output of How do you explain the output at Firefox for the code, unless it is possible to isolate capture of "speakers or headphones" per this specification? If isolated selection and capture of "speakers and headphones" is not intended whatsover, why are those terms in the specification? And why is that specific output observable? Am merely asking to take account of what is possible in the field, clarify that output is possible, whether the term of art used is "input" or "output", officially clarify that some combination of the methods defined in this specification does provides a means to capture exclusively "speakers or headphones", and provide the canonical procedure to do just that. Or, remove all references to "speakers or headphones" and "audiooutput | Represents an audio output device; for example a pair of headphones." from the specification, making it clear that this specification does not support that output. Why would that be the case? |
@guidou At a relatively recent version of Chromium was able to achieve the same output as Firefox, that is not recording microphone, tested by playing sounds into the microphone during the procedure described at guest271314/SpeechSynthesisRecorder#14 (comment). However, it should be possible to achieve that directly at the browser, which was the restriction for the requirement assigned to self: use only API's and methods shipped with the ostensibly FOSS browser, which given the state of the art, should be specified and unequivocally possible, without ambiguity, by default. |
As explained in #651 (comment) as well as in comments here, the model in this spec is source->sink, and the Firefox's "Monitor of" device is an In contrast,
If the desire is to get at the output of speechSynthesis, please take that up with the working group responsible for speechSynthesis directly. Solving that here would be a hack in my view. |
The use case is not limited to capturing Currently the Web Speech API does not have any algorithm language. A socket connection is made by the client browser to
The change being asked for is to merely make it clear that device listed as monitor of default audio device MAY be exposed by implementations, to at least recognize that option is available, even if implementers decide to not expose the monitor device. Since there is no hope for language to be specified that we can capture monitor of input audio device source directly at constraints passed to
and
to a
then fetching the file. Will dive into https://github.com/pettarin/espeakng.js-cdn/blob/master/js/demo.js to substitute |
Happy New Year! Created several workarounds, or what you might refer to as "a hack". It turns out that at Ubuntu is shipped
where with the appropriate flags set or including CORS header For a more elaborate solution that would up being a proof-of-concept for whatwg/html#3443 and WICG/file-system-access#97 created a pattern that provides a means to execute local arbitrary shell scripts and set the While testing the code it became obvious that there is no way to determine precisely when audio output of speech synthesis actually ends when the output mechanism is a
where if we test for silence https://stackoverflow.com/a/46781986 in order to determine when the expected audio output is complete we could prematurely call It also turns out that Chrome OS is already using In any event, your closure of this issue/feature request ironically lead to revisiting prior interest in executing arbitrary shell commands using the browser as a medium https://gist.github.com/guest271314/59406ad47a622d19b26f8a8c1e1bdfd5. |
@jan-ivar FWIW Initial implementation of proof-of-concept https://github.com/guest271314/native-messaging-espeak-ng. |
#211 added output device capability to
enumerateDevices()
while concerns were raised about the definition output device, or the omission thereof, in the specification, e.g.,#211 (comment)
Currently the term
"audiooutput"
occurs twice in the specification, where the language appears to be a brief description of the term, not explicitly a definition of the termA pair of headphones could not reasonably be construed as a microphone.
However, in spite of
"audiooutput"
and the brief description appearing the text of the specification, at least one implementer has interpretated the language to not explicitly mean capture of audio output is mandated by the current specification https://bugs.chromium.org/p/chromium/issues/detail?id=1013881#c9At least one concrete use case where the definition of
"audiooutput"
, the devices list fromenumerateDevices()
, and whether or not the specification mandates capture of audio output devices, where clarity or lack thereof is observable, consider the codewhich is intended to select only
"audiooutput"
, not"Microphone"
.Firefox 70 and Nightly 73 outputs the expected result, that is, capturing and recording only audio output, not input from microphone: Meaning only audio output is captured, not microphone input and audio output.
Chromium 80 does not output the expected result. Even where
"audiooutput"
is selected microphone is captured and recorded, not"audiooutput"
. That is a Chromium bug that is markedWontFix
(https://bugs.chromium.org/p/chromium/issues/detail?id=1013881) apparently due to lack of clarity in this specification relevant to the capability to select audio output - not only microphone.Contrary to the suggestion at #629 (comment)
getDisplayMedia()
after testing various approaches, does not provide any means to capture audio output from the system.Kindly make it clear in this specification that 1) capture of audio output is under the umbrella of this specification and provide an example of the canonical code pattern to achieve that use case per this specification; 2) the user can select
"Monitor of <audio_device>"
at UI prompt and directly in code by use ofapplyConstraints()
and directly atgetUserMedia(<constraints>)
; or 3) this specification is not intended to be construed to capture only audio output.The text was updated successfully, but these errors were encountered: