Bringing the Learnosity Audio Question To Devices
As is the case with anything built with Flash, though, its lack of open standard has implications for its adoption on newer and more mobile platforms, most of which have seen a demand and subsequent push for open web standards.
Rather than forever maintaining the DIY aspect of specialisation through the use of plugins and special configurations, this push has seen the adoption of amazingly practical audiovisual APIs for mobile web, namely, the slew of WebRTC APIs and Web Audio API.
The question: With these emerging technologies for mobile, can we bring the Learnosity Audio Question to devices?
A Short History of Exploration
At Learnosity, we like to keep up with emerging technology and adapt accordingly. As such, investigations on making the audio question more portable started as early as 2013, when the WebRTC and Web Audio APIs became available for Chrome for Android. One of our hack day teams tinkered with the technologies as they emerged, and, while noticable teething problems put the proverbial pin in things, the positive undertone was that there was definitely potential.
It wasn’t until early 2014 that stable and user friendly support for the Web Audio API came to Chrome for Android. MediaStream API support for mobile WebRTC had hit the ground running not long after our initial experiments, but now the latest inclusion of the AudioContext from the Web Audio API was the next runner in the relay.
What this meant for a more portable Audio Question was:
- The browser itself had access to the audio stream coming from an end users recording hardware thanks to the MediaStream API.
- We could read in that audio stream to an accessible audio context.
- Most importantly, we could access buffered chunks of that stream for the sake of persistence.
Recording – A Workflow
Working draft specifications and naming polyfills aside, the recording workflow itself is rather straightforward. The Web Audio API comes equipped with more tools than required to just get the job done. That being said, the bulk of the work came about while dealing with the ‘newness’ of having these tools available for mobile browsers – buffer sizes and memory management being of prime concern in a context of being as lightweight as possible.
The flow itself works as follows:
From the MediaStream API, we have access to a communications stream – the MediaStreamAudioSourceNode (dependent on the end user, this is typically from a microphone).
This is an AudioNode that acts as an audio source for the Web Audio API to work on.
We connect our audio source to an AnalyserNode. This allows us to have access to frequency and time domain analysis in real-time for the sake of levels monitoring.
Finally, the whole process chain is connected to AudioDestinationNode, which, is effectively the speakers of the end user.
(A pre-recording tone is supplied by an OscillatorNode, which outputs a computer generated sine wave, and we control the output drop-off with a GainNode – to prevent speakers from giving a hardware crackle due to a lone burst of sound).
Playback – A Workflow
The playback workflow provided more than its fair share of “what?” moments while putting it together. It needs to be understood that the Web Audio API wasn’t intended to be an out of the box media player – there’s other tools that fill that niche already, though, those other tools didn’t anticipate playing back raw audio fresh off the stream.
The Web Audio API was designed around the idea of video game audio engines and audio production applications, and as such, a lot of the tooling revolves around “one shot” playback – you don’t scrub or seek on an audio blip that lasts less than a second. Similarly, its purposing from the WebRTC specification sees it being hooked up to a live stream and playing that until it stops – not altogether different.
In our playback workflow, our AudioBufferSourceNode is created from the stream we’ve been capturing via our recording workflow. In essence, this is raw audio data that has turned up to the party wearing a “Hi, my name is WAV” name tag, and manages to mingle as such.
Through our familiar chain of a GainNode (for volume) and an AnalyserNode (for levels), we again reach the AudioDestinationNode (hopefully speakers).
However, due to the one-shot nature of the AudioBufferSourceNode, any pause or seeking operation done on the audio will see its destruction, and a new Node taking its place as if nothing has happened. Hilariously, the original has no idea at what point it stopped, it just knows that it did, and as such, playback timing needs to be an external operation.
Conclusion – a solution?
The current incarnation of our efforts is the WebRTC-audio question. Currently in beta, and functioning admirably on the latest versions of Chrome and Firefox for Android.
As the MediaSteam API specification is still in working draft, and the Web Audio API specification is still subject to change (for the better, no doubt), this beta flag is unlikely to be lifted in the near future.
Thankfully, the ever accomplishing API specifications have seen to it that we’ll be getting Audio Workers at some point along our journey – let’s hope it’s not too far off.