The crew at Microsoft is forging ahead with their “CU-RTC-Web” specification as a counter proposal to the new WebRTC / RTCWEB proposed standard in the W3C and IETF. My colleague Robin Raymond and I certainly align with Microsoft on some issues, more specifically around SDP but it would have been good if this work took place inside the IETF.
I really can’t see Microsoft changing their tune anytime soon, which means that Enterprise web application developers will likely need to support both WebRTC and CU-RTC-Web if they are to be a plugin-less solution enabling RTC across all browsers. Not ideal.
I would like to believe that I’m not hopelessly confused and outdated with regards to what is going on with RTCWEB. Last I checked my head is not stuck in the sand nor have I been buried under a rock for the last several years. I recently watched the February 7th, 2013 netcast talking about the data channel and questions about how it relates the SDP and the SDP ‘application m-line’.
For the love of all that is human, why is SDP part of RTCWEB efforts at all?
To be clear, I’m talking about a few specific aspects of SDP: the format, the exchange of SDP between browsers and media negotiations via the offer/answer model (and all that it implies regarding the negotiation of media streams). Come to think of it, all that makes SDP, well… SDP. I know what some will say: We need to exchange some kind of blob-like information between browsers so they can talk, that’s why SDP is used. And I would respond “of course”! Beyond arguing how arcane SDP is as a format, RTCWEB was specifically designed not to do signaling stuff at all. That part was purposefully (and wisely IMHO) left out of the specification so that the future was wide open for whatever it might hold in creative solutions.
What we really need in order to do future stuff in the browsers (yet remain compatible with the past) is a good API for a lower level media engine to create, destroy, control and manage media streams. That’s it. Write an engine that doesn’t take SDP, but manages much lower level streams and allow the programmer to dictate how they are plumbed together, which are active and inactive, and give events for the streams as they progress.
That’s the API I want. There’s no SDP offer/answer needed. There’s no shortage of really smart people out there who would know how to produce a great API proposal.
Such an API would lower the bar of browsers being able to interoperate at the media level. This removes the concerns about SDP compatibility issues (including the untold extensions that will happen to handle more powerful features and all that it implies and complex behaviours associated with SDP offer/answer, including rollback and ‘m=’ stability). If the browsers support RTP, ICE and codecs, and can stream then they are pretty much compatible even if individually their API sets aren’t up to par to their counterparts.
This also solves an issue regarding the data channel. There is no need for the data channel to be tied to an offer/answer exchange in the media at all. They are separate things entirely (as well they should be). For example, in Open Peer’s case the data channel gets formed in advanced to maintain our document subscription model between peers and media channels are open and closed as required.
Those who still want to do full on SDP can do SDP. Those who want stateless SDP-like exchanges can do exactly that. Those who want to negotiate media once and leave the streams alone can do so.
As an example, let’s examine Open Peer’s use case. Open Peer does not have, nor does it need or want a stateful offer/answer model. It also doesn’t support or require media renegotiation. Open Peer offers the ports and codecs (including offering to multiple parties the same port sets) and establishes the desired media. Call and media state is completely separated out. From then if alternative media is needed, a new media dialog is created to replace the existing one and then a ‘quick swap’ happens and the media streams are rewired appropriately to the correct inputs and outputs without renegotiation, at least this is not a renegotiation in the offer/answer sense of the meaning. Further, Open Peer allows either side to change its media without waiting for the other party’s answer.
Media is complicated for good reason as there are many use cases. The entire IETF/W3C discussion around video constraints illustrates some of the complexities and competing desires for just one single media type. If we tie ourselves to SDP we are limiting ourselves big time, and some of the cool future stuff will be horribly hampered by it.
Let’s face it, browsers are moving toward becoming sandboxed operating systems. So why do we not give an appropriate API low level as it deserves that allows for flexible futuristic application writing? Complicated and powerful HTML 5 APIs are being well received, so why can’t the same be true for lower level RTCWEB APIs?
I know Microsoft has argued the API is too high level and they’ve even gone to the trouble of submitting their own specification with CU-RTC Web and splintering and fragmenting efforts. I don’t presume to represent this stance regarding SDP, nor will I go into the merits of their offering, but I think they are right in principle. And for saying so, I’ve got my rotten tomato and egg shield in position.
From an interoperability perspective, even if WebRTC & CU-RTC-Web end up competing there will be JS libraries out there that will support both. So it seems SDP is not the big issue here, but there is an elephant in the room, the media stack
Differing media stacks (specifically codecs) could cause big problems. As an example; IE & MS endpoints may support various Microsoft codecs versus WebRTC compliant endpoints (Chrome, FireFox, Opera, Mobile Apps etc.) which would presumably support the RTCWEB MTI (mandatory to implement) Video and Audio codecs.
That is if WebRTC has such codecs. We still don’t have a MTI Video codec yet! <- This has been one the most contentious issues in the IETF RTCWEB working group to date.
If we fail to deliver a MTI video codec in WebRTC what’s the likelihood of opposing browser vendors (implementing opposing standards) supporting the same codecs? Not very good odds I would expect. In which case cross-browser communication (media: audio, video) would fail.
Although, we might get lucky and have all the browser vendors select at least 1 like audio and video codec on their own accord. Ya, right.
If you don’t want to leave it entirely up to chance, get involved! Joining the IETF is free and open standards need your support if they are to succeed. The next IETF meeting could be be very telling wrt a MTI video codec: http://www.ietf.org/meeting/86/index.html