Feed on
Posts
Comments

In a couple of hours there will be a VUC session on this topic. So I thought it will be useful to record some of my observations and outstanding questions.

  1. A user or administration of the local network must have a way to designate the STUN and TURN servers that override the ones specified by the application. STUN is analogous to DNS server and just like we are at liberty to specify the DNS servers, we must be able to specify the STUN server. Depending on the security considerations, a network may be obligated to record all conversations. To facilitate that, a network may deploy a TURN server and may require all RTC traffic to flow through this server.This can be simple done if the browser were to tacitly utilize its own TURN server and assign the highest priority to the corresponding ICE candidate. This is analogous to using SOCK proxy for HTTP flow.
  2. Both the users and application providers should recognize that external STUN and TURN providers have access to session metadata.
  3. TURN adds overhead and this is further added when ReTURNs are used. TURN needs this additional overhead to multiplex multiple streams between a TURN client and server. Most of the WebRTC use cases will involve a single stream. I think it is a good tradeoff to consume the occasional additional ports at the server, rather than consuming additional bandwidth for all the flows. So, it might be worthwhile to use a relay server rather than a full fledged TURN server.
  4. Some have expressed concern in sharing local address with other clients. Given that Trickle ICE is part of WebRTC, a modification to listing ICE candidates should be considered. Browsers should not include local addresses in the initial candidate set. Instead they should be added if and only if the peer’s server-reflexive or peer-reflexive address matches its own and te connectivity test passes. Of course, we have to recognize that the call setup time may increase slightly.
  5. TURN is required only when both the end-points are behind symmetric NATs. If it is known a priori that this will not be the case (as when the session is always to app’s own device/server), then we can dispense with relay addresses as ICE candidates. If further we know that app’s own device/server will have public Internet presence, then even STUN can be eliminated, since that device/server can use peer-reflexive addr it learns as part of Trickle ICE.
  6. As part of connectivity test, the two end-points must perform authentication of the other end before meaningful information is exchanged.

In a post that prompted me to write this, Tsahi discusses different alternative signaling protocols one can use in a WebRTC-enabled app. In this post, I approach the issue from a different angle and I hope this sheds additional light and helps you to reach a choice appropriate for you.

Before we dig deep, we have to recognize that we have to decide on two independent matters: 1) how will the signaling messages be carried and 2) what will be the signaling protocol. There are very many variables that will affect the optimal answer for your scenario. So it is best that we discuss them in general and let you decide on a case by case basis.

First let us consider the transport mechanism.

  1. Pure HTTP: Since the app will be accessed from a browser, an easy choice would be to use HTTP as the transport. It works great if the browser is initiating a signaling procedure and the server responds.
  2. HTTP w Long Polling/Comet: But there are times, when the server needs to initiate asynchronously. Some examples are when the server wants to notify one user of another’s action like placing mic or speaker on mute. Or the server would like to notify of an incoming call request. Since the server can autonomously initiate an HTTP session an alternate will be to use long polling or Comet. This may increase the load on the server due to excessive polling or may introduce latency and its undesirable effect on UX.
  3. HTTP w Push Notification: Alternatively the server can use Push Notification offered by both Chrome and Firefox to push a notification and upon receiving such a notification, the browser can initiate an HTTP session to continue the procedure. Of course this addresses the server load, but does not address the latency issue, especially for “in-session” procedures. Worse, the latency is affected by a third party service.
  4. Websocket: This where use of Webscoket has its advantages. Since Websocket starts as an HTTP session which is then converted to a persistent TCP session. Almost all browsers (most recent versions) support Websocket and there are server implementations that are very efficient. So it addresses both the issues.
  5. Websocket w Push Notification: If maintaining a Websocket connection during an idle period (so as to inform of an incoming session request), then one can use Push Notification during idle periods and then use Websocket only during active sessions.
  6. Data Channel w X: Final choice is for the server not to be involved during an active session, but allow the browsers to handle the signaling procedures directly between themselves via a WebRTC Data Channel. But this approach does not address how to handle notification during idle periods.

As you can see there are many choices with each having its own trade-offs. But knowing the trade-offs, you can decide the appropriate transport for your use case.

Deciding which protocol to use is either “no-brainer” or “not so fast”. If the paramount objective is to work with already deployed system and WebRTC app is just another access mechanism, then there is nothing more to consider. It is optimal just to use the signaling procedure used by the deployed system and that is that. Otherwise, it is better to start from scratch and ask questions differently. From the time of Q.931 in ISDN Basic Access up to and including SIP, the standards bodies have focused on defining the protocol so as to ensure interoperability between two autonomous systems. Since the end-points will be of different capabilities and present different user experiences, the best a standard can do is to design a protocol that drives basic user interface. Thus for example, when the far-end places a call on hold, the near-end is not notified. It is not clear how to abstract the notification so all variation in the UI can be handled.

Next, let me quickly dismiss a faux use case, but one that is widely considered. It is know as “trapezoidal connection”. In this connection, the two end points are each connected to its own WebRTC app and the two apps are federating between themselves. The fact that the two end-points are using WebRTC as access is incidental; the real crux is that the two apps are federating and they have agreed on a protocol for this. So what the apps will select for protocol belongs to the “no-brainer” category. The apps will select a protocol that is optimal for the agreed upon federation protocol.

So the real interesting use case is where the end-points are directly connected to the app server, the so called “triangular connection”. Since both the end-points are directly connected to the app server and the server can dynamically download the signaling procedures via Javascript, it is in a position to offer a rich user experience by dynamically driving UI elements. The app designer can freely devised the needed signaling procedures – conforming to a standards is not critical. A good analogy is to compare the choice to paint by number and free-form painting. At first glance, paint by number looks straight forward; but in fact it is tedious, no room for error and not very expressive. On the other hand, free-form painting, if you are good at it, is fluid, very expressive and gives lots of freedom. If the choice were only free-form painting, then I will have only blank canvas; with paint by numbers, there is a hope that I will have something that looks like a painting. So I say to each, his own.

Recently, Carl Ford was musing about potential ideas for a WebRTC Hackathon. One idea he had was exploring different UI designs associated with “Video on Hold”. This post is a summary of our design thoughts decisions we made for a WebRTC application that is part of EnThinnai.

He felt that the design used in phone systems don’t work well for smartphones. So probably we should have different approach for video calls. As an example, he was wondering how should the user be notified when she has gone to different browser tab when the held video call is being retrieved by the other party.

In a followup post he elaborates his point. He suggests that we may imitate the idea used in 1A2 Key Systems phones. To see how far we can carry its design we need to go into a bit more detail.

These phones had some white buttons, with each one controlling a line it has access to and at most one button can be engaged. There was a red button that can place the currently engaged line on hold. All these buttons can be lit and also flash to signify the status of the line. For example, the quick flashing light will signify that there is an incoming call; a slow flashing light will signify that the line has been put on hold and a steady light will suggest an active call. Subsequently, Avaya carried over this design idea to their digital sets as well. This concept of “call appearances” and “active call appearance” is natural and very familiar in computer systems using windows. It is direct to observe that capp appearances are nothing more that open windows and active call appearance is active window. When the user selects a window to be active, the OS tacitly places other windows on hold.

But the analogy goes only so far. In a computer, even if a window is not active, activities can go on an inactive window. For example, the user may be playing a You Tube video in an inactive window. Also we should note here that 1A2 Key System phone indicates whether the local user has placed the call on hold or not; it does not know whether the far-end user has placed the call on hold or whether he is retrieving the call, which is the use case Carl wants to explore.

There is one other fundamental difference between 1A2 Key System and the environment WebRTC app will find itself. The phone can safely decide that when a call appearance becomes active, the call that was active must be placed on hold. But that may not be appropriate in the case of WebRTC. For example, the user may want to continue the call while viewing and interacting with the contents of another window. Or the user may have multiple WebRTC session going at the same time in an attempt to emulate a bridged call. So the only safe approach is to let the user explicitly select whether a video call must be placed on hold or not.

If we dig a bit deeper, we will question the basic need to place a call on hold in the first place. In PSTN systems, a call must be placed on hold if the user wants to attend another call because the access line can carry only one call at a time. But that is not the case in the case of WebRTC. The user can equivalently decide to turn off the camera or the display or both instead of placing the whole call on hold.

Recently, call centers have responded to frustrations expressed by callers due to excessive hold times, by introducing a feature called “callbacks” or “virtual queuing”. A webRTC app can offer a similar feature in an elegant manner by making the app a multi-modal one with the text chat session to periodically update the status and use it as a link to provide audio and video cues when an agent becomes available.

These thoughts are captured in the current user interface design used in EnThinnai:

Screenshot of the chat window, just before start of a video call

1. Screenshot of the chat window, just before start of a video call


2. Screenshot during an active video call


Screenshot of a “held” or “virtually queued” call

3. Screenshot of a “held” or “virtually queued” call


Far-end sends a text message and creates audio sound by pressing the “b” button

4. Far-end sends a text message and creates audio sound by pressing the “b” button

Inasmuch the main utility of a formal living room is to entertain visiting guests, a WebRTC allows guests to initiate a communication session with the subscriber of the app that utilizes WebRTC.

Many go to enormous lengths to furnish and decorate a living space normally called Living Room. Notwithstanding the expense involved and the name, it is used mostly when guests are visiting. When we are entertaining guests and they are using the amenities in that room, there is no question of whether the guests have similar room and similar amenities in their houses. The only requirement is that they visit you and that you are ready to host them.

So is the case with WebRTC apps. The main reason for the app and for you to sign up for one is so people can initiate communication session with you. The only requirements are that your guests have a compatible browser and that you are willing to communicate with them.

Just because you have a lavish formal Living room does not mean that when you visit one of your friends you will experience similar luxury. Similarly, subscribing to a WebRTC app may not imply that you can initiate a communication session to one of your friends. In this respect, WebRTC apps for for receiving only. This is critical. Anyone suggesting differently is misleading you.

The image is courtesy of AvaLiving.com

How do I get an OpenID?

It’s easy to get an OpenID; in fact, you probably already have one. If you have a Google account, you can use your profile id number as your login (which can be found in your profiles.google.com url). Similarly, if you have a Yahoo account, you can use your username as your OpenID login.

Other sources for OpenIDs include 3rd party providers like Verisign Labs. If you use Wordpress to host a blog, you can also install a plug-in to be your own OpenID provider.

If you have an account with one of the above providers, then you can derive your OpenID using the following rules:

Google:

profiles.google.com/<your profile id number>

Yahoo:

me.yahoo.com/<your username> or a customized string

Wordpress:

URL of the home page of your blog at wordpress.com

Introducing ffonio.in

ffonio.in is a web application that people can use to have IM, voice and video chats with their friends and family. Users can run this app on their own devices such as their WiFi router, Raspberry Pi or a cloud instance like Digital Ocean Droplet. As long as friends and family have an OpenID and use a browser that supports WebRTC, they do not have to host this application themselves.

The following are highlighted features of ffonio.in:

  1. Use of OpenID for authentication. (Registered users can assign an unverified, if unsecure, “OpenID” to unregistered users in an ad-hoc fashion.)
  2. “Availability status”, in lieu of Presence. Users can present different status to different persons.
  3. Only users who have been previously authorized can initiate an IM, voice or video chat. The authorization can be changed at any time.
  4. Seamlessly move to a voice or video chat from an IM chat session.
  5. Ability for either user to mute sound or turn off camera.
  6. Ability to buzz the other user to catch their attention.
  7. Once the IM chat session has ended, the transcript is made available to both the users. (We plan to also make recordings of voice and video chats available in the near future.)
  8. The app has a built-in simple relay server (two-sided NAT) to assist in NAT Traversal, replacing the functions of a TURN server.
  9. Generate a custom reach-URL (which users can share in their email or business card) or an embed code (which users can add to their websites.)

Although our primary objective was to help individuals run their own IM, voice and video chats, this system can also be used on much larger scale, such as within an enterprise. Companies can use this product for both internal communication between employees and external communication with outside partners and customers. We plan on pursuing this direction in the near future by integrating this application with CRM systems like Salesforce and Sugar CRM.

As early as 2008, EnThinnai supported the ability to conduct IM and voice chats. At that time a Java applet was dynamically downloaded to the browser. Then the browser maintained a two-way signaling channel with the server that allowed asynchronous notification from the server to the browser – a proto Websocket you may say. The applet also contained Speex codec which was used to provide real-time speech capability – fully anticipating WebRTC.

And we were in a bind to extend this feature further. There were no freely available video codec to extend the feature to support video communication. Leading mobile devices did not support Java. Users were disabling Java due to security concerns. For us, it is a defining use case for an unregistered user to initiate a communication session with a registered user (Guest access). This means the capability afforded by the Java applet must be universally available. This is precisely the objective of WebRTC.

Now that WebRTC has reached a stable stage, we have replaced Java applet with WebRTC. So users can use any WebRTC-enabled browser to communicate under EnThinnai.

Skype is celebrating its 10th Anniversary. On this occasion, I thought it will be interesting to revisit my early comments. They were published as a guest blog post in Gigaom on March 27, 2004. But due to some server malfunction, they were lost. I am republishing that post here. I am proud to say that many of my opinions have stood the test of time, including the claim that Skype will be forced to bring all the Supernodes in house.

It is very likely that you have heard about Skype; it is even probable that you are using Skype. (Fair disclosure: I am not a subscriber of Skype.) Michael Powell, FCC Chairman suggests that the telephony market place has changed dramatically since the arrival of Skype. Is Skype really so special compared to other VoIP service providers? Of course Skype thinks so. They say that unlike other VoIP service providers, Skype has a very intuitive user interface that does not require technical skills, but is easy to configure. They also suggest that unlike other VoIP service providers, they solve NAT Traversal problem without the use of Proxies with the resultant better voice quality. Of course the clincher is that Skype is P2P and so is infinitely scalable and resilient.

Before I analyze these points, let me describe the workings of Skype based on my understanding and what is available in public.

  • There is a Global Index Server where all clients login and authenticate themselves and exchange security key information.

  • Based on this exchange, the client will be assigned a Supernode, who will maintain the presence information; Supernodes also communicate with other Supernodes while locating other end-points.

  • The clients and Supernodes use the well documented UDP Hole Punching algorithm to solve the NAT Traversal problem.

Upon a little reflection, we can see that functionally this architecture is equivalent to other VoIP architectures like SIP. Global Index Server is equivalent to the Registrar; the function described in item 2 is equivalent to Location Server and the function described in item 3 is Session Border Controller. What is more, many SBC vendors solve NAT Traversal problems using similar optimization techniques with the same rate of success. Consequently, the clients in other environments also do not require complicated configuration setup.

Skype users have commented positively about its voice quality. Global IP Sound indicates that Skype uses its codec, in particular iLBC. GIPS also supplies their codec to other VoIP clients. X-ten also uses iLBC codec. So one can get Skype like quality in other systems as well.

The Global Index Server is a single point of failure. If it fails, clients can not login. I suppose new Supernodes can not be drafted either. In my opinion, this is not a serious failure, because existing system can continue to function and a replacement GIS can be easily brought online.

But my concern regarding Supernode is more substantial. It is suggested that since the Supernodes are nothing more than other Skype clients, Skype is infinitely scalable. I submit that this may not be the case. To begin with, a client is eligible to be a Supernode only if it has enough processing power and bandwidth capacity to perform the functions of a Supernode. Additionally, it is a requirement that they be present on the public Internet or behind a “transparent” NAT and a “permissive” Firewall. I am betting that such clients will be scarce in relation to the total number of clients (a single Supernode serves around 100 clients).

If Supernodes need to have special capabilities, then it is likely that they will demand some form of compensation. It is not clear whether Skype is setup for this. Additionally, it is not clear how the individual clients are protected from a misbehaving Supernode. It is true that the media is encoded. But the Supernode is involved in the signaling phase. Since the Supernode has network connectivity to the client, it is tempting to use it for extra and unwanted commercial activity. So Skype may deploy their own Supernodes, eliminating one more difference between it and other VoIP providers.

Some have expressed reservation because Skype is proprietary. There have been previous instances where proprietary consumer items have found wide adoption without incurring huge collective cost. VCR is one of the examples that come to mind. But in this case there are some differences:

  • Alternatives, based on standards are available

  • Skype uses mostly well-known and open technologies; only the protocol semantics is proprietary

  • Even though Skype (for that matter VoIP) is naturally a “product” and not a “service”, Skype views it as service. For example, they do not allow an enterprise to use their own GIS, instead of the global one, even if communication will be restricted to internal use alone.

  • As I am told, there is no way to directly address another client, even if the IP address is known. Windows Messenger from Microsoft has the same limitation, whereas NetMeeting allowed direct communication.

In this respect also, they are just like other VoIP providers. It is disheartening to see that even those whose middle name should be P2P, think like this. I am reminded of an ad that appeared in a New York based Indian newspaper in 1982. The ad was taken by an Indian Restaurant that offered two free alcoholic drinks in exchange for ticket stub for the movie Gandhi. In summary, Skype shares the same functional architecture with other VoIP providers. It shares the same business plan and outlook. But they have artificially cloaked it in a proprietary system. I guess this is their “economic moat” to use a Buffett term. From a consumer point of view, the beauty of VoIP is that there is no moat and current technology is sufficient to realize direct IP Communications that does not require any intermediation.

Aswath Rao has 20 years of experience in the telecommunications field, having worked for leading R&D firms. He has worked on ISDN,Frame Relay, BISDN, wireless and satellite communications. For the past 5 years he has been working on VoIP related issues. Long before intelligence at the end became acceptable, he advocated “functional terminals” in ISDN. His proposal for Inter Connect Function has been incorporated in the TIPHON architecture and currently it is known as Session Border Controller. He has developed ways to offer PSTNsubscribers many of the features available to VoIP subscribers.  He maintains a blog. He can be reached at aswath@whencevoip.com

It has been reported that Telefonica will shut down Tu Me and redirect its resources in shoring up another service, To Go. People have theorized that compared to competing services from OTTs, To Me has anemic traction with uncertain revenue potential. On the other hand, the reasoning continues, To Go has solid revenue opportunities, since it accrues billable minutes/SMS from existing customer base. Two years back, I gave a talk at Telecom 2018 Workshop, in which I argued that telcos will have difficult time directly competing with OTTs and suggested an alternate approach. In this post, I revisit those points in the context of Telefonica’s decision.

We have to recognize that Telcos and OTTs are fundamentally different. OTTs are funded by risk loving VCs. They are designed to take big risks with a quick entry and just as quick an exit. They go for world domination and design their services for viral adoption.

Telcos are a study in contrast. They are established enterprises beholden to shareholders who value steady return and adverse to big risks. Furthermore, they need to be worried about cross elasticity of new services with old ones. They also have strong presence in geographically restricted areas; usually federate with other telcos in out-of-regions. But such federation is not easy to come by since potential partners may have different priorities in introducing new and speculative services. So on Day 1, a new service will have low Network effect.

It is clear that To Me experience exactly these issues and predictably they had low traction. Though I do not have verified data, it is a safe bet that they were more successful in their local regions more so than out-of-regions. Since they are marketing To Go to their existing customers, they will have better luck with that service. It allows their subscribers to access the services using multiple means of access. This way, they have become an “OTT” for their subscribers. But it is only half the solution.

If we take the perspective of friends of Telefonica’s subscribers, we will notice the missing piece. They also use multiple technologies to access the network, but in the current scheme, it all have to come via PSTN with attendant restrictive set of features due to federation agreements with their carriers. This need not be the case anymore. Supposing Telefonica allows non-subscribers to reach its network using WebRTC technology, then its customers can use new services and features with no loss of Network effect.

This is the fundamental benefit of WebRTC from the perspective of the carriers: it frees them to introduce new services and features to their subscribers without loss of network effect and without relying on federating and coordinating with other carriers.

In a recent post Chris Kranky on the need “to move on” and the need for expediency in wrapping up the first iteration of the API. Personally I would have benefited if the first iteration had been a low level spec. For I could have easily ported a custom Java applet. But given the passage of time, it is more important that there is an agreed standard. But this point is not the objective of this post. Instead I would like to focus on another of his points:

[WebRTC] wasn’t designed to be federated (namely that 2 WebRTC applications aren’t in fact supposed to talk to each other.

He makes this observation to explain the motivation for seeking low level control. My quibble is not with this explanation, but I want to take this sentence in isolation, interpret it literally and discuss it. (It is not fair to Chris, but I am just using his sentence as a prop. So it should be OK with him.)

In my interpretation, if WebRTC is not designed to be federated, then there is some deficiency and need to be addressed. If not immediately, but at some future time. But with WebRTC construct there is no need for federation. Let me explain.

Following are four main reasons why we need federation and how WebRTC handles them without requiring federation:

  1. Reachability information is not widely held, except for some selected nodes in both the systems.
    • Since WebRC address is HTTP URI, the originator’s service provider or system is not needed. The originator can directly access the destination’s system. Indeed, it is not required that the originator be part of any service provider or system.
  2. Communication between the systems may need mediation to handle incompatibilities.
    • Since the app server dynamically downloads the signaling procedures, there are no incomptibility issues on the signaling plane. I further assume that MTI codecs remove incompatibility between the browsers. In any event, any such incompatibility can be solved w/o the two systems federating.
  3. Identification and authentication of external nodes need to be mediated.
    • Since the whole construct is built on HTTP, any of the third party verification systems can be used to identify and authenticate the end-points. In this respect there is a need for federation, which is much less stringent requirement and can be easily waived by the end points depending on the use case.
  4. Since the external systems may not be trustworthy, the local system need to protect its users.
    • WebRTC has builtin security systems to protect the end nodes from malware apps. Specifically, the browser ensures that a rogue app can not assume control of the end node.

In my opinion the fact that WebRTC does away with federation is one of the important benefits and why it is going to disrupt communications industry.

Older Posts »

read more today usa gambling games for money online casino real slot machine games online casino payment options online casinos that accept mastercard deposits from us online roulette for USA players first time deposit bonus casino online online slots for mac download casino casinos Classic casino games they offer online casinos that accept usa players bonus code for us casinos online blackjack real money newest casino 2013