RFC 3263 — Locating SIP Servers specifies how SIP clients use DNS procedures to resolve a SIP URI into the address, port and transport protocol of the next server to contact. In addition, these DNS procedures can result in multiple addresses that clients can try sequentially if earlier addresses failed.
The SIP Servlet RA now supports RFC 3263 to provide failover and load balancing of outgoing requests. Previously the SIP Servlet RA used RFC 3263 only for determining the first address to contact; it did not automatically try backup addresses if the first address failed. Now the SIP Servlet RA has been updated to automatically failover to backup addresses as well, with no intervention required from the application.
Introduction to the RFC 3263 process
When a SIP client sends a request, it must select either the first Route header’s URI, if present, or the Request-URI.
This URI determines where the request is sent to (the next-hop address).
This URI might only contain a domain name, such as sip:bob@example.com
.
RFC 3263 DNS procedures are required to convert the URI into the address, port and transport protocol of an actual SIP server (or servers).
At a high level there are three parts to the RFC 3263 DNS process. Some of these may be skipped depending on what information is already given in the URI, such as transport, IP address or port numbers. Here is a simplified description of the process that applies to any SIP client using RFC 3263.
-
Determine the transport protocol. If not already specified in the URI, the client will do a DNS NAPTR lookup on the domain. This may return some NAPTR records which specify in order of preference the transport protocols that shall be used. The NAPTR records will contain SRV addresses for each supported transport protocol.
-
Next, determine the port. This can be found by looking up the SRV address in the NAPTR record. Or, if there were no NAPTR records, the SIP client will try the default SRV addresses for its preferred transport protocols, e.g. "_sip._tcp.example.com". The SRV query may return one or more SRV records. Each record contains the hostname and port of a SIP server. Multiple SRV records are sorted according to their priority and weight, and ordered randomly as per RFC 2782. This means that the results will be ordered slightly differently every time, providing a form of load balancing.
-
Finally the hostnames provided in SRV records must be looked up to obtain their IP addresses. Sometimes this information is already included in the SRV response. If there were no SRV records the SIP client will default to looking up the IP address of the hostname in the URI.
At the end of this process the SIP client has a list of (address, port, transport) tuples to try. If the list is empty then the request cannot be routed and will fail. Otherwise, the client picks the first address in the list and starts a client transaction. The next backup address will be tried if the transaction fails for any of these reasons:
-
the server responds with 503 (Service Unavailable)
-
the transaction times out
-
the transaction fails with a transport error
When such a failure occurs, the client selects the next address and tries again with a new client transaction. If all addresses failed then the client must fail the request.
SIP Servlet RA usage of RFC 3263
The SIP Servlet RA closely follows the RFC 3263 procedures but with some adaptations described below.
Use of client transactions
RFC 3263 states that a new client transaction is used for each backup address.
The SIP Servlet RA does this, but the client transactions are hidden from the application.
The application just sends a single SipServletRequest
on a SipSession
, as usual.
If a failure occurs and there are backup addresses available, the RA automatically creates a new client transaction and sends the request to the backup address.
The RA takes care of routing the successful responses up to the application’s SipSession
.
Each new client transaction created for contacting backup servers will have a new Via branch ID (as per the RFC), but these are derived from the original branch ID with a different suffix for each new transaction. For example if the original branch was "z9hG4bK776asdhds", any subsequent transactions will have branches "z9hG4bK776asdhds%1", "z9hG4bK776asdhds%2" so on. In this way the branches for each transaction are still unique in the RA and the network, but can be correlated when looking at logs or packet captures. The application will only see the original branch ID in responses.
The first final response to be received will end the transaction and no more backup addresses will be tried. If all of the available addresses failed, the application will see the last error response received from a server (a 503 or 408 response), or the RA will generate a 503 response and pass it up to the application.
Failover timer
Failover to a backup server is triggered by transaction timeouts, transport errors or 503 responses. If a server has failed completely, the resulting transport error (such as ICMP Port Unreachable) may not be directly visible to the RA, especially when using UDP. Or there may be no ICMP rejects at all in the event of a network partition. This means failover will not occur until the transaction timeout (Timers B or F in RFC 3261) occurs, which will usually be 32 seconds!
For many applications it is desirable to try and detect failures a bit faster than this. The SIP Servlet RA defines an additional failover timer, which can expire before the default transaction timeout to trigger a failover faster. The default value of this timer is 10 seconds.
The failover timer is only started when there are multiple addresses to try. If there is just a single address, the normal transaction timeout behaviour applies. The timer is stopped as soon as any response (including 100 Trying) is received, as this indicates that the next-hop server is functioning. If this timer expires and no responses were received, the RA moves on to the next address.
Server blacklisting and recovery
By itself, DNS does not tell you if a server is available. When using the DNS process above, the same set of servers will always be returned, regardless of whether they are currently available or not. To avoid contacting servers that are likely to be down, the SIP Servlet RA maintains a "blacklist".
Whenever a server failure is detected, either by timeout, transport error or 503 response, the RA adds the failed address to a blacklist so that the address is not tried again for some period of time.
This avoids the RA needlessly contacting servers that are most likely going to be down. By default, servers are blacklisted for 5 minutes. After that time the RA will try using the address again.
Servers that respond with a 503 response and a Retry-After
header will be blacklisted for the duration specified in the header.
Configuring the SIP Servlet RA for RFC 3263
See the RFC3263 configuration properties in Configuration Properties.