Wednesday 20 July 2016

Singleton HttpClient? Beware of this serious behaviour and how to fix it

If you are consuming a Web API in your server-side code (or .NET client-side app), you are very likely to be using an HttpClient.

HttpClient is a very nice and clean implementation that came as part of Web API and replaced its clunky predecessor WebClient (although only in its HTTP functionality, WebClient can do more than just HTTP).

HttpClient is usually meant to be used with more than just a single request. It conveniently allows for default headers to be set and applied to all requests. Also you can plug in a CookieContainer to allow for all sessions.

Now, ironically it also implements IDisposable suggesting a short-lived lifetime and disposing it as soon as you are done with. This lead to several discussions in the community (here from Microsoft Patterns and Practices, Darrel Miller in here and a few references in StackOverflow here) to discuss whether it can be used with longer lifetime and more importantly whether it needs disposal.

Singleton HttpClient matters, especially when it comes to the performance [Dragan Brankovich - Flickr]

HttpClient implements IDisposable only indirectly through HttpMessageHandler and only as a result of in-case not an immediate need - I am not aware of an implementation of HttpMessageHandler that holds unmanaged resources (the mere reason for implementing IDisposable).

In short, the community agreed that it was 100% safe, not only not disposing the HttpClient, but also to use it as Singleton. The main concern was thread safety when making concurrent HTTP calls - and even official documentations said there is no risk doing that.

But it turns out there is a serious issue: DNS changes are NOT honoured and HttpClient (through HttpClientHandler) hogs the connections until socket is closed. Indefinitely. So when does DNS change occur? Everytime you do blue-green deployment (in Azure cloud services when you deploy to staging slot and then swap production/staging slots). Everytime you change settings in your Azure Traffic Manager. Failover scenarios. Internally in a myriad of PaaS offerings.

And this has been going on for more than 2 years without being reported... makes me wonder what kind of applications we build with .NET?

Now if the reason for DNS change is failover, your connection would have been faulted anyway so this time connection would open against the new server. But if this were the blue-black deployment, you swap the staging and production and your calls would still go to the staging environment - a behaviour we had seen but had fixed it by bouncing the dependent servers thinking possibly this was an Azure oddity. What a fool was I - it was there in the code! Whose code? Well debateable...

Analysis

All of this goes back to the implementation in HttpClientHandler that uses HttpWebRequest to make connections none of which code is open sourced. But obviously using Jetbrain’s dotPeek we can look into the decompiled code and see that HttpClientHandler creates a connection group (named with its hashcode) and does not close the connections in the group until getting disposed. This basically means the DNS check never happens as long as a connection is open. This is really terrifying...
protected override void Dispose(bool disposing)
{
    if (disposing && !this.disposed)
    {
        this.disposed = true;
        ServicePointManager.CloseConnectionGroups(this.connectionGroupName);
    }
    base.Dispose(disposing);
}
As you can see, ServicePoint class plays an important role here: controlling number of concurrent connects to a ‘service point/endpoint’ as well as keep-alive behaviours.

Solution

A naive solution would be to dispose the HttpClient (hence the HttpClientHandler) every time you use it. As explained this is not how HttpClient is intended to be used.

Another solution is to set ConnectionClose property of DefaultRequestHeaders on your HttpClient:
var client = new HttpClient();
client.DefaultRequestHeaders.ConnectionClose = true;
This will set the HTTP’s keep-alive header to false so the socket will be closed after a single request. It turns out this can add roughly extra 35ms (with long tails, i.e amplifying outliers) to each of your HTTP calls preventing you to take advantage of benefits of re-using a socket. So what is the solution then?

Well, courtesy of my good friend Andy Jutton of Amido, the solution lies in an obscure feature of the ServicePoint class. Basically, as we said, ServicePoint controls many aspects of TCP connections and one of the properties is ConnectionLeaseTimeout which controls how many milliseconds a TCP socket should be kept open. Its default value is -1 which means connections will be stay open indefinitely… well in real terms, until the server closes the connection or there is a network disruption - or the HttpClientHandler gets disposed as discussed.

So the root cause is basically that with the default value of -1, which is IMHO, wrong and potentially dangerous setting.

Now to fix it, all we need to do is to get hold of the ServicePoint object for the endpoint by passing the URL to it and set the ConnectionLeaseTimeout:
var sp = ServicePointManager.FindServicePoint(new Uri("http://foo.bar/baz/123?a=ab"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute
So this is something that you would want to do only at the startup of your application, once and for all endpoints your application is going to hit (if endpoints decided at runtime, you would be setting that at the time of discovery). Bear in mind, path and query strings are ignored and only the host, port and schema are important. Depending on your scenario, values of 1-5 minutes probably make sense.

Conclusion

Using Singleton HttpClient results in your instance not to honour DNS changes which can have serious implications. The solution is to set the ConnectionLeaseTimeout of the ServicePoint object for the endpoint.