It's rare to see a web app that doesn't use XMLHttpRequest (or fetch, the new API with comparable capability). XMLHttpRequest (which we can call XHR if you're into the whole brevity thing) is as handy as a shirt pocket, but it doesn't do much to encourage robust and resilient programming practices. Any app that runs in the real world will sometimes encounter transient network interruptions and server outages. We should gracefully recover from both by automatically retrying our requests. But, we shouldn't turn a brief server hiccup into a full-on fireworks display by retrying too fast or by having every client retry at the same time.
This good advice gives us three concrete goals:
If you're using a framework that “takes care of all that for you,” you may wish to check in on what it's taking care of and how. At best, you'll learn something about your tools and maybe find some settings you'd like to adjust. At worst, you may find the framework isn't taking care of you as well as you were led to believe.
We might find out there was some trouble with our XHR
or through its
or we might notice in the
onload event handler that the
status indicates an error.
Listing 1 shows an example of how we might automatically retry
when we detect an error.
With this naive strategy, we may be solving a small problem by creating a bigger one. The deluge of immediate retries may generate more traffic than the server would be able to handle even if it was in good shape. It's not very respectful of the client's resources either. We can do better.
Listing 2 shows a first step toward making our automatic retry safer for our server and easier on our clients' resources. The initial request goes out right away, but we pause one second before each retry. One second is a compromise between getting the first couple of retries out quickly (to recover fast and give a good experience) but not retrying too frequently (to go easy on client and server resources). If we knew how long the outage would be, we could tune this delay. You may have observed, as I have, that short outages tend to clear up quickly while long outages tend to last a while. In other words: the more times we have to retry, the more likely it is that we're dealing with a long outage.
Following this line of thought brings us to “exponential backoff,” a strategy of increasing the pause before each retry exponentially as we make more attempts. We could use any base for our exponent, but let's use two because we're programmers and programmers like binary. With binary exponential backoff, if our pause before the first retry is one second, then our pause before the second retry will be two seconds, then four seconds before the third retry, and so on. We should also set an upper limit on how long the pause can be. If we select the duration of the first pause and the upper limit well, we should get pretty quick recovery from short outages without unduly stressing our clients or our server during longer outages.
Listing 3 shows an implementation of this truncated binary exponential backoff strategy. What we've built so far is starting to look pretty solid, but we might still get in trouble if many clients encounter an error at the same time. Because the length of the pause is deterministic, any clients that encounter an error at the same time will make each of their retries at the same time as well. Our server might gracefully recover, only to be knocked down hard by a deluge of retries. This could turn one little blip into a whole bouquet of whoopsie-daisies.
Listing 4 adds an element of randomness to the duration of the pause before retrying. The pause will now have a random duration no longer than one second for the first retry, no longer than two seconds for the second retry, and so on. The average pause duration is now half of what it was, so we might want to adjust the initial pause duration and limit.
Listing 4 achieves the goals we set out, but it's a little verbose to type up for each request we wish to send. Let's see if we can bundle it up into a handy function.
Listing 5 brings together the ideas we've developed into one handy function that wraps XMLHttpRequest. Our function doesn't expose the complete capabilities of XMLHttpRequest, just what I normally use. If you need support for more — setting request headers, for example — it should be pretty easy to add what you need within the existing structure. You may also like to change the default options to suit your environment.
This code could benefit from documentation and a permissive license. You'll find both in xhr.js. I'm pretty convinced this project is feature-complete, but I'm happy to address any bug reports — just drop me an email.
I hope this article has got you thinking about the value of having an XHR error recovery strategy. With some luck, it may also have given you some tools to use in implementing such a strategy.
If you have any questions, comments, or corrections please don't hesitate to drop me a line.Aaron D. Parks