Understanding browser caching

In a nutshell browser caching involves storing website files (images/stylesheets/etc) locally on a computer and loading these files from the cache rather than downloading from the web server when loading a webpage.

Why is this helpful? Basically because it speeds up the loading of the webpage for the user as there is no need to wait for the files to actually be downloaded. It also means less load for the web server as it's not having to return as many files, and when you are talking about hundreds or thousands of webpages being requested then this quickly adds up.

How does it work?

When you (well, your browser) makes a request for a webpage, the web server returns the files and 'tells' the browser whether the files should be cached. The browser is looking out for this information and if the files should be cached then it will go ahead and cache them for the length of time the web server dictates.

Once this has happened, any further requests for this file - i.e. any requests for this exact URL - will be served from the browser cache rather than being requested and received from the web server.

So to provide an example, say this webpage you're reading now used an image www.webigence.com/images/smiley-face.jpg. Based on our web server being set to tell the browser that all images should be cached for 10 years, if you came back to this webpage later today, tomorrow or next week, then the browser would have stored the image in the cache stored on your computer and would return the image from the cache on those future occasions.

If I saved this image as www.webigence.com/images/smile-face-copy.jpg, and replaced my code for this webpage to use this copy rather than the original image, then the browser will simply understand that it doesn't have this image stored in the cache and will therefore download the image and will again store it in the browser cache for subsequent visits to the page or would also retrieve this image from the cache if it is required on any other pages on the Webigence website.

When this can be a problem...

This is great and works very well when you do indeed want the user to see the same images and use the same stylesheets and other files on future visits to pages of your website. And for the most part these files will remain exactly the same and therefore you absolutely would want them all to be cached and the webpages will load faster and the web server will have to work less hard.

However what do you do when you actually want to change one of these images, stylesheets or any other type of files and you want the file to be replaced on the users computer rather than using the old version that is stored in the cache.

For the most part there is no way to tell the browser that it should now go and get an updated version from the web server (actually that's not the full story but it's a bit too complicated for this post), so (for the purposes of this post) there is simply no way to have the same file on the same URL get updated on all users computers.

The solution(s)

There are many ways and many reasons why a browser won't get the file from the cache despite having previously saved it to the cache, for instance:

The user can force the browser to refresh all the content for a particular webpage by hitting the 'F5' key, or by clicking a combination of 'Control' and 'R' keys. This tells the browser that in this particular case it should ignore the cache and should request all files used on the current webpage to be returned from the web server. These new files will now be stored in the cache, replacing any other files that had exactly the same name coming from exactly the same URL.

The user could also delve into their browser settings and find the option to 'clear cache'. This would mean the next request for the webpage would lead to the browser checking the cache for files, finding none and therefore downloading them all and saving to cache for subsequent visits. It's also possible to set the expiry of the browser cache so if this was set to a short time then the browser would keep clearing the cache on the timeframe set by the user.

The above two options are performed by the user, so they aren't much help to the designer/developer who wants to ensure all users would view the latest file they have added on the web server.

The 'not so elegant' solution would be for the designer/developer to set that the file always has a randomly generated querystring value added onto the end of the URL for that image, so going back to our previous smile face example, if a random string was added to the end of the URL such as www.webigence.com/smile-face.jpg?lkjsd20, then this would be a unique URL that the browser will know it hasn't accessed before and it will therefore download the file in this case.

The reason the above isn't so elegant is it basically means a new file would be stored in the cache every time the user accesses the webpage, yet the file will never actually be retrieved from the cache as the next request will have a different random code and the file will be downloaded every time.

A far better solution is to update the file name on the web server whenever that file is changed, and then to change the reference to this file in the HTML code at the same time. That way the file can be cached and any subsequent requests will always attempt to get the locally stored copy of the file, and would only then request the file from the server once it has been changed to a new file name and therefore obviously needs to be downloaded at that point.

One of the most common uses for this technique is for cascading stylesheets (CSS) where it may well be a large file and is being called on every single page of the website. The stylesheet will remain the same for days/weeks/months, but will also inevitably be changed sometimes frequently, sometimes infrequently.

In this case we will tend to name our stylesheets something like style-1.0.0.css, and when we make any amendment however large or small we increment the version number, so the next version would likely be called style-1.0.1.css, and then we update the reference in the HTML to use the latest version of the stylesheet. The same approach works well for javascript files (jQuery use exactly this version numbering convention) and can also be used for images or really any other type of file you might want to cache.

Change the name of the file on the server.

Blog written by Simon Wilkinson