Even though the Internet has become six orders of magnitude faster in the last twenty years, there’s never been a time without congestion slowing traffic. Now more than ever we run into the fundamental limits of the speed of light. There’s no way to move information around the world any faster than about ⅕ second.
The Internet is conceptually a series of tubes, information passed through connections from point to point. In current times, data is usually carried by fiber optic cables, with capacities in terabits of information per second. The 90s analogy to a superhighway is apt: point to point connections, prone to congestion during peak hours, yet still able to carry a huge amounts of traffic. Like the finite resources for building and maintaining roads, and the limits of human reaction time limiting speeds, there are fundamental limits to the speed of information: the speed of light, and the bandwidth of each point to point connection.
Before internetworking, information was relatively expensive to copy; copying machines are complex, expensive and time-consuming machines to operate; printing books requires an industrial organization to pass information that can now be transmitted to a mobile phone in less than a second. The few forms of information that were easily dispersed had no memory, no later retrieval. TV shows are broadcast at certain times, and while video recorders existed, they were a late innovation and unwieldy to use. We live in a world where none of those limits still exist.
We can now access many television shows on demand, stored on a computer somewhere, ready to send to you on a moment’s notice. Web pages full of text are literally copied as they are sent to you, there is no way to move an object over the Internet except by copying it and discarding the original.
This means that no matter how much we work to make things faster, there will always be a human-perceptible delay if we have to ask a server somewhere else on the Internet. We might get lucky (or ambitious) and put servers all over the globe like Netflix does, or pay a CDN like Akamai, but there will both be people on slow networks (think your local coffeeshop, mobile service on a subway train or while traveling abroad, being in a hilly suburban neighborhood, or in rural Africa or a small village in India or Italy.)
The web is held together with caching.
When we access this blog, we access dozens of caches, in computers all over the world.
First our computer looks to see if it’s got the page already. If so? It quickly tries to confirm it’s still good, then shows it.
If not, it then looks to see if it can connect to, using this blog as an example, cloudcity.io, but isn’t even sure where that is on the Internet, so it looks it up in DNS to get an IP address. So it checks … the DNS cache in our computer. DNS records come with an expiration date attached, usually in a low number of hours from when it was fetched from a server that was an authority on that domain name, so if the IP address is in there, and the cache entry hasn’t expired, the browser connects to that IP address and fetches the page, and caches it for the next time we go there, even as soon as hitting the back button after we follow a link from it.
If not, it asks the router if it knows the IP. That router has a cache of recently looked up domains. If it doesn’t know, it asks our ISP’s DNS server (or Google’s, or whatever it’s configured to), which in turn asks the authority servers that don’t cache but instead have the authoritative information.
Inside that server, that page has to be fetched from disk and sent to us. Unless that file is already in memory somewhere in a cache, ready to dump into the network. It’s caches all the way down.
This is all well and good and part of how the Internet can be so fast. If a cache is nearby or faster than the authority, it can make things faster.
If we don’t know all of what we’re going to fetch right away, things get slow because that means fetching something to find out, then fetching the next thing, then the next. A web page loads a stylesheet and some scripts and images. The stylesheet loads another stylesheet. That stylesheet loads a font. Pretty soon we have several seconds of delay trying to display something as simple as text.
If you have a website made out of plain HTML and CSS files with some images, browsers are pretty clever about caching them. The server tells the browser when they were last changed, and browsers assume that things that have changed recently are likely to change soon again, and things made long ago are likely to stay that way.
It’s the “predict yesterday’s weather and you’ll mostly be right” caching strategy.
Browsers also do some interesting things when they request pages. They send headers that specify the user’s preference for language (Accept-Language), the kinds of things the browser can display (Accept), and the kinds of compression it understands (Accept-Encoding). When a server responds, it should tell the browser which of those things it decided between, so the browser (and any caches in between, like CDNs) can store a copy specific to that pattern of requests, but if it didn’t, use the cached entry since it won’t vary because of that anyway. The server sends a header called Vary to show which of those parts of the request it used to decide what to send, in addition to the URL.
The caching of web applications is more complex. Web applications don’t have a ‘last modified’ date unless they send one with each page fetched. The guessing game will less often be right if they did. Changes affect every page of an application, and resources are often updated in bulk, and tightly coupled between CSS, Javascript resources and the HTML generated by an application.
The simplest strategy is to set a far-future Expires header and maximum cache time header (Cache-Control: max-age= ) for a given URL, and if the thing you serve changes, use a new URL. The simplest may be to have your application load its CSS and Javascript with a suffix on the URL like ?rev=version where version is some counter or version number that changes with each release. New URL? New cache entry. The old one will fade away with disuse, the user’s computer and any CDN you use can clean up using last access times, and everything gets happy. This technique is often referred to as ‘cache-breaking’.
Now for the interesting tension: the optimum caching for a repeated-use application is at odds with the optimum technique for “time to first paint,” the time it takes to have enough information on hand for the computer to actually draw the screen the user sees for the first time.
The more you move out of the plain HTML, the more resources are external and well-cached, the less has to be loaded on a repeated hit to the application, and only the initial startup page has to be loaded. However, each of those things moved out of the page and into external, read-once and cache forever resources is another HTTP round-trip for the initial page load. This means the access pattern matters. A font shared across domains (like Google Fonts) is more likely to already be in cache. A script loaded from a CDN in a common location is too. However, if it’s not, that’s another HTTP request to fetch it, and for things that block the render, that can be a deal-breaker. The more that’s bundled together, the more that has to be downloaded if something changes or a new version is used. The more things are separated into separate requests, the slower the initial page load, thanks to our friend, the speed of light.
This means marketing sites and things all about the first contact with customers are likely to need to be optimized one way, while repeated-use applications need to be treated another way. This even trickles into design, where separating those use cases into separate pages can make the optimization case that much clearer.