Four Oh Four: Performance Not Found

As users of the web, we all know what a 404 error means: “File Not Found.”  There are memes and jokes made about things that are not found and the number 404.

 

(https://http.cat is a fun site)

But, when you cannot find things, everything takes longer.  Just think about how much time you spend this week looking for your keys, wallet, child’s favourite stuffed creature.  We spend a lot of extra time looking for stuff before we can continue on with our day.

Lost On the Web: Making the Best of a Bad Situation

When we land on a page that is a 404 error, we know that something is broken.  Did we follow a bad link?  Is the page down – never to come back?  Do we care enough to search on this website to see if the information is still there?

There are a number of great posts on how to build a great 404 page.  But what can we do to enhance the performance of 404?  It turns out, there is a lot we can do.

Page Loads, But Not Completely

The focus of the articles above are for when a page is not found, and you are redirected to a full page describing that the file was not found.  But I think that is a small subset of 404 errors.  What about pages that load – except the CSS file is missing, so the styling is a mess?  What if the hero image appears as a broken file?  When a page is completely broken, we know to do, but what happens when only some of the page is broken?

Screen Shot 2019-01-28 at 10.04.37 PM.png

On this site, the logo is a 404 – with the broken image file.  However, the spinner never goes away, as the JS is trying to download “undefined.jpg.”  The page technically loads, but the site appears to be nonfunctional.  What are these files that are throwing 404s, and what can we do to prevent issues?

404: File Types

Looking at the December 15, 2018 run of the Mobile HTTP Archive, of 118.5M requests, 604,000 are 404 errors (thats 0.5% of all requests).

screen shot 2019-01-27 at 9.27.20 pm

This makes sense, since the HTTP Archive is loading the main landing page of top domains – we expect that these pages *should* work (as we will see, this is not always the case). Breaking down these files by extension, we find that ico, jpg, png and js have the most 404 errors.  However, when we compare the 404 counts to the total number of requests, we find that 13% of all ico requests result in 404s.. Two font types – woff and ttf – and JSON responses all have over 1% as 404 errors.

Screen Shot 2019-01-28 at 10.14.16 PM.png

json Not Found

Of the 6495 JSON 404 errors, ~2900 have the name manifest.json (thats 45% of the failures). manifest.json files are used to identify web apps, but it appears that 3.8% of implementations will not work due to the missing files (2900/73,500 total files in the dataset) .

 

Broken Favicons Can Hurt Performance?

Favicons are generally one of the last things to load on a page, so are unlikely to slow the rendering of a page.  The data shows that Favicons are the most likely file type to throw a 404 error. But what happens with a 404 for a .ico file?  To the customer, it generally appears as a box in the tab – rather than the cute icon:

Screen Shot 2019-01-28 at 10.30.47 PM.png

Ok, sure – this does not break the page.  But it does give us a chance to see what your server does when a 404 occurs.   What should happen is a short 404 message:

Screen Shot 2019-01-29 at 9.19.28 AM.png

This response, including headers, is 2 KB.  If anyone actually looks for the favicon on your site, they’ll get a nice message – and when it fails on any other page load – there is a small cost of 2 KB.

But what if you have one of those fancy 404 pages (like in the articles above)?  If all 404 errors redirect to that page, the first request of the page will be downloaded instead of the favicon. This data is never used by the page, and is simply extra data transfer – incurring server costs and hitting your customer’s data plan.

In the HTTP Archive, there are ~5,000 sites downloading over 10 KB of data when the favicon fails to load, and 300 using over 100KB.  Not found errors should be segregated, and the fancy 404 page delivered fr not found pages, and only a very simple error delivered for web page components.  The CBC saved 80% on each component 404 in this way.

There are 64,500 404 responses over 5 KB, and only 500 of these are HTML   To redirect 64,000 responses to just 1KB would make a large improvement in download time, and reduction in server usage.

Best Practice:  Redirect only Page 404 errors to your failure page. For files like images or CSS, send a (very) small 404 error to reduce the KB impact of the error.

 

Fun with 404 Errors

Looking through 404 responses , I found some interesting trends that I thought would be a fun addendum to this article:

There are over 900 sites that attempt to use Tumblr’s Share api (using V!, which is deprecated):  “platform.tumblr.com/v1/share.js”  and this results in a 404 error. There are a lot of websites that are no longer getting updated, and as tools are removed – the sites lose functionality.

In the weird use of the web category:  I found 7 Silicon Valley newspapers that were using a menu.png file from a midwest bank.  (apparently the bank found out, and removed the file, hence the 404 response).

screenshot2019-01-31at6.27.52pm

Also in the weird category: There are 1450 requests in the HTTP Archive where sample code was taken a bit too literally, and the call was made to example.com:

screenshot2019-01-31at9.07.57pm

 Conclusion

The big takeaway here is to serve as small a 404 response as possible for files.  It will lower your server costs, and speed up the delivery of your sites – while you work to fix the 404s that are live.

 

 

 

 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.