Web caching stores frequently used objects closer to the client through browser, proxy, or server caches. By storing "fresh" objects closer to your users, you avoid round trips to the origin server, reducing bandwidth consumption, server load, and most importantly, latency. This article shows how to configure your Apache server for more efficient caching to save bandwidth and improve performance.
Caching is not just for static sites, even dynamic sites can benefit from caching. Graphics and multimedia typically don't change as frequently as (X)HTML files. Graphics that seldom change like logos, headers, and navigation can be given longer expiration times while resources that change more frequently like XHTML and XML files can be given shorter expiration times. By designing your site with caching in mind, you can target different classes of resources to give them different expiration times with only a few lines of code.
Three Ways to Cache In
There are three ways to set cache control rules for your web site.
- Via
meta
tags (meta equiv="Expires"
) - Programmatically by setting HTTP headers (CGI scripts etc.)
- Through web server configuration files (httpd.conf)
This article addresses the third method of cache control through server configuration files. The first method works with browsers, but most intermediate proxy servers don't parse HTML files, they look for HTTP headers to set caching policy. The second method of programmatically setting cache control headers (Expires
and CacheControl
for example) is useful for dynamic CGI scripts that output dynamic data.
Cache Freshness Guaranteed
In order to cache web objects, browsers and proxy servers upstream from origin servers must be able to calculate "freshness lifetimes," or how long from a previous access or modification of an object it is still OK to display from the cache. HTTP does this digital melon squeezing primarily through brief HTTP header conversations between client, proxy, and origin servers to determine whether it is OK to reuse a cached object, or reload the resource to get a fresh one. Here's an example REQUEST/RESPONSE sequence for our logo image, l.gif
.
Host: www.websiteoptimization.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.websiteoptimization.com/
Our server responds as follows:
HTTP/1.1 200 OK
Date: Mon, 25 Oct 2004 11:55:45 GMT
Server: Apache/1.3.31
Cache-Control: max-age=2592000
Expires: Wed, 24 Nov 2004 11:55:45 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:10 GMT
ETag: "7b80d9-891-40d45ad6"
Accept-Ranges: bytes
Content-Length: 2193
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif
This image was last modified on June 19 and is fresh for 30 days from the last access. It is clear from these response headers that this object does not change frequently and can be safely cached for up to a month. After a client queries a proxy or origin server for a specific object, if that object is validated as still fresh, it is returned from the cache. If not, the object is reloaded from the origin server to grab a fresh copy.
Cache Control with mod_expires and mod_headers
For Apache, mod_expires and mod_headers handle cache control through HTTP headers sent from the server. Since they are not installed by default, have your server administrator install them for you. For Apache/1.3x, enable the expires and headers modules by adding the following lines to your httpd.conf configuration file.
LoadModule expires_module libexec/mod_expires.so
LoadModule headers_module libexec/mod_headers.so
AddModule mod_expires.c
AddModule mod_headers.c
...
AddModule mod_gzip.c
Note that the load order is important in Apache/1.3x, mod_gzip must load last, after all other modules.
For Apache/2.0, enable the modules in your httpd.conf file like this.
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule deflate_module modules/mod_deflate.so
mod_deflate is the native compression module in Apache/2.0 (although mod_gzip does a better job of handling wayward browsers). In this case, the load order does not matter, as Apache/2.0 handles this for you.
Target Files by Extension for Caching
One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on mod_expires set ExpiresActive
to on.
ExpiresActive On
Next target your website's root HTML directory to enable caching for your site in one fell swoop.
Options FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
Allow from all
ExpiresDefault A300
Expires A86400
Expires A2592000
ExpiresDefault A300
sets the default expiry time to 300 seconds after access (A). Using M300 would set the expiry time to 300 seconds after file modification. The FilesMatch
segment sets the cache-control header for all .html
files to 86400 seconds (1 day). The second FilesMatch
section sets the cache-control header for all images, external JavaScripts and CSS files to 2592000 seconds (30 days).
Note that you can target your files with more granularity using multiple directory sections, like this:
For truly dynamic content you can force resources to not be cached by setting an age of zero seconds and to not store the resource anywhere.
Header Set Cache-Control "max-age=0, no-store"
Target Files by MIME Type
The disadvantage of the above method is the reliance on the existence of file extensions. In some cases webmasters elect to use extensionless URLs for portability and performance (see Rewrite URLs with Content Negotiation). A better method is to use the ExpiresByType
command of the mod_expires module. As the name implies, ExpiresByType
targets resources for caching by MIME type, like this.
ExpiresActive On
ExpiresDefault "access plus 300 seconds"
Options FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
Allow from all
ExpiresByType text/html "access plus 1 day"
ExpiresByType text/css "access plus 1 day"
ExpiresByType text/javascript "access plus 1 day"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType application/x-shockwave-flash "access plus 1 day"
This httpd.conf code sets the same parameters, only in a more flexible and readable way. For expiry commands you can use access
or modified
, depending on whether you want to start counting from the last time the file was accessed, or the last time the file was modified. In our case for WebSiteOptimization.com, I chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.
Note the AllowOverride All
command. This allows webmasters to override these settings with .htaccess files for directory-based authentication and redirection. However, overriding the httpd.conf file gives a performance hit because Apache must traverse the document tree looking for .htaccess files.
HTTP Header Results
For our Apache/1.3x server, the httpd.conf file comes with cache-control disabled. Let's look at the headers for the WebSiteOptimization.com home page and embedded logo (l.gif) before we update the httpd.conf configuration file.
HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:15:38 GMT
Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a
Connection: close
Content-Type: text/html
Content-Encoding: gzip
Content-Length: 4326
HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:14:13 GMT
Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.9 FrontPage/5.0.2.2634a mod_ssl/2.8.20 OpenSSL/0.9.7a
Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT
ETag: "7b80da-4f2-40d45ae1"
Accept-Ranges: bytes
Content-Length: 1266
Connection: close
Content-Type: image/gif
After updating the httpd.conf file with the above MIME-based code, we restart the HTTP daemon using this command:
service httpd restart
The headers for our home page and logo now look like this.
HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:17:52 GMT
Server: Apache/1.3.31
Cache-Control: max-age=86400
Expires: Sun, 24 Oct 2004 23:17:52 GMT
Connection: close
Content-Type: text/html
Content-Encoding: gzip
Content-Length: 4326
HTTP/1.1 200 OK
Date: Sat, 23 Oct 2004 23:18:54 GMT
Server: Apache/1.3.31
Cache-Control: max-age=2592000
Expires: Mon, 22 Nov 2004 23:18:54 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT
ETag: "7b80da-4f2-40d45ae1"
Accept-Ranges: bytes
Content-Length: 1266
Connection: close
Content-Type: image/gif
Both resources now have cache-control headers. Note also that the Server
field is also stripped down. This is done with the ServerTokens
command:
ServerTokens Min
This minimizes the response header from:
Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8
mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.19
OpenSSL/0.9.7a
to
Server: Apache/1.3.31
Our images are now cachable for 30 days. However the HTML file does not have a Last-Modified
header. This is because we use conditional server-side includes to merge in different CSS for different browsers to save a HTTP request. We'll address the cachability of SSI pages in a future tweak.
Warning: Pragma no-cache Deprecated
According to Stephen Pierzchala of Gomez, you should avoid using the deprecated Pragma no-cache header. The following is an INVALID server response:
Header Set Pragma "no-cache"
"I see this a lot in server responses. In the HTTP specs, the Pragma header is a deprecated, client-side, HTTP/1.0 request header."
Conclusion
Server cache control can improve your site's performance while reducing bandwidth bills. By caching objects that change infrequently for longer periods, and caching frequently updated content for shorter periods (or not at all) you can speed up perceived load times while maintaining fresh content.
About the Author
Andy King is the founder of five developer-related sites, and the author of Speed Up Your Site: Web Site Optimization (http://www.speedupyoursite.com) from New Riders Publishing. He publishes the monthly Bandwidth Report, the weekly Optimization Week, and the weekly Speed Tweak of the Week.
Nguồn: http://www.websiteoptimization.com/speed/tweak/cache/
No comments:
Post a Comment