HOWTO Cache S3 Objects with Varnish

Varnish is an amazing tool. In fact I refer to it as the Internet band-aid. If you are having problems with your web site, put some Varnish machines in front of it and voila, problem solved. In fact I feel it’s so crucial that at Phyber we install it by default on every Datacenter build we do.

Which brings me to the point of this post. I hate NFS. I hate it passionately. Mostly because I hate the way people use it. Conversations like, “I have 10 dual quad core app servers with 48 gig ram each and all of our images mounted on a P4 2.4 with 4GB ram and some IDE hard drives. Why is our site slow?” My solution – host your user uploaded content on an object store. It doesn’t matter which, just pick one. MogileFS, Amazon S3, SWIFT, etc… They all work great, no single point of failure, and no single hard drive as a choke point. As an added bonus, you can scale and scale and scale.

So what do you do if you want to control your S3 serving costs? Cache your objects with Varnish of course. I found this little tidbit here today and it’s worth saving/sharing for the future.

backend s3 {
.host = "s3.amazonaws.com";
.port = "80";
}

sub vcl_recv {
if (req.url ~ ".(css|gif|ico|jpg|jpeg|js|png|swf|txt)$") {
unset req.http.cookie;
unset req.http.cache-control;
unset req.http.pragma;
unset req.http.expires;
unset req.http.etag;
unset req.http.X-Forwarded-For;

    set req.backend = s3;
    set req.http.host = "my_bucket.s3.amazonaws.com";
    lookup;
}

}

sub vcl_fetch {
unset obj.http.X-Amz-Id-2;
unset obj.http.X-Amz-Meta-Group;
unset obj.http.X-Amz-Meta-Owner;
unset obj.http.X-Amz-Meta-Permissions;
unset obj.http.X-Amz-Request-Id;

set obj.ttl = 1w;
set obj.grace = 30s;

}