Pivotal Labs

Rails, Slashdotted: no problem

edit Posted by Steve Conover on Wednesday June 27, 2007 at 03:29AM

By Steve Conover and Brian Takita

Peer-to-Patent, one of Pivotal Labs' clients, got Slashdotted last week, and we had no trouble handling the load. The site was just as responsive as it always is, and we didn't come close to having a scale problem.

Moral of the story: the technology for serving static web pages is old, boring, and extremely scalable. If you have the type of site that can be page-cached, do so aggressively, starting with the front page and any pages likely to be linked to. We got a huge payoff for the engineering time that we invested in our page-caching strategy.

Highlights:

  • We moved away from Rails page-caching and developed our own "holeless cache", which uses a symlink trick (see below) to instantly and "holelessly" switch to a new version of a cached page. (The cache "hole" is the time between the expiration or purge of a cached page and the time when it's regenerated. The danger is that in that time your Mongrels can be saturated with requests - something we proved to ourselves could easily happen.)
  • Here's our symlink trick, using the front page as an example:

    1. Have index.html point to index.html.current
    2. If (index.html.current is >= 20 minutes old)
      1. Copy index.html.current to index.html.old
      2. Point index.html to index.html.old
      3. Rewrite index.html.current by asking Rails for the page (using the process method)
      4. Repoint index.html back at index.html.current
    3. Repeat step 2 every minute using a cron job.
  • For cache expiration that's model-based, we make a call from the model observer class to our holeless cache routine, instead of using Rails cache sweepers. So, instead of just deleting the cached page we regenerate it in place.

  • It was important to write tests that proved that the HTML we generated for cached pages looked exactly the same in different "modes" (user logged in vs not, for example). This forced us to push modal decision logic out of Markaby templates and into JavaScript, meaning that view-oriented Rspec tests asserting modal differences became useless. We rewrote them as Selenium tests.

  • Performance/load testing: we tried several tools and approaches and found that a simple Ruby script that launches wget requests (that write to /dev/null) in many separate threads worked best for us.

  • We send down exactly one .js and one .css file. If you are sending down more than one of each of these to the browser, you have a performance problem. Fix it with asset packager.

Update: one clarification about the cron job: we deploy this "automatically" using capistrano.

Comments

  1. watt watt on June 27, 2007 at 09:29AM

    the symlink trick is not very efficient - since "move" operations on Linux are atomic, you could simply "move" the new page over the "old" page.

    1) have index.html point to index.html.current 2) generate the cached as index.html.new 3) have cron script check if index.html.new exists, and if yes, "mv index.html.new index.html.current"

  2. Steve Steve on June 27, 2007 at 09:18PM

    watt - thanks for your suggestion, this is a fine solution to the problem, anyone thinking of doing something similar to what we did should consider it.

  3. Claus Claus on July 04, 2007 at 02:28PM

    So "Rails, Slashdotted - no problem" means "Avoid Rails if there is any load" - the strategy you talk about has little to do with Rails.

  4. Dav Dav on July 04, 2007 at 04:38PM

    Claus, actually it means you can code your application in Ruby on Rails, taking advantage of the short time to market (I think Peer to Patent took under two months), vastly improved maintainability (over PHP, for instance) and enjoy the pleasure of coding in Ruby, yet still put the application into production in a manner that survives a slashdotting. That's pretty nice.

  5. Juan Lupión Juan Lupión on July 04, 2007 at 04:45PM

    "So "Rails, Slashdotted - no problem" means "Avoid Rails if there is any load" - the strategy you talk about has little to do with Rails.·"

    Yes, it has. Caching is the life of any web app, and Rails excels on enabling you to cache exactly the content you need.

  6. Morten Frederiksen Morten Frederiksen on July 04, 2007 at 06:21PM

    Quite a useful caching trick, and one that works with all application languages too.

    Nothing Rails specific here, if not for the "moved away" part...

  7. Claus Claus on July 05, 2007 at 09:20AM

    The development speed benefits of rails are largely anecdotal and accidental to the caching technique. As for Juan's assertion that "rails excels" here - is funny, considering that the technique largely involves not using Rails to solve the problem.

    I'm totally OK with Rails and the people loving Rails - I use it too on some projects - but it is quite simply not true that rails development is orders of magnitude faster/better than the other dynamic environments. It is exactly as ridiculous as the "Enterprise == Java" claims, the Rails community are so fond of laughing at.

  8. Paul M. Watson Paul M. Watson on July 31, 2007 at 01:34PM

    Thanks for the caching trick.

    On the "one .js and .css" comment though I am pretty sure you aren't advocating it for everyone in every context. With YUI on a CDN you don't want to be sending down your own copy of YUI inside your one .js file. Let Yahoo's CDN do it for you and it may be cached already on the visitors machine. And with some web-apps now having many hundreds of k of JS you need to find a balance between HTTP requests and file-size.

  9. Javascript required? Javascript required? on September 30, 2007 at 05:14AM

    Hi guys,

    Interesting solution...thanks for posting it. A question not as much about this but about your javascript login/logged in code, I'd love to hear more about how you are doing that, (so that you can do full page caching), but keep the "Hello Cameron" on the screen - where you got that idea, and especially, does it fail gracefully, if a user has javascript disabled?

    Thanks!

    Cameron

  10. Mike Bailey Mike Bailey on October 17, 2007 at 05:58AM

    Do you have any nice tricks for managing cron via Capistrano? Do you maintain entire cron files under version control and push them out with cap or have you written tasks to add/remove cron entries individually?

  11. Jim Meyer Jim Meyer on October 26, 2007 at 11:17PM

    @Mike: Just surmising, but you could keep your crons somewhere in your app's dir structure, then make a cap recipe to call "crontab [filename]" when appropriate. Problematic if you have multiple apps with crons all to manage under a single username, though.

  12. Dan Kubb Dan Kubb on November 14, 2007 at 08:15AM

    Another approach that you may want to think about is using the Varnish HTTP Accelerator. With a bit of configuration all you'd have to do is set your Expires and Cache-Control headers to keep the pages fresh for 20 minutes, and it'll automatically keep refreshing the content every 20 minutes without any file copying.

    Plus Varnish is fast, faster even than Nginx. I read that the Joyent guys say its about 10x the speed of Nginx, but I've only gotten it to run about 40% faster -- still not too shabby considering Nginx is one of the fastest web servers around. Keep in mind that Varnish isn't a full web server, its just a cache, but its probably worth testing in front of Nginx.

  13. Brian Takita Brian Takita on November 16, 2007 at 05:35PM

    @Cameron - We store the user's basic information in the Cookie header. The javascript code then reads/alters the cookies.

    Cookies work out well because they can be read/altered by the server on POSTs and passed to the next request on cache hits.

    Obviously, we need to consider privacy, so sensitive information cannot be stored this way.

    Since the vast majority of our target users have javascript enabled we did not implement a non-javascript way to interact with the site. We also heavily use AJAX techniques for our site's interaction, which, afaik, makes it expensive to have a parallel non-javascript solution.

  14. Brian Takita Brian Takita on November 16, 2007 at 05:38PM

    @mike + @jim - We have a cron installer that overwrites the crontab on deploy. Since P2p is the only app running on our slice, its not a problem.

    If you have a shared server, you could use a modified technique, like search/replace using comments and your favorite text munging tool.

Add a Comment (MarkDown available)