Cowie from Etsy – Cooking with Chef
John Cowie from Etsy gave insight on how they use Chef at Etsy. This is a turbo-overview of his talk and my impressions.
First, as a note, Etsy prefers running on bare metal as opposed to the cloud. There are cases where working in cloud is better, but not EVERY case as some techno-pundits seem to evangelize. Not that some managers I’ve dealt with have this idea, but just know that the ENTIRE WORLD IS NOT MOVING TO AMAZON. They’ve got around 800 servers and they’re real servers – not VMs and not AMIs.
Some rules of thumb for dealing with Chef they’ve got:
- Never test chef in production
- Keep things as simple as possible
- For metrics – they use Chef handler to send data to graphite (in github.com/etsy/chef-handlers.git
- They push their chef failures to IRC.
With respect to handling of Failures:
- Use “knife node lastrun <<hostname>> to get what happened on the last run
- gem install knife-lastrun, then install on client.rb to get this data.
- Try to keep conditions simple if possible – not huge regexes, try to keep to simplicities if possible to help your readability at 3am.
Standards in Chef:
- Foodcritic: This is a tool for enforcing rules & standards on Chef, and it sounds FULLY RAD. http://acrmp.github.com/foodcritic/
- Foodcritic integrates with Jenkins (!!)
- Supports custom rules
- Etsy standards (guidelines)
- Never have chef auto-upgrade packages
- If you want to send an action:restart — RELOAD instead of restart
- (foodcritic can enforce)
- Some more rules of thumb:
- Don’t take Opscode’s word for it – if it doesn’t work for you, change it.
- 41 people have Chef access, most have keys to push to prod
- There is an unconstrained Test env. NEVER test in prod
- They tweaked the “environments” workflow with some tooling
- SPORK! (knife-spork)
- knife-spork is a wrapper around environments