Project: Sadie

Sadie, A Dynamic Interspection Engine

In addition to being yet another open source project with a nifty gnu-style recursive moniker, Sadie is also yet another key/value store like redis or memcached. But to make it really handy for handling some common data manipulation and management tasks, it adds a few nice features that distinguish it from the crowd:

  • primers - these can be fired periodically (every second, hour, day, week, etc.) or just-in-time to ensure that the value is present when you request it with the "key"
  • selectable storage mechanisms - storage mechanisms that are selectable on set and are transparent on get. If you're storing 3 gigs of data in a single value and you know that you don't need to serve it up to 1000 clients per second, you can choose to store that thing on disk instead of storing it in memory.
  • clusterable - multiple Sadie instances can share redis stores and nfs volumes as common storage resources to allow for multiple request handlers and multiple background primer processing units (see illustration below)
  • failover (future) - when a key is not found locally, the Sadie instance can pass the request along to one or more "failover" instances.

Sadie was originally designed to organize data gathering and processing code into a maintainable framework encompassing every part of a data pipeline. It was used to:

  • crawl websites for real estate data
  • digest large datasets into a time series of relevant statistics
  • assemble raw data into charts
  • assemble the processed statistics, charts, and graphs into polished, typeset reports

It has since undergone various architectural improvements:

  • more idiomatic ruby syntax
  • more scalable architecture
  • RESTful interface

Recent improvements for scalable, distributed architectures, illustrated:

Sadie Aspirational Architechture Diagram

Sadie is written in Ruby and the gem can be downloaded at its rubygems page. The source can be forked and/or downloaded at github.