Dealing with Heroku's Random Router

The Problem

A little while back there was a now famous post on rapgenius.com that let the Rails world in on how we're all getting screwed by Heroku. This post, however, is not about the issue of whether this is right or wrong (or evil), but a way to work around the problem of requests being stuck in a long queue on one dyno while another dyno sits around watching reruns of "Friends."

The Solution

Simply put, time-intensive work must be moved from an app's dynos to its background workers.

Using delayed-job (or resque), some sort of pub/sub javascript service (like Pusher or Faye) and a sprinkling of AJAX, we'll allow the request to complete quickly and free up the dyno for its next request. We'll do this by creating a background job to pass the work previously performed by a dyno to a worker and a pub/sub service to handle displaying the completion of that worker's job in the view.

We've created a contrived example (available on github), which simply fetches an item and performs some imaginary processing before returning that item. The action will be kicked off with a button click and the returned data will be displayed in the view.

For this example, we decided to use Pusher, a sub/pub service that handles all the messiness of running your own websocket server. Using pusher, we create a channel, trigger an event on that channel, and bind to that event in order to handle the data we receive. Since there could be many concurrent clients on this page, we need our event to be unique so responses don't get sent to unintended clients. Code should clear this up.

Starting with the Item model and the imaginary processing:

class Item < ActiveRecord::Base
  attr_accessible :description, :name, :position

  def self.slow_query(channel, event)
    sleep(1) # imaginary processing
    items = self.order("RANDOM()").first
    Pusher[channel].trigger(event, items.to_json)
  end
end

We'll soon see why the Pusher event is triggered in this method.

From there, we added a simple button and empty div on the view for our forthcoming results:

<button id="background-request">Background Job</button>
<p>Output:</p>
<div id="background-output"></div>

When the button is clicked, we fire off the subscription service and pass the unique channel name to the controller:

$ ->
  $("#background-request").on "click", =>
    eventName = bind("heroku_slow_push")

    # Event name is sent to the server so the response can be sent back to
    # the proper listener
    $.ajax "/items/background.json", data: { event: eventName }

The bind() function sets up the unique channel, subscribes to it, and binds a callback to the unique event name on the channel:

bind = (channelName) ->
  pusher = new Pusher("abc123") # your Pusher key here
  pusherChannel = pusher.subscribe(channelName)

  # If the event name is static, it's possible that multiple browsers
  # could receive the same event. This may be desired behavior in some cases,
  # but not here; to prevent this from happening, a GUID is used for
  # the event name
  eventName = guid() # http://stackoverflow.com/a/105074/573465
  pusherChannel.bind eventName, (data) ->
    $("#background-output").html( JSON.stringify(data) )

  return eventName

Back to our AJAX call, we queue the background job with channel and event names in tow so the event can be triggered when the processing is complete:

def background
  # event name is received from the browser and passed to the query
  # method so it can be triggered when background job finishes
  Item.delay.slow_query("heroku_slow_push", params[:event])
  render json: {}
end

The background job is queued, and the dyno is released in a few milliseconds rather than waiting for any processing. The query method triggers Pusher so we can handle the event's completion in our view.

Beyond Heroku

Moving time-intensive tasks into the background in a Rails app has benefits beyond working around Heroku's router problem - it can improve overall app responsiveness as well. Imagine two requests coming into an app, and both get routed to the same server (because all others are busy). The first request is time-intensive, and the second will return quickly. The second request must wait while the first completes, and in this time the second request could have completed many times over on an unclogged server.

Scenarios like this are why time-intensive operations in Rails introduce random spikes in an app's response time as its servers become saturated. These don't significantly change average response time, but can be very noticable on the client side when important requests are randomly slow.

Conclusion

As a Heroku app scales, dyno costs should increase in a linear manner. The change Heroku made to their router means that is not the case; exponential increase is needed to maintain the same response times for an app as it scales. Finding the time-intensive operations in an app and moving them into the background makes this increase linear again - an app will require more workers as it scales, but the number of dynos that would be required otherwise is far greater.

Dealing with Heroku's Random Router

Secure Passwords Without Punishing Rules

Parsing logs 230x faster with Rust

Is Contributing to Open Source Right for You?

Testing Rails Applications