Comparing Background Processing Libraries: Sidekiq
This is the final part in the background processing series. You can check out the previous posts from the links above. I’ll summarize what we have covered so far.
- The principles behind background processing (queues, processes etc.).
- Using delayed_job, which is amazing for getting background processing rolled into your app really quickly, but, it uses ActiveRecord so it is kind of slow and you end up mixing up background and foreground code throughout your app.
- Using Resque, which hopes to solve the problems with delayed_job by using Redis and separate worker classes.
However, we can still improve on the performance of Resque. This is where Sidekiq comes in.
Sidekiq
If you read through the theory section in the previous articles (which you really should), you might have noticed that I talked about processes as the only concurrency option. It turns out that’s not really true and Sidekiq takes full advantage of something else: Threads.
Threads are a super lightweight concurrency measure, in order to increase performance. The headline on the Sidekiq page puts it squarely
“What if one Sidekiq process could do the work of 20 Resque or DelayedJob processes?”
Let’s check out how to use Sidekiq. If you read through the delayed_job and Resque introductions, Sidekiq is somewhat a combination of the two. Sidekiq does use Redis, so you will need to have that installed and running (check out the Resque section in the previous articles to see how to do this).
To get started, add the following line to your Gemfile:
gem 'sidekiq'
Install ‘er up:
bundle install
Fortunately, that’s about it for the setup we need. SideKiq doesn’t use the rake system directly like Resque, so we don’t have to fiddle around with “lib/tasks”. Let’s get onto writing our print worker in “app/workers/print_worker.rb”. Here it is:
class PrintWorker
include Sidekiq::Worker
def perform(str)
puts str
end
end
Once again, we have to queue up a job somewhere (we’ll do it in the index controller):
class IndexController < ApplicationController
def index
PrintWorker.perform_async(params[:to_print])
end
end
Finally, we have to get the Sidekiq process(es) fired up. Type this into your shell in the root directory of your Rails app:
bundle exec sidekiq
This starts a process that is waiting for jobs. Let’s put one in the queue.
If you go to “/index/index?to_print=sidekiqisgreat”, you should get “sidekiqisgreat” somewhere in the output of your sidekiq process (The sidekiq runner includes some other information that you can safely ignore for the sake of the example.)
Sidekiq is easy enough if you learned Resque, but it has a pretty big “problem” that we haven’t discussed yet: thread safety. Since Sidekiq uses threads, you can only uses libraries in Sidekiq if they are thread safe – anything else will likely cause a ton of problems that are difficult to track down. This really limits what you can do with Sidekiq.
Secondly, you must ensure that your code is thread safe (e.g. global variables are a no-go). Ruby (rather, the “default” ruby interpreter which is MRI) also has something called an “interpreter lock”, which means that only one thread can run at a time, so Sidekiq will work much better with alternative implementations of Ruby, such as Rubinus and JRuby.
What’s the point of going through all that hassle? Why not just use Resque? The biggest reason is performance; the difference is pretty big if you’re processing lots of jobs that would benefit with concurrency.
The Final Comparison
Over this three part series, we’ve covered the theory behind background processing and three background processing frameworks: delayed_job, Resque and Sidekiq. Each has its ups and downs.
With delayed_job:
* Pros
– incredibly quick and easy to get rolling
– no addition to your “stack”; it can run just fine with ActiveRecord
– a fantastic choice for beginners or migrating code from the foreground to the background
* Cons
– Runs on ActiveRecord, so it will probably run slower than something that runs on Redis
– Makes it very easy to mix async and sync code in file, which, in my opinion, is a bad thing
– The “.delay” calls scattered across your codebase will make it difficult to reason about six months later
With Resque:
* Pros
– It runs on Redis, which is fast
– Great separation of background code with worker classes
– It has a fantastic web dashboard
– It makes it fairly easy to do background processing
– It is my favorite!
* Cons
– More difficult to get running with than delayed_job
– Still isn’t the fastest!
– Doing prioritized, time-based jobs is not as easy as delayed_job
With Sidekiq:
* Pros
– Pretty darn fast and workers are lightweight
– You can port over code really easily from Resque
– Great separation of code
* Cons
– More difficult than delayed_job
– (this is the biggie) You must use thread-safe libraries and write thread-safe code
As I’ve mentioned, I like Resque the most out of the three. My primary reason to avoid delayed_job is the mixing of sync and async code. When considering Sidekiq, it is the requirement of thread-safe libraries that scares me off.
If your requirements are different, your choice could be different. Just take the time to understand the consequences of your choice! Make sure you don’t try to optimize too early, or you’ll spend time on something that has no relevance to the real bottleneck in your app!
Example app
I’ve built a small example application using Sidekiq. It performs the same things as the delayed_job and Resque examples, but using Sidekiq. It saves and displays page counts on an uploaded PDF synchronously and asynchronously. The most important thing to note is that I had to change about three lines of code in order to switch from Resque to Sidekiq, because the library I was using to count pages in PDFs happened to be thread safe. However, if it hadn’t been, writing a Sidekiq port would have been much more difficult since I would have had to either roll my own library or use a C extension.
The code for the example app can be found here.
Wrapping it up
As you can tell, the Ruby community has presented three unique solutions to a single problem. Each has its own benefits and drawbacks. Pick the one that you like the most and use it in your next amazing app!
If you have any questions or suggestions, do drop them in the comments below!