Llcqpct

Paperclip reprocess attachments: Too many open files

by: GIANNA FUSARO

November 06, 2016 Need to reprocess a bunch of paperclip attachments, but running into the 'Too Many Open Files' exception? Here's a workaround using find_in_batches to quickly reprocess each group in a background job.

Recently I ran into a problem reprocessing thousands of records with attachments using rake paperclip:refresh CLASS=User (Thumbnail Generation). After exactly 995 files would reprocess, the exception "Too many open files" was thrown.

 

related to:

https://github.com/thoughtbot/paperclip/issues/1326

https://github.com/thoughtbot/paperclip/issues/1759#issuecomment-72812870

https://github.com/thoughtbot/paperclip/issues/1980, and many more

 

I didn't want to go down the path of raising the ulimit (the amount of open files your OS permits), so I decided to create a workaround rake task.

 

Before coming to a final solution, I tried using find_in_batches (documentation) on the models I wanted to reprocess, even manually triggering GC in an attempt to clean up the tempfiles. Turns out a find_in_batches with a sufficiently small batch size avoided the "Too many open files" exception, but took almost 2 hours to complete for 2000 records. I needed to reprocess ~80,000 attachments, so this task needed to be faster.

 

In order to speed up reprocessing thousands of images, I decided to throw each batch into a job. Instead of <2 hours, it took around <10 minutes.

 

lib/tasks/reprocess_paperclip_attachments.rake

 

workers/paperclip_reprocess_worker.rb