Some people when confronted with a problem, think “I know,
I’ll use ImageMagick.” Now they have two problems.*
For one of the sites I’m maintaining a lot of content is generated directly from (more or less print-ready) PDFs. The only free tool I’ve been able to find that can convert PDFs to decent quality JPGs or PNGs is ImageMagick.
But even when you’ve got ImageMagick’s
mogrify commands installed, conversion of PDFs still requires some careful tuning, that is: careful selection of arguments to
convert. Also; a sacrificial chicken and lots of patience. Anyway, here’s what I ended up with. Most of this is also available in my clj-imajine clojure library.
Many web browsers do not support any color space other than RGB/sRGB. If your PDFs are in the CMYK color space (usual for print) or any other color space, the resulting JPGs will look “weird” in many applications and web browsers; some viewers just show a blank image and others completely mess up the colors. To make sure the end result is in sRGB, use the option “
For much the same reasons, you want to enforce that the output color depth is 8 bits for JPGs. To do that, use the option “
PDFs are pretty complex documents and one potential pitfall is that there are at least 3 different indicators of the “boundaries” of the PDF. I’ve run into a few where the “right” boundaries were provided by the “cropbox” instead of the “media box”. This post by Joseph Scott provided the solution: use “
The final line becomes:
convert -define pdf:use-cropbox=true -colorspace sRGB -depth 8 pages.pdf pages.jpg
Note that if your PDF contains more than one page, this will generate a JPG for each one, named pages1.jpg, pages2.jpg etc… To select a single page you can use
convert -define pdf:use-cropbox=true -colorspace sRGB -depth 8 pages.pdf[X] pages.jpg where X is the page number minus 1. You can find the page numbers in a PDF using ImageMagick’s identify command like this:
identify -density 2 -format "%p," pages.pdf
*) paraphrased from Jamie Zawinski’s remark on regular expressions.