ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null

I got the following pretty obscure error the other day from a cfscheduler job that runs nightly to index documents uploaded to our site:

org/apache/pdfbox/pdmodel/PDDocument null

Turns out that the error is caused by a file having the extension of .PDF instead of .pdf. No, really. Luckily I only had one offending file, but what if I had many? Also, what if users uploaded more after I renamed the problematic one? There are two parts to “future proofing” my situation. The first part it to address the .PDF extensions in the uploads. The second part, and what I’m going to pass on to you, is a custom tag that will look in a directory you specify and rename all .PDF extensions to .pdf.

To implement:

Download the pdf_cleanup custom tag
Unzip it to whatever directory you keep you custom tags in
Call it using the following syntax just before you run your <cfindex> operation(s):
<cf_pdf_cleanup dirToClean="C:\mysuperdocs">

Be forewarned I take no responsibility for your use of the tag 😉

Download the pdf_cleanup custom tag

4 thoughts on “ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null”

Did you try this in 901+CHF? Solr fixes were added that may got this. If so, please be sure to file a bug report. Adobe does NOT search out blog posts like this so it’s up to us guys to use the public bug tracker.

Unfortunately I did have 901+CHF. I have filed a bug as you suggested. Thanks!

Hi, this happened to me but in my particular case, there was a PDF file without extension. So instead of File.pdf it was only File. Thanks for this info.

Wow! Thanks so much for this. I was going crazy trying to find the single PDF in my collection that was causing SOLR to crash with a 500 error. I narrowed it down to one PDF (after putting one in a folder, re-indexing, putting a second pdf file, re-indexing, etc. etc — until I narrowed it down to a single file that would always bomb the indexing.) Anyway, I didn’t even notice the upper-case PDF extension.

Wow — this is a *BIG* bug in SOLR. Crazy, crazy. Thank you so much!

Raymond Camden

February 4, 2011 at 2:48 pm

Did you try this in 901+CHF? Solr fixes were added that may got this. If so, please be sure to file a bug report. Adobe does NOT search out blog posts like this so it’s up to us guys to use the public bug tracker.
Chris Simmons

February 7, 2011 at 7:49 am

Unfortunately I did have 901+CHF. I have filed a bug as you suggested. Thanks!
Nery

June 3, 2011 at 11:03 am

Hi, this happened to me but in my particular case, there was a PDF file without extension. So instead of File.pdf it was only File. Thanks for this info.
Bobbytuck

June 21, 2011 at 4:26 pm

Wow! Thanks so much for this. I was going crazy trying to find the single PDF in my collection that was causing SOLR to crash with a 500 error. I narrowed it down to one PDF (after putting one in a folder, re-indexing, putting a second pdf file, re-indexing, etc. etc — until I narrowed it down to a single file that would always bomb the indexing.) Anyway, I didn’t even notice the upper-case PDF extension.

Wow — this is a *BIG* bug in SOLR. Crazy, crazy. Thank you so much!

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this:

4 thoughts on “ColdFusion SOLR error: org/apache/pdfbox/pdmodel/PDDocument null”

Leave a Comment