How New EC2 Instances Lead to Re-write PDF Tools
by Sebastien Mirolo on Mon, 11 May 2015Amazon announced new T2 instances in July 2014. A little later it became clear that AWS free tier for StartUps was only for T2 instances. The free tier offer did not extend to the previous T1 generation (We found out on the first bill).
New and free, who could resist? Ensued an executive decision to move all of DjaoDjin infrastructure to second generation instances. Search and replace T1 by T2. How hard could that be, really?
It didn't take long before we stumble on the hard truth. While previous (T1) instances relied on PV virtualization, newer T2 instances relied on HVM virtualization. Unless you are a virtualization buff, that means nothing. Well not quite. There is a forced OS kernel update to make that happen.
Experience shows that Kernel updates ring with unexpected changes all over your software stack. Has anyone seen those beautiful lasagna stacks, separation of concerns, et al.? Yeah right.
Reality looks more like dependency hell.
So now we are trying to figure out which HVM compatible AMI to use. November 2014, Fedora 21 starts to ship release candidates as HVM AMIs, a few vendors pushed out experimental CentOS images. At that point there are no stable and official HVM AMI of a RedHat-compatible distribution.
We start deploying our web stack on a freshly provisioned Fedora 21 t2.micro.
$ yum install pdftk Warning: No matches found for: pdftk No matches found
Not found? Misspell? Typing the command again. Same result. Time for a little Googling. So PDFtk has been removed from Fedora 21 because it has a hard dependency on libgcj, a Java runtime library implementation, which itself has been removed due to the lack of a maintainer. There might also be some licensing issues related to iText.
Either way, spending time to figure out why a package that was perfectly working before is no longer available is best left for historians. Our main problem is: How do we fill a PDF template now?
There is not a lot of real answers. PDFtk seemed like the only viable
alternative and it is gone now. Because our team is resourceful, we plunge
into finding a good C++ PDF library and writing our own fillform
tool. We settled on PoDoFo
after gaining confidence reading the code in FormTest.
The code for the resulting PDF template filler is available on GiHub: podofo-flatform.cc.
After development on a local OSX laptop, it remained to compile and run the
podofo-flatform
on the production systems. We hit another hic-hup
then. Though podofo 0.9.3 has been released in July 2014 and is available
through MacPort, it is version 0.9.1 that is currently packaged on Fedora 21.
Linking with PoDoFo 0.9.1 shows no error but the resulting
podofo-flatform
generates blank PDFs. Fields are not populated.
Fortunately, compiling PoDoFo 0.9.3 from source did not pose extraordinary
challenges.
$ tar zxvf ~/podofo-0.9.3.tar.gz $ mkdir podofo-build $ cd podofo-build $ yum install cmake openssl-devel libidn-devel libjpeg-turbo-devel \ libtiff-devel libpng-devel lua-devel freetype-devel fontconfig-devel \ cppunit-devel $ cmake -G "Unix Makefiles" \ -DCMAKE_INSTALL_PREFIX="/usr/local" \ -DPODOFO_BUILD_STATIC:BOOL=TRUE -DPODOFO_BUILD_SHARED=TRUE \ ../podofo-0.9.3 $ make $ make install
For reference, on OSX:
$ tar zxvf ~/Downloads/podofo-0.9.3.tar.gz $ mkdir podofo-build $ cd podofo-build $ sudo port install fontconfig freetype jpeg tiff lua $ cmake -G "Unix Makefiles" \ -DWANT_FONTCONFIG:BOOL=TRUE \ -DCMAKE_PREFIX_PATH=/opt/local \ -DCMAKE_INCLUDE_PATH=/opt/local/include \ -DCMAKE_LIBRARY_PATH=/opt/local/lib \ -DCMAKE_FRAMEWORK_PATH=/opt/local/Library/Frameworks \ -DCMAKE_FIND_FRAMEWORK=NEVER \ ../podofo-0.9.3
Finally the last hurdle was to open the PDF templates we had with
Preview.app
and save them. This last step somehow created
a cleaner PDF file, one our podofo-fillform
was able to generate
flat PDFs from. It might just be that somewhere embed into the PDF Reference Manual
there are some syntax we did not account for.
$ podofo-flatform --fill "City=San Francisco" --fill "Last Name=Smith" \ --fill "First Name=Joe" template-form.pdf -
The podofo-flatform is available and used in djaodjin-extended-templates,
a Django App we use in production to generate PDF invoices and HTML emails.
By the time we had a working PDF fillform utility on a T2 instance, Django
version 1.8 was released. Long in the works, Django decided to break API
compatibility for the template engine on version 1.8. So yes, djaodjin-extended-templates
only works with Django 1.7. Upgrading to Django 1.8 is bound to be another
story...
More to read
If you are fascinated by the subtle interaction between business and technical decisions and the disproportionate outcome it often leads to, you might be interested in How we setup pylint on a git pre-receive hook or Software-as-a-Service lighting talk.
More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS application hosting platform.