Code beats Bureaucracy: Tax Form Automation With Ruby and FDF

The City of Kettering decided to tell me they wanted my Schedule E’s from 2007 to 2012 and to fill out an income tax return for each of these years. We have a rental house there, and had no idea we needed to file a local tax return. I hate manual data entry and wanted to fill out my forms using ruby and pdftk. Yes, this is rube goldberg at its finest, but I work a lot with PDFs and wanted to learn how to do this quickly. I’ve decided that PDF programmatic management is one of those modern skills like typing that I need to master, and I’ve already made an investment in Ruby. (Just learning to use the python script PDFconcat is a great lesson in how a little learning can save a lot of time.)

I started with (random) data in this form, which represents a yearly loss on my rental house. I was able to pull up my schedule E’s since we have been paperless since 2002. I use yep to assign tags for all my files so I could pull them up quickly. Data below is made up, but in the same format as the real data.

2007|10
2008|12
2009|22
2010|20
2011|107
2012|388

And I need to populate that in [this form](wget http://dev.ci.kettering.oh.us/wp-content/uploads/2013/06/TAX_2013-Kettering-Individual-Return-No-Dates.pdf)

wget http://dev.ci.kettering.oh.us/wp-content/uploads/2013/06/TAX_2013-Kettering-Individual-Return-No-Dates.pdf

Here is a log of my attempt (in order to keep me focused on this and do it as fast as possible).

Start: 14:44 on Sunday PM

Several google queries — identified that I wanted to use pdftk and nguyen, a very lightweight library that fill PDF forms using XFDF/FDF with pdftk.

I had to install an older version of ruby (1.9.3-p448) and then clone the repo:

rvm install ruby-1.9.3-p448
git clone git@github.com:joneslee85/nguyen.git

14:54

Wow, the form is done pretty crappily:

irb(main):002:0> require '../../lib/nguyen'
=> true
irb(main):003:0> p = Nguyen::PdftkWrapper.new 'pdftk'
=> #<Nguyen::PdftkWrapper:0x007fa72d88def8 @pdftk="pdftk", @options={}>
irb(main):005:0> d = Nguyen::Pdf.new('tax.pdf', p)
=> #<Nguyen::Pdf:0x007fa72b126928 @path="tax.pdf", @pdftk=#<Nguyen::PdftkWrapper:0x007fa72d88def8 @pdftk="pdftk", @options={}>>
irb(main):006:0> d.fields
=> ["Occupation", "Occupation_2", "undefined", "undefined_2", "undefined_3", "undefined_4", "undefined_6", "undefined_7", "undefined_8", "undefined_9", "undefined_10", "undefined_11", "undefined_12", "undefined_14", "undefined_15", "undefined_16", "undefined_17", "undefined_18", "undefined_19", "Date", "Date_2", "Date_3", "undefined_21", "undefined_22", "undefined_23", "NAME_2", "ADDRESS", "ADDRESS_2", "undefined_24", "AMOUNTA", "AMOUNTB", "undefined_25", "undefined_26", "undefined_27", "undefined_28", "undefined_29", "undefined_30", "undefined_31", "undefined_32", "undefined_33", "Address", "l100", "l101", "l102", "l103", "l105", "l106", "undefined_5", "t101", "t102", "t103", "t104", "NAME", "t105", "t106", "t107", "t108", "t109", "t110", "t111", "t112", "l200", "l201", "l202", "l203", "t113", "t114", "cb1", "cb2", "cb3", "cb4", "t1", "undefined_13", "l1", "l104", "b1", "b2"]

15:02

Boom! You can figure out acrobat form names through Forms -> Edit. Looking at this, I now feel good about writing a script because there is so much duplication. Here is a list of the fields I need to fill (dummy data below):

  • “TAX YEAR” -> current_year
  • cb2 -> true
  • t1 -> “Not aware”
  • cb3 -> true
  • Address -> “123 Main Street, Alexandria, VA 22304”
  • l100 -> “123-45-1111”
  • Occupation -> “USAF”
  • “City of Income” -> “Alexandria, VA”
  • l101 -> “245-28-2822”
  • Occupation_2 -> “Physical Therapist”
  • City of Income_2 -> “Alexandria, VA”
  • “Phone Number” -> “571-281-2822”
  • “Email Address” -> “foo@bar.com”
  • “Old Address” -> old_address
  • “undefined_4” -> amount_of_loss
  • undefined_5 -> amount_of_loss
  • l102 -> 0
  • undefined_10 -> 0
  • undefined_11 -> 0
  • l103 -> 0
  • l106 -> 0
  • Date -> Date.now()
  • Date_2 -> Date.now()
  • NAME -> “Kettering Rental House”
  • t105 -> old_address
  • t106 -> “Kettering, OH 45202”
  • l200 -> amount_of_loss
  • undefined_24 -> amount_of_loss

15:16 starting to write test code

15:20 this code works, starting on real code

15:48 20 minute break for lunch and play with kids

16:20 frustrated — can’t get ruby syntax to work with here doc

This was just silly. I should have known how to load an array of text . . .

16:30 all working — printing forms with this code

Pretty cool.