PDF/A Conference (Amsterdam) highlights

April 18, 2008

Hi – I’m Greg Manuel, VP Marketing and Strategic Planning at Datalogics. I haven’t seen any online comments from last week’s (April 10-11) PDF/A Conference in Amsterdam, so I thought I’d share a few thoughts.

Quick disclaimer: these are my thoughts and recollections from the conference; this info may not be 100% accurate.

This inaugural conference was held at the Hilton Hotel in Amsterdam (the same hotel where John and Yoko held their famous “bed-in” in 1969) on Thursday and Friday, April 10th and 11th. sponsored by the PDF/A Competence Center. PDF/A is an ISO Standard for using PDF for long-term archival of electronic documents (see the PDF/A Competence Center for more background on PDF/A).

I was there manning the Datalogics booth in the exhibition area, but also had time to slip in to see several sessions. Rather than attempting to review the entire conference, I’ll opt instead to share a few tidbits of info which I found interesting, in no particular order:

PDF/A-2, the next “Part” of PDF/A

As PDF/A-1 is already an ISO Standard (ISO 19005-1), work is already well underway on the next “Part”, PDF/A-2 (which is expected to become ISO 19005-2). PDF/A-2 will not replace nor make obsolete PDF/A-1. This message was stressed more than once: if PDF/A-1 is working fine for you, there’s no need to migrate to PDF/A-2. Instead, PDF/A-2 conforming documents will support features not supported in PDF/A-1, including support for:

  • JPEG2000 compression
  • transparency
  • optional content groups (OCGs), or layers
  • PDF packages

PDF/A-2 will also introduce another level of conformance. Recall that PDF/A-1 defines two levels of conformance: “A” level indicates ‘All’, or complete, compliance; and “B” level indicates ‘Basic’ compliance. PDF/A-2 will introduce “U” level, or ‘Unicode’ compliance; namely, that “…any text contained in the document can be reliably extracted as a series of Unicode codepoints.”

In addition, PDF/A-2 will introduce the concept of a “PDF/A conforming reader”, a PDF viewing application will follow certain rules/behaviors when rendering PDF/A files.

PDF/A-2 won’t be based on a PDF Specification from Adobe per se; rather, it’ll be based on ISO 32000, which will give it a little more “ISO-ness” (ISO standards bodies like to base ISO standards off of other ISO standards when possible).

Notably absent from PDF/A-2 is support for 3D formats such as u3d, prc, etc. Many attendees were disappointed, and will need to wait until “Part 3″ for formal support of this type of engineering information. PDF/A-2 is targeted for 2009 approval and release.

PDF/A and digital signatures

I went to a few sessions on PDF/A and digital signatures; since I am no digital signature expert, I found these sessions a useful backgrounder. Probably the most interesting thing I discovered was that, if I understand correctly, there is a discrepancy in the ‘timescale’ of PDF/A vs. digital signatures. By this I mean that while PDF/A documents should still be supportable in 50 to 100 years, the validity of digital signatures can only be for something like 5 to 7 years. Digital signatures rely on certificates issued by Certificate Authorities; these certificates can only last up to 5-7 years max before expiring. This limit is intentional – the theory is that technology advances too quickly to ensure that methods used today to authenticate certificates might be too easily compromisable several years from now.

It’s an interesting situation: you apply a digital signature to a PDF/A to indicate “I vouch for the contents of this document”. Then in 7 years’ time the certificate expires, so the signature cannot be validated. You then have to apply a new digital signature, but this ‘changes’ the original document. So what do you do? Well, perhaps you should still archive the original, even though the digital signature cannot be validated, along with the updated PDF with the new certificate. But now the digital signature you just applied no longer means “the contents of this document are valid”, instead it means something like “I vouch for the validity of this document at the time I applied my signature”. This is slightly different, and may have different legal implications…

In addition, you’ve just doubled your space requirements for archiving; and over 100 years you may be managing 15-20 PDFs for each original that you had stored…? I’m guessing that something will be done, either on the PDF/A side or the digital signature side, or on the implementation/best practices side, to change this model.

PDF/A Test Suite

Finally, I wanted to mention that the PDF/A Competence Center announced that they are working on a PDF/A Test Suite, a set of a few hundred PDF documents which can be used to test applications/environments for PDF/A conformance. The Test Suite is expected to be available as a public download (with no fees) from the PDF/A Competence Center website later this summer.

I think I’ll end it here – all in all, an interesting informative conference. I’m sure we’ll be back next year.


Exploring DLE using Python

February 15, 2008

I’ve always had an appreciation for the higher level languages, the ones that make life easier, that let you code rather than worry about the housekeeping.

Jack has mentioned the many advantages of DLE, using C#. C# is an improvement over coding in C or C++, since it relieves you of many of the burdens of tracking pointers and object ownership. You still have to compile the program before you can run it.

Scripting languages like Python give the best of both worlds. Programs don’t require compilation before being run, and in fact, you can type commands to an interactive console, just like in the old days of BASIC.

I’ve been something of a Pythonista for a long time now, and I’ve always wanted to access the PDF Library from Python. Now that we have DLE we can.

Before you go digging in the distribution to find the secret Python bindings, I’ll tell you there aren’t any. We’re going to use a little trick. There are versions of Python that run on some of the major VMs out there. One of them is IronPython, which runs on .NET, and the other is Jython, which runs on the JVM.

Both mix the ease of use of Python with direct access to the features of the underlying VM. Generally, Python and Java or .NET objects can be freely mixed, and you don’t have to really know in which language classes and objects are declared, especially from the point of view of the Python code.

For this article, I’m going to focus on Jython. Read the rest of this entry »


What About Those Robots?

February 14, 2008

Remember years ago (OK, many years ago, like the 1970’s, I’m old!) when computers first started to surface as potential products for the home?

People scratched their heads.  “Why would I need a computer at home?!,” they asked.  “I mean, c’mon, will we all really need to calculate moon rocket trajectories or 30,000 places of Pi in our living rooms?”  They didn’t see the need, the potential, the everyday practical uses of such a thing.  It wasn’t obvious.  It really wasn’t.

The first home computers were built for the home hobbyist, the nerd, with all sorts of switches and dials, costing tons of dough, serious lapses in personal hygiene, and countless love lives.  Home computers slowly evolved into something for the semi-nerd, then just the nerdish, then many normal people (ohhh…the Macintosh) and finally, now, everyone has one (at least one).  Anyway, we all know how this story unfolded, so no need to retrace the entire history of the personal computer in this article.

Don’t you think robotics will have a similar evolution, eventually being ubiquitous devices found in every home?  I do.

But I don’t understand why it’s taking so long!

Let’s review where we are now with these mechanical marvels.  In industry, robotics have reached true ubiquity.  Robots build our cars, they move things around in warehouses.  They sand, dip, spray, dry, and lubricate parts.  Factories are full of them.  Gleaming gadgets of endless programmed servitude, spinning figures of predetermined poetic mechanical motion, tireless whirling dervishes that flip things, weld things, paint things, dust things, rotate stuff around, and plop perfectly milled and polished Spider Valve subassemblies in a 10 micrometer tolerance location for yet another robot to later pick up a fraction of a second later in a kind of robotic ballet of Rotovector Metagear assembly.  They are truly beautiful, in a graceful, mathematical, precise moving parts kind of way.

But for the home, what do we have?  We have Frisbees that vacuum floors.  And for those undaunted by the prospect of lost limb lawsuits (LLL’s), there’s a robot that mows your lawn and hopefully not your neighbor’s pet rabbit.  But that’s…about it, for the home market right now.  Not exactly The Jetsons.  (Jane, don’t worry, there’s no Crazy Thing you have to stop at the moment.)  Not to say the thinking vacuums aren’t cool.  They are.  And they’re a great start.  But what’s next for the home?  Where’s the spit and polish?

Maybe we’re still kinda sorta in the head-scratching, “I don’t get it!” stage of Robot Evolution/Revolution.

One question that comes to mind for me is: will robots for the home be more of the general-purpose genre, or will they be specialty robots?  And if they’re more general-purpose in nature, will they tend to be humanoid/humanoid-ish in shape – the supporting theory of this configuration preponderance being that, if human-like-shaped, a general purpose robot will be able to do more of the things a human does, using things humans already use (and already own), like brooms, hammers, oil pans, and snow shovels.  And them being general-purpose will automatically imply one more bonus feature for us mere mortal humans: you’ll only have to buy one robot for all your robotting needs.

I wonder how far we are from seriously decent balancing, hearing, and vision technology.  Maybe that’s what’s holding this back.  You don’t want to come home to your new RoboMaid 3000 squirming around like a turtle on its back on broken coffee table glass with a vacuum cleaner in her claw futilely aimed in the air, as the pot roast it put in the oven 3-1/2 freakin’ hours ago continues hardening to one tenth its original size in the Viking blast furnace as she tells you over and over in calm soothing robot tones (that you paid extra for on robocalm.com) over and over that “something is burning”.  Thanks, RM3K, and so much for putting two and two together, Ms. quad math co-processor with NeuroNet™ upgrade Model MC-squared.

All the usual dark robots-taking-over cautionary tale stuff notwithstanding, I still want a robot, probably a general-purpose humanoid-ish kind (somewhere between C3PO and R2D2), one that plays chess (and lets me win, but is convincing that she is not letting me win).  Job One: clean my biohazard room, a.k.a. bathroom.  I mean the whole thing, starting with the toilet sparkling like the Hope Diamond, and finishing with all my cologne bottles being clean and in a row, and my toothpaste tube optimally squeezed from the bottom up and stashed away in my medicine cabinet.  Why not?  Robots, I’m told, don’t (correction: won’t) mind doing this stuff!

But what’s gonna light the fire here?  When are we gonna see this Robot Revolution we’ve been promised?  I say it’s time!

We all want our beds made (don’t we?) with mints on our pillows and the sheet corners turned down like we’re staying at the Beverly Hills Hotel.  We all want the dishes put away, our sinks cleaned, our houses painted, our rooms painted – no, wallpapered (though for some robot models, this particular activity will certainly void their warranty), our dogs groomed, our fish fed, carpets vacuumed, floors swept, oil changed, shoes put away, undies folded, plants watered, pianos tuned, cars driven sober, rooms added on, and heliports installed on our roofs.

Frankly, I’m sick of doing this stuff, and I’m really sick of the fact that I don’t own a robot that can.  This is 2008!  Do you have any idea how futuristic “2008” sounded in the 1960’s when the talking (and thinking!) robot from Lost In Space was rolling around in the 1990’s saving the Robinsons from low-budget intergalactic animatronic seaweed monsters while calculating the odds to like six decimal places exactly which way Dr. Smith would do something to ruin their chances of getting back to Earth this time?

Give the Robinson Cantankerous Clod (real name: Robot) longer arms and a toilet brush adapter and I’ll buy it.  Besides, retro is still in, isn’t it?  And those high voltage claws cook a perfect hot dog, I hear.  Don’t taze me, Ro’!

Seriously, I’d pay big bucks for a good, hard-working (tireless!) robot.  If this general-purpose robot does all these things – imagine the third party software market with programs that make your robot do things like: clean your pool while a filet of Canadian elk and Beemster Gouda lasagna he prepared (at 3 am, silently, while you slept; they roll 24/7!) bakes, timed to be ready when you get home to a newly robot-carpeted living room with a freshly tapped mini-keg of Heinekin on ice at the exact location your feet soon will be – it would be worth thousands of dollars.  Maybe like as much as a frickin’ car.  Think about it.  If people plop down 6 G’s for a plasma TV to watch…TV, in the endless American pursuit of perpetually Sitting On A**, they’ll buy this robot.

I’d personally trade in the BMW for the CTA if I didn’t have to clean…anything.  A compelling feature set-enough robot would sell like hotcakes.  There’s money to be made.  I just know it.

A robot in every bathroom.  That’s my vision.

So what’s stopping it?  I can’t be alone in this grand dream of the toilet brush-liberated apartment resident.  Is it the lack of maturity of technology, and it’s just a matter of time before it is ready for prime time?  Or is it the lack of pioneering spirit in technology companies, venture capitalists, and Wall Street pundits?  Is it that robots would cost too much to build and too much to buy for the benefits currently feasible?  Am I alone in volunteering I’d dish out thousands for a super-robot when no one else shares this sentiment?  Or is it something else?

Don’t tell me it’s The Twilight Zone vision of machines gone mad, bad, and rad, taking over their human creators, evoking the “I’m Perfect, Are You?” ending of ELP’s Karnevil 9  (I’m old).  C’mon!  These robots are automatons that obey commands.  Like computers, they just do what you tell them, and I think Rod Serling (I’m old) is a bit over-excited.  When’s the last time your computer, when you weren’t home, used Quicken to open an offshore account, transferred all your money from your account into its account, then made all the proper phone calls and Internet travel purchases to have itself moved from your house to a house full of other renegade computers (a “flip flophouse”, ba-doom, ching / I’m old) in Switzerland, leaving behind a sinister recording on your voicemail about that being the last time you kick it when you lose at Mindsweeper?  (I’m old.)  These are machines.  They have no soul, no self-determination.

Computers and robots don’t do evil things.  People do.  (Maybe using computers and robots, but that’s another story.  Countless stories, to be more precise.)

Anyway, it’s probably a combination of all these things that’s gumming up the works in the realization of the Robotic Dream.  When development costs, the price people will pay, technology needed to move your robot around your house without inadvertently remodeling it, and venture risk calculations and considerations all align, we’ll see our first generation of general-purpose human-like robots with throngs of people lining up to buy them.  (And, soon after, the second generation ones with throngs of robots lining up to buy them.  So sad and eerie, isn’t it?)

We’ll probably (hopefully) see some more special-purpose ones along the way.  But what’s next after automagic vacuum cleaners?  Some gizmo that picks clothes off the floor and dumps them in a hamper?  Some cyber-servant that can negotiate the pizza boxes in your fridge, pull out a beer, pop it open, and bring it to you in a “Cy-Beer” foam coozie?  Something that folds your socks and skivvies?  But how much would anyone pay for any of these?  It’s got to do some serious amount stuff I don’t like to do before I spend the big bucks.  (Hmmm…I do like the beerbot idea.)

Maybe this is part of the problem: there are too few logical Baby Steps from the present-day auto-vac to the super-duper android-on-steroids.  Perhaps it’s one of those Critical Mass things that doesn’t scale well from here to there, necessitating that things will have to emerge in plateaus, spurts, rather than a smooth continuum of amazing Robolutionary product releases.  So maybe we’ll just have to wait for the whole enchilada at this point.  (Though I’m sure there are some serious imaginations pondering great home robot ideas right now!)

I guess only time will tell where (and when – and if) this all goes.  Maybe it will be 20 years from now that a robotically-guided brush meets lavatory porcelain for the first time.  (I hope sooner – my bathroom can’t wait another 20 years.)

But for now, I’ll have to settle for my carpets automatically vacuumed.  And my lawn mechanically cut (if I buy a house again – after I get my robot) and possibly covered with rabbit…uhhhh, fur.

And while I wait, I guess I can invest in companies that are engaging heavily in home robotics R&D.  Excuse me as I look those companies up on the Internet.

But hey, where’d my computer go?!

“You have one message waiting.”  BEEP.


Using DLE In An Application

February 14, 2008

I am fortunate to have the unique experience and opportunity to be the first person at Datalogics, using the DLE layer, to get to write a full-blown C# .Net application for a customer, from absolute scratch (think: Visual Studio, File à New à Project…) so I thought I’d write to tell you all about it.

I like it!  It’s great!  And this is coming from someone who was tentative about the use of .Net, and someone who’s never written a .Net application from scratch, and has hardly used C#.  (Though I’m well-versed in C and C++ and have done some VB/ASP .Net code maintenance work.)  I can tell you, using DLE has been a very agreeable experience.

For the uninitiated, DLE is the .Net layer class library that sits on top of the PDF Library, providing an abstraction layer between a .Net application and the PDF Library.  Through the magic of object-oriented wizardry, it offers a simpler use of the PDF Library, with less headaches, less “plumbing” code, less worry, and a lot less work.

The Customer’s Application

Let me give you a brief explanation of what this customer’s C# / .Net application does.  It’s basically a PDF viewer with some nice extra bells and whistles.  With it, you can open a PDF document and view it in the normal way as you would expect, with page navigation, zooming, printing and all the usual stuff.  But what makes this application different is that it:

  • in addition to allowing the user to view a document in the traditional Single Page mode (one page at a time), it allows one to view it in Continuous mode, where all the pages appear “stitched together”, seamlessly, as though it’s one long scrollable document
  • allows the user to append any number of other PDF documents to the one they’re currently viewing
  • allows the user to print the document on a roller printer (plotter) as one long document of unlimited length and up to 3 feet wide
  • allows users to view bookmarks in a node-expandable/collapsible tree view and navigate to the pages to which they point
  • allows users to turn PDF layers on and off for both viewing and printing (independently)
  • allows the user to add annotations (I’m not there yet…but I’ll be sure to cover this in a future article)

As you can see, one central theme of the application is handling PDF documents that are very long.  (For this particular customer, this is very important, as they have a series of PDF documents that visually describe something physically long, that need to be viewed and printed together, contiguously, as a whole.  If I told you more, I’d have to kill you, as they say.)

In this article, I’ll just begin to extol the virtues of DLE with some of the basic parts of the application.  In future articles, I’ll get into more how DLE expedites the development of the more exotic aspects of it.  (By the way, I’m still writing the application as I write this article.)

The Benefits of DLE

So why is DLE so great, you might ask?  Here are the main reasons that I found so far:

  • Many of the single DLE functions each perform the work of many PDF Library functions; these are all encapsulated and optimized in DLE’s class library
  • Because C# and .Net supports automatic “Garbage Collection” (releasing programmatically-allocated memory), the burden on the programmer to write tight code to avoid memory leaks is greatly mitigated
  • It commingles quite well with Visual Studio and the existing .Net framework and standard .Net class library
  • It’s intuitive to use

You probably want hear about examples of all of these.  No problem!

DLE In Action

Here are the most basic things my application needs to do: 

  1. Start the PDF Library
  2. Open a PDF document
  3. Show a page from the PDF document on a Visual Studio control (a standard PictureBox to be precise)
  4. Terminate the PDF Library on application termination

Let’s begin with starting the PDF Library.  How does one start the PDF Library?  Jeez, what if you don’t know anything about DLE?

One way to find out about things is to look at the Object Browser in Visual Studio.  If you open any of the included samples (another great way to find out about things) you can see what’s inside DLE in the Object Browser in Visual Studio.  Expanding the Datalogics.PDFL assembly node and then the Datalogics.PDFL namespace node, you’ll see the entire class library before your eyes.

Guess what?  There’s a Library object.  Inside any of the samples you’ll see how this is used:

using (Library lib = new Library())

{

// run the application, utilizing the PDF Library via the DLE class library

} 

This “using” statement does a lot for you under the covers:

  • It starts the PDF Library for you
  • It terminates the PDF Library automatically after the last statement in the code block is executed
  • Upon terminating the Library, it releases memory allocated by the Library object and any objects it may have created

This is a great example of:

  • The Garbage Collector (GC) at work; you don’t have to worry about manually deallocating memory you previously allocated.  With C++, you would have to make sure you “delete” any dynamically-allocated memory later in your code to avoid memory leaks 
  • How a lot of work is done for you behind the scenes, as initializing and terminating the PDF Library can be a somewhat cumbersome process in the “old school” (pre-DLE) approach

After creating a new project and adding some UI and menu elements, you just need to add the statement above to the code.  Step (1) – and step (4): DONE!

Next: So how do you open a PDF document?  Check this code snippet out; it’s basically what you need:

Document myDoc;

OpenFileDialog dialog = new OpenFileDialog();

String FileName = dialog.FileName;

myDoc = new Document(FileName); 

In DLE, there’s an object called “Document”.  In the above code, we declare the object, then use an OpenFileDialog object (a standard object included in .Net) to explore the file system which returns a FileName.  Then we construct a new Document using that FileName, and lo and behold, we have a PDF Document from the file system, ready to play with.  The constructor did all the work, creating an instance of the object, and associating the object with a document on disk.  Cool, huh?

This is a great example of:

  • Encapsulated functionality.  Using a single “new” statement, I get a constructed Document object – initialized with a PDF document from disk! 
  • Intuitive use (could it be easier than myDoc = new Document(FileName)?  What, in ZERO lines of code?) 
  • Less memory allocation worries.  As soon as myDoc goes out of scope or is set to null, the memory allocated for this object is marked to be freed by the GC.  C#.Net doesn’t even have a “delete” keyword as in C++.  Though you can do a myDoc.Dispose() call to free memory on the spot if you don’t want to wait for – or don’t trust! – the GC.

So, next we want to display the open PDF on the myPictureBox PictureBox object.  There are a few steps to do this.  Basically, in a nutshell, you need to:

  • Set a myPage instance of the Page object to the myDoc.GetPage() method, grabbing a single page from the PDF document
  • Create a myBitmap Bitmap object sized to the height and width of myPage’s content (i.e., its MediaBox gotten via myPage.MediaBox) 
  • Set myPictureBox.Image property to the myBitmap object
  • Create a myGraphics Graphics object (a standard .Net object) and use myGraphics.FromImage(myBitmap) to create it using myBitmap
  • Create a myMatrix Matrix object for scaling, rotation, and translation, and a myDrawParams DrawParms object to specify various drawing parameters 
  • Use myPage.DrawContents(myGraphics, DrawParms) method to draw the graphics contents to myBitmap via myGraphics and thus to myPictureBox

Voila!  The page’s content is drawn to the standard .Net control.

In the steps above, you can see examples of all of the benefits I listed earlier.  All the objects, like the myPage, myBitmap, myGraphics, myMatrix, myDrawParams, myPictureBox don’t require any explicit memory deallocation when you’re done with them.  Getting a page from a PDF document is as simple as it could possible be: a single code line using the GetPage() method and storing the results in yet another DLE object, Page.  DLE interoperates with standard .Net classes, like the Graphics class, quite well.

Digging Deeper: Transformations Transformed

The matrix operations (in the steps above) in particular are greatly simplified for you when you use DLE vs. the Old School (pre-DLE) approach.  With pre-DLE, you would need to construct various matrices to perform scaling, translation, and rotation operations.  With DLE, you just create a myMatrix object, using the Scale(), Translate(), and Rotate() methods, and DLE constructs the matrices and performs the matrix mathematics for you, behind the scenes.

With DLE, you express scaling as simply and intuitively as one possibly could, in terms of two numbers: the X scaling and the Y scaling, instead of manipulating the [sX 0 0 sY 0 0] scaling matrix directly, as you would pre-DLE.  Same with translation.  You supply X and Y translation numbers instead of using the [1 0 0 1 tX tY] translation matrix.  Rotation is greatly simplified: you supply a single piece of information, the number of degrees of counter-clockwise rotation, a single parameter, instead of using the [cos T sin T -sin T cos T 0 0] rotation matrix!

The Matrix class still consists of the [A B C D H V] matrix, but represented internally as six properties of the object.  But you rarely need to manipulate these properties directly in your code – but you still can, as they’re public and gettable and settable – as the transformation methods do the heavy lifting for you.

To illustrate the differences between the pre-DLE and DLE methods, first, creating a rotation matrix the pre-DLE way, using C:

double pi = 3.14159;

ASFixedMatrix myMatrix;

double rotationDegrees = 45;

double rotationRadians = rotationDegrees * pi / 180.0;

myMatrix.a = cos(rotationRadians);

myMatrix.b = sin(rotationRadians);

myMatrix.c = -sin(rotationRadians);

myMatrix.d = cos(rotationRadians);

myMatrix.h = 0.0;

myMatrix.v = 0.0;

and creating a rotation matrix the DLE way, using C#:

Matrix myMatrix = new Matrix().Rotate(45);

You get the idea.  I know which way I’d rather do it. 

Furthermore, each of the DLE transformation methods performs work on a matrix and returns a matrix type, so you can perform any number of these methods “dotted” together, like this, for example:

Matrix myMatrix = new Matrix().Scale(3, 4).Rotate(45).Translate(0, 1);

In a single line, we have created a matrix that, when applied in our case to a bitmap in myPage.DrawContents(), will scale, rotate, and translate it (using standard SRT transformation order, as transformation operations are not commutative).

(For those not familiar with the PDF Library transformation matrices don’t worry too much about it.  These matrices are a way to mathematically represent transformation operations, and matrix mathematics is a way to implement them.  With DLE, these are abstracted away from you with these new transformation methods built into the class.) 

Coming Next… 

In follow-up articles I’ll write more about the special requirements of this application, like appending one PDF document with another, stitching PDF documents together in one giant, continuous viewing page, getting and displaying bookmarks from the PDF, manipulating PDF layers (a.k.a. Optional Content Groups), and printing to a specialty printer such as a roller printer or plotter with seamless, continuous printing, and how I implemented them all using DLE.


Getting Your First Programming Job for the Non-Traditional Seeker: Part 1

February 11, 2008

I recently met a gentleman who asked me for a bit of advice: he has a passion for programming and, after being in the workforce for several years as a technical writer wants to make the move into software development. He’s done some programming on a very part-time basis at work, but most of what he’s done has been for classes in his Masters program. The MS program he’s attending is an evening program and lies in the middle ground between a computer science curriculum and a software development program.

He’d like to know how to break into the field. It’s a good question – there are many people out there who’d like to get into the field who’d be able to hack it, but didn’t get an undergraduate degree in computer science (perhaps didn’t get a degree at all), don’t understand much about professional software development and don’t have any experience. While pondering his question and coming up with some answers, I’ve read questions from others asking about how to break into the field in similar but not identical situations.

Alas, I’m not the best person to be answering this person; I took what seems a traditional route these days – got a computer science degree, put out some resumes and applications and found a programming job as I graduated.

Even though my advice may not be accurate, therefore, I present it in two parts: part one discusses two particular obstacles I’ve seen in the industry (decries, perhaps, since I have little practical advice). Part two will feature specific advice.

The Obstacles:

Obstacle the First: “We Hire By Keyword”

Allow me to paint with a broad brush: there are two major schools to hiring for programming teams. One holds that it’s best to look for smart people who’ve a broad background and who can learn various tools and techniques. The other school holds that programmers are the sum of the languages and technologies listed on their resumes, and nothing more. I’ll not discuss the first school here; others have done a much better job than I’m able.

We Hire By Keyword is typical of consultancies and of companies where programming falls under the IT department and is secondary to the business at hand – in other words, any company that directly sells the time of its employees or that doesn’t “sell software” or do something closely related. Hiring in these companies is usually centralized through an HR department that’s responsible for hiring everybody in every department – and it’s impossible to know the requirements and attributes that signal brilliance for each and every single roll in these companies. A master salesperson is probably a very different person than a master accountant, but they all go through the same application filter.

How does this relate to you as a programmer? The way HR tries to sidestep this issue is to get a list of requirements from hiring managers in the form of various technologies, then run every programmer’s resume through a keyword filter and grade them by the number of matching keywords. Some hiring managers do the best they can to provide an open list – maybe just asking for a C# person – but then get rebuked for not being detailed enough. After all, just asking for a C# programmer will net a lot of resumes & these will be resumes that have to be further processed.

It’s much easier all around to look for a programmer with 12 years of C# 3.0 experience on Windows Server 2000 and who has 3 – 6 years of Microsoft SQL Server 6.5 experience in a clustered transactional database environment. Makes for vastly fewer people to follow-up with and interview.

Why’s this a bad thing?

  1. Those who make it through the filter are just as likely to be lucky or to have embellished their resumes than to be smart.
  2. These filters reward those who’ve had the same year of experience 10 times much more than those who’ve had 10 years of different experiences.
  3. The illusion of a skills shortage in IT is, IMHO, a manifestation of the over-filtering problem. That is, when overly specific job ads aren’t a cover-up to try and hire a specific H1B holder and indenture him/her to their employer. But that’s a separate discussion.

If I hire a contractor to remodel my kitchen, I don’t interview contractors and ask if they’ve used a #2 Phillips screwdriver but then send them on their way if they’ve not used a #3 Phillips screwdriver on maple cabinet doors. No – one looks for good references and experience, talks about previous projects and ultimately goes on some faith. Interviewing programmers is analogous – one should look for previous experience, talk about previous projects (professional, personal and school projects) and look for good references, among other things. Again, others have written about this at much length; the important thing is to, as Joel Spolsky says in the Joel On Software blog, look for someone who’s “smart and gets things done”.

It’s much easier for a smart person to learn a tool than it is for a tool to learn how to be smart.

Obstacle the Second: “Only People Who Are All-Consumed by Programming Are Worth Hiring”

There’s a… a belief that some hold out there. Or maybe it’s a grudge. Or maybe it’s an attitude based on misapplication of experience. I don’t know. But there are those who hold that only programmers who think about computers all the time, who work all day programming and then turn around and program as a hobby, who are easily excitable by the technology du jour and who spend all day decrying the faults of wetware – only those who’ve given their lives to the computer can be good programmers. All else need not apply, for none else are worthy.

I can understand some of the reasoning behind this, just as with hiring by keyword. Most all of us have worked with at least one person who makes no effort to stay current in any aspect of the programming craft or to learn anything new. Or worse, those who expect “the company” to tell them how to keep current and will learn what’s paid for and on company time. I’ve nothing against these attitudes – and I believe strongly that company-paid training is important (unless you’re an independent contractor, in which case it’s all your responsibility). Still, these people become the co-workers who are able to contribute less and less as time goes on. Then these people become ex-teammates. Then ex-colleagues who can’t find jobs and can’t get interviews. Trust me, I get the impact these people have on those who work with them. And we all know that one year of experience repeated twenty times does not make for twenty years of strong experience in the field. Keeping current in the field should be a joint effort between you and your employer but in the end is your responsibility. Right. Got it.

There’s a significant minority out there that takes this attitude out to absurdity, spending all their hours tooling around on the computer and basing one’s life and outlook entirely in world of the screen. While in the end this is one’s right and personal choice, I reject the attitude and reject the attitude that people who get out into the big blue room are somehow not dedicated enough to the “programming cause” and make for bad coworkers. This is the attitude that shows up for work at 11:00 in the morning and whiles away the hours IM’ing friends from work, surfing the web, &c., and staying until some off hour in the evening – and the attitude that derides those who show up earlier, do work at work and leave at a reasonable hour as “not dedicated” to the cause.

Nobody likes working with someone who doesn’t pull their weight. However, someone who’s at work 70 hours a week every week usually isn’t working very hard very many of those hours. Usually they’d do well to come in a little earlier, leave a little timlier and focus with the time they do spend at work. Some sunlight, hobbies that doesn’t involve a screen and a little exercise make for, in my experience, much better developers than those who while away all their hours in front of the screen.

In short: depth and passion are great but are compliments to, not substitutes for, breadth and versatility – the broader your mind, the more inspiration you have for the next great discovery. Work to live, don’t live to work – you get more work done that way.


Today’s thoughts about the mail

January 16, 2008

In the not-too-distant past I ordered tickets for RENT, the musical from the mid 1990s that’s on semi-perpetual tour and making its way back to Chicago again. Of course, all the large theaters in Chicago use Ticketmaster to handle ticketing – so I was charged for a variety of extra and excess fees, as I had expected:

  • $9.65 – “convenience charge”
  • $2.50 – “facility charge”
  • $4.15 – “order processing charge”

and, of course, the

  • $2.50 – “email you tickets instead of printing and mailing them ourselves, so you don’t cost us as much to service” charge

that I took a pass on. Why spend money just so that Ticketmaster can save some?

But I digress. I ordered these tickets, waited for a couple weeks and then the tickets showed up a few days ago. Well, not quite. An envelope with my receipt, accompanied by someone else’s tickets, arrived in my mailbox. Looks like a good ol’ fashioned printing facility error. Thankfully, nothing a little too much time on the phone couldn’t clear up. That is, after the 15 minutes it took to get the person on the other end to understand that I really did receive someone else’s tickets.

But this got me to thinking about transactional and especially about variable data printing. Variable data printing refers, in brief, to printing documents that are a combination of static, pre-built content and content that varies from document to document. Most of us will have seen examples in our mailboxes: those offers you and I receive which are tailored just to you and just to I, or at least just to people in your demographic and my demographic. Materials span from the barely personalized (credit card offers, for example) on through the gamut to highly personalized materials; these are usually saved for high-value purchases like weddings, to pitch former customers new cars after a few years, &c.

Most of the time, most of you and I just call it junk mail. The more personalized these mailings are, however, the more likely you and I are to respond to these. Some campaigns are even starting to tie-in with web advertising, magazine ads and other media. At this point in time, personalization and cross-media personalized campaigns still haven’t really taken off in a big, big way – but I foresee more and more of it in the next few years.

So, let’s say it’s a few years from now and I receive a personalized financial offer – for example, NewBroke Internet Brokerage may kindly send me an offer letting me know that I could move the $10,000 (for example) that’s in my brokerage account at JohnDoe, LLC to a new broker; because I average more than 10 trades a month, NewBroke will even give me $500 after 60 days with them. Now, let’s say that NewBroke sent out a number of these offers, all different: and let’s say that, like my RENT tickets above, I end up getting Johnny Unsuspecting’s offer. Thus I now know something rather curious about Johnny Unsuspecting: I have his offer saying he could transfer the $2,000,000 he has at OtherTrade Corp. to NewBroke. I now have some rather personal, and rather private, financial information about Johnny Unsuspecting.

Multiply this by the hundreds or thousands of mis-mailings that might occur before someone detects this. A mere mail mishap? Or a huge privacy violation and PR concern for NewBroke Internet Brokerage?

How about if this were a personalized offer to switch physicians instead? Or a personal mailing stating that I can get my chemotherapy 30% cheaper if I switch to a different lab? Now this isn’t just a privacy issue but (at least in the US) a HIPPA violation – and much closer to a criminal mistake.

Will personalized mailings become personalized enough that, if there’s a mailing mishap, they could release sensitive information? Enough to send a printing plant employee or supervisor to prison? Worse yet, enough to sink the company that contracted out their advertising campaign in bad press and litigation?