Wednesday, October 06, 2010

GruntWorx leverages Tessact OCR

I thought this was pretty cool. Remember, in 2007 Google announced it was launching an open source OCR project based on the Tesseract Code, which was developed by HP in the late 1980s and early 1990s. At AIIM that year, we interviewed document capture/OCR expert Chris Riley on what he thought would be the effects of this initiative on the OCR industry.

In our April 20, 2007 issue, Riley commented, "“The real threat to the commercial OCR market could come from independent developers who decide to take the engine and run with it. The technology’s true power could be unleashed when it is set into motion for a niche type of processing, and fine-tuned to do it well."

For more than three years, we didn't hear a whole lot about people leveraging open source OCR. However, currently we are working on a story on a company called Copanion that has leveraged the Tesseract OCR technology to create a niche SaaS application for capturing data from tax forms. Based on the number of forms they processed, we're estimating their run rate for the 2010 tax season was around $3 million and they are expecting to surpass $10 million for the 2011 tax season.

Granted, they use a lot of their own proprietary algorithms on top of the Tesseract OCR, but it's kind of cool what they are accomplishing. For more, check out this week's premium issue of DIR.


Nightingalemd EMR said...
Anonymous said...
Anonymous said...
DIReditor said...

Copanion CEO Ed Jennings did not inidicate the most popular product was the Organizer or $3 product. He said something along the lines of "most commonly, customers will purchase all three of our services as a package." Becuase this package nets them $30 per return, I multipled that by 100,000 to get my $3 million revenue figure. But, that may be a bit optimistic.

For reference, who are the competitors you're talking about?

I will send your comments over to Jennings to see if he wants to comment on the privacy aspects. I assume Copanion is just sending snippets of data, so I'm not sure what sort of restrictions apply.