http://www.capsystech.com/static.asp?path=5646

Showing posts with label OCR/ICR. Show all posts
Showing posts with label OCR/ICR. Show all posts

Friday, February 28, 2014

I.R.I.S. Partners with Scytl to Create Document Imaging Systems for Elections

I.R.I.S. recently made an interesting announcement about a partnership with Scytl, which develops election management and voting systems. I.R.I.S., which is now owned by Canon Europe, is a developer of document imaging and automatic recognition/data extraction software. The companies recently got together and successfully completed election projects in Ecuador and Honduras.

From the press release, "Scytl looked for a company that could prove efficient extraction technology to complement their offering for the Ecuadorian elections 2013. The request encompassed supporting the election specific process where: the voting slips were gathered in the polling stations and grouped into reports. These were then scanned and processed in a decentralized scenario in 105 scanning centers. With I.R.I.S.’ advanced extraction technologies, Scytl was able to capture the election results from the reports automatically."

The reason this partnership interests me so much is because of what I, and several other people, consider to be security concerns associated with electronic voting systems installed in many states in the U.S.A. that don't produce any paper records. Due to my experience with document imaging, I don't understand why we don't utilize scanners, like Scytl is apparently doing with the help of I.R.I.S.' technology. A couple years, OMR technology was tried in the NYC area, but several glitches occurred. Perhaps Scytl, which seems to have successfully pulled off two Latin American elections with I.R.I.S.' help can bring its technology North.

Friday, January 03, 2014

Second Top Story of 2013: ABBYY Beats Nuance in Court

In a case that went on for more than five years, ABBYY finally prevailed in a lawsuit related to OCR patent infringement. The case was heard over two weeks in August in a federal court in San Francisco. ABBYY was awarded a "clean sweep," in the words of its general counsel. "The jury found no infringement on any of Nuance's patent or trade dress claims."

Nuance had been seeking $107 million from ABBYY and its customer Lexmark. Nuance's claims were related to OCR patents that were filed for in the late 1980s and early 1990s and granted in the 1990s and and early 2000s. The case started with six patents in question, but was narrowed down to three by the time it came to trial - two of which Nuance had picked up in its 2000 acquisition of Caere.

ABBYY's defense was "non-infringement." The patents had been through a reexamination, so arguing
that they were invalid was not an option,” said LeighAnn Weiland, VO and general counsel for ABBYY USA.“If you look at ABBYY’s methods as compared to the very simplified processes in the patents that Nuance is alleging we violated, it’s very clear that ABBYY is not infringing. What Nuance has patented is analogous to building a bicycle, while we are building jet airplanes.”

There are not that many developers of OCR technology left on the market, but ABBYY's win was certainly a victory for those that are left, as well as end users - including those who utilize Google's Open Source OCR. Had Nuance won, we're assuming it would have gone after patent licensing agreements with everyone else in the market. And while Nuance still could go after patent infringement by other vendors (whose development methods presumably differ from ABBYY's), ABBYY is their biggest and most direct competitor, and this loss will certainly take the wind out of Nuance's sails - in addition to money out of its pockets that could be used for further litigation.

It's probably worth noting that Nuance's legal team for the case included outside counsel from Morrison & Foerster LLP, the same firm that represented Apple in its patent suit against Samsung. “We were up against the best of Silicon Valley,” said an elated Weiland. “It’s quite gratifying that our team could work like a little machine to convince a jury of what we believe are the actual facts of the case."

The verdict brought to an end an important case that we had been covering since 2008.

Sale of Kodak DI - Top Story of 2013

I apologize for being a few days late with this but I was busy enjoying holidays with my family, as well as dealing with living in the snowiest city in the U.S. so far this year: and yes, it is certainly coming down as I write the first draft of this post on Thursday evening (We're at 70 inches and counting- more than twice as much snowfall as Anchorage Alaska!). Anyhow, in between holiday cheer and bouts of flu and cold, I've been going over my annual article index for 2013, which will get published along with my 2014 predictions next week.

Going over the index is a great way to review the top stories and events of the previous year, which is inevitably leads to a top 10 list or something like that. For 2013, it seemed there were certainly two stories that stood out above all others, and maybe eight more that I thought were fairly significant.Today I'll share with you my top story in the document imaging industry from 2013 and follow up shortly with number 2, and the rest thereafter.

Without further ado, here is a summary of the top story we covered in DIR in 2013:

1. Kodak Document Imaging is acquired by the Kodak U.K. Pension Plan (KPP).

In a deal that was announced in May 2013, one of the leading players in our market was sold to an organization that is roughly the equivalent of an equity investor, but with a longer-term vision. Not coincidentally, KPP, which operates independently of Eastman Kodak, also happened to be Eastman Kodak's largest unsecured creditor. KPP agreed to pay Eastman Kodak $650 million in cash and non-cash considerations for DI and Kodak Personalized Imaging (PI), which combined generated $1.46 billion worth of profitable revenue in 2012. As part of the deal, Eastman Kodak was also relieved of $2.8 billion in claims that KPP had made against the bankrupt organization. So, in all Eastman Kodak received a potential net $3.45 billion for the two businesses, which KPP renamed Kodak Alaris, when the sale was completed in early September.

This brought to an end a saga which began in early 2012, when after months of rumors, Eastman Kodak filed for bankruptcy. Originally, DI was positioned as a "core business" that Eastman Kodak would hang on to help fund its emerging "growth businesses." That changed in August when Eastman Kodak realized it needed to sell more assets to pay off its creditors and get them to agree to the terms of its bankruptcy. Per bankruptcy laws, a formal process was put in place for selling DI that included accepting a "stalking horse" bid that would serve as a starting point in an auction.

The stalking horse bid came in April from Japanese manufacturer Brother, which offered $210 million, plus the assumption of $67 million worth unfulfilled service contracts for DI. However, that bid was trumped less than two weeks later by KPP's much higher bid for both DI and PI.

Just a few weeks after the sale to KPP was closed, the recently renamed Kodak Alaris DI put on its second annual Global Directions educational conference, where the keynote was noted author and futurist, and current Google Director of Engineering Ray Kurzweil. The event focused on a more software-centric future for Kodak DI. "In five years, we’d like to have at least one third of our revenue coming from software,” said Tony Barbeau, DI VP, products and services. “It could be higher depending on how much investment the organization makes. We could possibly choose to complement our organic growth through acquisition."

Barbeau and most of the Kodak DI management team, including President Dolores Kruchten, stayed on through the acquisition, so we don't expect any major shake-ups in the way DI will be doing business going forward. That said, everyone in the organization seemed relieved and somewhat elated that the sale to KPP was completed. They are looking forward to the opportunity to run free of the burden of their failing parent and the increased nimbleness and aggressiveness their new position should bring. We expect more exciting news from Kodak Alaris DI in 2014, but we're not sure if it can top the exciting events of 2013.






Thursday, November 07, 2013

Partnerships Take Technology into new Geographies

This week both NovoDynamics and KnowledgeLake announced interesting partnerships that will help them expand into new geographical markets. In conjunction with the recent GITEX show, held in Dubai, Novo, a recognition technology specialist, announced that ForeFront Technologies, a VAD that focuses on the Middle East and Africa, will be carrying its OCR software. KnowledgeLake, which develops software for document image-enabling Microsoft SharePoint, announced that PFU will be introducing its technology into PFU's ECM practice in Japan.

Novo, which first came onto our radar screen because of its Arabic OCR technology (it currently supports Chinese, Korean, Russian, Spanish, and English languages as well), exhibited at GITEX. "This show covers all areas of IT and expects over 140,000 visitors before the week is over," reported Art Nichols, Novo's VP of Global Sales, who attended the event. "Forefront is a large Fujitsu and Kodak distributor that also sells Kofax and now NovoDynamics NovoVerus."

Georges Mehchi, CFO and Managing Partner for ForeFront sounded pretty excited about the partnership. As quoted in a press release,  “The intelligence that NovoDynamics has built into NovoVerus’ software truly raises the bar for language detection, recognition and data extraction, taking Arabic and multilingual OCR to an unparalleled level! Introducing this technology into Middle Eastern and African markets will be life changing, not only for Arab nations, but globally.”
 The KnowledgeLake-PFU partnership was a natural, seeing how the ISV is now a wholly owned subsidiary of PFU. Said Ron Cameron, president of KnowledgeLake in a press release, "“This natural progression of our partnership with PFU will extend their already successful ECM practice to include SharePoint ECM. As SharePoint continues to gain momentum in the Japanese marketplace, we hope this partnership promotes the profile and perception of Microsoft’s platform by providing value around its robust ECM capabilities. We are grateful for this opportunity and I couldn’t think of a more suiting partner in this effort than our parent company, PFU."

I don't think there is any question that we are truly working in a global economy today. Yes, there are certainly hurdles to be cleared to be successful doing business in multiple countries, but working with strong partners, like the ones that NovoDynamics and KnowledgeLake have chosen, represents a great way to clear these hurdles.

Tuesday, August 27, 2013

Jury Rules in Favor of ABBYY, Lexmark, in OCR Patent Trial

(Some updates since first post)
The long-lasting OCR patent lawsuit filed by Nuance against ABBYY and Lexmark is finally over. Yesterday, a jury appointed by the U.S. District Court of San Francisco, ruled unanimously in favor of ABBYY and its partner Lexmark. It ruled that neither company owes Nuance anything in damages related to patent or trade dress infringement. 

The way I understand it, Lexmark, which manufactures printers and MFPs, was a partner of Nuance, but at some point, prior to 2008, when Nuance filed the suit (I guess the suit was originally filed in Wisconsin in 2002, but moved to California in 2008), Lexmark switched out its bundled Nuance OCR technology in favor of ABBYY's. Nuance accused both Lexmark and ABBYY of attempting to create packaging that resembled Nuance, and also accused ABBYY of violating five six patents that Nuance picked up in its 2000 acquisition of Caere. ABBYY promptly filed a countersuit, accusing Nuance of violating two of its patents, as well as violating anti-trust act. The whole thing was combined in one trial in the Court of Judge Jeffrey S. White.

In 2009, eCopy and it's OCR partner I.R.I.S. were dragged into the suit, but that was apparently resolved when Nuance acquired eCopy later that year and replaced the I.R.I.S. technology with its own.

Apparently before the case went before a jury, in a trial that started earlier this month, it was narrowed down to three patents.

I've read Nuance's OCR patents and they are pretty broad based - meaning that if ABBYY were found in violation of them, it could have affected everyone else developing (and licensing non-Nuance) OCR technology. So, this decision should have many people in the document imaging market breathing a collective sigh of relief. 

No word yet if Nuance plans to appeal, if they can, and/or if they will go after anyone else for patent infringement related to OCR . We expect to talk with ABBYY reps later today and I know Nuance is planning on issuing a statement. We'll keep you posted as more news on this develops.

Tuesday, June 11, 2013

eDiscovery Dispute Highlights ABBYY-Nuance OCR Patent Suit

This is kind of ironic - and thanks to our friends at Harvey Spencer Associates for connecting us with the link to this article, but it seems ABBYY has been ordered to pay Nuance $135,000 in a dispute over document discovery. Yes, it seems the two OCR ISVs are arguing over the exchange of documents.

U.S. District Judge Jeffrey White, whose office is in San Francisco where Nuance's lawsuit against ABBYY and Lexmark (Lexmark licenses ABBYY technology) over OCR patent infringement is being heard, ordered "Abbyy to pay Nuance $135,000 in sanctions for taking so much time handing over requested documents that the court reopened discovery and Nuance retook depositions that had already been completed."

Also from the article "ABBYY contended that it disclosed the information late because it was tied up responding to 'Nuance's multiple other discovery requests seeking massive amounts of irrelevant information,' but Judge White didn't buy that excuse. 'The court does not find the delay in production justified considering the scope of this case and the sheer amount of lawyering and the parties' investment of time and effort,' he said."

Alright, so it appears the game is on. This case has been in court since 2008, but it seems some headway is finally being made. According to the article, "All the parties were ordered to attend a settlement conference to be held no later than July 5."

If you remember, one of the predictions we made in DIR to start the year was "Some major market developments driven by ongoing patent lawsuits." Stay tuned.

Monday, May 06, 2013

Is ICR Technology Underutilized?

Don Dew of advanced document recognition ISV Parascript recently authored an article on the benefits of intelligent character recognition (IDR) technology that he has asked us to share with you. It discusses how ICR technology is underutilized on documents that include handprint and even cursive writing.

Here's an excerpt from Dew's article, "From name, address, social security number, phone number, or any other unconstrained or cursive information entered on a form, advanced ICR solutions can capture this data with a high degree of accuracy and make it available for use within the organization. Based on research performed by AIIM and Parascript last year, only 6% of organizations are automating this level of recognition. At the same time, survey participants estimated that they would achieve a considerable level of productivity savings if they were able to automate the recognition of hand-written text."

Here's the link to Dew's complete article on our DIR Web site.

Just for some background, Parascript develops advanced recognition technology that does both OCR and ICR. Historically, it has been best known for its cursive recognition, which is utilized by the USPS in their envelope and parcel sorting operations. Parascript also offers a document recognition-centric toolkit that includes both OCR and ICR. Parascript recently released a new version of this SDK, as well as a new version of its FormXtra Capture application, which it is moving through a recently revamped reseller program.

Monday, March 04, 2013

OCR/ICR Survey Highlights Infographic

We originally highlighted the results of this AIIM survey, (sponsored by Parascript) when it first came out last summer. (Download the entire report here).

Some interesting outtakes, some of which have been highlighted below in an outstanding infographic:
  • 55% of respondents who were scanning documents were key entering data
  • Only 32% were using OCR with ICR and cursive recognition utilization significantly lower
 Check out the rest of this info:

Thursday, January 10, 2013

Zagami Contracts with Beyond Recognition

Back in September, we did a story on an innovative classification and full-text indexing operation out of Tennessee called BeyondRecognition. Basically, its claim to fame was having successfully indexed 2.3 billion images that were given to it in boxes full of CDs and DVDs with little-to-no indexing information attached to them. BR used some glyph scraping and matching and threw some other semantic type understanding into the mix and successfully completed the project.

BR has productized its technology and is marketing it to the legal space, for help with discovery, as well as anyone else that requires classification and grouping of large volumes of documents. BR also can incorporate innovative data extraction techniques.

BeyondRecognition recently announced that is has signed on former AIIM and TAWPI Chair Bob Zagami as a member of its Advisory Board. Zagami is a veteran of the document conversion services market, most recently serving as an executive with DataBank IMX. For BeyondRecognition, Zagami will act as an authorized sales agent for BR with the intent to focus on large-scale document management processes for Fortune 500 companies. Read the complete press release.

Wednesday, September 19, 2012

Canon to Acquire I.R.I.S.

Canon, working through its subsidiary Canon Europe, has made a bid to acquire Belgium capture ISV and systems integrator I.R.I.S. The two companies have been partners since Feb. 2009, when Canon Europe became a reseller of  I.R.I.S. products. A few months later, Canon followed-up by buying a 17% stake in I.R.I.S.

I.R.I.S. is probably best known in North America for its OCR/ICR software. Several big-name companies like Adobe, HP, and Evernote license I.R.I.S.'s technology in this area.

 I.R.I.S. also has a batch capture product - which has its roots in software it formerly licensed to Kodak through an OEM agreement (Kodak Capture, the predecessor to Kodak's current Capture Pro software). In 2008, I.R.I.S. acquired German IDR ISV Docutec and markets a document classification and extraction product - IRISXtract, based on the Docutec technology. I.R.I.S recently ramped up its North American efforts around Xtract, which includes licensing Xtract to Salumatics, a Canadian outsourcing firm, that is using the technology to capture healthcare patient records.

 I.R.I.S. has several other software products and some hardware, like mobile scanners and a pen scanner, as well. I.R.I.S. also has a ECM systems integration/professional services business that mainly operates in the Benelux region. This integration business has historically accounted for more than half the company's revenue.

For 2011, I.R.I.S. reported revenue roughly the equivalent of $158 million, but it also went through a reorganization last year. For the first half of 2012, I.R.I.S. revenue was down 33% to around $58 million, but its EDITDA (cash flow from operations) was actually improved over 2011.

Commented, Denis Hermesse, CFO I.R.I.S. Group, “We have seen a shift in our revenue mix with an increase in revenue from license, maintenance and services (including system performance and remote monitoring) and less hardware sales with low margin."

The deal
The offer Canon has made is for EUR 44.50 per share, or the equivalent of $92 million for the remaining 83% of I.R.I.S. This represents a 50% premium over what I.R.I.S. shares were trading for, before trading was suspended as the deal works itself through. It values I.R.I.S. at around $111 million, which is considerably less than the $184 million valuation related to Canon's $31 million investment in 2009.

In 2009, I.R.I.S. was coming off a 2008 in which it reported an EBITDA of Euro 9 million on revenue of Euro 108 million. Based on the first half, 2011 EDITDA projects to Euro 7 million on Euro 85 million.

Commented I.R.I.S. CEO Pierre de Muelenaere in a press release, “We are very pleased to have reached this important milestone for I.R.I.S. Group, and proud that Canon intends to bring our company within the Canon group. The entire board of I.R.I.S. Group fully supports this bid and we are committed to making this transition a success, which we believe will be to the benefit of our customers and all our stakeholders.”

For Canon, the move represents part of the overall trend of MFP manufacturers moving more toward software and solutions. Commented Rokus van Iperen, President & CEO, Canon EMEA, "Canon has identified business solutions and professional services as important focus areas for future growth and we believe this investment will bring long term opportunities to build on our success in the solutions and consultancy businesses to date. We will be working closely with I.R.I.S. Group, as a stand-alone company, to deliver more advanced solutions and services and greater customer value.”

For the record, "More acquisitions of Capture/DM/BPM ISVs by Hardware Vendors," was one of the six predictions for 2012-2013 I made at the Harvey Spencer Associated Capture Conference two weeks ago.

Friday, August 31, 2012

A Data Capture Systems Book Review

Yes, Dr. K. Bradley Paxton of ADI (for Advanced Document Imaging) has written a fairly comprehensive book on implementing and maintaining forms processing systems. Paxton spent 32 years at Kodak and is probably best known in our industry for his work encouraging the U.S. Census Bureau to adopt digital imaging technology. He has plenty of experience in our market and it is certainly leveraged in this comprehensive book.

 Here's my complete review of his book on the Amazon page. The title is Handprint Data Capture in Forms Processing: A Systems Approach, but it's really about automating any type of document capture, from OMR to OCR to handprint. Paxton offers plenty of sound advice on how to really make your system hum - and then how to make adjustments to help it keep humming going forward.

As I say in the review, some of the statistical formulas went right over my head, but there is plenty of valuable stuff in there. It's probably the most comprehensive, neutral (meaning non-vendor) piece I've ever read on implementing automated data capture for documents. Can't see how this could not provide an ROI for anyone doing any volume of capture.


Wednesday, August 15, 2012

Thoughts on Parascript/AIIM Forms Processing Study & White Paper

Lot of interesting information in a recent study conducted by AIIM and sponsored by Parascript.

Parascript develops a slew of recognition technology including handprint and cursive recognition. Not surprisingly, a follow-up article written by Parascript's Don Dew highlights some of the shortcomings in adoption of handprint/cursive recognition. According to Dew, "In most organizations, hand-written fields are prevalent on a significant number of forms. 42% of respondents indicated they have hand-written data fields on half or more of their forms. In addition to being prevalent, these hand-written forms are also important to the efficiency of the business process. 40% of respondents say they are quite important; 20% say they play a key role."

 "However, many organizations are not taking advantage of this information. 88% of respondents say they scan forms, but only 32% say they perform text recognition to automatically make that data readily available for use in their organizations. The majority of respondents (55%) report they scan images and manually re-key the data as part of their workflow."

 Of course, this is where Parascript's technology could come in, or, the crowdsourcing data-entry solutions from companies like virtualsolutions and Captricity, which were featured in our last premium issue. Some combination of the two may actually form the most efficient solution.

Another interesting point made in the study is that users cited a multitude of forms as the number one reason that they are not using forms processing technology - in other words, they feel the templates are too hard to set up. This should be interesting news to companies like ITyX, a German artificial intelligence vendor that DIR was recently introduced to.

Anyhow, there is a lot of interesting stuff in this white paper about forms processing adoption, what end users are implementing it for, and why they are not in certain areas.  The bottom line to me seems to be that parochial/departmental management of many forms capture operations prevents users from looking at the top tier capture automation solutions out there. They just don't have the bandwidth to consider the cutting edge technology that is most often included in enterprise capture applications. SaaS/Cloud services may prove to be the way around this.

Tuesday, December 27, 2011

Scannx Licenses ABBYY SDK

Scannx has signed an agreement to license ABBYY's FineReader SDK. SCANNX is an ISV focused on providing cloud services for scanning and capture. The ABBYY license is for traditional OCR/ICR software that will initially be incorporated in the book scanning software that is going to be bundled with the new Xerox book scanners that are being marketed by Scannx.

From the press release: "We are integrating ABBYY’s technology into our software and bundling it into a self-service book scanning center for library patrons and staff”, said John C. Dexter, president and COO of Scannx. “The book scanning center includes a 15-inch touchscreen computer preloaded with the Scannx and ABBYY software, and a patented book-edge scanner to protect the spine of the book. The scanner’s beveled edge enables students to scan to the edge of the book spine, producing clear and legible text in the center of the book. When the clear image is converted into searchable or editable text by ABBYY’S FineReader technology, the result is unmatched accuracy. Moreover, our implementation for OCR conversion is more than twice as fast as competitive systems.”

Tuesday, October 11, 2011

PFU Invests in ABBYY

PFU Limited has reportedly invested about $100 million in ABBYY. PFU is a Fujitsu subsidiary which manufactures the scanners sold by Fujitsu subsidiaries worldwide, including FCPA. ABBYY is a developer of recognition technology including OCR/ICR and IDR used in document imaging applications.

This continues a trend of document imaging hardware vendors investing in software ISVs over the past few years. Of course, HP's acquisition of Autonomy, is the most recent example, with Lexmark acquiring Perceptive in 2010. PFU has also invested heavily in KnowledgeLake.

The investment reportedly values ABBYY's business at around $2 billion as it supposed to be for about a 5% stake. Not sure, what ABBYY's annual revenue is, but as Harvey Spencer values the worldwide document capture market at somewhere south of $2.5 billion, with ABBYY holding less than a 10% share of that...well PFU seems to have paid a pretty good premium for its investment, which is certainly in-line with what we've been seeing by hardware vendors looking to diversity into sofware.

Friday, September 30, 2011

ABBYY Signs Reseller Agreement with Xerox

Xerox will now be selling ABBYY's Flexicapture document and data capture and Recognition Server OCR and PDF software, in connection with its DocuShare Web-based document management platform. Introduced as primarily an electronic document repository at least 10 years ago, Xerox has introduced records management and workflow features in recent years. With the ABBYY software Xerox is adding intelligent data capture and server-based OCR to the mix.

Xerox represents ABBYY's largest reseller partner. More on this in an upcoming premium issue of DIR.

Wednesday, April 13, 2011

Kofax Releases New Version of Express

Kofax has announced a new version of its SMB/departmental batch capture software. Kofax Express 2.5 includes automated data capture improvements, as well as certification with SharePoint 2010. On the data capture front, users can now do zonal OCR as well capture fields by rubber-banding text on images.

Monday, January 31, 2011

OCR Accuracy Benefits

One of the often under-rated benefits of automated data capture technology is the accuracy improvement it can bring to a process. I was just typing a colleague's business card info. into Outlook (yes, I still sometimes use Outlook), and I accidentally typed the fax number into the field for cell phone. Luckily I caught it and saved myself the pain of hearing a loud screeching sound when I thought I was dialing a mobile phone number. As I typed in the correct number, I realized that would not have happend had a I scanned the card and used my OCR-enabled business card software, which I typically use for batch capture only.

So, yes, while we usually like to stress the productivity benefits - less manual keying - of OCR applications, there are certainly accuracy benefits as well, that can potentially further contribute to the ROI - how much depends on the value of your downstream data. Which brings me to a second instance of inaccurate data entry that I ran into this weekend: My wife had recently purchased a memorial service (I don't know if purchased is the right word, but...) for her father-in-law in our family's name. Well, wasn't it quite a surprise, when I received the church bulletin this week and saw the service was being held in memory of me! Whoops. The flowers have certainly brightened up the house though.

Monday, November 15, 2010

Latest on ABBYY-Nuance Lawsuit

Apparently, ABBYY's parent company is back in the lawsuit, being brought by Nuance over OCR patent infringement. According to this blog post, both ABBYY 's Cyprus and Russian location, which had previously been dismissed from the suit, have been ruled back in play. The case, which was originally filed in 2008 is currenlty being played out  in United States District Court for the Northern District of California under (case no. 08-CV-2912) Judge Jeffrey S. White.

Wednesday, November 10, 2010

CVision Releases PDF Compressor 5.0 with SuperFast OCR

CVision has leveraged its pattern recognition expertise to develop what it terms as "Super Fast" OCR in the latest version of its PDF Compressor software. The software is named "PDF Compressor" because of its ability to create very small PDF files from scanned images. This is achieved through the application of segmenting and JBIG compression.

In PDF Compressor 5.0 CVision, an ISV based in Queens, NY, utilizes inter- and intra-page font learning to accelerate the OCR process. According to CVision founder and CEO Ari Gross, "We can apply OCR 10 times faster than [a leading OCR vendor], with the same accuracy."

PDF Compressor 5.0 also features a "Super Accurate" mode that leverages font learning. In this mode, CVision boasts a 5-10% accuracy increase over leading OCR engines. This is based on word accuracy.

It's probably important to note that CVision does not develop its own OCR technology. Rather, for PDF Compressor, it licenses Nuance's OCR toolkit and improves upon it with its proprietary technology.

PDF Compressor is a mature products with an install base in the thousands. There are versions available that can be integrated into the workflows of leading capture platforms from Kofax, EMC Captiva, and Oracle.

Wednesday, October 06, 2010

GruntWorx leverages Tessact OCR

I thought this was pretty cool. Remember, in 2007 Google announced it was launching an open source OCR project based on the Tesseract Code, which was developed by HP in the late 1980s and early 1990s. At AIIM that year, we interviewed document capture/OCR expert Chris Riley on what he thought would be the effects of this initiative on the OCR industry.

In our April 20, 2007 issue, Riley commented, "“The real threat to the commercial OCR market could come from independent developers who decide to take the engine and run with it. The technology’s true power could be unleashed when it is set into motion for a niche type of processing, and fine-tuned to do it well."

For more than three years, we didn't hear a whole lot about people leveraging open source OCR. However, currently we are working on a story on a company called Copanion that has leveraged the Tesseract OCR technology to create a niche SaaS application for capturing data from tax forms. Based on the number of forms they processed, we're estimating their run rate for the 2010 tax season was around $3 million and they are expecting to surpass $10 million for the 2011 tax season.

Granted, they use a lot of their own proprietary algorithms on top of the Tesseract OCR, but it's kind of cool what they are accomplishing. For more, check out this week's premium issue of DIR.