Thursday, October 27, 2005

More on Yahoo! book scanning

Here's an interesting report on the Internet Archive-hosted book scanning party held this week in San Francisco. Apparently, Microsoft has gotten involved -- anything to counteract Google. Also, if you just want to cut to the chase, here's an excerpt into how the book scanning is apparently being handled,

"While Google has released few details of its scanning project (the search company has nondisclosure agreements with its library partners), the Internet Archive had a display of its technology at the Tuesday night event.

The Internet Archive built a specialized scanning machine and written open-source software called Scribe for the specific purpose of digitizing books. The "machine" is an assembly of a standard PC with the Scribe software installed, two Cannon EOS cameras, a pedal-operated glass and metal stand to hold and secure books at an angle, along with a table and chair. The machine looks much like a photo or voting booth, with black cloth covering a box frame and shielding the books and computer gear from ambient light.

The chair seats one person, who operates the computer program and turns book pages by hand. During the scanning process, the book sits at a 90-degree angle under glass, which protects it from the camera light and causes the least amount of damage to its pages, according to the Internet Archive. The operator pushes a pedal under the table to release the book from under the glass, and turns the page before it's ready to take another picture.

Once a picture is taken, both pages of the book appear on a computer screen in their original form. The Scribe software then finds the center of the page and makes adjustments of the picture's angle or ensures that it's cropped properly. It will also clean up any poor coloring and make it uniform.

The operator enters some metadata about the book--its author, title and publication date. And once the book is scanned, it's then saved to the system and catalogued. Scribe takes the metadata from the book and matches it with data from existing card catalogs in order to prevent duplication. The work is then added to the digital record.

It takes roughly one hour to scan two 300-page books. And it costs an estimated 10 cents a page, split among data storage, labor and equipment and administration fees, according to Brewster Kahle, the project's leader. The cost does not take into account libraries' fees for getting the book to the scanners.

Daniel Greenstein of the University of California's archive project said that his group has donated $500,000 to assess the ultimate costs of scanning from the libraries' perspective.

The Internet Archive currently has 10 scanning machines, but it is ramping up to build 10 more in the next year."

I guess our quesiton is, with all the cool book scanning technology out there that we saw at AIIM, why? I guess we'll need to call Brewster Kahle to find out.


Friday, October 21, 2005

New Yorker project

If you're interested in magazine scanning, you might want to give this a listen. It's an NPR interview (5 minutes) with the project leader for the New Yorker's DjVu 8 DVD set.


Other imaging stocks

Oh yeah,related to the EMC deal, I love this post on Captiva's Yahoo! message board. This sector is heating up. I just talked to a financial analyst who was all bummed because now that Captiva is off the market he needs to find someone else to cover. The problem with ScanSoft, of course, which is now known as Nuance, is that the majority of its revenue is from voice recognition. So, unless that poster is talking about the company spinning and selling off it's imaging business... to raise money for the speech business, guess it could happen, but wouldn't cause a huge jump in stock I don't think.

Anyhow, also got a couple odd press releases - maybe just the timing one was. One is that eCopy will be showing at Documentum's Momentum event. Remember that article I wrote about the ideal capture company commanding both ad hoc and batch product lines - could EMC buy eCopy.

Also, Datacap just announced it was integrating with the latest version of AIX - the old OTG document imaging application. Announced it today of all days. What's up?

Finally, looks like imaging-related stocks like FileNET,Dicom, Plasmon, and even Stellent and Xerox a little are up today. ACS is way up too but for different reasons.

Reynolds role at EMC

Of course the big news in the industry is that EMC has announced its intention to acquire Captiva for $275 million cash. Internestingly, a day following that announcement, Joe Tucci, EMC's president and CEO was appointed Chairman of EMC's board. I had understood that under SOX, the sort of single-person triumvirate was frowned upon. Not that EMC has traditionally cared what anybody has thought...

But here's the thing - even though Reynolds got cash and would seem to have every reason to step out - wouldn't he be the perfect guy to help turn EMC around. Look what he did at Captiva. We're talking less than a dollar per share just a few years ago - now he sells it for 22.25 per share. Great work. Reynolds made Captiva into a money making machine. Could he do the same thing at EMC? It's obviously a much larger business... Anyway, I'm all in favor of it - after all, we did award him our DIR Man of the Year back at AIIM.

Carry on,


Tuesday, October 11, 2005

Kodak Digital Assets

This is an interesting story on the state of things at Eastman Kodak. It discussed the struggles the company has had in making its transition to digital. Of course, the document imaging part of the business, which we cover, was years ahead of the rest of the company in going digital with its scanners, but still has struggled to wean itself of microfilm revenue. And there has been some disastrous software activity as well.

Anyhow, the most intersting part of this story appears at the end when they discuss the four distinct reporting units for Kodak starting next year. That will give us a clear picture of who is making money and who is not at Kodak - setting the table for potential spin-offs and acquistions. Kodak has assembled quite an interesting portfolio of printing businesses to surround its document scanning technology. It will be great to see how the business model pans out. It's my guess that it will be the most successful of Kodak's four business units. Then, what do you do with it?

Stay tuned.

Wednesday, October 05, 2005

Microsoft PDF

It seems like after all the huffing and puffing about Metro, aka .XPS, Microsoft has come out in support of PDF. Speculation that we've seen is that the move is beibng driven by the State of Massachusetts reent announcement to go to non-proprietary document formats in all its state offices by the year 2007. According to eWeek. "As part of this new policy, the state will support the newly ratified Open Document Format for Office Applications, or OpenDocument, and PDFs (portable document format) as the standards for its office documents." That would seem like a bit of rash and quick decision to us - but perhaps Microsoft is scared getting dumped by other state offices in favor of Open Source and PDF.

One thing is for sure, we were a bit surprised when Microsoft announced Metro because we thought they would just leverage ScanSoft's rapidly improving PDF tools to create PDF in Office and really put some crunch on Acrobat. Don't know if they've leveraged ScanSoft, but they've definitely put PDF creation in Office.