Google Operating System Unofficial news and tips about Google

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 29 August 2007

The Quality of Google Book Search

Posted on 14:29 by Unknown

Paul Duguid wrote an interesting article about Google Book Search in which he analyzed the quality of the indexed editions and the search results by doing a search for Lawrence Sterne's "Tristram Shandy", a novel from the 18th century. Mr. Duguid noticed that the Harvard edition of the book had many quality problems and some text wasn't scanned properly. Google Book Search doesn't distinguish between the volumes of a book, so it's difficult to realize that the Stanford edition is actually the second volume of the book.
Google may or may not be sucking the air out of other digitization projects, but like Project Gutenberg before, it is certainly sucking better–forgotten versions of classic texts from justified oblivion and presenting them as the first choice to readers. (...) The Google Books Project is no doubt an important, in many ways invaluable, project. It is also, on the brief evidence given here, a highly problematic one. Relying on the power of its search tools, Google has ignored elemental metadata, such as volume numbers. The quality of its scanning (and so we may presume its searching) is at times completely inadequate. The editions offered (by search or by sale) are, at best, regrettable. Curiously, this suggests to me that it may be Google's technicians, and not librarians, who are the great romanticisers of the book. Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don't submit equally to a standard shelf, a standard scanner, or a standard ontology.

Patrick Leary, the author of the article Googling the Victorians (PDF), has a pragmatical response, as seen on O'Reilly Radar:
Mass digitization is all about trade-offs. All mass digitizing programs compromise textual accuracy and bibliographical meta-data so that they can afford to include many more texts at a reasonable cost in money and time. All texts in mass digitization collections are corrupt to some degree. Everything else being equal, the more limited the number of texts included in a digital collection, the more care can be lavished on each text. Assessing the balance of value involved in this trade-off, I think, is one of the main places where we part company. You conclude, on the basis of your inspection of these two volumes, that the corruption of texts like Tristram Shandy makes Google Books a "highly problematic" way of getting at the meanings of the books it includes. By contrast, while acknowledging how unfortunate are some of the problems you mention, I believe that the sheer scale of the project and the power of its search function together far outweigh these "problematic" elements.

When scanning and indexing millions of books, it's difficult to assess the quality of each edition. Google Book Search's main goal is to let you discover books you can borrow or buy later on. But Google could add an option to rate the quality of each digitized book or build algorithms that detect flaws or differences between editions. So the next time you do a search for Tristram Shandy, all the editions are clustered and the best one comes up first.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Book Search | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • More People Can Buy Apps from the Android Market
    If there's one thing that Google should do to improve Android, it's developing a better Android Market. Google's app store has a...
  • Could Google Save Yahoo from Microsoft?
    Microsoft is taking over Yahoo! by Gnal. Licensed as Creative Commons Attribution . Even if it's hard to believe that Yahoo will accept...
  • Google's Marketing Dashboard
    MediaPost reports that Google wants to integrate the reporting features from all of its ad products to provide a "fully functional mar...
  • Watch a Video in YouTube's HTML5 Player
    In January, YouTube launched a player that used the HTML5 video tag. To try this player, you have to go to youtube.com/html5 and enable th...
  • Swipe Navigation in the Mobile Gmail Site
    One month ago, Google updated the Gmail app for iOS and added a swipe gesture that lets you move between conversations without having to re...
  • A Bogus DMCA Takedown Request (Part 3)
    I've mentioned in the previous two posts that Inspection 12 sent a DMCA notice for one of my posts, Google took it offline and reject...
  • The Old Image Search, Still Available
    The old Google Image Search interface is still available in the OneBox result that's displayed for some Google searches like [tropical b...
  • Google+ Photo Search With Image Recognition
    Last year, Google Drive added an advanced image search feature powered by Goggles that recognizes objects and uses OCR technology to extrac...
  • Search Engine Comparison Poll: The Results
    Six days ago, I posted a poll that asked you to evaluate the quality of the first results from Google, Yahoo, Windows Live. You had to ente...
  • Bring the Mashups to Google Maps
    Google Maps API was the most successful API ever created by Google and the tool behind a lot of cool mashups available on the web today. Th...

Categories

  • Acquisitions (17)
  • Ads (16)
  • AJAX Search (4)
  • Android (83)
  • Annoyances (7)
  • API (9)
  • April Fools Day (2)
  • Blog Search (4)
  • Blogger (20)
  • Book Search (11)
  • DMCA (4)
  • Easter Egg (18)
  • FeedBurner (4)
  • Firefox extensions (10)
  • Froogle (1)
  • Game (3)
  • gm (1)
  • Gmail (161)
  • Google Analytics (4)
  • Google Apps (17)
  • Google Bookmarks (7)
  • Google Buzz (14)
  • Google Calendar (17)
  • Google Cast (3)
  • Google Checkout (5)
  • Google Chrome (105)
  • Google Chrome OS (28)
  • Google Co-op (9)
  • Google Contacts (9)
  • Google Desktop (5)
  • Google Dictionary (8)
  • Google Docs (80)
  • Google Drive (41)
  • Google Earth (22)
  • Google Gears (5)
  • Google Goggles (7)
  • Google Groups (2)
  • Google Hangouts (4)
  • Google Health (2)
  • Google Instant (15)
  • Google Keep (5)
  • Google Latitude (5)
  • Google Local (9)
  • Google Maps (80)
  • Google Music (3)
  • Google News (20)
  • Google Notebook (9)
  • Google Now (14)
  • Google Pack (2)
  • Google Phone (9)
  • Google Photos (14)
  • Google Play (3)
  • Google Plus (29)
  • Google Profiles (5)
  • Google Promos (2)
  • Google Reader (47)
  • Google Scholar (1)
  • Google Sites (1)
  • Google Suggest (13)
  • Google Takeout (1)
  • Google Talk (19)
  • Google Toolbar (7)
  • Google Translate (38)
  • Google Trends (9)
  • Google TV (4)
  • Google Update (1)
  • Google Video (11)
  • Google Voice (6)
  • Google Wallet (2)
  • Google Wave (3)
  • Greasemonkey (10)
  • iGoogle (32)
  • Image Search (31)
  • InOut (13)
  • Knowledge (14)
  • Mobile (133)
  • Month in review (1)
  • Music (3)
  • Nostalgia (6)
  • OneBox (19)
  • orkut (10)
  • Page Creator (1)
  • Picasa (5)
  • Picasa Web Albums (22)
  • SearchMash (2)
  • Security (10)
  • Social (32)
  • Software (4)
  • Spam (2)
  • Tips (86)
  • Universal Search (3)
  • User interface (116)
  • Visualization (9)
  • Voice Search (14)
  • Web History (7)
  • Web Search (202)
  • Webmasters (5)
  • Windows Live (5)
  • Yahoo (8)
  • Yahoo Pipes (2)
  • YouTube (122)

Blog Archive

  • ►  2013 (364)
    • ►  September (1)
    • ►  August (60)
    • ►  July (60)
    • ►  June (56)
    • ►  May (59)
    • ►  April (48)
    • ►  March (47)
    • ►  February (29)
    • ►  January (4)
  • ►  2012 (134)
    • ►  December (14)
    • ►  November (18)
    • ►  October (26)
    • ►  September (5)
    • ►  August (8)
    • ►  July (17)
    • ►  June (24)
    • ►  May (4)
    • ►  April (18)
  • ►  2011 (13)
    • ►  January (13)
  • ►  2010 (487)
    • ►  December (47)
    • ►  November (37)
    • ►  October (44)
    • ►  September (44)
    • ►  August (55)
    • ►  July (44)
    • ►  June (43)
    • ►  May (54)
    • ►  April (48)
    • ►  March (40)
    • ►  February (28)
    • ►  January (3)
  • ►  2008 (65)
    • ►  February (13)
    • ►  January (52)
  • ▼  2007 (435)
    • ►  December (60)
    • ►  November (55)
    • ►  October (57)
    • ►  September (64)
    • ▼  August (59)
      • August 2007 Recap: The Good, the Bad and the Ugly
      • Google Earth Easter Egg: Flight Simulator
      • Google News Starts to Host Content
      • Google as a Bank
      • Embed Multiple Google Calendars
      • Easy Way to Find Recent Web Pages
      • Google Gadgets that Talk to Each Other
      • Two Ways to Watch the Same YouTube Video
      • New Context Menus in Google Docs
      • The Quality of Google Book Search
      • Internationalization and Google Search Results
      • YouTube Launches New API
      • FlashEarth Comes to Google Earth
      • Gmail's Collaborative Video
      • Connect to Google Talk on Your Mobile Phone
      • Google Facebook App
      • Find This Place in Google Maps
      • Bloglines Upgrades to Stay in the Game
      • Google Lets You Remove People from Street View
      • Google Apps, Not Yet a Mature Enterprise Solution
      • Explore the Sky in Google Earth
      • Add Google Maps to Your Site
      • New orkut Interface
      • YouTube Experiments with Video Ads
      • Google News Shows Videos
      • Customer Satisfaction and the Swiss Army Knife
      • Dancing Around the World
      • Gmail, the Top Web Mail Service with the Least Amo...
      • The Building Blocks of Google Browser
      • Embedding Google Maps
      • Google Health Prototype
      • The History of Your Book Searches
      • Google Earth, a Personal Journey
      • What Do You Use Google Docs for?
      • Google Pack Adds StarOffice
      • Google Video Store Closes. The End of DRM Is Closer
      • Define Your Own Top Search Results
      • Organizing Chaos: Folders vs Labels vs Search
      • Pay for More Gmail Storage
      • Cure Information Overload Using Google Reader
      • Google Reader to Retire the Old Interface
      • JavaScript Google Talk
      • Updated More Than One Minute Ago
      • Google News Adds Comments
      • Google Video with Multi-Lingual Closed Captions
      • Google APIs for Researchers
      • New Cities in Google Maps Street View
      • Google Custom Search Blog Hacked?
      • Storage API for Google Documents
      • The Walkability of a Place
      • Searching for Multiple Perspectives
      • 10 Ways to Look at Feeds
      • Blogroll Powered by Google Reader
      • Differences Between Google and Yahoo
      • Sticky Google Search Results
      • Change or Delete Your Homepage in Page Creator
      • Google's Grandiose Plans in the Mobile Space
      • Google Maps Adds Support for the hCard Microformat
      • Google Tests New Local Ads Formats
    • ►  July (70)
    • ►  June (59)
    • ►  May (11)
Powered by Blogger.

About Me

Unknown
View my complete profile