Google Operating System Unofficial news and tips about Google

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 16 December 2007

Google Is All About Large Amounts of Data

Posted on 14:34 by Unknown

In a very interesting interview from October, Google's VP Marissa Mayer confessed that having access to large amounts of data is in many instances more important than creating great algorithms.
Right now Google is really good with keywords, and that's a limitation we think the search engine should be able to overcome with time. People should be able to ask questions, and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions -- not about what words will appear on the page but more like "what is this about?" A lot of people will turn to things like the semantic Web as a possible answer to that. But what we're seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they're done through brute force.

When you type in "GM" into Google, we know it's "General Motors." If you type in "GM foods" we answer with "genetically modified foods." Because we're processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart like it achieved that semantic understanding, but it hasn't really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.

Marissa Mayer admitted that the main reason why Google launched the free 411 service is to get a lot of data necessary for training speech recognition algorithms.
You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.

Peter Norvig, director of research at Google, seems to agree. "I have always believed (well, at least for the past 15 years) that the way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective." Google uses statistics for machine translation, question answering, spell checking and more, as you can see in this video. The same video explains that the more data you have, the better your AI algorithm will perform, even if it isn't the best.

Peter Norvig says that Google developed its own speech recognition technology. "We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time."

Google is in the privileged position to gain access to large amounts of data that could be used to improve other services.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Voice Search | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • More People Can Buy Apps from the Android Market
    If there's one thing that Google should do to improve Android, it's developing a better Android Market. Google's app store has a...
  • Could Google Save Yahoo from Microsoft?
    Microsoft is taking over Yahoo! by Gnal. Licensed as Creative Commons Attribution . Even if it's hard to believe that Yahoo will accept...
  • Google's Marketing Dashboard
    MediaPost reports that Google wants to integrate the reporting features from all of its ad products to provide a "fully functional mar...
  • Watch a Video in YouTube's HTML5 Player
    In January, YouTube launched a player that used the HTML5 video tag. To try this player, you have to go to youtube.com/html5 and enable th...
  • Swipe Navigation in the Mobile Gmail Site
    One month ago, Google updated the Gmail app for iOS and added a swipe gesture that lets you move between conversations without having to re...
  • A Bogus DMCA Takedown Request (Part 3)
    I've mentioned in the previous two posts that Inspection 12 sent a DMCA notice for one of my posts, Google took it offline and reject...
  • The Old Image Search, Still Available
    The old Google Image Search interface is still available in the OneBox result that's displayed for some Google searches like [tropical b...
  • Google+ Photo Search With Image Recognition
    Last year, Google Drive added an advanced image search feature powered by Goggles that recognizes objects and uses OCR technology to extrac...
  • Search Engine Comparison Poll: The Results
    Six days ago, I posted a poll that asked you to evaluate the quality of the first results from Google, Yahoo, Windows Live. You had to ente...
  • Bring the Mashups to Google Maps
    Google Maps API was the most successful API ever created by Google and the tool behind a lot of cool mashups available on the web today. Th...

Categories

  • Acquisitions (17)
  • Ads (16)
  • AJAX Search (4)
  • Android (83)
  • Annoyances (7)
  • API (9)
  • April Fools Day (2)
  • Blog Search (4)
  • Blogger (20)
  • Book Search (11)
  • DMCA (4)
  • Easter Egg (18)
  • FeedBurner (4)
  • Firefox extensions (10)
  • Froogle (1)
  • Game (3)
  • gm (1)
  • Gmail (161)
  • Google Analytics (4)
  • Google Apps (17)
  • Google Bookmarks (7)
  • Google Buzz (14)
  • Google Calendar (17)
  • Google Cast (3)
  • Google Checkout (5)
  • Google Chrome (105)
  • Google Chrome OS (28)
  • Google Co-op (9)
  • Google Contacts (9)
  • Google Desktop (5)
  • Google Dictionary (8)
  • Google Docs (80)
  • Google Drive (41)
  • Google Earth (22)
  • Google Gears (5)
  • Google Goggles (7)
  • Google Groups (2)
  • Google Hangouts (4)
  • Google Health (2)
  • Google Instant (15)
  • Google Keep (5)
  • Google Latitude (5)
  • Google Local (9)
  • Google Maps (80)
  • Google Music (3)
  • Google News (20)
  • Google Notebook (9)
  • Google Now (14)
  • Google Pack (2)
  • Google Phone (9)
  • Google Photos (14)
  • Google Play (3)
  • Google Plus (29)
  • Google Profiles (5)
  • Google Promos (2)
  • Google Reader (47)
  • Google Scholar (1)
  • Google Sites (1)
  • Google Suggest (13)
  • Google Takeout (1)
  • Google Talk (19)
  • Google Toolbar (7)
  • Google Translate (38)
  • Google Trends (9)
  • Google TV (4)
  • Google Update (1)
  • Google Video (11)
  • Google Voice (6)
  • Google Wallet (2)
  • Google Wave (3)
  • Greasemonkey (10)
  • iGoogle (32)
  • Image Search (31)
  • InOut (13)
  • Knowledge (14)
  • Mobile (133)
  • Month in review (1)
  • Music (3)
  • Nostalgia (6)
  • OneBox (19)
  • orkut (10)
  • Page Creator (1)
  • Picasa (5)
  • Picasa Web Albums (22)
  • SearchMash (2)
  • Security (10)
  • Social (32)
  • Software (4)
  • Spam (2)
  • Tips (86)
  • Universal Search (3)
  • User interface (116)
  • Visualization (9)
  • Voice Search (14)
  • Web History (7)
  • Web Search (202)
  • Webmasters (5)
  • Windows Live (5)
  • Yahoo (8)
  • Yahoo Pipes (2)
  • YouTube (122)

Blog Archive

  • ►  2013 (364)
    • ►  September (1)
    • ►  August (60)
    • ►  July (60)
    • ►  June (56)
    • ►  May (59)
    • ►  April (48)
    • ►  March (47)
    • ►  February (29)
    • ►  January (4)
  • ►  2012 (134)
    • ►  December (14)
    • ►  November (18)
    • ►  October (26)
    • ►  September (5)
    • ►  August (8)
    • ►  July (17)
    • ►  June (24)
    • ►  May (4)
    • ►  April (18)
  • ►  2011 (13)
    • ►  January (13)
  • ►  2010 (487)
    • ►  December (47)
    • ►  November (37)
    • ►  October (44)
    • ►  September (44)
    • ►  August (55)
    • ►  July (44)
    • ►  June (43)
    • ►  May (54)
    • ►  April (48)
    • ►  March (40)
    • ►  February (28)
    • ►  January (3)
  • ►  2008 (65)
    • ►  February (13)
    • ►  January (52)
  • ▼  2007 (435)
    • ▼  December (60)
      • Top Google Apps in 2007
      • Winning Even When You Lose
      • 2007 Metrics
      • Updates from Your Gmail Contacts
      • Google in 2000
      • Google Mini-Labs
      • Creating a Backup for Your Google Account
      • Predictions for Google's 2008
      • Google News Archive's Updated Timelines
      • Popular Christmas Gifts in the US
      • Gmail's Christmas Card
      • Gmail and Google Maps Have the Same Number of Users
      • Best New Google Features that Don't Require Login
      • Weave: Integrating Online Services with Firefox
      • Elections Section in the US Google News
      • Remember the Googley Milk
      • Orkut OneBox for People Search
      • Let's Test Powerset
      • Post Blogger Comments Using Your Own Domain
      • Google Gears Wants to Upgrade the Web
      • Who Are My Gmail Contacts?
      • Translation Service for Google Talk
      • Holiday Village, the New Winter Theme for iGoogle
      • Slowly Transitioning to Online Software
      • Google Is All About Large Amounts of Data
      • Find Your Gmail Contacts in orkut
      • Google Profiles
      • Features that Just Work
      • Google Reader Shows Shared Items from Your Friends
      • Google Video's Redesigned Homepage
      • Annotating the Web with Google Toolbar
      • Subscribe to Custom Search Results
      • Google Knol, an Encyclopedia Written by Experts
      • YouTube Visualization for Discovering Related Videos
      • Google Answers to Relaunch as Google Q&A
      • Will Google Give Up on Universal Search?
      • Google Calendar Sync for BlackBerry
      • Google Toolbar 5 for IE
      • Ratings and Comments in Google's Community Maps
      • Gmail Mobile Application, Now Available for Google...
      • Street View Images for Dallas, Detroit and Other 6...
      • Google's Navigational Bar Goes International
      • YouTube's Revenue Sharing Program Expands
      • Google Finds Less Search Results
      • Google Starts to Index Images Uploaded to Blogger
      • Picasa 2.7 for Linux
      • iGoogle's Winter Theme
      • Using Google Gadgets to Cache Images
      • The First Year of Google Docs (& Spreadsheets)
      • Dynamically Generated Charts
      • Google Mobile Updater for Blackberry
      • Small New Features Make Gmail Better
      • Open with...
      • Google's Unified Interface for iPhone
      • The Fastest Rising Google Queries in 2007
      • Chat with Your AIM Buddies in Gmail
      • Gmail Adds Label Colors
      • Google 2007 in 12 Pictures
      • Download Picasa Web Albums Without Installing Picasa
      • Google Sites to Launch in 2008
    • ►  November (55)
    • ►  October (57)
    • ►  September (64)
    • ►  August (59)
    • ►  July (70)
    • ►  June (59)
    • ►  May (11)
Powered by Blogger.

About Me

Unknown
View my complete profile