Google Operating System Unofficial news and tips about Google

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 22 October 2007

Remove Spam from Google Blog Search

Posted on 04:02 by Unknown
Even if Google Blog Search doesn't have too many interesting features, I still use it more often than Technorati because it's faster, it's not down for hours, it's much more comprehensive and it has features not available in any other important blog search engine. I still use Technorati for finding backlinks, because Google does a poor job in this area (compare Technorati with Google Blog Search). Unfortunately, Google Blog Search indexes a lot of spam posts that steal content and use it for lucrative purposes.

Google has two features that reduce the number of splogs (spam blogs) from search results. Like in web search, there's a duplicate filter that removes some of the posts that are almost identical. But it doesn't exclude all of them and it doesn't find posts that duplicate articles from news sites like Business Week.


The second feature is the option to sort results by relevancy, which is enabled by default. It may seem counterintuitive to sort blog search results by relevancy and not chronologically, but that's a great way to filter splogs or at least move them at the bottom of Google's search results. Google uses a lot of signals to rank blog posts, including PageRank, the number of feed subscriptions or the amount of duplicate content. But if you sort the results by relevancy, you'll find both recent and old posts and that's not always the optimal solution. A better way is to restrict the results to a recent period of time in the sidebar (to the last day or the last hour, depending on the volume of posts).


If you see a "References" link after the snippet, that's an indication that Google found (a significant number of) backlinks, so the result should be a little more reliable.

Many blogs use Google Alerts to pollute the web and make money, so you could also add [-"google alert"] to your query (a search for "google alert" returns more than 200,000 results). A lot spam blogs are hosted by Google's Blog*Spot, so removing the posts from blogspot.com could increase the quality of your results, but also remove non-spammy blogs like this one or Google's official blogs. I also noticed that many spam blogs use the .info TLD. A recent study showed that, when searching for commercial keywords, 75% of the results from blogspot.com and 68% of the results from .info sites are spam.

It's also a great idea to restrict the result to English (or another language) in "Advanced blog search".

So here's a summary:

1. sort the results by relevancy
2. restrict the results to a recent period (last day)
3. restrict the results to English (or another language)
4. if you really have to sort the results by date, remove the posts that follow a spammy pattern (for example, add -"google alert" -site:blogspot.com -site:.info to your query), but make sure you don't remove important results
5. check the posts that contain "References"

Google should do a better job at detecting spam in Blog Search results and identifying results from sites that happen to have feeds, but they're not blogs. It should also make it more difficult for spammers to use sites like Blogger or Google Alerts to pollute the search results.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Blog Search, Spam | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • More People Can Buy Apps from the Android Market
    If there's one thing that Google should do to improve Android, it's developing a better Android Market. Google's app store has a...
  • Could Google Save Yahoo from Microsoft?
    Microsoft is taking over Yahoo! by Gnal. Licensed as Creative Commons Attribution . Even if it's hard to believe that Yahoo will accept...
  • Google's Marketing Dashboard
    MediaPost reports that Google wants to integrate the reporting features from all of its ad products to provide a "fully functional mar...
  • Watch a Video in YouTube's HTML5 Player
    In January, YouTube launched a player that used the HTML5 video tag. To try this player, you have to go to youtube.com/html5 and enable th...
  • Swipe Navigation in the Mobile Gmail Site
    One month ago, Google updated the Gmail app for iOS and added a swipe gesture that lets you move between conversations without having to re...
  • A Bogus DMCA Takedown Request (Part 3)
    I've mentioned in the previous two posts that Inspection 12 sent a DMCA notice for one of my posts, Google took it offline and reject...
  • The Old Image Search, Still Available
    The old Google Image Search interface is still available in the OneBox result that's displayed for some Google searches like [tropical b...
  • Google+ Photo Search With Image Recognition
    Last year, Google Drive added an advanced image search feature powered by Goggles that recognizes objects and uses OCR technology to extrac...
  • Search Engine Comparison Poll: The Results
    Six days ago, I posted a poll that asked you to evaluate the quality of the first results from Google, Yahoo, Windows Live. You had to ente...
  • Bring the Mashups to Google Maps
    Google Maps API was the most successful API ever created by Google and the tool behind a lot of cool mashups available on the web today. Th...

Categories

  • Acquisitions (17)
  • Ads (16)
  • AJAX Search (4)
  • Android (83)
  • Annoyances (7)
  • API (9)
  • April Fools Day (2)
  • Blog Search (4)
  • Blogger (20)
  • Book Search (11)
  • DMCA (4)
  • Easter Egg (18)
  • FeedBurner (4)
  • Firefox extensions (10)
  • Froogle (1)
  • Game (3)
  • gm (1)
  • Gmail (161)
  • Google Analytics (4)
  • Google Apps (17)
  • Google Bookmarks (7)
  • Google Buzz (14)
  • Google Calendar (17)
  • Google Cast (3)
  • Google Checkout (5)
  • Google Chrome (105)
  • Google Chrome OS (28)
  • Google Co-op (9)
  • Google Contacts (9)
  • Google Desktop (5)
  • Google Dictionary (8)
  • Google Docs (80)
  • Google Drive (41)
  • Google Earth (22)
  • Google Gears (5)
  • Google Goggles (7)
  • Google Groups (2)
  • Google Hangouts (4)
  • Google Health (2)
  • Google Instant (15)
  • Google Keep (5)
  • Google Latitude (5)
  • Google Local (9)
  • Google Maps (80)
  • Google Music (3)
  • Google News (20)
  • Google Notebook (9)
  • Google Now (14)
  • Google Pack (2)
  • Google Phone (9)
  • Google Photos (14)
  • Google Play (3)
  • Google Plus (29)
  • Google Profiles (5)
  • Google Promos (2)
  • Google Reader (47)
  • Google Scholar (1)
  • Google Sites (1)
  • Google Suggest (13)
  • Google Takeout (1)
  • Google Talk (19)
  • Google Toolbar (7)
  • Google Translate (38)
  • Google Trends (9)
  • Google TV (4)
  • Google Update (1)
  • Google Video (11)
  • Google Voice (6)
  • Google Wallet (2)
  • Google Wave (3)
  • Greasemonkey (10)
  • iGoogle (32)
  • Image Search (31)
  • InOut (13)
  • Knowledge (14)
  • Mobile (133)
  • Month in review (1)
  • Music (3)
  • Nostalgia (6)
  • OneBox (19)
  • orkut (10)
  • Page Creator (1)
  • Picasa (5)
  • Picasa Web Albums (22)
  • SearchMash (2)
  • Security (10)
  • Social (32)
  • Software (4)
  • Spam (2)
  • Tips (86)
  • Universal Search (3)
  • User interface (116)
  • Visualization (9)
  • Voice Search (14)
  • Web History (7)
  • Web Search (202)
  • Webmasters (5)
  • Windows Live (5)
  • Yahoo (8)
  • Yahoo Pipes (2)
  • YouTube (122)

Blog Archive

  • ►  2013 (364)
    • ►  September (1)
    • ►  August (60)
    • ►  July (60)
    • ►  June (56)
    • ►  May (59)
    • ►  April (48)
    • ►  March (47)
    • ►  February (29)
    • ►  January (4)
  • ►  2012 (134)
    • ►  December (14)
    • ►  November (18)
    • ►  October (26)
    • ►  September (5)
    • ►  August (8)
    • ►  July (17)
    • ►  June (24)
    • ►  May (4)
    • ►  April (18)
  • ►  2011 (13)
    • ►  January (13)
  • ►  2010 (487)
    • ►  December (47)
    • ►  November (37)
    • ►  October (44)
    • ►  September (44)
    • ►  August (55)
    • ►  July (44)
    • ►  June (43)
    • ►  May (54)
    • ►  April (48)
    • ►  March (40)
    • ►  February (28)
    • ►  January (3)
  • ►  2008 (65)
    • ►  February (13)
    • ►  January (52)
  • ▼  2007 (435)
    • ►  December (60)
    • ►  November (55)
    • ▼  October (57)
      • Google Photo Picker
      • OpenSocial, Google's APIs for Social Applications
      • Google to Connect to Other IM Networks Using Jabbe...
      • Gmail's New Version Is Now Available
      • How Gmail Blocks Spam
      • The Growing World of Google Gadgets
      • The Next Version of Gmail Will Be Faster
      • Customize YouTube's Player
      • Google's Marketing Dashboard
      • The AdSense Loop
      • SearchMash, Now in Flash
      • Nested Folders in Gmail
      • Decomposing the Web and Rearranging its Fragments
      • Email Notifications for Blogger Comments
      • Gmail Supports IMAP
      • Google Switches to Its Own Translation System
      • Two
      • More Google Sitelinks
      • Traffic Analysis for Content Hosted by Google
      • Remove Spam from Google Blog Search
      • The Supercomputer that Connects Everything and Eve...
      • Offline Blogger
      • Google's Homepage Goes Black in San Francisco
      • Facebook App for Google News
      • YouTube Updates the Embeddable Player
      • Historical Data for Your Site's Top Search Queries
      • On Google's Mobile Strategy
      • Google on an iPhone
      • Google Maps Goes Social
      • Mobile Google Docs
      • Google Search Add-Ons
      • Google Spreadsheets Adds Conditional Formatting
      • Google Tries to Fix Broken Links
      • YouTube's Video Identification Technology
      • Gmail Mobile 1.5 Released
      • Google Maps for Symbian Devices
      • Create Google Calendar Events from Gmail
      • Find the Number of Google Subscribers for Any Feed
      • YouTube Brings Google Earth to Life
      • Gmail's Storage Increases, 6 GB in January 2008
      • From Google to Facebook
      • Google Maps Universal Search
      • Six New Cities Added to Google Street View
      • Google Buys Jaiku, a Lifestreaming Service
      • Google Crosses $600 a Share for the First Time
      • Search Engine Comparison Poll: The Results
      • Google Online Desktop
      • Public Transit Directions in Google Maps
      • Knowledge Discovery Using Google's Info View
      • Join a Google Search Experiment
      • Google Desktop Gadgets on Your iGoogle Page
      • Find Wallpapers Using Google
      • Business Google Apps Adds Postini and 25 GB Gmail
      • Comparing the Top Three Search Engines
      • Yahoo Adds a Search Assistant
      • Finding Answers Without Clicking on Search Results
      • Keep Track of Your Friends' Shared Items
    • ►  September (64)
    • ►  August (59)
    • ►  July (70)
    • ►  June (59)
    • ►  May (11)
Powered by Blogger.

About Me

Unknown
View my complete profile