Categories

  • Most Popular in 'Search Engines'

  • Recent Posts

    Blogs I Read

    Pages

    Meta

    Feeds

    Posts filed under 'Search Engines'

    Google Sitemaps and URLs Restricted by Robots.txt

    August 22nd, 2006 byPhilip Nicosia

    URLs Restricted by Robots.txt

    When you log into your Google Sitemaps account if you see the error urls restricted by robots.txt with a number greater than zero in your summary then it is definitely worth checking out. This means you are blocking Google from indexing pages from your site. It is so easy to make mistakes with your robots.txt file and if you are unsure about any changes you make you can test them with Google’s robots.txt checker within your sitemaps account.

    I have received countless emails from people saying that our sitemap generator at XML-Sitemaps.com doesn’t pick up all their pages only to find they have inadvertently blocked access to robots with their robots.txt file.

    I have these errors show up all the time on one of my sites as in my case it is because I have chosen to block certain areas and pages of the website. These pages were designed specifically for individual users of the site and serve no purpose for the rest of the visitors. These pages aren’t linked to on the site anywhere and have been picked up from external links pointing to the pages generated by the users.

    There are lots of reasons why you might choose to block robots from areas of your site but it is equally important to make sure you don’t block areas you want indexed too.

    What is a Robots.txt file and What Does It Do

    August 17th, 2006 byPhilip Nicosia

    The Robots.txt protocol, also called the “robots exclusion standard” is designed to lock out web spiders from accessing part of a website. It is a security or privacy measure, the equivalent of hanging a “Keep Out” sign on your door.

    This protocol is used by web site administrators when there are sections or files that they would rather not be accessed by the rest of the world. This could include employee lists, or files that they are circulating internally. For example, the White House website uses robots.txt to block any inquiries on speeches by the Vice President, a photo essay of the First Lady, and profiles of the 911 victims.

    How does the protocol work? It lists the files that shouldn’t be scanned, and places it in the top-level directory of the website. The robots.txt protocol was created by consensus in June 1994 by members of the robots mailing list (robots-request@nexor.co.uk). There is no official standards body or RFC for the protocol, so it’s difficult to legislate or mandate that the protocol be followed. In fact, the file is treated as strictly advisory, and does not have absolute guarantee that those contents won’t be read.

    In effect, robots.txt requires cooperation by the web spider and even the reader, since anything that is uploaded into the internet becomes publicly available. You aren’t locking them out of those pages, you are just making it harder for them to get in. But it takes very little for them to ignore these instructions. Computer hackers can also easily penetrate the files and retrieve information. So the rule of thumb is—if it’s that sensitive, it shouldn’t be on your website to begin with or it should be in a password protected folder.

    Care, however, should be taken to ensure that the Robots.txt protocol doesn’t block the website robots from other areas of the website. This will dramatically affect your search engine ranking, as the search engines rely on the robots to find and register the pages.

    One misplaced hyphen or dash can have catastrophic effects. For example, the robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final ‘/’ character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

    To avoid these problems, consider checking your pages with a robots.txt analyzer. Google has a free one within their Google Webmaster Tools.

    101 Ways to Build Link Popularity in 2006

    August 16th, 2006 byPhilip Nicosia

    101 Ways to Build Link Popularity in 2006

    There is no doubt that link building is still an important area for any website that wants to do well. The ultimate aim of any link building campaign is to increase the traffic to your site. This is achieved in 2 ways mainly, click through traffic from the links themselves and link popularity to help your search engine rankings.

    Aaron Wall from SEO Book has written an article together with Andy Hagans explaining in great detail what you should and shouldn’t be doing. It lists 71 good ways of building links and 30 bad ways to build links and is a must read for anyone with a website.

    To view the article go to 101 Ways to Build Link Popularity in 2006

    Submitting a Reinclusion request to Google

    August 12th, 2006 byPhilip Nicosia

    Before in a previous post I mentioned I couldn’t find a way to submit a reinclusion request to Google. In order to do this you had to have a Google Sitemaps account and I couldn’t find the link to do this.

    Well today I logged into my Google Sitemaps account and found where the link is. In the top right hand corner there is a drop down menu that says + Tools and clicking this there is a link to submit a reinclusion request now. I may have missed this before or it was missed by Google and they have now added it back in I don’t know.

    The good news of course is that now you can submit a reinclusion request without any problems although the link is kind of hidden a bit. At least it is there.

    Google not accepting reinclusion requests?

    August 9th, 2006 byPhilip Nicosia

    Maybe just a glitch or possibly intentional due to demand they can’t cope with but today I noticed you can’t file a reinclusion request to Google. If you look at Google’s webmaster support page it says”

    If your site has violated our webmaster guidelines, and you’ve made changes to it so that it meets our guidelines, you can request reinclusion and we’ll evaluate your site.To request reinclusion, log in to Google Sitemaps, choose the “Request reinclusion” link, and follow the steps outlined there.”

    Now here is where the problem is. I can’t find a “request reinclusion” link. Does that mean that I don’t need to or is it something that has been missed. It used to be there as I have seen it in the past but this morning when I looked it’s gone.

    It’s also interesting to see that you now need to have a Google webmasters account in order to submit a reinclusion request. Is this another way of pushing webmasters to use sitemaps?

    Of course you don’t need to submit sitemaps when you do have a Google Sitemaps account and there are other benefits too like their robots.txt checker, reporting of errors on your site, choosing your preferred domain with or without www, etc so it is still worthwhile.

    Its just a shame you can’t submit a reinclusion request to Google at the moment.

    Next Posts Previous Posts