Posts filed under 'Search Engines'
October 13th, 2006 byPhilip Nicosia
I’m puzzled by Google at times as I’m sure a lot of webmasters are. I’ve just noticed that some of the pages of this site have been labelled as “Supplemental Result”.
Anyone who has a website for long enough will likely have this happen to them but I am puzzled as to how Google determines which pages they place in the supplemental index.
One of the pages that has gone supplemental is www.philipnicosia.com/page/3/
Now this page updates virtually everyday and has done so since it was online. Any search results going to this page are going to be months out of date and pretty pointless for the user. I could understand if this page remained the same months on end by why has this page in particular been singled out?
I don’t have the same problem with much older pages such as page/4/ page/5/ right up to page/20/ so what has Google got against page/3/ ?
October 2nd, 2006 byPhilip Nicosia
Just curious as to why Google search results are different to Google Toolbar search results.
If I enter www.philipnicosia.com into the Google Toolbar it takes me directly to this site. If I enter the same search term on Google.co.uk it lists search results instead.
The funny thing is I’m not number 1 for that search term. Google doesn’t think that www.philipnicosia.com is the most relevant result for www.philipnicosia.com.
September 26th, 2006 byPhilip Nicosia
Logging into my Google Sitemaps account I notice the following:
http://www.philipnicosia.com/gallery/index.php URL restricted by robots.txt Sep 8, 2006
But looking at a site command Google has indexed and cached this page despite being restricted by robots.txt.
This is Google’s cache of http://www.philipnicosia.com/gallery/index.php as retrieved on 18 Sep 2006 06:18:24 GMT.
Google’s cache is the snapshot that we took of the page as we crawled the web.
So I check with their robots.txt checker to see if there is a problem and it indeed says the page is allowed despite the robots.txt saying:
User-agent: *
Disallow: /gallery/
By adding the following to my robots.txt
User-agent: Googlebot
Disallow: /gallery/
Google now recognizes that the directory is blocked.
So somewhere between the 8th September and the 18th September Google has decided that it is not like any other search engines and any page you have on your site is fair game unless you specifically tell Googlebot not to go there.
September 10th, 2006 byPhilip Nicosia
Search engine optimization is a very complex science, but at its heart is the simple rule: to format your website in such a way that spiders can immediately recognize and index its content. If they can’t “see” you, you might as well not exist—and if they can’t understand your code, no amount of keywords can get you in the Golden Top 20.
The problem that many website developers used to encounter was that search engines worked differently; so you could end up with a high ranking in Lycos but languish at the bottom of Google. How exactly should you optimize your site so you perform well in all search engines?
Enter ROR (short for Resources for a Resource), an independent XML format that translates your content in a way that all search engines can understand.
Think of it as a web spider’s Cliff’s Notes. it describes all the objects, services, discounts, images, podcasts, etc. If it’s on the site, it’s on the ROR feed, but in a format that’s easy to process and removes all risks of skipping or ignoring a link.
ROR calls its “magic file” structured feeds, which guide search engines as they scan the text. Unlike Google Sitemaps, it’s universally understood—and very easy to process. It’s also more detailed. It doesn’t just give a map or “table of contents”, it actually summarizes what’s inside. It’s also been in existence far longer than Google, so its reliability has been proven by time.
Though it’s been around for a long time, ROR is by no means outdated. The majority of the file formats are already available in ROR, although it is currently being updated to keep up with the growing number of website innovations. But to avoid being too unwieldy, the ROR system tries to re-use existing data structures. It boasts of being very streamlined, a strength that makes it one of the more efficient ways of indexing a site.
Usually the ROR feed is located in the directory, and is named by default ror.xml. It is possible to rename the file, and the search engines will still find it. The only thing it needs to have is a tag in your main page (between the and tags). Another alternative is to create a smaller ror.xml file which will direct the search engines to the ROR feed. You can create this file in the ROR sitemap generator.
August 27th, 2006 byPhilip Nicosia
Whenever you upload your Google sitemap to your Google sitemaps account they will download your map and validate it for you which can take a couple of hours depending on how busy they are at the time.
If you can’t wait for Google to do it and want to make sure everything is okay straight away then we have launched a Google Sitemap validator on my site XML-Sitemaps.com.
The validator will also optionally ping Google with the whereabouts of your site map if you haven’t done so already although we would recommend you inform Google of your sitemaps via the Google Webmaster Tools.
Next Posts
Previous Posts