SEO   Search Engine Optimization SWIFT
Professionals in Google website promotion and optimization services with guaranteed results.

 

How to Block Spiders From Visiting and Indexing Your Site : Robots.txt File Tutorial

There are reasons you might not want your Web site to be indexed by search engines. More likely, there are simply certain pages that you don't want indexed by the major search engines.

For instance, maybe you constructed an elaborate direct marketing site that requires the visitor to enter through your main page and then proceed through a highly structured series of links that lead them to a buying decision. The internal pages would only confuse visitors who entered through those pages and they would be less likely to buy a product or service.

Whatever your reason, there is a standard that you can implement that will keep most of the major search engine spiders from indexing your Web site.

Here's how to block the spiders. Create a file called "robots. txt" that includes the following code:

user-agent: * Disallow: /*

The first line specifies the agents, browsers or spiders that should read this file and adhere to the instructions in the following lines of code. The second line stipulates which files or directories the spider or browser should not read or index. The example above uses the "/*" which means the agent should not read or index anything as the asterisks denotes "everything."

The robots. txt file must be placed in the root directory of your Web site. What this means is that if you are hosting your Web site using one of the free services and your domain looks something like this:

http:// members. aol. com/ Joesmith/ home. htm you cannot use the robots. txt file to keep out the spiders, since you don't have a primary domain name. The primary domain name is aol. com - and America Online will probably not allow you to block all the search engines spiders from indexing their site and the Web sites of the 11 million other subscribers.

This robots. txt file could look like this if there were specific directories and files that you wish the search engines not to index:

user-agent: * Disallow: /clients/*
Disallow: /products/* Disallow: /pressrelations/*
Disallow: /surveys/ survey. htm

In the above example the robots. txt file asks the search engines spider to omit all pages within the following directories:

http:// www. yourcompany. com/ clients/ http:// www. yourcompany. com/ products/
http:// www. yourcompany. com/ pressrelations/

And the following specific page:

http:// www. yourcompany. com/ survey/ survey. htm
If you are one of the millions of people hosting a Web site on America Online's server or one of the other free or subdirectory Web site services and you can't place a robots. txt file in their root directory, you can use a META tag that talks to some of the spiders:
<META NAME=" ROBOTS" CONTENT=" NOINDEX">

You will need this META tag on every page in your Web site that you don't want indexed. If your Web site has 30 or 40 pages (or more), this will take a lot of time. Here's another reason to buy a good HTML editor like Luckman's WebEdit or Allaire's HomeSite. These programs allow you to do a global search and replace and add an HTML tag to every Web page that you open in the program. As with all META tags, this META tag goes at the top of your HTML document between the <HEAD> and </ HEAD> tags.


SEO Tutorials Index

28Referrer Logs

 ©2004-2005. SEOSwift.com