There goes a little practical hint on how to control how the webcrawlers (robots, spiders) out there see your web site. Might be useful in your html header section.

The content="robots-terms" is a comma separated list used in the Robots META Tag that may contain one or more of the following keywords without regard to case: noindex, nofollow, all, index and follow.

noindex
Page may not be indexed by a search service.

<meta name="robots" content="noindex">

nofollow
Robots are not to follow links from this page.

<meta name="robots" content="nofollow">

Admin Note: The robots directives of index, follow or all are not required as it is the default behavior of indexing spiders.

<meta name="robots" content="index, follow">
<meta name="robots" content="all">

index
Robots are welcome to include this page in search services.

<meta name="robots" content="index">

follow
Robots are welcome to follow links from this page to find other pages.

<meta name="robots" content="follow">

If this meta tag is missing, or if there is no content, or the robot terms are not specified, then the robot terms will be assumed to be "index, follow" (e.g. "all"). If the keyword all is found in the robots terms list it overrides all other values. That is, a robots terms that is “nofollow, all, noindex, nofollow”, would effectively be “all”.

If the robots terms contains contradictory information (e.g. "follow, nofollow, follow") then the robot is free to do whatever it wishes with regard to the behavior being addressed (in this case the follow behavior).

robots.txt

There is also another way to provide control over the robots/crawlers: You can also use a robots.txt file inside your webroot. The robots file must be accessed as www.yoursite.com/robots.txt, this means the file has to be in the root of your website.

The content of a robots.txt file might look like this :

# robots.txt generated at http://www.mcanerin.com
User-agent: Googlebot
Disallow: /
User-agent: googlebot-image
Disallow: /
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: http://www.yoursite.com/sitemap.gz

You might also use a good generator like this one:

http://www.mcanerin.com/EN/search-engine/robots-txt.asp

Everything you have to know about robots can be found at http://www.robotstxt.org