There's only two more years before it actually is 2001. And what DaveBowman learned from HAL is just as true for us today as it was then: You can't hide fromthe machine. Web spiders crawl everywhere, reading every page they can gettheir electronic mandibles wrapped around. Don't kid yourself into thinkingyou're too small to be noticed. You're much better off assuming thateverything online - hidden, unlinked, or sitting behind a password scheme- will eventually be cataloged by one search engine or another.Web spiders are programs that run through the Web and retrieve pages byfollowing links. In the early years, they were the cause of some notable,wide-scale Web blackouts. Spiders can get trapped in poorly named directories and bring servers down in minutes. As a result, groups formed to address the problem.Driven by the need to prevent runaway spiders and crippled servers (versusthe need for privacy), the major search engines now all support thefollowing techniques for blocking the indexing of specific pages.
If you happen to be running your own Web server, you can create arobots.txt file that tells search engine spiders and robots which pagesto ignore. This simple process is described at the Web Server Administrator's Robots Exclusion Protocol Guide. Thisprotocol is not an official standard, but it is supported by many of thebigger search engines. You can usually find out whether a search enginewill support the Robots Exclusion protocol by reading its privacypolicies. Links to this information tend to get high priority and aren'tnearly as tricky to find as links to help files, so don't hesitate to lookfor them.
The only other option you have to keep spiders from crawling into yourspace is to use the robots meta tag. For the best protection ASCII can buy,simply cut and paste the following into your not-so-worldwide HTMLdocuments:
<meta name="robots" content="none">
For more about this meta tag (but not much more - it is pretty simple),check out the RobotsExclusion page on Webcrawler.Again, these are de facto standards. There is no governing bodycapable of enforcing them. Furthermore, these methods, when they areeffective, only shield your pages from machines, not people.
next page»