If Skynet became self-aware and took on the mission of destroying the world, what would people who aren't Sarah Connor do? Turns out a simple robots.txt file could save Google founders Larry Page and Sergey Brin.

Hundreds of users from around the world have pointed to Google's killer-robots.txt file that instructs the robot assassins T-1000 and T-800 to "disallow" the destruction of the founders of the world's biggest search engine.

Specifically, here's what the file says:

User-Agent: T-1000

User Agent: T-800

Disallow: /+LarryPage

Disallow: /+SergeyBrin

Of course, all of this is simply Google's way of commemorating the 20th anniversary of the Robots Exclusion Protocol, also known as the robots.txt file, a simple tool used by webmasters since the 1990s to instruct search engine robots to stop crawling certain pages on their website.

Dutch software engineer Martjin Kostner developed robots.txt in 1994 while he was working with Nexor. At that time, Kostner, like many webmasters, was having issues with search engine bots taking up too much of the unlimited bandwidths that they end up crashing entire websites. This led to the creation of a standard that can be used by anybody who owns a website to prevent automated robots from visiting certain pages on the website.

Lycos, AltaVista and WebCrawler, the three major search engines before the rise of Google exterminated them all, promptly adopted robots.txt, while major websites during that time, including WhiteHouse.gov, California Department of Motor Vehicles, Metallica, Nissan and the Library of Congress uploaded the text file to their servers to tell crawlers that they aren't allowed to visit specified pages on the website. Robots.txt, simple as it may be, reduced downtime caused by bandwidth-hogging robots and allowed human visitors to freely access their websites without lagging.

The file, however, isn't a surefire way to prevent search engines from sending crawlers to specified pages. For example, Google's crawlers extract content about disallowed pages from other pages that are not disallowed on the robots.txt file, which then compiles all the text gathered from allowed URLs and associates it with the disallowed URL. This means Google will still sometimes return search results from disallowed URLs even if the webmaster specified in his robots.txt file to disallow that page.

Still, robots.txt is one of the easiest, simplest and most effective way to prevent robots from crawling a web page. In real life, search engine bots named T-1000 and T-800 will most likely comply with Google's command not to crawl its founders' Google+ pages. But if Skynet arises, there's really no reassurance that cyborgs and war machines programmed to annihilate the human race will follow Google's instructions at all. 

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion