MPS logo

home

about

jobs

contact

webmasters

Webmasters

What is NanoBot?

NanoBot is an experimental web-crawling robot developed by Machine Phase Systems Limited. It collects certain documents from the web to build a new type of knowledge discovery system. If NanoBot visits your site, it will likely have the result of increasing visitors to your site, or helping people find it. This page provides answers to the most commonly asked questions about how NanoBot works.

How do I prevent NanoBot from crawling some or all of my website?

NanoBot follows the standards associated with the file 'robots.txt'. The robots.txt file instructs web crawlers to ignore some or all information on a Web site. The format of the robots.txt file is specified in The Robot Exclusion Standard. (http://www.robotstxt.org/wc/exclusion.html#robotstxt) NanoBot examines and obeys all records where the User-Agent is specified as either "NanoBot" or "*". Based on this, NanoBot crawls only the web pages permitted by the Web site.

Why is NanoBot trying to access a robots.txt file that is not on my server?

A Web site administrator can indicate which parts of the site should not be visited by a robot, by placing a specially formatted file, robots.txt, on their site. For information on how to create a robots.txt file, see The Robot Exclusion Standard (http://www.robotstxt.org/wc/exclusion.html#robotstxt). If you want to prevent the "File not found" error messages from appearing in your server log, create an empty file named 'robots.txt' and store it in the root directory of your Web server.

Why is NanoBot trying to download incorrect links from my server, or from a server that doesn't exist?

NanoBot crawls the links it extracts from documents on the web. If any pages contain links that incorrectly point to your server, or fail to reflect changes in your server, NanoBot will follow those links and attempt to access a page from your site incorrectly. To determine where the incorrect link is coming from, check your server logs for a referrer field.

Why is NanoBot downloading information from my "private" web server?

NanoBot follows hyperlinks. If there are hyperlinks to your "private" web server, then NanoBot follow those hyperlinks to the server. Also, if there is a hyperlink from your "private" Web server to another server, then the URL of your private Web server will be in the referrer tag of the other server's log. If you have a confidential web server, use password protection in addition to the robots.txt defined by The Robot Exclusion Standard.

Why isn't NanoBot following the directions of my robots.txt file?

NanoBot downloads the robots.txt file once a day, so it may take a few moments for NanoBot to learn about changes you make. Ensure that the syntax of your robots.txt file is correct according to the standard at http://www.robotstxt.org/wc/exclusion.html#robotstxt. Also, make sure the robots.txt file is in your server's root directory. If it is in a subdirectory, it will not affect on how NanoBot crawls your web site. For more info, see the Robots FAQ (http://www.robotstxt.org/wc/faq.html).

How often will NanoBot access a web page from my web server?

NanoBot should not try to access your site more often than once every few seconds. If NanoBot determines that your site has a slow connection, it accesses it less frequently. If you find that NanoBot places too high a load on your web site, please send e-mail to info@machinephasesystems.com.

How do I keep NanoBot from following links from a particular web page?

NanoBot obeys the noindex and nofollow meta-tags. Placing these tags in the heading of your HTML document prevents NanoBot from indexing or following specific documents. The tags and their effects are:

<META NAME="robots" CONTENT="noindex"> NanoBot will retrieve, but will not index the document.
<META NAME="robots" CONTENT="nofollow"> NanoBot will not follow any links on the page to other documents.
<META NAME="robots" CONTENT="noarchive"> NanoBot will index, but not keep a coy/cache/archive of this document.

The robots tag is obeyed by many different web robots. If you'd like to specify some of these restrictions only for NanoBot, you may use NanoBot in place of robots. You can also combine these tags into a single meta tag. For example:

<META NAME="robots" CONTENT="noindex,nofollow">

<META NAME="NanoBot" CONTENT="noindex,nofollow">

Why is NanoBot downloading the same page on my site multiple times?

In general, NanoBot should only download one copy of each file from your site during a given crawl. Occasionally the crawler is stopped and restarted, and it may recrawl pages that it has recently retrieved. These recrawls should happen infrequently.

How do I report problems?

If you have questions about our NanoBot technology, please e-mail them to info@machinephasesystems.com. In your e-mail message, include as much detail as possible about the problem you are experiencing. Thank you!

home | about | jobs | contact | webmasters