Before a site appears in search results, a search engine must index it. An
indexed site will have been visited and analyzed by a search robot with relevant
information saved in the search engine database. If a page is present in the
search engine index, it can be displayed in search results otherwise, the
search engine cannot know anything about it and it cannot display information
from the page..
A D V E R T I S E M E N T
Most average sized sites (with dozens to hundreds of pages) are usually
indexed correctly by search engines. However, you should remember the following
points when constructing your site. There are two ways to allow a search engine
to learn about a new site:
- Submit the address of the site manually using a form associated with the
search engine, if available. In this case, you are the one who informs the
search engine about the new site and its address goes into the queue for
indexing. Only the main page of the site needs to be added, the search robot
will find the rest of pages by following links.
- Let the search robot find the site on its own. If there is at least one
inbound link to your resource from other indexed resources, the search robot
will soon visit and index your site. In most cases, this method is recommended.
Get some inbound links to your site and just wait until the robot visits it.
This may actually be quicker than manually adding it to the submission queue.
Indexing a site typically takes from a few days to two weeks depending on the
search engine. The Google search engine is the quickest of the bunch.
Try to make your site friendly to search robots by following these rules:
- Try to make any page of your site reachable from the main page in not more
than three mouse clicks. If the structure of the site does not allow you to do
this, create a so-called site map that will allow this rule to be
observed.
- Do not make common mistakes. Session identifiers make indexing more
difficult. If you use script navigation, make sure you duplicate these links
with regular ones because search engines cannot read scripts (see more details
about these and other mistakes in section 2.3).
- Remember that search engines index no more than the first 100-200 KB of
text on a page. Hence, the following rule � do not use pages with text larger
than 100 KB if you want them to be indexed completely.
You can manage the behavior of search robots using the file robots.txt. This
file allows you to explicitly permit or forbid them to index particular pages on
your site.
The databases of search engines are constantly being updated; records in them
may change, disappear and reappear. That is why the number of indexed pages on
your site may sometimes vary. One of the most common reasons for a page to
disappear from indexes is server unavailability. This means that the search
robot could not access it at the time it was attempting to index the site. After
the server is restarted, the site should eventually reappear in the index.
You should note that the more inbound links your site has, the more quickly
it gets re-indexed. You can track the process of indexing your site by analyzing
server log files where all visits of search robots are logged. We will give
details of seo software that allows you to track such visits in a later section.