How to Spider Documents
- 1). Make a list of all the documents that you want to index and those that you do not want indexed.
- 2). Open Notepad and copy the lines below:
User-agent: *
Disallow: /images/
User-agent: Googlebot-Image
Disallow: /images
The first section above prevents spiders from accessing the "images" folder effectively removing it from being indexed.
The second section specifies that the "Googlebot-Image" spider should skip indexing the "images" folder. - 3). Add as many "Disallow" statements as you prefer depending on the folders that you want skipped during indexing. Refer to the list you created earlier to ensure no folder is missed.
- 4). Specify specific files that you want skipped during indexing as shown below:
User-agent: *
Disallow: /documents/ehow.txt
The above statements will tell all spiders to avoid indexing the "ehow.txt" file that is located inside the "Documents" folder.
The above statement can similarly be replicated for any other documents that need to be skipped during indexing. - 5). Save the above file as "robots.txt" and upload it onto the main directory of the website.