ACC web search | Alkaline search engine
ACC Web pages search
In the spring of 2000, ACC initiated a search engine for the ACC site. The ACC website has a mixture of material of interest to the general public and material mainly useful to ACC staff. When archival information, such as prior year's catalog entries and such, is located on the search just as easily as current information, that makes the search facility less useful than it might be. It is useful for ACC web authors to understand how the search works so that they can make appropriate information come up early in the search lists and other information come up late or not at all.
| Hints to make a page come up early | Hints to "hide" a page | Other ways to "hide" pages | How the search works
I have read the background information from Alkaline and looked through the results of the search process fairly extensively. I have talked with Glenda Keyworth some about the properties of the search and the search configuration file. At her suggestion, I have tried to experiment with the ACC search.
Some hints to make a page come up early in the search list:
Back to the
top of this document.Some hints to make a page come up late in the search list or not at all:
Back to the
top of this document.Make sure that you don't link to the page from your home page (or other pages that might be linked to from elsewhere). This is a bit tricky, since you can't control the links others make. However, you can post a page for your committee, tell them all about it, and tell them not to make links to it. This works pretty well and, if no links to it are made, it will definitely protect the page from searches outside ACC (like from Yahoo, etc.) since they can only follow links and don't have access to the entire file structure within the directories.
Back to the
top of this document.When a web author decides to make this (or any typical) search available, first they decide on a search configuration file and then run it to create the search database. Usually a "default" search configuration file is provided. When you search on a specific work or set of words, the program is merely looking through the search database that has been created. It is not going back to the original pages. Thus, if a page has been deleted (or moved) since the search database was last updated, it will still be listed, but when you click on the URL as it came up in the search, you won't find a page. Moreover, if a page has been added since the search database has last been updated, it will not come up in the search.
In response to various questions of mine, Glenda said that it was set to index all the various ACC web servers: www2, www3, www, accweb, lrs, opc, etc. and to follow links from their root directories. She tells me that our search is set to follow links rather than just find all files that are posted. (In general, this is true of search engines.) In confirmation of that, I notice that a number of pages I know about which aren't linked do not appear to be found by the search. However, I have seen some pages that I believe have no links to them that have appeared in the search database, so I would not completely rely on this method. Whether they appear because someone else made a link to the page or because the search engine is really searching the entire directory structure of our web servers, I can't be sure.
Glenda also said that the "site depth" was set at 2 and that it was set in a mode that means it should continually scan the web servers and add pages to its database as they are posted. She ran it in March, I think, and said that the she had to cut it off from creating the search database after about 10 hours. So it is possible that some pages were missed as she stopped it.
I posted some pages appropriately to test this (in my root directory, linked to from my main page) and they have not appeared in the search database after three weeks. So I am skeptical that it is continually adding to the database. That means I can't see the results of my experiment until the search database is created again. Glenda doesn't anticipate doing that very soon.
About the "site depth" of the search at 2 -- it appears to me that should mean pages more than 2 subdirectories deep would not be indexed. Whether that is two subdirectories from the ACC root directory or from your personal web directory isn't clear. I have seen one page in our search database with a URL longer than that. However, I haven't seen any with URLs this long: http://www.austincc.edu/business/second/third/fourth/file.html. I asked her whether my interpretation was correct -- that such pages wouldn't be indexed. She agreed that the technical description of the search engine says that they wouldn't, but cautioned that you can't ever really tell what will happen except by experimentation because those who write the descriptions aren't the same people as those writing the code that makes the program. She did say that she has no intention of ever making that "site depth" parameter any higher than 2.
Back to the
top of this document.Last updated May 21, 2000. Comments or questions?