Penn State Search Engine FAQ
Layer 3 Layer 2 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9

Intro. to the Search Engine | Search Engine Help | Penn State Search Engine



Section 1: General Information


Section 2: Site Indexing


Section 3: Common Problems


Section 4: Policies


Section 1: General Information

1.1 What is an indexed site?
An indexed site is a server that is added to the search engine index. The search engine administrator adds sites to the search engine. An example of an indexed site is
http://www.psu.edu/.

When a site (server) is added to the list of indexed sites, the search engine spider looks for a particular file in the root directory of that site. The file name depends on the type of server being indexed. For example, www.psu.edu uses the default index.html. It then indexes the index.html file and follows all the links in that file. Each of the referenced files will be indexed by the search engine (provided it belongs to an indexed site). It is not necessary for all the documents of an indexed site to be indexed by the search engine. Those files that are referenced by indexed files will also be indexed. If the search engine spider does not find the server-specific default file (index.html or default.html in most cases), then it will visit all the sub-directories, looking for the default file in each. The indexing process continues in the same manner.

NOTE: The search engine will not follow links from your pages to URLs that are not served by the search engine. For example, your site, http://www.aaa.psu.edu/ includes a link to http://www.some.org/. The site, www.some.org, will not be indexed by the search engine.

1.2 What search engine does Penn State use?
The Penn State Search Engine uses Inktomi's Search/Site (formerly UltraSEEK Server) search engine, version 3.1.10 on Solaris 2.5.1. Our license agreement with Inktomi currently allows for 1,000,000 indexed documents. Per this license agreement, we can only index .psu.edu sites. Note that we currently use the UltraSEEK logo/image on our pages. We are permitted to use this logo per our license agreement; however, subsequent versions of the search engine will likely display the new
Inktomi logo.


Return to topics

Section 2: Site Indexing

2.1 What is the procedure for getting a site indexed by the Penn State search engine?
To have your site indexed by the Penn State search engine, send an e-mail request to
webmaster@psu.edu and include the URL for your site. Once your site is added to the index, the site will be indexed overnight. You will receive an e-mail notification from webmaster@psu.edu that your site has been indexed.

Special note for people with pages in departmental space (www.psu.edu/dept/): If you have departmental Web space and would like your documents to be searched, then make sure that you reference all site files from either the server-specific default file (index.html or default.html) or another referenced file. Consider the case where the main index.html file at your site, which is indexed, makes a reference to your "index.html" file. Suppose that you have the files a.html, b.html, and c.html in your directory. If a.html and b.html are the only files that are referenced from your index.html file, then c.html will not be indexed as it is not referenced by any file that the search engine spider can find. So either reference c.html from your index.html file or from a.html or b.html.

Special note for people with pages on the personal server:
Please do not send e-mail to webmaster@psu.edu requesting that we index your personal pages. As of June 1999, faculty, staff, and student personal Web pages on the personal server (www.personal.psu.edu) are no longer indexed by the Penn State search engine.

2.2 How long does it take the search engine to recognize changes to indexed documents?
It takes about 10 days after the files are changed for the search engine to purge the files from the index. This is also true for files that are removed from the search engine.

2.3 I am responsible for a site that resides on a server that is not indexed by the search engine. Can I request that the site be indexed?
We can index the site for you but we prefer to index the entire server and not just one site on the server. We prefer that you first ask the server administrator's permission before sending a request to have the site indexed. The server administrator can also send this request.

2.4 The server for which I am responsible is indexed by the search engine. I have materials on this server that I do not want the search engine to find. Can the search engine disallow certain sites/pages on a particular server from being indexed?
Yes. For every indexed server, filters can be established to allow and disallow the indexing of certain sites on a particular server.

For example, you are responsible for the sites that reside on the server www.someserver.psu.edu, which is indexed by the search engine. You just created a site for a particular department; the URL is http://www.someserver.psu.edu/yoursite/. However, you have a directory, "budget" that you do not want the search engine to index. We can set up a filter to disallow the indexing of http://www.someserver.psu.edu/yoursite/budget/. The site www.someserver.psu.edu/yoursite/ will be indexed, but all files associated with the directory "budget" will not be indexed. When submitting a request to have your site indexed, make sure to mention any disallow filters.

You can also use the robots.txt file to restrict access to parts of your Web site. Please see the Web Robots Pages for instructions on how to do this.

2.5 How do I prevent the search engine from indexing part or all of a document?
In your HTML document(s) surround the information that you do not want to index with the following tags: <!--stopindex-->   <!--startindex-->

All information before the <!--stopindex--> tag and after the <!--startindex--> will still be indexed.

If you wish to prevent an entire document from being indexed by the search engine, then insert the tag <!--stopindex--> within the <HEAD> </HEAD> tags and omit the <!--startindex--> tag.

NOTE: It might take up to 10 days for our search engine to recognize the update to your document(s) and that it needs to remove all or part of the document(s) from the index.

By default, when a server is indexed (for example, www.someserver.psu.edu) all documents, provided that all files/directories are referenced from the server-specific default file or another referenced file in the www.someserver.psu.edu collection are indexed, unless you request otherwise.

2.6 How do I invoke the search engine from my Web page?
Penn State colleges, departments, and other official Penn State units can invoke the search engine from Web pages by creating a form that either passes the requested parameters to the search engine or that directly starts the search process. The syntax for creating a form is shown below in Example 1. Add the following lines to the HTML document:

<form name=seek method=GET action="http://search.psu.edu/index.html">
<input type=text name=qt size=40 value="" maxlength=2047>
<input type=submit value=" Search ">
</form>

To directly activate the searching process from a Web page, you will need to replace the action tag. It should point to http://search.psu.edu/query.html. In addition, a collection name must be specified. To do this, add the following line:

<input type=hidden name=col value=psu>

We chose to use the psu collection name because it searches all of the indexed Web servers listed in the Penn State domain. In case you need to restrict the search to a particular collection name, then you would replace psu by a collection name such as polreg or uinfonet. The Penn State search engine provides three collection names:

  • psu: all indexed Web servers listed in the Penn State domain
  • polreg: all indexed policy servers listed in the Penn State domain
  • uinfonet: all indexed undergraduate information servers listed in the Penn State domain

To restrict the search process to specific URL's, use the search parameter query prefix (qp) as shown below:

<input type=hidden name=qp value="url:www.aaa.psu.edu">

This will restrict the search to URLs that contain www.aaa.psu.edu. Users who visit your page will be able to search your site and find all results containing www.aaa.psu.edu. Do not forget to add the collection name tag, as in the previous example.

Additional search parameters can be found via the search parameters section of the UltraSEEK FAQ at http://www.inktomi.com/products/search/support/docs/faqs/faq007.htm.

2.7 What are META tags? How can I use META tags?
META tags serve a variety of different functions, depending on how you use them in your document. You can redirect or reload the page after a specified amount of time and you can use META tags to provide visitors with information about your Web pages/site. In particular, META tags can provide keywords, controlling HOW your page is indexed by the search engine.

Your META tag information should be added after the </TITLE> and before the </HEAD> tags.

First, you might want to describe your document, so that the search engine displays the description META tag along with the title of your document in the results. The description META tag looks like this:
<META NAME="description" CONTENT="your description">

Keywords help the search engine to categorize your site. Choose keywords that best describe your content. Choose carefully---priority is given to the first few keywords found. The keyword META tag looks like this:
<META NAME="keywords" CONTENT="keyword1, keyword2, keyword3">

The refresh META tag is a way to direct visitors of your Web site to another site after a specified amount of time. This tag is especially helpful if your Web site has moved to a new location. The refresh tag looks like this:
<meta http-equiv="refresh" CONTENT="10; url=http://www.yournewwebsite.psu.edu">
The number in the CONTENT section of the tag reflects the number of seconds visitors will be automatically directed to the new site.

Currently, we do not prevent refresh META tag use; however, you might consider sending to us the URL to your new site. We can then index the new site and remove the old site form the index.

In addition to META tags, the title of your document should reflect the content of your document. The title of your document should be placed between the <TITLE> and </TITLE> tags.

The following is a list of META tag resources:

For more information about how the Penn State search engine responds to META Tags, please see http://search.psu.edu/help/meta.html.


Return to topics

Section 3: Common Problems

3.1 I just created a new site which is indexed by the search engine; however, files from the old site are still being indexed. What can I do?
The old site will need to be removed. If the old site was hosted via
Penn State departmental Web space, then the site's administrator/supervisor will need to request removal via the CAC Computer Accounts Office. After the Accounts Office verifies the request, the site can be removed. If the old site was hosted on an independent server (for example, www.aaa.psu.edu), then the site administrator for the independent server will need to be contacted. It then takes about 10 days after the site has been removed for the search engine to purge the files from the index. If there are only a few URLs in the index, then we can delete the URLs manually. If there are many URLs, then you will need to wait until the server purges the URLs from the index.

3.2 My site exists on a server that is indexed by the search engine. Why doesn't my site show in the index when I search on my URL or keywords?
Most likely, the indexed server on which your site resides does not provide links to your site from any of its pages/sites. Contact the server administrator or Webmaster for the server to request that a link be added to your site. If a link can be added, then it might take up to 10 days for the search engine to recognize the change and find your page. If for some reason you can not be listed, contact
webmaster@psu.edu again and we can add your site as a separate URL for the search engine to index.

3.3 I can't find a particular site. Why is this?
Not all servers/sites are indexed by the search engine. We have not been proactive in contacting every Webmaster/server administrator for every server on campus. If a request is sent to us, then we index the site. If a site is not indexed, it is likely that the Webmaster/server administrator for the site has not requested that the site be indexed with the Penn State search engine.


Return to topics

Section 4: Policies

4.1 Can a site from outside the psu.edu domain be indexed?
No, we can only add .psu.edu sites to our search engine per the contract agreement with the search engine company, Inktomi.

4.2 Can server administrators/Webmasters register Penn State pages with the various search engines, such as Yahoo, Excite, Lycos, etc?
Yes, Penn State pages/sites can be registered with other search engines. We currently do not exclude robots from searching our site.

4.3 Are there any regulations or requirements concerning which search engine Penn State departments, colleges, and units can use?
No, each Penn State college, department, or unit can use the search engine of their choice. Departments, colleges, and units are welcome to index sites with our search engine and then query our search engine to find documents on the departmental, college, or unit server (see
Question 2.6, "How do I invoke the search engine from my Web page?).


Return to topics