Remote Access to Web Databases/Files that Require Access Authorization, TCU URL Re-writer Solution

Kerry Bouchard: K.Bouchard@tcu.edu

The program discussed at the conference is fully documented, along with downloadable script files, at: http://lib.tcu.edu/www/staff/bouchard/cgi_logon/cgi_logon.htm. This version of the program uses the LYNX WWW browser in conjunction with DCL CGI scripts and the VMS implementation of the CERN web server, so it is highly "non-portable". I hope to have a new, Java-based version of the program ready to announce in May (1999). The new version is being tested on an NT box running IIS, but should be compatible with other operating systems and web browsers that can be made to support Java servlets. The new version should also be able to handle authorization for sites that require the exchange of Cookies to maintain state. Links to the new program will be posted on the Web page above, and I will send announcements to Web4Lib and ATLAS-L.

Validation Scenarios Handled/Not Handled by Lynx URL Re-writer Program

The program in use at TCU currently handles the following validation scenarios:

The Lynx Proxy program currently does not handle the following scenarios:

How a URL Rewriter Differs From a Proxy

From a user’s standpoint, the difference from a URL re-writer and a proxy is that they do not need to change their browser settings to start going through a proxy – they simply click on a URL that points to the URL re-writer program. From a systems stand-point, handling the form-based (and possibly even the HTTP-based Username/Password) scenarios above may not be do-able with a commercial proxy server, since access authorization was not what proxies were invented for.

What it Does

The re-writer script fetches HTML pages and other data on behalf of the user and sends the data onto the user’s browser. When fetching HTML data, the program re-writes links on the fly, as in the following example:

<a href="/dir1/dir2/afile.html">

is converted from "relative" to "absolute" form:

<a href="http://www.vendor.com/dir1/dir2/afile.html">

and then the URL of the re-writer program is prepended to the URL for the remote resource:

<a href="http://lib.tcu.edu/htbin/Proxy.pp?http://www.vendor.com/dir1/dir2/afile.html">

If the vendor site requires a Username/Password (HTTP-based), then the proxy program sends these along with the request for each page. If the vendor site uses forms to start a session, then the script for that site contains extra statements to fill in the form with the correct parameters and pass the user the page the vendor returns after the form has been successfully processed.

Advantages/Disadvantages of This Approach (or of TCU’s Implementation)

Advantages:

Disadvantages:

-###-