Remote Access Authorization for Commercial Databases Requiring IP validation and Username/Password Validation -- Lynx URL Rewriter in use at TCU  Kerry Bouchard

3/25/1996, last revised 4/20/2000

NOTE: The Lynx-based solution described here is no longer in production use at TCU. We are now using EZproxy. For a discussion of the TCU EZproxy implementation and other authentication solutions, see the Remote Authorization and Authentication Presentation given at the March 2000 DRA Users' Conference.

Explanation
Requirements
Setting Up Access Authentication on Systems Running CERN/VMS

CGI Scripting System for Transparent Proxy Access to Remote Databases

Powerpoint Presentation for DRA Users Group Conference, March 1999
Handout for DRA Users Group Conference, March 1999

Explanation:
Libraries have encouraged commercial database providers to write license agreements that allow library patrons to access databases regardless of the patron's physical location (access is based on "who you are" rather than "where you are"). When these services are remotely mounted and accessed through the telnet protocol, scripting languages (such as DCL) used in combination with C-Kermit provided a simple way to write logon scripts that validated users as being patrons of the library (based on patron id number or some other unique identifier), and then logged the user on to the remote database without ever revealing the library's institutional username and password.

Increasingly, remote access to these same databases has been shifted from telnet to the World Wide Web. The WWW HTTP protocol provides a mechanism to accomplish the first part of the above sequence (validating users by patron id), but not the second part (getting the patrons "logged on" to a remote server without knowing the institutional username and password). This problem is due to the "stateless" nature of the HTTP protocol, and the way the HTTP protocol handles access authorization. (For details, see the draft HTTP 1.1 specification, Chapter 11 --"Access Authentication".) In the case of servers that validate by IP address, rather than name and password, there is still a problem, since many library patrons may be accessing the library menus from a third-party Internet Service Provider. The remote server will see the third party ISP's IP address, not the library's.

The collection of CGI scripts and the executable file below provide a mechanism to work around these problems on library systems running the CERN WWW server under OpenVMS. I have also included some VMS and DRA-specific tools for building password files to accomplish the first task (validating by patron id before users can access the CGI scripts that in turn access the remote database server).

Requirements:

Setting Up Access Authentication on Systems Running CERN/VMS:

If you have not done so already, you will need to set up Access Authorization password files on your own system to restrict access to the CGI scripts documented below. Otherwise you will be granting the whole world access to the remote databases and violating your license agreements. The example below documents the approach currently used on the Mary Couts Burnett Library system at Texas Christian University.

The CERN server follows UNIX rules for password files: every Username must be unique, but all users can have the same password. The password fields in the file must be encrypted. I have not found any way to force a Web browser to prompt only for password; they always prompt for Username and Password when a server sends an authentication challenge.

Therefore I have used a somewhat awkward mechanism where the user must answer the "Username:" prompt with their ID number (which is unique), and then use a text string common to all patrons as their "Password." (This is explained to the user with a disclaimer screen that comes up before the username and password validation prompt.) The reason for using a generic string as the password, (rather than say, individual last names) is that the HTADM.EXE file that CERN supplies for maintaining server password files is only intended to be run interactively; it is far too resource-intensive to run from a batch job that builds a file containing thousands of ids (assuming you want to update the file on a nightly/weekly basis). By using a generic text string ("tcu") as the "password" for all library patrons, I was able to use the HTADM.EXE program once, to see how it encrypts "tcu", and then include this encrypted string in a GEMbase (Report Writer) job that builds an updated set of password files every night. In these files, every user has a unique "username" (their id number), and "tcu" as their password.

The following sample programs are available for download. Except for STREAM_LF.FDL, all contain site-specific code that must be edited:

 

CGI Scripting System for Transparent Proxy Access to Remote Databases

Many thanks to Joachim Martin of Harvard University Library for responding to my January 1996 posting to Web4Lib and suggesting using Lynx with the -post_data parameter as a way of automating First Search logons.

The basic technique used takes advantage of the Lynx "-auth", "-source", and "-post_data" command line parameters. In the case of servers that validate by Username and Password, the "-auth" parameter tells Lynx to pass the username and password to the remote server. Since Lynx is invoked from within a CGI script, these parameters are hidden from the user. In the case of servers that validate by IP address, the fact that Lynx is running on the library's server means that it is coming from the IP address that the remote server expects to see (so the "-auth" parameter is not needed). The "-source" parameter tells Lynx to dump the HTML (or other data) it finds at the specified URL to the SYS$OUTPUT device, which can be redirected to a disk file with DCL commands. For servers, such as OCLC's, that ask the user to input a Username and Password into a form, the "-post_data" parameter tells Lynx to "fill in the form" with the username and password automatically; i.e., is an alternative to the "-auth" parameter.

Using Lynx with these parameters, it is possible to write a CGI script that invokes Lynx to access a remote URL, dump the output to a file, and then send this file to the user's browser. However, simply sending the raw file to the user's browser is not enough, for two reasons:


The "LYNX_PROXY.EXE" program (available below) addresses these two problems by taking the Lynx "source" file as input and sending output to the user's browser in which: 1) all relative URL's are converted to absolute URL's, and 2) all URL's pointing to files that require Access Authentication have the library server's CGI script prepended to the URL, e.g. "http://www.remote.com/protected.htm" becomes: "http://www.library.edu/htbin/remote.pp?http://www.remote.com/protected.htm".   So access to protected files continues to be redirected through the CGI script.

This is not a very good way to solve the problem - the good way would be to extend the HTTP Protocol "Access Authentication" methods to include commands that would allow a CGI script on a library server to tell a user's browser what authentication data to use for a remote database server. Until/unless the standards group that works on revisions to HTTP implements such extensions, the programs documented below may fill a need.

The proxy scripts are nearly identical, with the exception of several symbol definitions at the beginning of each script, some of which contain site-specific passwords, which must be edited. (In some scripts, such as EI_VILLAGE.PP, there are other modifications to accommodate the quirks of a particular server.) The "TCU$UTIL" logical in each script must be replaced by logical pointing to the directory where Lynx is installed on your system. You will also need to either define a system-level "WWW$SCRATCH" logical pointing to a directory that can be used for temp files, or replace this logical in the scripts. (All of the passwords contained in the files are dummy passwords.)

3/5/98: Posted new version of INFOTRAC.PP script to handle IAC's new web servers.
6/10/98: Posted new  version of LYNX_PROXY.EXE that parses SRC= qualifiers inside FRAME tags.
6/17/98: Posted updated versions of all proxy scripts (.PP files below)
6/30/98: New version of LYNX_PROXY.EXE that correctly parses Java Script HREF parameters where strings are enclosed in single quotes instead of double quotes; also posted script for Annual Reviews in Microbiology e-journals
2/25/99: Posted script for The Lancet, plus Power Point slides and handout presented at March 1999 DRA Users Group conference.

LYNX_PROXY.EXE (AXP executable) or LYNX_PROXY.EXE (VAX executable) -- this program is called by all of the CGI scripts listed below. The current version of this program was last modified in 6/10/98.
VALIDATE.PP -- this CGI script provides a front-end to all the scripts below.  Since the majority of databases validate by IP address only, VALIDATE.PP can check for a local IP address and send the user directly to the remote database if they are coming from a local IP address.

1STS_CORE.PP CGI logon script for Core First Search. (Extended First Search script is the same but with different Username and Password)
AIP_PROXY.PP proxy script for off-campus access to American Institute of Physics journals
ANNREV_PROXY.PP proxy script for off-campus access to Annual Reviews in Biomedical Sciences electronic journals
ARCHUSA_PROXY.PP proxy script for off-campus access to Archives USA (Tex-Share subscription)
BRITANNICA.PP proxy script for off-campus access to Encyclopedia Britannica
CHADWYCK.PP proxy script for off-campus access to Chadwyck Healey Music Index
CSA.PP logon script for Cambridge Scientific Abstracts (similar to First Search, proxy is required for initial access from on or off-campus)
ECO.PP logon script for OCLC ECO journals
EI_VILLAGE.PP proxy script for Engineering Information Village, required both on and off-campus access
HOOVERS_PROXY.PP proxy script for access to Hoovers Company Information, required for on and off-campus access
IDEAL.PP proxy script for off-campus access to Academic Press (IDEAL) online journals
INFOTRAC.PP proxy script for off-campus access to Infotrac
IOP_PROXY.PP proxy script for off-campus access to Institute of Physics online journals
JSTOR_PROXY.PP proxy script for off-campus access to JSTOR online journals
LANCET_PROXY.PP off-campus access to The Lancet, uses HTTP Username/Password validation
MUSE.PP proxy script for off-campus access to Project Muse online journals
SPRINGER_PROXY.PP proxy script for off-campus access to Springer-Verlag online journals
TEXSHARE_PROXY.PP logon script for access to ABI Inform and Periodicals abstracts through Tex-Share.  Grab session id for on-campus users; proxy all access for off-campus users.

-###-