Hlins: Hyper-Link Insertions in HTML documentsVersion 0.32

Hlins: Hyper-Link Insertions in HTML documents
Version 0.32

Ralf Treinen

December 16, 1999






1   An Introductory Example

Hlins inserts in a HTML document the url's (uniform resource locator) for certain names (normally the names of people), according to a data base associating url's to names.

First you have to create a data base that associates url's to names, let's call it addresses:
Donald Knuth        =  http://www-cs-staff.stanford.edu/...
Leslie Lamport      =  http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will generate a file newtext.html that contains now the piece of text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like

A milestone in the development of digital typesetting was the TeX system developed by Stanford computer science professor Donald Knuth, which was used by L. Lamport as a base to build the more user-friendly (but less powerful) LaTeX system.
Note that the url insertion knows about abbreviating first names (as for Leslie Lamport) and works over line breaks (as for Donald Knuth).

2   Usage

hlins [-quiet] [-db databases] [-o outputfile] [inputfile]
Options can be given in any order. If inputfile, resp. outputfile are missing then STDIN resp. STDOUT are used. If both are specified then inputfile and outputfile must be different file names. Hence, Hlins can be used as filter, for instance in a pipe with HeVeA (as done to produce the HTML version of this document, see the Makefile).

The string databases is a blank-separated list of data base files, which means that you have to protect the blanks from your shell when using several data base files. Examples of usage strings in the csh shell are
hlins -db myaddresses -o new.tex original.tex
hlins -db "friends groupmembers" -o final.tex
The flag -quiet suppresses diagnostic messages.

3   Secondary Effects on the HTML Text

Hlins replaces special characters of HTML (as &eacute; or &#233) by the corresponding ISO-8859-1 character, which is in this case é. Hence, you can use Hlins without any database argument to replace HTML special characters in a HTML document.

In some cases, non-empty sequences of white space characters may be replaced by one space. However, this happens only when the white space is part of a prefix of some name in the data base. Anyway, this replacement is irrelevant for the rendering of HTML documents.

4   Address Data Bases

Every line of the file must be either a comment line or an address specification. A comment-line is a line that either consists only of white space, or that starts with the comment-symbol # (possibly preceded by white space).

An address specification consists of a name and a url that are separated by the character = . Leading white space of the line is ignored. In the name, the character = must be written as ==.

Special characters in the name can be either written in HTML or as 8bit characters. The number of spaces separating the words of a name is not relevant.

The syntax of the url is not checked.

5   Variations of Names

Several variations of the names in the data base are recognized as well:
  1. If the last word of the name contains the symbol - then the name without this - and everything behind is also recognized. Hence, if you have an entry for Egon Müller-Meier then Egon Müller is also recognized.
  2. The first word of a name may be abbreviated. The abbreviation of a first name is its first letter followed by a dot, except in case of a word starting with St when it is St followed by a dot. Composite first names are abbreviated in both components, hence Marc-Stephane becomes M.-St..

    Abbreviation of a first name can be suppressed by prefixing it in the data base with !. If a first name starts on ! then you have to write it as !!. Should a first name starting on ! be protected from abbreviation then you have to write it as !!!. This mechanism is used in the data base to produce this document, to have matching of Objective Caml but to avoid matching of O. Caml.
Words not containing a blank are never abbreviated and the symbol ! does not have any special meaning for them.

6   The Exact Rules of Searching Names

Names are searched starting from the beginning of the text. If there are overlapping matches then the match starting at the earlier position wins. For example, if the data base contains entries for Egon Meier and for Hans Egon Meier-Müller then the second one matches on input Hans Egon Meier-Müller.

A match is extended to longer matches if possible. That is, if the data base contains entries for Hans Egon and for Hans Egon Meier then the second one matches on input Hans Egon Meier.

7   The Exact Rules of URL Insertion

If the occurrence of the name is immediately followed by </a> or </A> then no insertion takes place.

If there are several different url's for a string foundname then the following rules apply to determine the url inserted:
  1. An address specification ``name = url'' where name matches exactly (modulo white space and HTML special characters) foundname has priority over a name specification ``name = url'' where foundname is an abbreviation for name.
  2. In the list obtained from the above priority rule, the first match is taken.
A warning is issued in case of a conflict, unless the -quiet option has been given.

For instance, your data base might contain something like
Hans Meyer   =  http://address.for.full.name
H. Meyer     =  http://address.for.abbreviated.name
On input H. Meyer, the second address specification is selected (and a warning is issued).

8   Implementation

Hlins is written in Objective Caml.

9   License and Installation

Hins ins covered by the Gnu General Public License. See the Hlins home page for binary and source distributions.

10   Credits

Thanks to Claude Marché and Jean-Christophe Filliâtre for their remarks and suggestions.

Ralf Treinen, December 16, 1999.


This document was translated from LATEX by HEVEA.