Hlins: Hyper-Link Insertions in HTML documentsVersion 0.32
Hlins: Hyper-Link Insertions in HTML documents
Version 0.32
1 An Introductory Example
Hlins inserts in a HTML
document the url's (uniform resource locator) for certain names
(normally the names of people), according to a data base associating
url's to names.
First you have to create a data base that associates url's to names,
let's call it addresses:
Donald Knuth = http://www-cs-staff.stanford.edu/...
Leslie Lamport = http://www.research.digital.com/...
Suppose that you have a HTML document mytext.html that
contains text as
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor Donald
Knuth, which was used by L. Lamport as a base to build the more user-friendly
(but less powerful) LaTeX system.
Calling hlins -db addresses -o newtext.html mytext.html will
generate a file newtext.html that contains now the piece of
text
A milestone in the development of digital typesetting was the TeX
system developed by Stanford computer science professor <a
href="http://www-cs-staff.stanford.edu/...">Donald Knuth</a>, which
was used by <a
href="http://www.research.digital.com/...">L. Lamport</a> as a base to
build the more user-friendly (but less powerful) LaTeX system.
which will eventually be rendered by a browser as something like
A milestone in the development of digital typesetting was the
TeX system developed by Stanford computer science professor
Donald
Knuth, which was used by
L. Lamport
as a base to build the more user-friendly (but less powerful) LaTeX
system.
Note that the url insertion knows about abbreviating first names (as
for Leslie Lamport) and works over line breaks (as for Donald Knuth).
2 Usage
hlins [-quiet] [-db databases] [-o outputfile] [inputfile]
Options can be given in any order. If inputfile, resp.
outputfile are missing then STDIN resp. STDOUT are used. If
both are specified then inputfile and outputfile
must be different file names. Hence, Hlins can be used as filter, for
instance in a pipe with HeVeA (as done to produce the HTML version of
this document, see the Makefile).
The string databases is a blank-separated list of data base
files, which means that you have to protect the blanks from your shell
when using several data base files. Examples of usage strings in the
csh shell are
hlins -db myaddresses -o new.tex original.tex
hlins -db "friends groupmembers" -o final.tex
The flag -quiet
suppresses diagnostic messages.
3 Secondary Effects on the HTML Text
Hlins replaces special characters of HTML (as é
or
é
) by the corresponding ISO-8859-1 character, which is in
this case é
. Hence, you can use Hlins without any database
argument to replace HTML special characters in a HTML document.
In some cases, non-empty sequences of white space characters may be
replaced by one space. However, this happens only when the white space
is part of a prefix of some name in the data base. Anyway, this
replacement is irrelevant for the rendering of HTML documents.
4 Address Data Bases
Every line of the file must be either a comment line or an address
specification. A comment-line is a line that either consists only of
white space, or that starts with the comment-symbol #
(possibly
preceded by white space).
An address specification consists of a name and a url that are
separated by the character =
. Leading white space of the line
is ignored. In the name, the character =
must be written as
==
.
Special characters in the name can be either written in HTML or as 8bit
characters. The number of spaces separating the words of a name is not
relevant.
The syntax of the url is not checked.
5 Variations of Names
Several variations of the names in the data base are recognized as
well:
-
If the last word of the name contains the symbol
-
then
the name without this -
and everything behind is also
recognized. Hence, if you have an entry for Egon Müller-Meier
then Egon Müller is also recognized.
- The first word of a name may be abbreviated. The abbreviation of
a first name is its first letter followed by a dot, except in case of
a word starting with St when it is St followed by a
dot. Composite first names are abbreviated in both components, hence
Marc-Stephane becomes M.-St..
Abbreviation of a first name can be suppressed by prefixing it in the
data base with !
. If a first name starts on !
then you
have to write it as !!
. Should a first name starting on
!
be protected from abbreviation then you have to write it as
!!!
. This mechanism is used in the data
base to produce this document, to have matching of
Objective Caml
but to avoid matching of O. Caml
.
Words not containing a blank are never abbreviated and the symbol
!
does not have any special meaning for them.
6 The Exact Rules of Searching Names
Names are searched starting from the beginning of the text. If there
are overlapping matches then the match starting at the earlier
position wins. For example, if the data base contains entries for
Egon Meier
and for Hans Egon Meier-Müller
then the second
one matches on input Hans Egon Meier-Müller
.
A match is extended to longer matches if possible. That is, if the
data base contains entries for Hans Egon
and for
Hans Egon Meier
then the second one matches on input
Hans Egon Meier
.
7 The Exact Rules of URL Insertion
If the occurrence of the name is immediately followed by </a>
or </A>
then no insertion takes place.
If there are several different url's for a string foundname
then the following rules apply to determine the url inserted:
-
An address specification ``name = url'' where
name matches exactly (modulo white space and HTML special
characters) foundname has priority over a name specification
``name = url'' where foundname is an
abbreviation for name.
- In the list obtained from the above priority rule, the first
match is taken.
A warning is issued in case of a conflict, unless the -quiet
option has been given.
For instance, your data base might contain something like
Hans Meyer = http://address.for.full.name
H. Meyer = http://address.for.abbreviated.name
On input H. Meyer
, the second address specification is selected
(and a warning is issued).
8 Implementation
Hlins is written in Objective Caml.
9 License and Installation
Hins ins covered by the Gnu General Public License.
See the Hlins home page for
binary and source distributions.
10 Credits
Thanks to Claude Marché and Jean-Christophe Filliâtre for their
remarks and suggestions.
Ralf Treinen, December 16, 1999.
This document was translated from LATEX by HEVEA.