mhtml -- a program to mirror html pages recursively
Copyright (C) 1996 Kevin M. Bealer
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
You can send mail to the author at
Kevin M Bealer
94 Bowers Road
Mertztown, PA 19539
mhtml [options] (page)
mhtml is a contained c++ program w/ flex lexer, except for one
script, fetchfile. mhtml is based on a previous program which
used (besides the makefile) three C++ programs (including a
flex generated lexer), and three bash scripts. This redesign
was prompted by the unuseable interface of the component
programs and a poor design which prevented the program's data
from being in the right modules at the right times. (In as much
as the first program was written without a clear design, this
_is_ the original design :))
Other currently unaccessable html features may be exploitable as
well.
Concept:
Approximate algorithm:
Future plans:
Forseeable future:
This will mostly depend on what seems useful. A few ideas would
be to make a polling algorithm for image maps that would use a
grid (and possibly a "cheap monte carlo" method) to reconstruct
an image map. This would require a little fiddling of course,
but it might even be possible (assuming documentation on image
maps can be acquired.)Deep future:
Parts of the puzzle could be rewritten in Scheme to make it
more configurable in terms of which images are kept, which pages
visited, etc. It would then be within reach to, say, make
it into a worm.
Links
Local cache directory