The HTML to text converter suite consists of a HTML to text, and a text to HTML converter. Html2txt removes all tags from HTML files, substitutes HTML symbols for ASCII symbols, and removes extraneous blank lines. Txt2html uses an existing plain text file and produces an HTML version, suitable for publishing on the World Wide Web. The advantage of using Txt2html is to preserve the layout of your plain text files when viewed with a web browser.
This converters are freely available on the Internet, and are provided free of charge. It may be freely distributed with all files intact, but I would like to be informed if it is distributed on any packages. Users must NOT be charged for this converter.
It is available for Linux at Sunsite (It is compiled using Red Hat release release 5.0 (Hurricane)).
There are also versions for Windows 95/NT and other popular UNIX flavours. Download the latest version from your favourite FTP site, or check at the Stellar-X page.
Executables for various platforms are in the ./bin directory.
For Unix machines, ask the System Administrator to put the executables for the platform into a directory in user's paths, such as /usr/local/bin. The executables should be renamed to html2txt and txt2html.
For the DOS/Windows 95 console version, copy the executables to somewhere in your path, and renaming them as above.
For Windows95/NT, copy h2twin32.exe and t2hwin32.exe into the Windows directory. A registry file, h2twin32.reg, is provided to integrate the Windows version with the shell. After checking that the contents are suitable, merge it with the system registry by double-clicking on it.
Usage is simple. Type
html2txt HTML file
Where the first argument, HTML file, is the file to convert. Multple files and wildcard filenames are supported on Unix versions. A new file will be created with the same name as the original, but with a .htm or .html extension. It contains the resultant text file. Edit it to suit your needs.
Alternatively, type
html2txt -l HTML file
An additonal file, html.out, lists all HTML tags and symbols detected in HTML file, and the lines numbers where they were found. Any errors which are reported with the tags may be listed in this file.
Please keep in mind that all HTML tags begin with a < and end with a >. HTML symbols start with a & and end with a ;. If the HTML source file has a <, >, or & as part of its text, the output file may be missing some text. Run with the -l option and check for any problems.
To convert from text to HTML, type
txt2html Text file
Where the argument, Text file, are the files to convert. A new file with a .htm or .html extension will contain the resultant text file. It is ready for use, or further editing. It also contains the date the file was generated, and its source file and directory, for future reference.
Invoking the context menu of any .htm or .html file will reveal a Convert to text option. This will run h2twin32.exe, and create a text version. Similarly, all .txt files will have a Convert to HTML option, running t2hwin32.exe. When not executed from a context menu, these will prompt the user for files to convert.
Everybody may freely use and distribute this package, as long as all files are intact. I will provide technical support, but I am not responsible for any damage that this software may cause. If you have suggestions for improvement, problems, complaints, I invite you to contact Stellar-X at antonino@usa.net.
Download latest versions from the Stellar-X website.
Version 1 - Released, no major bugs. Version 1.1 - Better handling of invalid and unreal HTML tags. Version 1.2 - Better file conversion and wildcard support. Windows 95 version. Integrated with the txt2html package. Included versions for other UNIX flavours.