HTMLArchive

General Concepts

HTMLArchive runs through a template file, looking for an <ARCHIVE> tag. After reading the tag (and any other <ARCHIVE> tags you might have in the file), it writes to an output file. Ultimately, this output file should be what you're looking for out of this program.

If you execute HTMLArchive -v, you'll get a message about what version you're running.

Directories

You should note where you're running HTMLArchive, and where you're addressing your files. For example, if you call HTMLArchive test/template.html HTMLArchive will not change directory to 'test' prior to handling 'template.html'. It will parse through test/template.html, looking for any <ARCHIVE> tags, and running as normal. If 'test/template.html' specifies an ENTRYPATH to a relative directory, HTMLArchive considers the directory relative to where it was executed, not relative to 'test/template.html'.

This may change in a future release... I suspect it's entirely too confusing to keep track of all the relationships. I'll probably change everything to be relative to the template file, and not the directory in which HTMLArchive was executed.

The <ARCHIVE> tag has a couple of basic functions:

  1. It sets the place in your template file where the archive tag will write its results.

  2. It allows you to describe what you would like to see in its place.

  3. A list of files matching a wildcard pattern may replace the tag.

  4. Some text from the latest card found in the wildcard pattern may replace the tag.

Invoking HTMLArchive

Just type:


HTMLArchive [your template file's name]

and, provided your template file's <ARCHIVE> tag doesn't have silly values, HTMLArchive will handle the rest.

To handle text from stdin, specify a filename of '-'.

To work with files starting with '-', use another '-'.. like '--foo.html' for '-foo.html'.

The template file

Your template file represents the text that will be seen by the user after your archival entries have been processed. It holds one or more <ARCHIVE> tags that will be filled in later by the program.

If you are intending to use this with an existing set of pages, you would copy your original as the template file, gut out all the links to the various entries, then drop in the <ARCHIVE> tag in their place.

Variables

Occasionally, attributes may use variables. Variables are specified by enclosing them in '%' symbols, like %filename%. To specify a '%' symbol, put two of them together, e.g. '%%'.

The <ARCHIVE> tag

The <ARCHIVE> tag provides the bulk of the parameters needed to describe how you want HTMLArchive to fill in the entries, as well as how to find the latest text.

Just like a real tag, it handles this through a series of attributes:

  • INDEX

    The INDEX attribute tell HTMLArchive to replace the tag with all the entries it finds.

  • LATEST

    The LATEST attribute lets you describe the class id of the tag whose contents should be displayed in place of the <ARCHIVE> tag. So, for example, if you set this to 'mudpuppies', HTMLArchive will look inside the latest entry it finds for a tag like <span class="mudpuppies">, and write all the text between that tag and the </span> that ends it in place of the <ARCHIVE> tag.

    NOTE: If you specify INDEX and LATEST in the same <ARCHIVE> tag, both the index and 'latest' texts will replace the <ARCHIVE> tag, the index coming first.

  • OUTBOUND

    The OUTBOUND attribute specifies the name of the file that the template file should write to. This attribute must be set in the first <ARCHIVE> tag appearing in the template file.

  • REGEXP

    The REGEXP attribute specifies the regular expression to use when looking for files to regard as entries. HTMLArchive will look in the current directory for files matching this regular expression. The regular expression follows the POSIX standards.. it doesn't work merely with simple ? or * characters. You can learn more about what kind of regular expression this uses by typing 'man regex' at your command line (provided you have a man entry for this). You may also be able to type 'info regex' if you're using GNU's info system.

  • ENTRYPATH

    Use ENTRYPATH to specify what directory the REGEXP should be performed in.

  • REVERSE

    The REVERSE attribute tells HTMLArchive to reverse the SORTORDER.

  • SORTORDER

    The SORTORDER attribute lets you specify how the entries should be sorted. Currently you may sort according to FILENAME, CREATIONDATE, MODIFICATIONDATE, TITLE, or ORDERTAG. This may effect which file appears as LATEST. If you choose 'ORDERTAG', HTMLArchive will hunt for an <ARCHIVE> tag with an 'ORDER' attribute set to whatever string you like, and order the entries accordingly. TITLE will sort the entries according to the text found in the <TITLE> tag. FILENAME sorts on the file's name, CREATIONDATE sorts to the creation date of the file, and MODIFICATIONDATE sorts to the modification date of the file.

  • MAXCOL

    The MAXCOL attribute lets you specify the maximum entries that may appear in a grouping. HTMLAttribute will not place more than this number of entries in a group. Look to the FOREGROUP and POSTGROUP attributes for more information about groupings.

  • MINCOL

    The MINCOL attribute lets you specify the minimum entries that may appear in a grouping. HTMLAttribute wil not place less than this number of entries in a group unless it must (perhaps because there aren't enough entries). Look to the FOREGROUP and POSTGROUP attributes for more information.

  • FORELIST

    The FORELIST attribute lets you specify some text to appear prior to displaying the index. Typically, this might be used to specify a <TABLE> tag or something.

  • POSTLIST

    The POSTLIST attribute lets you specify some text to appear after the entries in the index have been created. Typically, this might be used to close your <TABLE> tag with a </TABLE> or something.

  • FOREGROUP

    The FOREGROUP attribute lets you specify some text to appear prior to a group of entries. Typically, this allows you to establish rows with the <TABLE> tag. You might, for example, set this to "<TR>".

  • POSTGROUP

    The POSTGROUP attribute lets you specify some text to appear after a group of entries. Typically, this allow syou to establish rows with the <TABLE> tag. You might, for example, set this to "</TR>".

  • FOREENTRY

    The FOREENTRY attribute lets you specify some text to appear prior to an entry.

  • POSTENTRY

    The POSTENTRY attribute lets you specify some text to appear after an entry.

  • ENTRYTEXT

    The ENTRYTEXT attribute lets you specify how you want the entry to appear. It understands two variables:

    • fileName

    • title

    If you use %fileName%, the entry's file name will replace the variable.

    If you use %title%, the text appearing in the entry's <TITLE> tag will replace this.

    Likely, more variables will be supported in the future.

Sample template file

Here's an example template file:


<html>

<head>
<title>Archived Happenings</title>
</head>

<body>
<h1>Archived Happenings</h1>
<P>Here's what's been happening in times past:</P>

<ARCHIVE latest="LATEST"
outbound="index.html"
regexp="......+\\.html$"
sortorder="filename"
>

<ARCHIVE index
regexp="......+\\.html$"
sortorder="filename"
forelist="<table>
"
postlist="</table>
"
maxcol="5"
mincol="3"
reverse
foregroup="<tr>
"
postgroup="</tr>
"
foreentry="<td> "
entrytext="<a href=\"%filename%\">%title%</a>"
postentry="</td>
">

</body>

</html>

This template will replace the first <ARCHIVE> tag with the text found inside a tag of class id 'LATEST' in the first file it finds matching the regular expression '......+\\.html$' sorted according to file name in the current directory of execution (since ENTRYPATH was not specified).

The second <ARCHIVE> tag will be replaced with a list of files matching the same regular expression as before, sorted in the same way as before. It will write the entries out in a table with anywhere from 3 to 5 columns of entries (if possible). The entries will appear as a link to the file name, displayed as the text in the TITLE tag of the file.

The results will be filed to 'index.html' in the current directory, as found in the first <ARCHIVE> tag.

Back a page