Creating Table of Contents for static web pages using sed, make, and perl

Earlier, I showed you how I created a multi-page navigation section for static web pages.

But this system has some flaws. I needed better navigation within the web page. I also needed a better way to keep track of my Google ads. And I needed a better automation.

Adding a table of contents using hypertoc

I looked around for a program that would do what I wanted, and I installed hypertoc(1) which is part of the perl HTML::GenToc package.  You may have the libhtml-gentoc-perl package available on your system. If not, it’s easy to install:

Installing hypertoc

wget http://search.cpan.org/CPAN/authors/id/R/RU/RUBYKAT/HTML-GenToc-3.20.tar.gz
tar xfz HTML-GenToc-3.20.tar.gz
cd HTML-GenToc-3.20
perl Build.PL
./Build
./Build install

There are a lot of options with hypertoc(1). Here is a section of shell code I used to generate the table of contents. I used hypertoc(1) as a filter, as I don’t like in-line editing of files. I passed the input filename as an argument (the variable $IFILE), and I piped the modified file to standard output.

I used the string ‘<!–toc–>’ in my HTML page to mark where I wanted the table of contents to be inserted.

Here are the key arguments to hypertoc(1) as I used them:

ARGS="--toc_entry 'H1=1' --toc_end 'H1=/H1' --toc_entry 'H2=2' --toc_end 'H2=/H2' --toc_entry 'H3=3' --toc_\
end 'H3=/H3' --toc_entry 'H4=4' --toc_end 'H4=/H4' --toc_entry 'H5=5' --toc_end 'H5=/H5'"
# The string !--toc-- is used as a marker to insert the new Table of Contents 
TOC="--toc_tag '!--toc--' --toc_tag_replace"
eval hypertoc $ARGS $TOC --make_anchors --make_toc --inline --outfile - $IFILE

This will look as all of the <h1> to <h5> sections, and create a list of links at the top of the page that points to the sections below. There is a problem with this, but I will address this later.

Inserting Google ads into a web page automatically

So I have a section that makes it easier to navigate to other pages, and a second one that navigates to the sections on the same page.  Intra-page and inter-page navigation is done. The next thing I wanted to do was to make it easier and cleaner to add Google Ads to a web page.    I store my ads in the folder ./Ads/GoogleAd1 and ./Ads/GoogleAd2

So now my static pages  have the sample structure like the one below:

<!-- INCLUDE Navigation -->
<div id="centerDoc">
<h1>Title</h1>
<!-- Insert an ad -->
<!-- INCLUDE GoogleAd1 -->
<!-- Insert my table of contents here -->
<!--toc-->
<h2>More HTML code here</h2>
....
<!-- insert a second ad -->
<!-- INCLUDE GoogleAd2 -->
<p>My blog is <a href="http://BLOG">here</a>

The lines marked in blue are special – and will be modified by my ‘include’ script below. This looks much cleaner, and it’s easier to keep track of which ad is inserted, and where, as a name is used instead of cutting and pasting a blog of text.

Adding a link back to the top of the Table Of Contents

One thing I liked about the troff2html program is that it added a link in each subsection to the top of the page where the Table of Contents is located. I wanted to add this capability.

I used a sed script that modifies the output of hypertoc(1). The key sections are below

# Quick and dirty way to add a way to get back to the Toc from an Entry 
# 1) put a marker in the beginning of the ToC 
 s/<h1>Table of Contents/<h1><a name=\"TOC\">Table Of Contents/ 
# 2) Add a link back to the ToC from each entry 
 s:\(<h[1234]>\)<a name=:\1<a href=\"$OFILENAME#TOC\" name=:g

hypertoc outputs “Table of Contents”, so I search for this and add the <a name=”TOC”> to this section. I also searched for all of the subsections, and when you click on the subsection name, you go back to the top.

Here is the improved “include” script

#!/bin/sh 
#This script modifies HTML pages staticly, using something similar 
# to the "#INCLUDE" C preprocessor mechanism 
INCLUDE=${1?'Missing include file'}
shift
IFILE=${1?'Missing input file'}

OFILE=`echo $IFILE | sed 's/\.in$//'`
# get the name without the path 
OFILENAME=`echo $OFILE | sed 's:.*/::'`
if [ "$IFILE" = "$OFILE" ]
then
 echo input file $IFILE same as output file $OFILE - exit
 exit
fi

blog=grymoire.wordpress.com
ARGS="--toc_entry 'H1=1' --toc_end 'H1=/H1' --toc_entry 'H2=2' --toc_end 'H2=/H2' --toc_entry 'H3=3' --toc_\
end 'H3=/H3' --toc_entry 'H4=4' --toc_end 'H4=/H4' --toc_entry 'H5=5' --toc_end 'H5=/H5'"
# The string !--toc-- is used as a marker to insert the new Table of Contents 
TOC="--toc_tag '!--toc--' --toc_tag_replace"
eval hypertoc $ARGS $TOC --make_anchors --make_toc --inline --outfile - $IFILE| \
sed "/<!-- INCLUDE [Nn]avigation/ r $INCLUDE 
# Change BLOG URL 
 s/BLOG/$blog/g 
# Quick and dirty way to add a way to get back to the Toc from an Entry  
# 1) put a marker in the beginning of the ToC 
 s/<h1>Table of Contents/<h1><a name=\"TOC\">Table Of Contents/ 
# 2) Add a link back to the ToC from each entry 
 s:\(<h[1234]>\)<a name=:\1<a href=\"$OFILENAME#TOC\" name=:g 
# Include ad named 'GoogleAd1' 
 /INCLUDE GoogleAd1/ { 
 r Ads/GoogleAd1 
 } 
# and GoogleAd2
 /INCLUDE GoogleAd2/ {
 r Ads/GoogleAd2
}
" >$OFILE

Automating everything with a Makefile

As before, my web pages have the name Example.html.in, and the output of the include script is Example.html

I created a rule that will automatically make the *.html files. Here is the Makefile I have in each of my subdirectories:

pages = $(wildcard *.html)
all: $(pages) 
$(pages): %.html: %html.in
    ../include ../navigation.nav $<

 

And here is the top level Makefile:

pages = $(wildcard *.html)
SUBDIRS = Unix Security Deception Spam EG Postscript Privacy
all: include navigation.nav $(pages) $(SUBDIRS)
# Handle directories recursively 
.PHONY: subdirs $(SUBDIRS)
subdirs: $(SUBDIRS)
$(SUBDIRS):
 $(MAKE) -C $@
# Building a page automatically 
$(pages): %.html: %.html.in
 ./include navigation.nav $<
install:  myCSS.css all
 cp *.html *.css /var/www/html
 cp Unix/*.html *.css /var/www/html/Unix
 cp Security/*.html *.css /var/www/html/Security
navigation.nav: navigation.txt makenav.pl
 ./makenav.pl <navigation.txt > navigation.nav

You can see an example of a page generated using this code here

 

Advertisements
This entry was posted in Linux, Shell Scripting, System Administration and tagged , , , , , , , . Bookmark the permalink.

One Response to Creating Table of Contents for static web pages using sed, make, and perl

  1. Pingback: Generating website navigation using perl, sed, and make | The Grymoire

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s