Bourne Shell Case Study II.

Building Web Pages.

Shell scripts make it easy to extract information from files, reformat the data and build new files.

Recently I wanted to build "html" web pages from a comma delimited file created froma an excell spreadsheet.




  1. My original input looked like:
    $ head data
    ABBOTT,109901,276
    ABERNATHY,095901,884
    ABILENE,221901,19649
    ACADEMY,014901,965
    ADRIAN,180903,102
    AGUA DULCE,178901,342
    ALAMO HEIGHTS,015901,4054
    ALBA-GOLDEN,250906,669
    ALBANY,209901,579
    ALDINE,101902,45139
    


    The fields are separated by commas, The first field is the district name followed by the district number and the enrollment.

    Since there are 1044 districts, I don't want all of the districts on the same web page. I want break into 26 pages based on the first letter of the district name.



  2. What I want is a web page that looks like this with and index page that looks like this.



  3. The first step is to extract the district by first letter.
    for k in  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    do
      grep "^$k" data <$k.html
    done
    



  4. To change to html format we can use a sed script and sed.

    The sed script looks like:

    s/^/<tr>\
    <td align=left>
    s:,:</td>\
    <td align=center>:
    s:,:</td>\
    <td align=right>:
    s:$:</td>:
    

    Processing a line of linput produces the following line of output.
    $ head -1 data
    ABBOTT,109901,276
    
    $ head -1 data | sed -f sedscript
    <tr>
    <td align=left>ABBOTT</td>
    <td align=center>109901</td>
    <td align=right>276</td>
    



  5. Now the "here" document facility can be used to build the header and trailer html information for each page.
    Header information
    cat >$k.html <<EOF
    <HTML>
    <HEAD>
    <TITLE>TEXAS SCHOOL ENROLLMENTS
    </TITLE>
    </HEAD>
    <BODY BGCOLOR="#FFFFFF">
    <Center>
    <H2>TEXAS SCHOOL ENROLLMENTS
    </H2>
    <P>
    <HR>
    <table border=1>
    <TR>
    <TH>District Name</th>
    <TH>District Number</th>
    <TH>Total Enrollment</th>
    EOF
    


    Trailer Information
    cat >>$k.html <<EOF
    </table>
    <br>
    <HR>
    <center>
    Updated $date
    </center>
    <P>
    </BODY>
    </HTML>
    EOF
    



  6. At the same time I could build the index page.
    cat >index.html <<FINI
    <HTML>
    <HEAD>
    <TITLE>TEXAS SCHOOL ENROLLMENTS
    </TITLE>
    </HEAD>
    <BODY BGCOLOR="#FFFFFF">
    <Center>
    <H2>TEXAS SCHOOL ENROLLMENTS
    </H2>
    <P>
    Please select school from the alphabetizied index.
    <P>
    FINI
    
    #
    # create index entry
    #
    cat <<FINI >>index.html
    <A HREF="$k.html">$k</A> |
    FINI
    
    cat <<FINI >>index.html
    <br>
    <HR>
    <center>
    Updated $date
    </center>
    <P>
    </BODY>
    </HTML>
    FINI
    



  7. Putting the pieces together we get:
    #!/bin/sh
    #
    # Get the current date
    #
    date=`date '+%B %d, %Y'`
    #
    #
    # Create index Heading
    #
    cat >index.html <<FINI
    <HTML>
    <HEAD>
    <TITLE>TEXAS SCHOOL ENROLLMENTS
    </TITLE>
    </HEAD>
    <BODY BGCOLOR="#FFFFFF">
    <Center>
    <H2>TEXAS SCHOOL ENROLLMENTS
    </H2>
    <P>
    Please select school from the alphabetizied index.
    <P>
    FINI
    #
    # Process each Letter entry
    #
    for k in  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    do
    #
    #
    # For each loop create a file header, process the schools, create a trailer
    # Create heading
    #
    cat >$k.html <<EOF
    <HTML>
    <HEAD>
    <TITLE>TEXAS SCHOOL ENROLLMENTS
    </TITLE>
    </HEAD>
    <BODY BGCOLOR="#FFFFFF">
    <Center>
    <H2>TEXAS SCHOOL ENROLLMENTS
    </H2>
    <P>
    <HR>
    <table border=1>
    <TR>
    <TH>District Name</th>
    <TH>District Number</th>
    <TH>Total Enrollment</th>
    EOF
    #
    # process schools
    #
    grep "^$k" data  | sed -f sedscript  >>$k.html
    #
    # tail
    #
    cat >>$k.html <<EOF
    </table>
    <br>
    <HR>
    <center>
    Updated $date
    </center>
    <P>
    </BODY>
    </HTML>
    EOF
    #
    # create index entry
    #
    cat <<FINI >>index.html
    <A HREF="$k.html">$k</A> |
    FINI
    #
    # end of letter loop
    #
    done
    #
    # Index tail
    #
    cat <<FINI >>index.html
    <br>
    <HR>
    <center>
    Updated $date
    </center>
    <P>
    </BODY>
    </HTML>
    FINI
    #
    # that's all
    #
    



  8. Running the script
    $ ls
    data       little       sedscript
    

    $ little
    

    $ ls
    A.html      G.html      M.html      S.html      Y.html
    B.html      H.html      N.html      T.html      Z.html
    C.html      I.html      O.html      U.html      data
    D.html      J.html      P.html      V.html      index.html
    E.html      K.html      Q.html      W.html      little
    F.html      L.html      R.html      X.html      sedscript
    

    Now I can view my files with a browser.

© Allan Kochis Last revision 1/3/2000