October 19, 2015

Streamlining huge sets of documentation

Documentation, everybody wants to read it, nobody likes to write it. When you have it there never is any order to it. This post is all about streamlining documents so everything looks like you had a plan to begin with.

Usually, documentation are created using several sources, html, pdf, doc etc. There are always many requirements for the actual documents, but ever requirements for the actual file naming conventions, and how do you treat multiple formats?

I suggest pdf as the single publishing source you should aim for. Pdf because it is readable basically anywhere, and because if your document is in some obscure format, you can usually find a printer driver at any system and print your obscure document as pdf.

But before we start changing to the output format let's have a look at the actual fie naming conventions. These are very important as they serve as the for searching through the documentation once it is located on a disk.

For starters I'd like the file names to look similar even though they are utterly obscure. I'm choosing the encoding:

authorname authorsurname - document title.extension

Once the formatting is the same we can always have a look at changing the order of the author,name etc. The important thing is that all documents follow the same conventions because this makes it possible to work on a huge set of data.

#/bin/bash
files=("*")

for f in $files; do 
    test ! -f "$f" && echo "$f is not a file - skipping" && continue

    lc=`echo "$f" | tr '[A-Z]' '[a-z]'`

    bad=`echo "$lc" | grep -v " - "`
    if [ -n "$bad" ]; then 
echo "yikes -> $bad";
    else
author=`echo $lc | sed 's/\(.*\) - \(.*\)/\1/'`
title=`echo $lc | sed 's/\(.*\) - \(.*\)/\2/'`
dyt=`echo $author | sed 's/\(.*\), \(.*\)/\2 \1/'`
n=`echo $dyt - $title`
if [ "$f" != "$n" ]; then
   mv -i "$f" "$dyt - $title"
fi
    fi
done

The above snipplet changes the names of the files in your folder to lower case and swaps some of the document formatting to format above. All files that are not touched are echoed with Yikes to your terminal and multiple filenames are queried for action. This handles "with some luck" most of your files. Once your directory is edited for the final changes you'll need to convert your stuff.

Depending on your document source formats, you'll need to install the appropriate converter tools. The following snipplet is for djvu to pdf conversion. You can add more converters:

#/bin/bash
files=("*")

for f in $files; do
    extension=`echo "$f" | cut -d '.' -f2`
    n=`echo "$f" | cut -d '.' -f1`
    if [ "$extension" = "djvu" ]; then
ddjvu -format=pdf "$f" "$n.pdf"
    fi
done

That's all for the conversion process, feel free to edit and change the snipplets to suit your needs.