Kindle highlights to wiki
-------------------------

Last edited: $Date: 2018/01/20 15:21:34 $

## Kindle e-reader

I do love my Kindle Paperwhite, I have the 2015 version, without ads
(the version with ads is not available in our country). Of course
Amazon is not a branch we should endorse, but I seriously like this
device.

All the Amazon Kindle e-readers allow you to highlights text and to
take notes. Every time you highlight a piece of text, this text is
copied into a file, called "My Clippings.txt". As we can see from
the space in the filename, this was probably build by some coder who
lives in the DOS- or Windows world.

When you take notes, the note is copied to the "My Clippings.txt"
file.

Every highlight and every note has a reference to the book and the
location in it, where it was created.


## Intelligence has to come from the parser

The software on the Kindle that stores your highlight and notes
in the "My Clippings.txt" file, is not very sophisticated ("cough").
All the records are stored in historical order, the oldest records
at the top of the file and the newest at the bottom. This means that
records belonging to different books can be mingled.

Creating the highlights is not always without errors, it can be
hard to get the boundaries right the first time. When you get a
boundary wrong (missing a word or a line for example), then you
can delete the highlight and create it again. Sometimes it requires
several tries before getting the right boundaries. Because of the
"sophistication" of the software on the Kindle, this results in
multiple records, each try results in a record.

So the parser has to go through the "My Clippings.txt" file, and
bundle the records per book. Also it has to ditch the deleted
highlights and only keep the good ones. As far as I can tell,
this has to be done by keeping the one nearest to the bottom.


## awkiawki as a personal wiki

Awkiawki is a wiki that uses awk as cgi. This is a very fast wiki,
that even performes great on a Raspberry Pi.

Awkiawki is a very simple wiki, that uses CamelCase to create links
from one page to the other.

My awkiawki has become an awesome personal knowledge base and my
poor mans Zettelkasten implementation, which is becoming more
valuable each and every day.

Awkiawki stores it content as flat text files in the Markdown
format. Whenever a page is requested, the cgi-script converts it on
the fly into hmtl. During this conversion, CamelCase words are
converted to html-links if a wiki file with a corresponding name
exists, and if not, a link is created to allow the user to create a
new wiki page.

Awkiawki only accepts alfabetical characters in the CamelCase
filenames, and not numerical characters. So Catch22 will not be a
legitimate filename.


## Script to convert "My Clippings.txt" file to wiki pages

My aim is some script with the following result.

* In my awkiawki I have a pointer to a file called
 "KindleHighlights". Remember that awkiawki uses CamelCase to
  generate links to other files.

* The conversion script creates this file and adds a link to the
 page per book in this file.

* The conversion script creates for every book a seperate file, with
 all the highlights and notes from that book, ordered by location.

* Each highlight and note has an unique anchor, to which references
 can be made on other wiki-pages.


The file "KindleHighlights" functions as an index page to the
pages per book. To link the index page to these pages, the filename
of the pages has to be in CamelCase. Unfortunately, awkiawki only
accepts alfabetical characters in the CamelCase filenames, and not
numerical characters. So Catch22 will not be a legitimate filename.

The file "My Clippings.txt" starts each highlight and each note with
a line containing the title of the book and the name of the author.
The name of the author is between round brackets. This is an
example:

  As a Man Thinketh (James Allen)

Every page per book ends with a back-reference to the
"KindleHighlights" index page, so that I can easely jump between the
individual book-pages and the index of all the books with highlights
or notes.


## awk for fun and profit

Although Perl is what comes to mind when one wants to create such a
script, I decided to give awk a try, just because that seemed like
fun.

I wrote an awk script that can do the conversion. I want the index
page page to be sorted on the book title and the pages per book to
be sorted by location. Unfortunately, we have to use gawk for this,
because the standard awk doesn't has a function for sorting arrays.


## Awk and redirection

Although awk has a syntax like

  print var > (target)

and

  print var >> (target)

the redirection of the output works different from that in
shell scripts.

When one performs three writes after each other, from the
same awk session. this will result in three lines in the target
file, even when > (target) is used. The syntax with the
single greater-than character means, that the target file
will be overwritten at the first write, and the consecutive
writes will be appended to that.


## Csh script and awk script

The shell script and the awk script, that creates the index
page and the wiki pages per book, can be find here:

  gopher://box.matto.nl/1/scripts/


$Id: kindlehighlights2wiki.txt,v 1.4 2018/01/20 15:21:34 matto Exp $