From: [email protected]
Date: 2018-09-27
Subject: Casual Programming at Work for Fun and Profit

The  thought  of countless office workers slaving over spreadsheets
and performing mindless work makes me sad.  These machines  that  I
love  were  supposed  to free us from menial labor.  Instead, there
are people out there copying something  from  one  spreadsheet  and
pasting  it  into another, over and over, ad nauseum.  It's because
the tools are incomplete, a consequence of trying to tie every con-
ceivable operation to a button on the screen.

You  don't  need  to be a computer scientist to process data.  With
casual programming, we can accomplish a lot without the operational
complexity  of  "real" software development.  If we take some basic
precautions, we can save time, have fun, and look like a total wiz-
ard without appreciable risk.

Here are some examples of things I've done on a locked-down Windows
10 laptop:

 * Translated an Excel timeline document to a Google Gantt
   chart[1].

 * Automated the process of verifying that developers are
   correctly billing their time.

 * Automated status reports that include financial calculations.

 * Identified duplicate orders in a spreadsheet based on the
   shipping address using Perl.

 * Automatically added weekly tasks to my todo list[2] every
   Monday.

 * Condensed multiple daily journal[3] entries with a Perl
   script so that I can paste them into my timesheet[4].

 * Fetched image URLs for a list of products in a spreadsheet
   using bash and curl.

 * Added child elements to an XML file with awk.

 * Automated the download of 10,000 unique codes from an online
   GUID generator with bash and curl.

 * Used a Vim macro to restructure content that was mangled by
   another application.

                            The Tools

To do all of this, I use Git for Windows[5].  I don't actually  use
git itself, just the tools that come with Git for Windows.  Even on
locked-down systems, I haven't had trouble  installing  it.   Note:
Run  the  installer  from  the  file explorer, not through your web
browser's download manager.  My guess is that  it's  an  executable
zip  file,  and extracting an archive doesn't require admin access.
Git for Windows is based on MSYS2[6].  which includes MinGW, a col-
lection of GNU utilities that have been ported to Windows.  Git for
Windows is updated about every 12 weeks, so be sure  to  keep  your
installation up to date.

These  tools will let you do a lot of cool tricks.  The key feature
of this environment is the  *composability  of  trivial  programs*.
It's  not as easy as clicking a button, but you won't need to break
out a calculator, either.  This is a great toolset for working with
small  datasets.   I believe most Excel spreadsheets fall into that
category.

Key tools include:

 * Vim[7] and nano[8] for text editing

 * Awk[9] and Perl[10] for data processing

 * GNU coreutils[11] like curl, sed, and grep, also handy for
   data processing

 * A bash[12] shell emulator to hold everything together

Other features that make this environment easy to use:

 * It uses the Windows filesystem, so you have access to all
   your files and folders.

 * Integration with the Windows clipboard lets you can copy
   and paste bits of text to and from.

 * Very lightweight compared to other Windows applications so
   it's fast, even on older machines.

To work with Excel data in the Git  for  Windows  environment,  you
will  need to save the file as "Text (Tab Separated)".  You can al-
ways paste the resulting data back into your original file.  I rec-
ommend  tab-separated because it is very unlikely that you will en-
counter a tab character embedded in an Excel document.  However, it
is quite easy to embed a newline character in a cell using Ctrl-En-
ter and this will be hard to compensate for in a script.   If  your
Excel document has cells with unstructured data, consider excluding
those columns from the tab-separated version of the file.

I also recommend installing Pandoc[13].  I hate messing around with
styles in Word.  With Pandoc, I can compose using a plain text edi-
tor.  I use markdown[14] to indicate headers, create tables,  quote
text,  and  bold  and  italicize text.  Pandoc will then convert my
markdown into a nice-looking Word document.  I have also  converted
markdown  into  HTML, HTML into plain text, and Word into markdown.
The key advantage for me is that I get to edit documents in Vim, my
favorite  editor,  and then convert them to a document that is more
appropriate for my audience (usually Word).  It  also  brings  this
content into an environment with powerful automation capabilities.

                            Use Cases

I'll focus on Excel because I use it a lot at work and awk was made
to process structured data like you find in most  Excel  documents.
The  key  things  missing  from  Excel  are regular expressions and
scriptability.  For really trivial stuff, like summing  up  numbers
or  creating  pivot  tables, staying in Excel is probably your best
bet.  When things get complicated, using a tool like awk or sed can
save you a ton of time and soul-crushing, repetitive work.  This is
especially true of anything you will have to do more than  once  or
on a regular schedule.

Examples include:

 * Find rows that meet certain criteria and

 * change or add a column value based on a set of rules or,

 * copy the line to a new file and/or,

 * remove them from the file or

 * find matching information and another file and pull it in.

You  can also perform find-and-replace operations that are too com-
plicated or fuzzy for Excel's built-in find-and-replace function.

Basically, if you find yourself eyeballing every row in  a  spread-
sheet  and applying some kind of rule, you should strongly consider
creating a script to do it for you.  Not only will this  be  faster
in the long run[15], but less prone to human error.  It's also just
more fun and it elevates your thinking.


References:

[1]: https://developers.google.com/chart/interactive/docs/gallery/ganttchart
[2]: /work/2017/09/17/information-gathering-management.html#to-do-list
[3]: /work/2017/09/17/information-gathering-management.html#journal
[4]: /play/2017/11/10/journal-to-timesheet.html
[5]: https://gitforwindows.org/
[6]: https://www.msys2.org/
[7]: https://www.vim.org/
[8]: https://www.nano-editor.org/
[9]: https://www.gnu.org/software/gawk/manual/
[10]: https://www.perl.org/
[11]: https://www.gnu.org/software/coreutils/coreutils.html
[12]: https://www.gnu.org/software/bash/
[13]: https://pandoc.org/
[14]: https://daringfireball.net/projects/markdown/
[15]: https://xkcd.com/1205/