99 ways to program a hex, Part 23: C89, const correctness, assertive, system

* * * * *

99 ways to program a hex, Part 23: C89, const correctness, assertive, system
calls, full buffering, lookup table

> From: Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
> To: Sean Conner <[email protected]>
> Subject: Boston: Well, since you're in the land of non-portability …
> Date: Sun, 29 Jan 2012 05:55:00
>
> > static void hexout(char *dest,unsigned long value,size_t size,const int
> > padding)
> > {
> > assert(dest != NULL);
> > assert(size > 0);
> > assert((padding >= ' ') && (padding <= '~'));
> >
> > dest[size] = padding;
> > while(size--)
> > {
> > dest[size] = (char)((value & 0x0F) + '0');
> > if (dest[size] > '9') dest[size] += 7;
> > value >>= 4;
> > }
> > }
> >
>
> You're also in the land of ASCII (American Standard Code for Information
> Interchange) specificness. Couldn't you make that:
>
> dest[size] = "0123456789ABCDEF"[value & 0x0f];
>
> And then not be tied to ASCII? You could also then switch out that array
> pointer if you wanted to get a mix of uppercase, lower case depending on
> what you need.
>
> -MYG
>

I initially reject the idea of doing this. My reasoning? The code itself is
already non-portable, being restricted to a Posix [1]-like system. So what's
one more non-portable item on the list? The sequence if (dest[size] > '9')
dest[size] += 7 is around six (for a lot of architectures that aren't RISC
(Reduced Instruction Set Computer) based) to twelve bytes (RISC systems) in
size, and now you want to add an additional 16 bytes? [He asks, working from
a system with a few gigabytes of RAM (Random Access Memory) —Editor] [Shut
up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet
to come across a non-ASCII based computer system.

Yes, there are a few. Baudot code [2] perhaps being the oldest and perhaps,
the oddest one. Then there are the 6-bit character encoding schemes [3] and
Radix-50 [4], which pack multiple 6-bit characters per “word” of storage
(where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied
from system to system. And let's not forget EBCDIC (Extended Binary Coded
Decimal Interchange Code) [5], one of about six nearly identical, but
maddendly different, encoding schemes developed by IBM [6]. All of these were
developed for machines in the 60s, but ASCII won out in the end, being the
most widely used and at the core of Unicode [7].

So I asked on a mailing list of classic computer enthusiasts:

> From: Sean Conner <[email protected]>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 11:21:02 -0500
>
> A friend recently raised an issue with some code I wrote (a hex dump
> routine) saying it depended upon ASCII and thus, would break on non-ASCII
> based systems (and proposed a solution, but that's beside the issue here).
> I wrote back, saying the code in question was non-portable to begin with
> (since it depended upon read() and write()—it was targetted at Posix based
> systems) and besides, I've never encountered a non-ASCII system in the
> nearly 30 years I've been using computers.
>
> So now I'm wondering—besides Baudot, 6-bit BCD (Binary Coded Decimal) and
> EBCDIC (Extended Binary Coded Decimal Interchange Code), is there any other
> encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any
> systems using those encoding schemes AND have a C compiler available?
>
> -spc (Or can I safely assume ASCII and derivatives these days?)
>

I figure if anyone knew the answer, these people would (many of them not only
use computers like the PDP-10 [8], but use them as heaters during the winter
months).

The answers were fascinating.

> From: "Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:18:55 -0500
>
> IBM has a very handy page on C compatibility with EBCDIC system services:
>
> http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html [9]
>

> From: "Dave" <XXXXXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: RE: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 19:33:06 -0000
>
> Please consider other character codes. An EBCDIC port of GCC is alive and
> well on several of the "legacy" operating systems (MVS, VM and Music) that
> run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs
> in ASCII (or whatever it uses to get more than 256 points in a code page)
> many zLinux sites also have the zVM hypervisor, which includes an optional
> EBCDIC C compiler. Having ported the BREXX interpreter to this environment
> I was stung by the fact that the original author had made assumptions about
> character ordering that are not true on an EBCDIC platform.
>

> From: Phil Budne <XXXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:00:52 -0500
>
> See “IBM libascii functions for z/OS UNIX System Services”
>
> http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html [10]
>
> Overview

> The libascii functions are integrated into the base of the Language
> Environment. They help you port ASCII-based C applications to the
> EBCDIC-based z/OS UNIX environment.
>

> From: Nemo <XXXXXXXXXXXXXXXX>
> To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
> Subject: Re: C compilers and non-ASCII systems
> Date: Tue, 31 Jan 2012 13:32:06 -0500
>
> z/OS is not only POSIX, it is UNIX (see
> http://www.opengroup.org/openbrand/register/brand3470.htm [11]).
>

Oh.

Well then …

I figure I would then try Mark's suggestion (and several other people on the
mailing list suggested the same thing) and at least time the change to see if
it's a worthwhile change for such odd-looking, but legal, C code.

> /*************************************************************************
> *
> * Copyright 2012 by Sean Conner. All Rights Reserved.
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public License
> * as published by the Free Software Foundation; either version 2
> * of the License, or (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> * GNU General Public License for more details.
> *
> * You should have received a copy of the GNU General Public License
> * along with this program; if not, write to the Free Software
> * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> *
> * Comments, questions and criticisms can be sent to: [email protected]
> *
> *************************************************************************/
>
> /* Style: C89, const correctness, assertive, system calls, full buffering */
> /* lookup table */
>
> #include <stdlib.h>
> #include <string.h>
> #include <errno.h>
> #include <assert.h>
>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> #define LINESIZE 16
>
> /********************************************************************/
>
> extern const char *sys_errlist[];
> extern int sys_nerr;
>
> static void do_dump (const int,const int);
> static size_t dump_line (char **const,unsigned char *,size_t,const unsigned long);
> static void hexout (char *const,unsigned long,size_t,const int);
> static void myperror (const char *const);
> static size_t myread (const int,char *,size_t);
> static void mywrite (const int,const char *const,const size_t);
>
> /********************************************************************/
>
> int main(const int argc,const char *const argv[])
> {
> if (argc == 1)
> do_dump(STDIN_FILENO,STDOUT_FILENO);
> else
> {
> int i;
>
> for (i = 1 ; i < argc ; i++)
> {
> int fhin;
>
> fhin = open(argv[i],O_RDONLY);
> if (fhin == -1)
> {
> myperror(argv[i]);
> continue;
> }
>
> mywrite(STDOUT_FILENO,"-----",5);
> mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
> mywrite(STDOUT_FILENO,"-----\n",6);
>
> do_dump(fhin,STDOUT_FILENO);
> if (close(fhin) < 0)
> myperror(argv[i]);
> }
> }
>
> return EXIT_SUCCESS;
> }
>
> /************************************************************************/
>
> static void do_dump(const int fhin,const int fhout)
> {
> unsigned char buffer[4096];
> char outbuffer[75 * 109];
> char *pout;
> unsigned long off;
> size_t bytes;
> size_t count;
>
> assert(fhin >= 0);
> assert(fhout >= 0);
>
> memset(outbuffer,' ',sizeof(outbuffer));
> off = 0;
> count = 0;
> pout = outbuffer;
>
> while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
> {
> unsigned char *p = buffer;
>
> for (p = buffer ; bytes > 0 ; )
> {
> size_t amount;
>
> amount = dump_line(&pout,p,bytes,off);
> p += amount;
> bytes -= amount;
> off += amount;
> count++;
>
> if (count == 109)
> {
> mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
> memset(outbuffer,' ',sizeof(outbuffer));
> count = 0;
> pout = outbuffer;
> }
> }
> }
>
> if ((size_t)(pout - outbuffer) > 0)
> mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
> }
>
> /********************************************************************/
>
> static size_t dump_line(
> char **const pline,
> unsigned char *p,
> size_t bytes,
> const unsigned long off
> )
> {
> char *line;
> char *dh;
> char *da;
> size_t count;
>
> assert(pline != NULL);
> assert(*pline != NULL);
> assert(p != NULL);
> assert(bytes > 0);
>
> line = *pline;
>
> hexout(line,off,8,':');
> if (bytes > LINESIZE)
> bytes = LINESIZE;
>
> p += bytes;
> dh = &line[10 + bytes * 3];
> da = &line[58 + bytes];
>
> for (count = 0 ; count < bytes ; count++)
> {
> p --;
> da --;
> dh -= 3;
>
> if ((*p >= ' ') && (*p <= '~'))
> *da = *p;
> else
> *da = '.';
>
> hexout(dh,(unsigned long)*p,2,' ');
> }
>
> line[58 + count] = '\n';
> *pline = &line[59 + count];
> return count;
> }
>
> /**********************************************************************/
>
> static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
> {
> assert(dest != NULL);
> assert(size > 0);
> assert((padding >= ' ') && (padding <= '~'));
>
> dest[size] = padding;
> while(size--)
> {
> dest[size] = "0123456789ABCDEF"[value & 0x0f];
> value >>= 4;
> }
> }
>
> /************************************************************************/
>
> static void myperror(const char *const s)
> {
> int err = errno;
>
> assert(s != NULL);
>
> mywrite(STDERR_FILENO,s,strlen(s));
> mywrite(STDERR_FILENO,": ",2);
>
> if (err > sys_nerr)
> mywrite(STDERR_FILENO,"(unknown)",9);
> else
> mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
> mywrite(STDERR_FILENO,"\n",1);
> }
>
> /************************************************************************/
>
> static size_t myread(const int fh,char *buf,size_t size)
> {
> size_t amount = 0;
>
> assert(fh >= 0);
> assert(buf != NULL);
> assert(size > 0);
>
> while(size > 0)
> {
> ssize_t bytes;
>
> bytes = read(fh,buf,size);
> if (bytes < 0)
> {
> myperror("read()");
> exit(EXIT_FAILURE);
> }
> if (bytes == 0)
> break;
>
> amount += bytes;
> size -= bytes;
> buf += bytes;
> }
>
> return amount;
> }
>
> /*********************************************************************/
>
> static void mywrite(const int fh,const char *const msg,const size_t size)
> {
> assert(fh >= 0);
> assert(msg != NULL);
> assert(size > 0);
>
> if (write(fh,msg,size) < (ssize_t)size)
> {
> if (fh != STDERR_FILENO)
> myperror("output");
>
> exit(EXIT_FAILURE);
> }
> }
>
> /***********************************************************************/
>

It can't be that much faster, can it?

> [spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null
>
> real 0m0.468s
> user 0m0.450s
> sys 0m0.018s
> [spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null
>
> real 0m0.257s
> user 0m0.245s
> sys 0m0.012s
>

Almost twice as fast as what I thought was the fastest version already.

Ouch.

Several people (including Mark) mentioned that on modern CPUs, a branch
instruction is like hitting a brick wall.

Yes, it's quite apparent that that is true.

But this does give me an idea for removing one more [DELETED-brick wall-
DELETED] branch point …

* Part 22: C89, const correctness, assertive, system calls, full buffering
[12]
* Part 24: more lookup tables [13]

[1] http://en.wikipedia.org/wiki/POSIX
[2] http://en.wikipedia.org/wiki/Baudot_code
[3] http://en.wikipedia.org/wiki/BCD_(6-bit)
[4] http://en.wikipedia.org/wiki/DEC_Radix-50
[5] http://en.wikipedia.org/wiki/EBCDIC
[6] http://www.ibm.com/
[7] http://unicode.org/
[8] http://www.columbia.edu/cu/computinghistory/pdp10.html
[9] http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html
[10] http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html
[11] http://www.opengroup.org/openbrand/register/brand3470.htm
[12] gopher://gopher.conman.org/0Phlog:2012/01/30.1
[13] gopher://gopher.conman.org/0Phlog:2012/02/01.3

Email author at [email protected]