C News  Vol. 1  Issue 11                         Sept 15, 1988


       CHOOSING A MEMORY MODEL by Bill Mayne

       ABSTRACT:   The  meaning  of  the  "near",  "far",  and  "huge"
       keywords specifying pointer types and how these are related  to
       the  various memory models available to C programmers using the
       80x86 family of processors used in IBM and compatible  PCs  and
       their successors  is  explained.    A  simple  bench mark which
       illustrates the affect of memory model selection on  code  size
       and execution  time  is shown.  Coding examples show how to use
       preprocessor symbols and the  #if  directive  to  handle  cases
       where  source  code must be modified according the memory model
       in use.  The compilers used are Microsoft C (MSC) versions  4.0
       and 5.0 and Turbo C version 1.5.

            Based  on  an  understanding  of  pointer types and memory
       models, confirmed by the results of the bench  mark  guidelines
       for  the  selection  of the best memory model for a program are
       given.

       ACKNOWLEDGEMENT:   Thanks  to  Jerry   Zeisler,   who   sparked
       interest  in the subject of this article in a "conversation" on
       the C BBS and helped with the  bench  mark  by  compiling  them
       with Turbo  C.    Thanks  also  to Barry Lynch, editor of the C
       News and sysop of the C BBS for his  encouragement,  assistance
       with  file transfers, and running a fine BBS for the discussion
       of C related issues.

       1. INTRODUCTION

            The use of the "near", "far",  and  "huge"  keywords  when
       declaring  pointers  and  the selection of a memory model for a
       program written in C is a problem unique to  the  80x86  family
       of  processors  because these are related to the segment:offset
       addressing scheme  used   in   this   architecture.      Before
       discussing  the  advantages  and  disadvantages  of the various
       options available,  it  is  useful  to  briefly  describe  this
       scheme   for  those  not  already  familiar  with  the  machine
       language of  the  80x86  architecture.      Experienced   80x86
       programmers  may  wish  to skip section 1.1, which explains the
       various types of  pointers,  and  go  directly  to  1.2,  which
       explains memory  models.   All of the information from sections
       1.1 and 1.2 except a few historical asides and  other  comments
       is in the Microsoft C User's Guide.

       1.1 80x86 Addresses and Pointer Types

            The  80x86 family of processors used in IBM and compatible
       PCs are 16 bit processors which are descendents of the 8080  or
       its spin-off,  the Z80 used in earlier CP/M machines.  A 16-bit
       machine is  so  called  because  its  word  size  is  16  bits.
       Usually,  but  not  always,  the  size  of a pointer, word, and
       integer are  the  same.    The  80x86  family  is  one  of  the
       exceptions.   A 16 bit word can hold only 2**16 or 64K distinct
       addresses.  In 80x86 processors, as in  most  micros  and  many
       larger  processors,  the  unit  of  memory addressed is a byte.

       The address of  larger  units  like  words  are  given  by  the
       address  of  their  first  byte, which may be required to be on
       certain  boundaries  such  as  even   numbered   addresses   or
       multiples of  the  word  size.    (There are machines which use
       word addressing.     This   has   advantages   especially   for
       scientific/engineering "number  crunchers".   It is not so good
       for handling character data.)

            When the 8080 and Z80 first  came  out,  memory  was  much
       more  expensive and being able to address 64K was thought to be
       sufficient.  Another consideration was that limiting  addresses
       to  16  bits  made  the  construction  of  memories simpler and
       cheaper, and  early  microprocessors  were  imbedded  in  other
       systems  for  control purposes and did not need so much memory.
       The use of microprocessors for data processing applications  in
       micro computers  came  later.   The term "Personal Computer" or
       PC was not yet in common usage.

            As an additional historical note, mainframes of  the  time
       were  designed with much larger address spaces, but still small
       by the standards of today and the near future.    The  IBM  360
       and   370  which  had  32  bit  processors  only  used  24  for
       addressing, limiting addressable memory to 16M even  for  these
       large machines.    Already  some PCs using extended memory have
       that much.  By contrast, IBM mainframes in use today  have  the
       option  of  "extended  architecture"  or  XA, using 31 bits for
       addresses,  and  the  next  wave  called   "Enterprise   System
       Architecture" or  ESA  adds  another 12.  The amount of storage
       which can be addressed by 43 bits is truly  immense,  2**43  or
       about  8.8e12,  more than any main storage we are likely to see
       for a long time.   Even  so,  such  large  address  spaces  are
       actually  useful  since nearly all mainframes have the hardware
       and software to support virtual memory.

            When the price of memory came down  and  the  need  for  a
       larger    address    space   became   important,   but   16-bit
       microprocessors were still the norm, designers decided  to  use
       a segmented  memory  architecture.   Segments would contain 64K
       bytes each, so  the  relative  position  of  a  byte  within  a
       segment  could  still  be  represented  by  a  16 bit register.
       Extra registers were  added  to  address  the  segments.    For
       flexibility,  segments  were  allowed  to  start on any 16 byte
       "paragraph" boundary.  The 80x86 has registers  for  addressing
       4 segments.      They   are  CS  ("code  segment"),  DS  ("data
       segment"), SS ("stack  segment"),  and  ES  ("extra  segment").
       The names  reflect  the  way they are normally used.  A segment
       register  gives  the  address  of  the  first  paragraph  of  a
       segment, shifted  right 4 bits to fit within a 16 bit word.  To
       compute an actual address the segment is shifted  left  4  bits
       to  convert  it  to a byte address and then the offset is added
       to address any of the 64K  bytes  within  the  segment.    Most
       programs,  whether  written  in assembly language or a compiled
       language take  advantage  of  the  registers  and  make  things
       cleaner  by putting code, data and stack into separate segments
       addressed by the registers named for those purposes.    (It  is
       true  that  the  stack  contains data, and for that matter code
       itself is a kind of data,  but  the  conventional  distinctions
       are useful.)

            Normally  such details of machine architecture are only of
       concern to the assembly language programmer, but the  processor
       architecture does  influence  part  of  the compiler design.  C
       programmers who wish to understand the reasons for such  design
       decisions  and  in  particular architecture specific details of
       pointer type and memory model need to understand them.

            In machine language it  is  very  convenient  if  all  the
       memory  referenced  lies  within  a  segment  whose  address is
       already loaded in the appropriate register.  With  the  segment
       implied,  only  the  16  bits  of  the  offset must actually be
       included in the pointer.  Such a pointer  is  called  a  "near"
       pointer.   If,  on  the other hand, the code or data referenced
       does not all lie within a  64K  segment,  it  is  necessary  to
       specify  the segment as well as the offset, and a "far" pointer
       is required.  This is  significant  not  only  for  space  (far
       pointers   requiring  four  bytes  instead  of  two),  but  for
       performance.   At  the  machine  language  level  use  of   far
       pointers  requires  the  values  of  segment  registers  to  be
       swapped every time a different segment is accessed.   Not  only
       does  an actual pointer take up more space, so does the code to
       manipulate it.    The  extra  instructions  also  increase  the
       execution time.   And this applies not only to explicit pointer
       arithemetic,  but  to  array   references,   sometimes   global
       variable  references and in other situations involving implicit
       address calculations.

            Far pointers are used for data when a  program  references
       more  than  64K of data in total, but it is still convenient if
       each array or structure  fits  within  a  segment.    Then  the
       segment  address  used  can  be selected so that the address of
       all elements of the array or structure can be computed  without
       changing the  segment  part  of  the  address.    If  even this
       restriction must be  removed  a  "huge"  pointer  is  required.
       Huge  pointers are four bytes long, just like far pointers, but
       arithmetic  with  huge  pointers  requires  extra  steps   (and
       code).   Both  huge and far pointers follow the rule, common in
       microprocessors, of storing the  least  signicant  byte  first.
       The  first  word  of a far pointer is the offset and the second
       word is the segment.  This is important to  know  if  you  must
       construct  a  far pointer from its components, or decompose one
       into its segment and offset parts.  A macro in Listing 1  shows
       how  to do the former, and the library macros FP_SEG and FP_OFF
       do the latter.  By the way, the segment  and  offset  are  also
       each   stored   least   significant   byte   first,   but   the
       implementation of shifting and arithmetic in  C  take  care  of
       this for you and you don't need to be concerned about it.

            Since  offsets  into  code  are  not used in this way, the
       "huge" keyword applies only  to  pointers  to  data,  including
       array names.

            Assembly  language  programmers must be directly concerned
       with considerations such as those above.   C  programmers  have
       it  a  little  easier, since the C compiler automatically takes
       care of generating addresses and  swapping  segment  registers.
       Still,   the   programmer   concerned  with  efficiency  should
       understand what  is  required  and  control  the  selection  of
       pointer  types  to  produce  the most efficient code compatible
       with  other   goals   such   as   ease   of   programming   and
       maintainability.

       1.2 Memory Models

            The  term  "memory model" simply refers to the combination
       of defaults for code and  data  pointers.    Though  individual
       pointers  may  be explicitly declared "near", "far", or "huge",
       the memory model used is very important to program design.   It
       partly  determines the amount of code and/or data a program can
       address.  In addition, as the bench mark  in  a  later  section
       shows,  the  selection  of  a  memory  model may have important
       implications for the  size  and  efficiency  of  the  generated
       code.   As  a  rule,  it  is better to use the smallest pointer
       which will work.  Use "near" in preference  to  "far"  and  use
       "huge" only if absolutely necessary.

            In  the  small  memory  model,  both the code and data are
       addressed by near pointers.   Small  model  programs  are  thus
       limited  to  a total of 64K of code and 64K or data, or a total
       of 128K.  Most programs fit within this limit, and  it  is  the
       most efficient, so it is the default.

            Medium  model  programs use near pointers for data and far
       pointers for code.  They can therefore have only 64K  of  data,
       but  the  amount  of  code is limited only by available memory.
       The medium model is preferred by the integrated environment  of
       Quick  C,  but  is  otherwise  not  often  useful  for hobbyist
       programmers.  It takes a rather large program to exceed 64K  of
       code,  and  most  that  do probably also exceed 64K of data and
       thus need the large or huge model.  However,  since  references
       to  data  are executed much more frequently than far references
       to  code  the  medium  model  does  have  quite  a  performance
       advantage  over  large  in  those  cases  where it does fit the
       requirements.

            Compact model programs use far pointers for data and  near
       pointers for  code.    This  model  is  good for programs which
       allocate a lot of data, but which have less than 64K  of  code.
       A  common example would be a simple editor which stores a whole
       file in memory as an array or linked list.

            The advantage of the compact model over  the  large  model
       is  usually  less  than the advantage of medium over large, but
       the choice is  almost  always  between  compact  and  large  or
       between  medium  and  large,  hardly  ever  between compact and
       medium.

            Large model programs use far pointers for  both  data  and
       code.   They can have any amount of code and/or data which will
       fit in memory, in any combination.   The  only  restriction  is
       that individual arrays or structures cannot exceed 64K.

            The  huge  model  uses  far  pointers  for  code  and huge
       pointers for data and is thus restricted only by the amount  of
       storage available.    It  is  also  the least efficient, and is
       rarely needed.

            The tiny memory model, which is an  option  with  Turbo  C
       but not  with  Microsoft  is  similar  to small.  Both code and
       data  pointers  are  near  pointers,  but,  in  addition,   all
       segments  are  assumed  to  be the same, that is the total data
       and code is restricted  to  64K.    This  might  yield  smaller
       and/or  faster  code  in  some  cases,  if  the  compiler  took
       advantage of it.  In the  simple  bench  mark  given  below  no
       significant difference was found.

            Another   important   design  consideration  is  that  the
       library routines will assume the  default  types  according  to
       the memory  model in use.  Under MSC release 4.0 there is a set
       a  libraries  for   each   memory   model,   and   the   linker
       automatically  selects  the set matching the .OBJ files linked.
       MSC 5.0 may be installed with  combined  libraries,  but  there
       are  still  separate  versions  of  library  routines  for each
       installed memory model.  (Mixing memory models  and  even  more
       exotic  options  are possible, but such advanced topics are not
       covered here.)

            For example, memcpy() will expect both  pointer  arguments
       to  be  either  near  or  far pointers, according to the memory
       model in use.  If it is necessary  to  use  a  far  pointer  to
       reference  a  block  of  memory to be copied in a program which
       otherwise uses near pointers an alternative must  be  provided,
       either  in  line  or  by a specially written function which has
       different name.  The  coding  example  in  Listing  1  shows  a
       simple but  realistic  case  in  which  this is necessary.  The
       function  cmdargs()  needs  to  build  a  far  pointer  to  the
       unparsed  command  line arguments in the program segment prefix
       and use this to copy the argument string to a  buffer  supplied
       by the  calling  program.  If the source code is compiled using
       the small or medium memory model memcpy() cannot be used.    In
       that case  in  line  code is selected.  The decision is made at
       compile  time  by  testing  the  preprocessor  variables  which
       identify the  memory  model.   Since the symbol which tells the
       preprocessor that the compact, large, or huge model is  in  use
       is  only  defined when using MSC the version with in line code,
       which will actually work with any memory model is  the  default
       (the #else case.)

       2. GUIDELINES FOR MEMORY MODEL SELECTION

            Many  C  programmers  find the selection of memory model a
       confusing or even mysterious issue.  The  default  small  model
       is  sufficient  most  of  the  time,  so  beginners can put off
       having to consider memory models at all.   But  there  comes  a
       time  as  programs  and/or the quantity of data grow that other
       models are necessary.  Rather than take the  coward's  way  out
       and  simply  resort  to using large or huge all the time, which
       some have done, the wise programmer should understand  all  the
       issues and pick the best memory model for the job.

            Even   in   this   age  of  cheap  hardware  and  abundant
       resources, it may makes sense to make the best choice  you  can
       to minimize  the  use  of  resources.  A smaller .EXE file will
       obviously load faster, and  for  many  programs  load  time  is
       significant,  especially  if  you  are  loading  from  a floppy
       disk.  Also, with 360K floppies  keeping  the  .EXE  file  size
       down  may  make  the  difference between being able to keep the
       program and data all on one floppy.  Looking at it yet  another
       way,  it  may  make  the difference between being able to put a
       frequently used program in a RAM disk or having to load from  a
       hard disk  or  worse yet a floppy.  And needless to say, if you
       want to either make your program resident or shell to DOS  from
       it it  is  worthwhile to conserve both code and data space.  If
       nothing else, keeping the code size down leaves more  room  for
       data, and you never know when you may need it.

            Most  of  the time, the choice comes down to selecting the
       model which the program requires.  The  main  purpose  of  this
       article  is  to  help users avoid erring on the side of caution
       by automatically going to the large model as soon as  they  run
       out of space with small.

            Rarely,  performance  considerations  may  be so important
       that  an  advance  determination  of  program  design   for   a
       particular  model  is  worthwhile,  and in that case it is even
       more  important  to  have  a  good  idea  of  the  trade   offs
       involved.

       2.1 Determining the Minimum Model Required

            Assuming  you  are  not willing to design a program around
       the  choice  of  memory  model,  the  problem  comes  down   to
       selecting  a  memory  model  for  a  program  which  is already
       designed and possibly coded.  As noted in 1.2, the best  choice
       is the one which uses the smallest pointers which will do.

       2.1.1 Code Pointer Requirements

            The  size  of  code  pointer required is easy to determine
       and may constrain the choice of memory model.   If  a  program,
       counting  all library functions will fit in 64K or less of code
       space, use  the  small  or  compact  model,  otherwise  medium,
       large, or huge.

            The  code  part  of  most  small programs obviously fit in
       64K.  For extremely large  programs  it  may  obviously  exceed
       that.   For  anything in between the decision is less clear and
       extremely difficult to estimate.  Fortunately, the decision  is
       always  a  clear  go  or  no  go  and the linker will tell you.
       Unless the program  is  very  big,  it  is  best  to  start  by
       compiling all  functions  using the small or compact model.  If
       the 64K limit is exceeded the linker will give  a  clear  error
       message.   (If  you ever exceed 64K in a single source file the
       compiler would catch that, but shame on you.  Modularize!)

            Since few if any functions need to  be  coded  differently
       to  switch to one of the larger models the chances are that all
       you will be required to do when and if you  find  it  necessary
       will  be  to  recompile  all  functions using one of the larger
       models and relink.  If you have a make  file  for  the  project
       that should be simple indeed.

            In  those  rare  instances where it is necessary to modify
       source code according to memory model, consider coding so  that
       you can   compile  using  any  memory  model.    It  is  almost
       inconceivable that  the  size  of  the  code  pointer  will  be
       critical  in  the  source program, so there are really only two
       cases to consider, near and far data pointers.

            With MSC coding for both possibilities is easy because  an
       automatically    defined    preprocessor   symbol   tells   the
       preprocessor which model is being used, and this  can  be  used
       with  the  #if directive to select between alternative versions
       of the affected parts of  the  source  code.    The  symbol  is
       M_I86xM,  where  "x"  is  the  one  character identifier of the
       model in use:  M_I86SM for small, M_I86MM for  medium,  M_I86CM
       for compact,  M_I86LM for large, and M_I86HM for huge.  For all
       models except huge the symbol for the corresponding model  will
       be defined  and  all others being undefined.  Huge is a special
       case,  where  both  M_I86HM  (as  expected)  and  M_I86LM   are
       defined.   Perhaps  this  is  because  the  huge  model  is  an
       extension of the large model.

            Listing 1 shows a simple but realistic  case  where  these
       symbols  are  used  to  select  code  based  on  memory  model.
       Listing 2 is a little more contrived, selecting only  a  string
       to be  displayed,  but  it checks all models.  Note that if the
       difference between large and huge makes  a  difference  at  the
       source  code  level  you must not conclude that the large model
       is the one in use just because M_I86LM is defined.    It  could
       be  that  M_I86HM  is also defined, indicating huge. That's why
       the code in Listing 2 checks M_I86HM before M_I86LM.

            The amount of code is fixed.  If you are  able  to  get  a
       clean  link  you  never  need  worry that a decision to use the
       small or compact model will come back to haunt  you,  and  your
       resulting  .EXE  file  will be smaller, sometimes much smaller.
       Jerry Zeisler, who helped in the preparation  of  this  article
       by  compiling  and  linking  the  bench  mark using Turbo C 1.5
       reported that when he was forced to go from the  small  to  the
       large  model  for  a  program  the  .EXE  file went from 71K to
       161K.   Using  either  medium  or  compact  according  to   the
       requirements  would  have  made  the  jump less drastic, but it
       does go to shown that once you cross the  line  from  small  to
       another model you do pay a price in space.

       2.1.2 Data Pointer Requirements

            Finding  the  size  of  data  pointer  required  is not as
       clear-cut as determining whether or not  a  near  code  pointer
       will suffice.    The amount of a storage a program will need at
       run time cannot be determined in advance  by  the  compiler  or
       linker in  every  case.    Since  C  is a semi block structured
       language automatic variables are allocated on block entry,  and
       the  total  required  varies  with the depth and order of block
       entries.  This does not depend only upon the  static  structure
       of your  program.    It may also depend upon the data each time
       you run it.  Sometimes you can arrive at a maximum, but  for  a
       program  of  any  complexity  it  would  be a tedious and error
       prone process requiring a lot of  knowledge  of  your  compiler
       implementation.   If the program uses recursion it may not even
       be possible.

            Even  when  there  is   no   recursion   the   uncertainty
       concerning  data  space  requirements  may  be  a  problem in a
       program which allocates heap storage using malloc() or  similar
       functions, since  this  is  even less predictable.  This puts a
       greater burden on the programmer, and I don't  offer  any  hard
       and fast rules here.

            If  you  can  determine  that  64K  of data will always be
       sufficient try the  small  model  first,  going  to  medium  if
       necessary because  of  the code size.  Otherwise use compact if
       possible, going to large if the code size requires it.

            Use huge only as  a  last  resort,  as  it  is  the  least
       efficient, especially  with  MSC  4.0.    You can almost always
       determine ahead of time whether or not  any  single  data  item
       will  exceed  64K,  so  the  choice  between  large and huge is
       usually easy.

       3 MEMORY MODEL BENCHMARK

            The  benefits  of  using  the  larger  pointer  types  are
       obvious, and  amount to a go/no go decision in most cases.  For
       those cases where performance and/or  space  is  very  critical
       and  the  choice  of  memory  model  may  affect  the design in
       non-trivial ways, it is good to get an idea ahead  of  time  of
       what the  costs  are  as  well.  The simple benchmark used here
       was devised for such a project,  where  the  the  design  could
       take  advantage  of  as  much  storage  as  possible,  but  the
       performance of bitor()  and  similar  functions  was  critical,
       since they would be called millions of times each.

       3.1 Bench Mark Code

            The  source  code for the very simple bench mark performed
       is shown in Listing 2 and Listing  3.  Listing  2  defines  the
       main()  function  and  Listing  2  defines an external function
       bitor(), which performs a  bitwise  or  operation  between  two
       memory buffers.    The  bench  mark  measures the efficiency of
       calling and executing  bitor()  under  various  memory  models,
       which  was  the  problem  of  interest  for  the  project which
       motivated this whole study.

            An  important   reason   for   compiling   the   functions
       separately,   besides   the  fact  that  bitor()  was  actually
       intended for use in other programs was that it quaranteed  that
       no optimizer could could eliminate the repetitive calls.

           The  main() function accepts parameters which determine how
       many times to call bitor() and the size of the buffers,  up  to
       a  maximum  of  256. A two level nested loop was used simply to
       avoid  using  a  long  integer  counter  for  more   than   64K
       repetitions.

            Both functions  were  optimized  for  speed.   This is the
       default with MSC, and was used with Turbo  C  for  consistency.
       This is  usually  a  wise choice for small programs anyway.  In
       this case,  the  bulk  of  the  code  comes  from  the  library
       routines,  and  the  bulk  of  the execution is in the compiled
       functions.  Optimizing the compiled functions for  space  would
       have saved  little space, and possibly cost a lot of time.  The
       usual rule should be to optimize anything seldom  executed  for
       space, and anything frequently executed for time.

       3.2 Execution Time Test

            When  testing  for execution time, I used the Rexx program
       shown in Listing 4 to set up and  time  the  execution  of  the
       .EXE files  prepared  under  each memory model.  In every case,
       the .EXE files are copied to the D:  the DOS path and is a  RAM
       disk.   This virtually eliminates any variability caused by the
       placement of the .EXE files on a hard  disk.    Two  tests  are
       performed.

            Table  1  shows  the  time  in  seconds  when  bitor()  is
       executed  300,000  times  specifying  a  length  of   0,   thus
       measuring mostly  calling  overhead.    The differences between
       memory models is thus  mostly  related  to  the  type  of  code
       pointer.

            Table  2  shows  the  time  to execute bitor() 2,500 times
       specifying a length of 256.  In that case  the  execution  time
       reflects   predominantly  the  indirect  memory  references  in
       bitor(), which do the real work take most of the time,  so  the
       primary enfluence is the code pointer type.

            The results  are  not  surprising.  The small model is the
       most efficient, followed  medium  or  compact,  depending  upon
       which   test  you  look  at,  then  large,  and  finally  huge.
       Further, in the first test, compact is nearly  equal  to  small
       and medium  nearly equal to large.  In the second this grouping
       is reversed.  Medium is close to small and compact is close  to
       large.  This  confirms  the  analysis  done  ahead of time.  It
       also goes  to  show  again  that  the  relative  importance  of
       different  factors  affecting performance depends upon not only
       the specific program, but sometimes  the  parameters  or  other
       data as well.

            One  thing  which  is surprising at first is that although
       MSC 4.0 and 5.0 are generally quite close, 4.0 shows a  (pardon
       the  pun)  huge  penalty for using the huge model in the second
       test.  This is probably  because  the  huge  model  was  a  new
       feature  with  that  release,  and by the time release 5.0 came
       out developers had had more chance to optimize it.

       3.3 Code Size Compared.

            Table 3  lists  the  size  of  the  .OBJ  and  .EXE  files
       produced by  each  compiler  with each memory model.  The files
       have  been  renamed  according  to  their   respective   memory
       models.  The  results are mostly self explanatory.  The size of
       the .EXE files must be taken with  a  half  a  grain  of  salt,
       since  they  consist  mostly of library routines, which may not
       have even been written in C, and don't  necessarily  shown  the
       quality of the compiler.

       3.4 Conclusions

            For  each  compiler, the time and code space efficiency of
       the various memory models compare to  one  another  exactly  as
       our theoretical  explanation  predicts.  That is that the small
       model is the most efficient and should be used in  those  cases
       where it   will  serve  the  purpose.    These  tests  show  no
       advantage of the tiny model over the small model.

            Medium and compact are both between small and  large,  but
       can't be  strictly  ordered.   The relative effeciency of these
       two depends upon the individual  program  and  data.    In  any
       case,  the  programmer  is seldom faced with the choice between
       medium and compact.

            The large model is less efficient than small,  medium,  or
       compact,  though the difference between it and either medium or
       compact may not be  significant.    When  far  code  references
       predominate  medium  is  close to large and compact is close to
       small.  When  data  references  predominate  the  situation  is
       reversed.  The latter case is the most common in practice.

            The huge  model  is  the least efficient.  The penalty for
       going from large to huge is quite severe  with  MSC  4.0,  less
       with  5.0, and almost insignificant for Turbo C, a real tribute
       to the optimization of Turbo C.

            Caution is always in  order  when  using  bench  marks  to
       compare   different   vendor's   program  products,  especially
       compilers.  It is often easy to devise a  test  to  make  one's
       choice come  out  on  top.    Contradictory  advertising claims
       suggest this is in fact what vendors do.  The bench mark  shown
       here  is  highly  selective, in that it aims to isolate certain
       features of interest.  It  does  not  use  any  floating  point
       operations,  recursion,  or  complex  calculations of any kind,
       and does not do any significant amount  of  i/o,  for  example.
       Still,  it does measure the things of interest here rather well
       and was not  written  with  the  purpose  of  proving  a  given
       compiler better or worse.

            It  is  therefore  worth  noting, without drawing dogmatic
       conclusions, that, contrary to the  claims  of  Microsoft  when
       pushing  upgrades to 5.0, version 4.0 sometimes produces better
       object code.  In fact, for actual applications, I  have  hardly
       ever  found  a  case  where  recompiling something with MSC 5.0
       yielded a smaller or significantly  faster  .EXE  file  than  I
       previously had gotten from 4.0.

            MSC  5.0  introduced  a  lot  of new functions, but if you
       don't need them and are not using the huge  model  you  may  do
       better to  continue  using  4.0.  I have also found version 4.0
       to be  a  much  more  reliable  product.    I  only  report  my
       experience.   Perhaps  my  applications are not representative.
       I never use  floating  point  math  but  use  recursion  fairly
       often, for example.

            So  many bugs were reported with 5.0 that Microsoft rather
       quickly announced 5.1.  I did not have 5.1 available  for  test
       because  I  had  such  a  bad experience with 5.0 that I didn't
       feel like  paying  another  upgrade  fee  to  fix  their  bugs,
       preferring  to  spend  about the same amount of money for Turbo
       C, if it came to that.  The results of this limited bench  mark
       seems to  strengthen  that  resolve.    In  every case, Turbo C
       produced tighter,  faster  object  code,  a  rather  impressive
       achievement considering the price differential.


       =================================================================
       /* Listing 1: CMDARGS.C */
       /* get unparsed command line arguments from PSP */
       /* sets input variable to the line and returns length */
       #include <stdlib.h>
       #include <string.h>
       #include <dos.h>
       #define FP_PTR(seg,off) ((((long)seg)<<16)+off)

       int cmdargs(result) char *result;
       {
       unsigned char far *dta=FP_PTR(_psp,0x80);
       /* if compact, large or huge use memcpy */
       #if defined(M_I86LM) || defined(M_I86CM)
       memcpy(result,dta+1,*dta);
       result[*dta]=0;
       return *dta;
       #else
       {
       int length=*dta;
       int ret_len=length;
       while (length--)
       *(result++)=*(dta++);
       *result=0;
       return ret_len;
       }
       #endif
       }

       #if defined(TEST)
       #include <stdio.h>
       main()
       {
       char args[128];
       cmdargs(args);
       putchar('"');
       fputs(args,stdout);
       putchar('"');
       }
       #endif


       =================================================================
       /* LISTING 2 - TEST.C */
       #include <stdio.h>
       void bitor(char *, char *, int);
       /* Use preprocessor symbols to determine
       content of string model[] */
       static char model[8]=
       #if defined(M_I86SM)
       "small";
       #elif defined(M_I86MM)
       "medium";
       #elif defined(M_I86CM)
       "compact";
       #elif defined(M_I86HM)
       /* NOTE huge must be tested before large,
          because huge sets M_I86LM as well as M_I86HM */
       "huge";
       #elif defined(M_I86LM)
       "large";
       #else
       "unknown"; /* non-standard, (or Turbo C) */
       #endif
       main(argc, argv) int argc; char **argv;
       {
       char buf1[256], buf2[256];
       int i=0, j, jlim=0, len=sizeof(buf1);
       /* i=outer loop count; j=inner loop count; defaults 0 0 */
       switch (argc)
       {
       case 4:
       len=atoi(argv[3]);
          if (len>sizeof(buf1)) len=sizeof(buf1);
       case 3:
          jlim=atoi(argv[2]);
       case 2:
          i=atoi(argv[1]);
       }
       printf("model=%s i=%d j=%d len=%d\n",model,i,jlim,len);
       while (i--)
          for (j=jlim; j; j--)
          bitor(buf1,buf2,len);
       }


       =================================================================
       /* LISTING 3 - BITOR.C */
       /* Perform bitwise or between two buffers */
       void bitor(x,y,len) char *x, *y; int len;
       {while (len--) *(x++)|=*(y++);}


       =================================================================
       /* Listing 4: TIMETEST.REX */
       source='MSC4 MSC5 TURBOC'
       parms.1=10 30000 0
       parms.2=1 2500 256
       models='S M C L H' /* Tiny model tested separately */

       do ii=2 to words(source)
       s=word(source,ii)
       copy '\'s'\*.exe d:'
       outfile=s'.DAT'
       do j=1 to 2
       do i=1 to words(models)
           m=word(models,i)
       /* Here is the key part: Execute and record time */
         call time r
          'TEST'm parms.j
             time.j.m=time(e)
           end
       end
       do i=1 to words(models)
       m=word(models,i)
          data=m time.1.m time.2.m
            say data
              call lineout outfile, data
          end
       end
       exit


       =================================================================
       Table 1: Speed Test - Function Calls

       Model    MSC 4.0   MSC 5.0   Turbo C 1.5
       -------  -------   -------   -----------
        Tiny                         25.10
        Small    35.32     34.27     25.10
        Medium   42.13     41.58     27.30
        Compact  35.54     34.43     21.42
        Large    42.29     41.63     23.07
        Huge     43.88     41.63     25.98


       =================================================================
       Table 2: Speed Test - Indirect Byte Reference

       Model    MSC 4.0   MSC 5.0   Turbo C 1.5
       -------  -------   -------   -----------
        Tiny                         18.89
        Small    32.19     32.19     18.90
        Medium   32.29     32.24     18.95
        Compact  35.92     35.86     30.92
        Large    35.98     35.93     30.98
        Huge     68.88     41.19     31.03


       =================================================================
       Table 3 - Comparing .OBJ and .EXE File Size
                      MSC     MSC  Turbo C
       File           4.0     4.0   1.5
       -----------
       bitort.obj       *       *   194
       bitors.obj     309     287   192
       bitorm.obj     316     294   197
       bitorc.obj     309     285   196
       bitorl.obj     316     292   201
       bitorh.obj     381     326   182

       testt.obj        *       *   473
       tests.obj      541     521   473
       testm.obj      557     537   487
       testc.obj      560     541   495
       testl.obj      576     557   509
       testh.obj      653     636   485

       testt.exe        *       *  6534
       tests.exe     6670    7383  6334
       testm.exe     6870    7531  6476
       testc.exe     8770    9501  7898
       testl.exe     8970    9649  8056
       testh.exe     9082    9729  9143

       * Tiny model not applicable to MSC.

From: <https://gist.githubusercontent.com/berk76/
02fa06d4628b3d1493fee41c80dd26ad/raw/
32f230bf37153a80e12fc0ffa801a149ef69e3ab/memory_model.md>