40Hex Number 6 Volume 2 Issue 2                                       File 00B

                       ------------------------------
                        SCAN STRINGS, HOW THEY WORK,
                            AND HOW TO AVOID THEM
                       ------------------------------
                                By Dark Angel
                       ------------------------------

 Scan strings  are the  scourge of  the virus author and the friend of anti-
 virus wanna-bes.   The  virus author  must find encryption techniques which
 can successfully  evade easy detection.  This article will show you several
 such techniques.

 Scan strings,  as you  are well  aware, are  a collection of bytes which an
 anti-viral product  uses to  identify a virus.  The important thing to keep
 in mind  is that  these scan  strings represent  actual code  and can NEVER
 contain code  which could occur in a "normal" program.  The trick is to use
 this to your advantage.

 When a  scanner checks  a file for a virus, it searches for the scan string
 which could  be located  ANYWHERE IN  THE FILE.   The  scanner doesn't care
 where it  is.   Thus, a  file which  consists solely of the scan string and
 nothing else  would be  detected as  infected by  a virus.   A  scanner  is
 basically  an   overblown  "hex  searcher"  looking  for  1000  signatures.
 Interesting, but  there's not  much you  can do  to exploit this.  The only
 thing you  can do  is to  write code so generic that it could be located in
 any program  (by chance).   Try  creating a  file with  the following debug
 script and  scanning it.   This  demonstrates the fact that the scan string
 may be located at any position in the file.

 ---------------------------------------------------------------------------

 n marauder.com
 e 0100  E8 00 00 5E 81 EE 0E 01 E8 05 00 E9

 rcx
 000C
 w
 q

 ---------------------------------------------------------------------------

 Although scanners  normally search  for decryption/encryption  routines, in
 Marauder's case,  SCAN looks  for the  "setup" portion  of the  code,  i.e.
 setting up  BP (to the "delta offset"), calling the decryption routine, and
 finally jumping to program code.

 What you  CAN do  is to  either minimise  the scannable code or to have the
 code constantly  mutate into  something different.  The reasons are readily
 apparent.

 The simplest  technique is  having multiple  encryption engines.   A  virus
 utilising this  technique has  a database  of encryption/decryption engines
 and uses  a random  one each  time it infects.  For example, there could be
 various forms  of XOR  encryption or  perhaps another  form of mathematical
 encryption.   The trick  is to  simply replace  the code for the encryption
 routine each time with the new encryption routine.

 Mark Washburn  used this  in his  V2PX series of virii.  In it, he used six
 different  encryption/decryption   algorithms,  and   some  mutations   are
 impossible to detect with a mere scan string.  More on those later.

 Recently, there  has been  talk of  the so-called  MTE, or mutating engine,
 from Bulgaria  (where else?).   It  utilises the multiple encryption engine
 technique.   Pogue Mahone  used the  MTE and it took McAfee several days to
 find a  scan string.   Vesselin  Bontchev, the McAfee-wanna-be of Bulgaria,
 marvelled the engineering of this engine.  It is distributed as an OBJ file
 designed to  be able to be linked into any virus.  Supposedly, SCANV89 will
 be able to detect any virus using the encryption engine, so it is worthless
 except for  those who  have an  academic interest  in such matters (such as
 virus authors).

 However,  there   is  a  serious  limitation  to  the  multiple  encryption
 technique, namely  that scan  strings may  still be  found.   However, scan
 strings must  be isolated  for each  different encryption  mechanism.    An
 additional  benefit   is  the   possibility  that  the  antivirus  software
 developers will  miss some  of the  encryption mechanisms  so not  all  the
 strains of the virus will be caught by the scanner.

 Now we  get to  a much better (and sort of obvious) method: minimising scan
 code length.   There are several viable techniques which may be used, but I
 shall discuss but three of them.

 The one  mentioned before which Mark Washburn used in V2P6 was interesting.
 He first  filled the  space to  be filled  in with the encryption mechanism
 with dummy  one byte  op-codes such  as CLC, STC, etc.  As you can see, the
 flag manipulation  op-codes were  exploited.   Next, he randomly placed the
 parts of  his encryption  mechanism in  parts of this buffer, i.e. the gaps
 between the  "real" instructions were filled in with random dummy op-codes.
 In this manner, no generic scan string could be located for this encryption
 mechanism of  this virus.   However, the disadvantage of this method is the
 sheer size of the code necessary to perform the encryption.

 A second  method is  much simpler than this and possibly just as effective.
 To minimise scan code length, all you have to do is change certain bytes at
 various intervals.   The  best way  to do  this can  be explained  with the
 following code fragment:

   mov si, 1234h                     ; Starting location of encryption
   mov cx, 1234h                     ; Virus size / 2 + variable number
 loop_thing:
   xor word ptr cs:[si], 1234h       ; Decrypt the value
   add si, 2
   loop loop_thing

 In this code fragment, all the values which can be changed are set to 1234h
 for the  sake of  clarity.   Upon infection,  all you  have to do is to set
 these variable  values to  whatever is  appropriate  for  the  file.    For
 example, mov  bx, 1234h  would have  to be  changed to  have the encryption
 start at the wherever the virus would be loaded into memory (huh?).  Ponder
 this for  a few  moments and  all shall  become clear.   To  substitute new
 values into the code, all you have to do is something akin to:

   mov [bp+scratch+1], cx

 Where scratch is an instruction.  The exact value to add to scratch depends
 on the  coding of  the op-code.   Some  op-codes take their argument as the
 second byte,  others take  the  third.    Regardless,  it  will  take  some
 tinkering before it is perfect.  In the above case, the "permanent" code is
 limited to  under five or six bytes.  Additionally, these five or six bytes
 could theoretically  occur in  ANY PROGRAM  WHATSOEVER, so  it would not be
 prudent for  scanners to search for these strings.  However, scanners often
 use scan  strings with wild-card-ish scan string characters, so it is still
 possible for a scan string to be found.

 The important  thing to  keep in  mind when using this method is that it is
 best for  the virus  to use separate encryption and decryption engines.  In
 this manner, shorter decryption routines may be found and thus shorter scan
 strings will  be needed.   In  any  case,  using  separate  encryption  and
 decryption engines increases the size of the code by at most 50 bytes.

 The last method detailed is theft of decryption engines.  Several shareware
 products utilise  decryption engines  in their  programs to  prevent simple
 "cracks" of  their products.   This  is, of  course, not a deterrent to any
 programmer worth  his salt,  but it  is useful  for virus  authors.  If you
 combine the  method above  with  this  technique,  the  scan  string  would
 identify the  product as  being infected with the virus, which is a) bad PR
 for the company and b) unsuitable for use as a scan string.  This technique
 requires virtually  no effort,  as the decryption engine is already written
 for you by some unsuspecting PD programmer.

 All the  methods described  are viable  scan  string  avoidance  techniques
 suitable for  use in  any virus.   After  a few practice tries, scan string
 avoidance should  become  second  nature  and  will  help  tremendously  in
 prolonging the effective life of your virus in the wild.