40Hex Number 6 Volume 2 Issue 2 File 00B

40Hex Number 6 Volume 2 Issue 2 File 00B

------------------------------
SCAN STRINGS, HOW THEY WORK,
AND HOW TO AVOID THEM
------------------------------
By Dark Angel
------------------------------

Scan strings are the scourge of the virus author and the friend of anti-
virus wanna-bes. The virus author must find encryption techniques which
can successfully evade easy detection. This article will show you several
such techniques.

Scan strings, as you are well aware, are a collection of bytes which an
anti-viral product uses to identify a virus. The important thing to keep
in mind is that these scan strings represent actual code and can NEVER
contain code which could occur in a "normal" program. The trick is to use
this to your advantage.

When a scanner checks a file for a virus, it searches for the scan string
which could be located ANYWHERE IN THE FILE. The scanner doesn't care
where it is. Thus, a file which consists solely of the scan string and
nothing else would be detected as infected by a virus. A scanner is
basically an overblown "hex searcher" looking for 1000 signatures.
Interesting, but there's not much you can do to exploit this. The only
thing you can do is to write code so generic that it could be located in
any program (by chance). Try creating a file with the following debug
script and scanning it. This demonstrates the fact that the scan string
may be located at any position in the file.

---------------------------------------------------------------------------

n marauder.com
e 0100 E8 00 00 5E 81 EE 0E 01 E8 05 00 E9

rcx
000C
w
q

---------------------------------------------------------------------------

Although scanners normally search for decryption/encryption routines, in
Marauder's case, SCAN looks for the "setup" portion of the code, i.e.
setting up BP (to the "delta offset"), calling the decryption routine, and
finally jumping to program code.

What you CAN do is to either minimise the scannable code or to have the
code constantly mutate into something different. The reasons are readily
apparent.

The simplest technique is having multiple encryption engines. A virus
utilising this technique has a database of encryption/decryption engines
and uses a random one each time it infects. For example, there could be
various forms of XOR encryption or perhaps another form of mathematical
encryption. The trick is to simply replace the code for the encryption
routine each time with the new encryption routine.

Mark Washburn used this in his V2PX series of virii. In it, he used six
different encryption/decryption algorithms, and some mutations are
impossible to detect with a mere scan string. More on those later.

Recently, there has been talk of the so-called MTE, or mutating engine,
from Bulgaria (where else?). It utilises the multiple encryption engine
technique. Pogue Mahone used the MTE and it took McAfee several days to
find a scan string. Vesselin Bontchev, the McAfee-wanna-be of Bulgaria,
marvelled the engineering of this engine. It is distributed as an OBJ file
designed to be able to be linked into any virus. Supposedly, SCANV89 will
be able to detect any virus using the encryption engine, so it is worthless
except for those who have an academic interest in such matters (such as
virus authors).

However, there is a serious limitation to the multiple encryption
technique, namely that scan strings may still be found. However, scan
strings must be isolated for each different encryption mechanism. An
additional benefit is the possibility that the antivirus software
developers will miss some of the encryption mechanisms so not all the
strains of the virus will be caught by the scanner.

Now we get to a much better (and sort of obvious) method: minimising scan
code length. There are several viable techniques which may be used, but I
shall discuss but three of them.

The one mentioned before which Mark Washburn used in V2P6 was interesting.
He first filled the space to be filled in with the encryption mechanism
with dummy one byte op-codes such as CLC, STC, etc. As you can see, the
flag manipulation op-codes were exploited. Next, he randomly placed the
parts of his encryption mechanism in parts of this buffer, i.e. the gaps
between the "real" instructions were filled in with random dummy op-codes.
In this manner, no generic scan string could be located for this encryption
mechanism of this virus. However, the disadvantage of this method is the
sheer size of the code necessary to perform the encryption.

A second method is much simpler than this and possibly just as effective.
To minimise scan code length, all you have to do is change certain bytes at
various intervals. The best way to do this can be explained with the
following code fragment:

mov si, 1234h ; Starting location of encryption
mov cx, 1234h ; Virus size / 2 + variable number
loop_thing:
xor word ptr cs:[si], 1234h ; Decrypt the value
add si, 2
loop loop_thing

In this code fragment, all the values which can be changed are set to 1234h
for the sake of clarity. Upon infection, all you have to do is to set
these variable values to whatever is appropriate for the file. For
example, mov bx, 1234h would have to be changed to have the encryption
start at the wherever the virus would be loaded into memory (huh?). Ponder
this for a few moments and all shall become clear. To substitute new
values into the code, all you have to do is something akin to:

mov [bp+scratch+1], cx

Where scratch is an instruction. The exact value to add to scratch depends
on the coding of the op-code. Some op-codes take their argument as the
second byte, others take the third. Regardless, it will take some
tinkering before it is perfect. In the above case, the "permanent" code is
limited to under five or six bytes. Additionally, these five or six bytes
could theoretically occur in ANY PROGRAM WHATSOEVER, so it would not be
prudent for scanners to search for these strings. However, scanners often
use scan strings with wild-card-ish scan string characters, so it is still
possible for a scan string to be found.

The important thing to keep in mind when using this method is that it is
best for the virus to use separate encryption and decryption engines. In
this manner, shorter decryption routines may be found and thus shorter scan
strings will be needed. In any case, using separate encryption and
decryption engines increases the size of the code by at most 50 bytes.

The last method detailed is theft of decryption engines. Several shareware
products utilise decryption engines in their programs to prevent simple
"cracks" of their products. This is, of course, not a deterrent to any
programmer worth his salt, but it is useful for virus authors. If you
combine the method above with this technique, the scan string would
identify the product as being infected with the virus, which is a) bad PR
for the company and b) unsuitable for use as a scan string. This technique
requires virtually no effort, as the decryption engine is already written
for you by some unsuspecting PD programmer.

All the methods described are viable scan string avoidance techniques
suitable for use in any virus. After a few practice tries, scan string
avoidance should become second nature and will help tremendously in
prolonging the effective life of your virus in the wild.