From: jmolini@nasamail.nasa.gov (JAMES E. MOLINI)

From: [email protected] (JAMES E. MOLINI)
Date: Wed, 4 Apr 90 14:03 PDT
Message-Id: <LJJA-2875-5674@nasamail>
To: [email protected]
Cc: [email protected], [email protected]
Subject: Universal Virus Detector?
X-Lines: 272

I am working with a colleague on defining a robust virus detection
utility. The paper below discusses an approach we are investigating.
The work was undertaken as part of a research project sponsored by the
National Aeronautics and Space Administration at the Johnson Space Center.

Please look it over and tell us what you think. We would like to know what
type of virus could be written to defeat this type of detector on a large
scale. I know it is a rather long document, but you might find it
interesting. Thanks in advance.

A Universal Virus Detection Model

by Chris Ruhl and James Molini
Computer Sciences Corp.
Email: [email protected]

PREFACE

This paper attempts to define an abstract model which will support the
construction of a Universal Virus Detector. Although the restrictions
imposed upon the model for 100% accuracy may be too severe to make such
an implementation practical, it is quite feasible to achieve near
universal virus detection in a user friendly fashion.

This paper is distributed with the intent of discovering reasonable
vulnerabilities in the concepts, or implementation. Comments are
therefore encouraged. We have used an IBM PC Compatible running MS DOS
3.X as the candidate implementation platform for convenience.

The paper does not discuss virus identification, which is a separate
issue from detection. Although not "absolutely necessary," virus
identification mechanisms dramatically reduce the time required to
eradicate specific cases of virus infection.

If you have questions, or see a flaw in the process, please let us
know. We are building a virus detector, which could be placed into the
public domain, that uses the techniques below to detect virus
infections. Our initial tests have shown encouraging results.

Please send comments/suggestions to Virus-L, or the authors at the
Email address above. Please do not request code samples, or testing
opportunities until we announce availability of the utility.

Definitions

Before proceeding with our discussion, it is important to define
terms. The following definitions are taken (as faithfully as possible)
from the most recent discussions about viruses on the various Email
networks:

VIRUS - A self-replicating program that must attach itself in some way
to an existing executable on the target computer system in order to
propagate. In doing so, no overt user action is required to further
the replication process.

TROJAN HORSE - A non-replicating malicious program that misleads the
user in order to cause him/her to execute it's malicious code.
Although it is malicious code, it is often hidden inside another piece
of (apparently innocuous) code in order to escape detection. This type
of program does not modify any existing executable files on the system.

WORM - A self-replicating program that does not attach itself to other
executable code in order to propagate. It relies upon some weakness in
a multi-user system, or requires some sort of overt user action in
order to operate. The technical feasibility of worms on single user
computer systems is debatable.

INFECTION - The act of modifying existing executable code in order to
propagate a virus.

MASKING - The act of preventing discovery by intervening at some point
in the scanning process. Typically this effects an indication of a
clean system, when, in fact, the environment under review has been
modified.

Understanding the scope of the virus problem, it is possible to define
a circumstance in which a Universal Virus Detector (UVD) may be
successful. We further scope the problem by NOT requiring that the UVD
prevent an infection. Instead it can identify an infection after it
has occurred. This principle is similar to the idea that smoke
detectors are not responsible for preventing fires, although they may
periodically work toward that end. They are actually responsible only
for responding to indications that a fire may be present and warning
the user of impending danger. UVD's must be scoped to a similar
purpose for them to work.

With this in mind, let us begin by defining the physic of computer
viruses:

A Virus Propagation Model

Although a Virus is an abstraction (i.e. Program), the environment
it attacks is not (i.e. IBM PC). Regardless of how creative the author
is, he/she cannot change certain characteristics of the machine that
the Virus inhabits. In order to develop this model the following
assumptions are made:

a.) The user will begin the detection process (we have proposed a
CRC type routine) prior to infection. By doing so, the user
has provided an uninfected baseline from which to judge future
states. Although virus propagation may still be identifiable
on an infected machine, the level of detection for subsequent
states becomes indeterminate.

b.) The user will avoid the introduction of self modifying code
into the system. By doing so, he/she ensures that the target
system maintains a given state of integrity.

Given the assumptions above, we may now define the circumstances
necessary to support a virus infection. Without the adherence to the
following rules, it is impossible to define a circumstance in which a
virus can propagate.

Rule #1: A Virus infection, or propagation occurs when an
executable file is altered.
Proof:
I) An un-altered executable will not be infected since by
definition it is not altered. Here we are assuming that the
original state of the machine is uninfected.

II) An unaltered piece of code that performs malicious acts is a
Trojan horse and thus, beyond the scope of this problem.

III) A non-executable file, whether altered, or not, cannot
further infect the system since by definition it is never
executed. An altered non-executable is merely a damaged data
file.

Thus: Only altered executables can further infect the system.

Note: In certain cases, a new executable file can be added to the
system, but it still cannot infect the system, unless it is
called from a modified file in the system, (violating I
above) or unless the user intervenes, in which case the
program is not a virus, but a worm, or Trojan Horse.

Rule #2: Assuming that the detection mechanism is sufficiently
robust, the only possible way to avoid detection is to
mask the infection prior to having the detection results
presented to the user.

Proof:
I) An un-altered procedure cannot mask an infected file since
by definition its not altered to do so. (Masking requires
foreknowledge of the code to be masked. Such a masking
scenario would indicate a state of infection prior to
installation of the UVD violating a basic assumption that you
install it on a clean machine.)

II) Masking requires some type of intervention in the file
read/result presentation process. Here we assume that the
computation of the checksum is sufficiently robust that no 2
different pieces of code can generate the same result.
Therefore, since masking requires some type of modification
of data in the path from storage to user and since the only
2 feasible parts to that path are either the read, or the
delivery, any masking must be completed at one of the 2 ends
of the pipe.

Thus: Only altered procedures can mask the infection of
executables.

UVD CONSTRUCTION.

>From the above discussion, we can begin defining a UVD with some degree
of assurance that it will do the job. If a virus must modify the
original state of the system in order to be successful, we can define a
process that can detect that modification. (Insert your favorite
Checksum/CRC/Cryptographic routine here.) Any program which
circumvents the modification of existing code is not a virus.

Then, to defeat the detection process, the virus must mask the
infection in some way so that this verifiable change detection
mechanism cannot present accurate results to the user.

Any program which circumvents the detection mechanism must do so by
modifying the data delivery process into, or out of the detector. Once
again we are talking about code modification.

We have recently seen an example of the masking effect. In that case
the 4096 virus, masked all infected files prior to releasing the data
to any process attempting to read them. Another masking attack would
redirect all detector output to NULL on a PC, thereby depriving the
user of any notification that the detector may have generated.

Another option, which attempts to mask infections, is a directed attack
against the utility. In order to prevent successful directed attacks,
several methods can be used. The following methods attempt to validate
the integrity of the detection code, or stored tables:

a. Run the detector from a write-protected, bootable floppy, thus
assuring a validated run time environment and providing a constant
set of scan pointers.

b. Use software to validate the detector prior to operation and
validate the fact that the detector is operating with exclusive use
of the CPU. Antigen is one example of code which validates the
integrity of a program prior to execution.

c. Prevent modification of computer checksums by prefixing each file
with a set of detector specific state vectors. This approach
obtains a set of memory resident pointers, or values that are
specific to the region where the detector is run. These pieces of
information are then prepended to each executable checked and act
as a type of "fingerprint" for the virus detector. They will
change from machine to machine and from version to version. As a
result, no virus can intercept the data points and compute a
substitute checksum.

d. Finally, a simple way to hinder directed attacks in a general
sense is to change file extensions, or stored text strings, which
will make identification of the detector by directed viruses more
difficult. Normally, this is only feasible when dealing with non-
copyrighted programs.

So to put our theoretical UVD into practice, on, for example, an IBM
PC, we would do the following:

a. Begin by validating the integrity of the detector code. This has
been discussed above.

b. Validate the integrity of the read process by checking the
interrupt table and low memory for changes. This would stop the
4096 and viruses of its species, which place themselves in the
memory resident read procedures and mask infections.

c. Validate the correctness of the output process by checking screen
write interrupt vectors and device paths. It could be done also by
generating a direct write procedure to hardware addresses during
the installation process.

d. Validate the Boot sector of the disk and hidden R/O system files
via direct sector reads. Knowing that the read process is
unchanged, we can also be sure that the data coming into the CRC
routine is correct. This then would defeat the Pakistani Brain and
viruses of that sort. which relocate the boot sector and generate
offset addresses.

e. Once both ends of the pipe and the pipe itself are validated, we
can begin checking all executables on the system for modifications.

By doing this we have checked the entire data path and found it to be
correct. We have also checked the correctness of the change detection
procedure. This assumes that no other process has taken over the CPU
during the scan, but this is no problem as long as we mask all external
interrupts. Then only an actual hardware interrupt can cause the
program to pause. And even these interrupts can be masked to a certain
extent.

Some have said in the past that the human is also a part of this process.
We agree, but the user must be a part of any process. This utility must be
designed to present the user with a reliable estimation of the integrity of
executable files in the target machine. Running the utility in conjunction
with the software update process guarantees that the user is aware of
changes to the system configuration. Doing this in a controlled fashion
will not violate the integrity of the model.

Although the detection of authorized modifications may be a nuisance, it is
necessary if we are to allow the system owner to make all risk related
decisions on his/her system. If the user misses an infection once, it is
fairly certain that the infection will be attempted again on the same
machine (we saw over 400 infections on one machine). To this end, boot
infections and memory infections are always flagged as serious. Beyond
that, we can't effectively protect the user from himself.