==Phrack Inc.==

              Volume 0x0b, Issue 0x39, Phile #0x07 of 0x12

|=---=[ ICMP based remote OS TCP/IP stack fingerprinting techniques ]=---=|
|=-----------------------------------------------------------------------=|
|=---------------=[ Ofir Arkin & Fyodor Yarochkin ]=---------------------=|


--[ICMP based fingerprinting approach]--

   TCP based remote OS fingerprinting is quite old(*1) and well-known
   these days, here we would like to introduce an alternative method to
   determine an OS remotely based on ICMP responses which are received
   from the host. Certain accuracy level has been achieved with
   different platforms, which, with some systems or or classes of
   platforms (i.g. Win*), is significally more precise than
   demonstrated with TCP based fingerprinting methods.

   As mentioned above TCP based method, ICMP fingerprinting utilizes
   several tests to perform remote OS TCP/IP stack probe, but unlike
   TCP fingerprinting, a number of tests required to identify an OS
   could vary from 1 to 4 (as of current development stage).

   ICMP fingerprinting method is based on certain discoveries on
   differencies of ICMP replies from various operating systems (mostly
   due to incorrect, or inconsistant implementation), which were found
   by Ofir Arkin during his "ICMP Usage in Scanning" research project.
   Later these discoveries were summarised into a logical desicions
   tree which Ofir entitled "X project" and practically implemented in
   'Xprobe' tool.

--[Information/Noise ratio with ICMP fingerprints]--

   As it's been noted, the number of datagrams we need to send and
   receive in order to remotely fingerprint a targeted machine with
   ICMP based probes is small. Very small. In fact we can send one
   datagram and receive one reply and this will help us identify up to
   eight different operating systems (or classes of operating systems).
   The maximum datagrams which our tool will use at the current stage
   of development, is four. This is the same number of replies we will
   need to analyse. This makes ICMP based fingerprinting very
   time-efficient.

   ICMP based probes could be crafted to be very stealthy. As on the
   moment, no maliformed/broken/corrupted datagrams are used to
   identify remote OS type, unlike the common fingerprinting methods.
   Current core analysis targets validation of received ICMP responses
   on valid packets, rather than crafting invalid packets themselves.
   Heaps of such packets appear in an average network on daily basis
   and very few IDS systems are tuned to detect such traffic (and those
   which are, presumably are very noisy and badly configured).

--[Why it still works?]--

   Inheritable mess among various TCP/IP stack implementations with
   ICMP handling implementations which implement different RFC
   standards (original RFC 792, additional RFC 1122, etc), partial or
   incomplete ICMP support (various ICMP requests are not supported
   everywhere), low significance of ICMP Error messages data (who
   verifies all the fields of the original datagram?!), mistakes and
   misunderstanding in ICMP protocol implementation made our method
   viable.

--[What do we fingerprint:]--

   Several OS-specific differencies are being utilized in ICMP based
   fingerprinting to identify remote operating system type:

   IP fields of an 'offending' datagram to be examined:

   * IP total length field

    Some operating systems (i.g. BSD family) will add 20 bytes
    (sizeof(ipheader)) to the original IP total length field (which
    occures due to internal processing mistakes of the datagram, please
    note when the same packet is read from SOCK_RAW the same behaviour
    is seen: returned packet ip_len fiend is off by 20 bytes).

    Some other operating systems will decrease 20 bytes from the
    original IP total lenth field value of the offending packet.

    Third group of systems will echo this field correctly.

   * IP ID
    some systems are seen not to echo this field correctly. (bit order
    of the field is changed).

   * 3 bits flags and offset

    some systems are seen not to echo this field correctly. (bit order
    of the field is changed).

   * IP header checksum

   Some operating systems will miscalculate this field, others just
   zero it out. Third group of the systems echoes this field correctly.

   * UDP header checksum (in case of UDP datagram)
   The same thing could happen with UDP checksum header.

   IP headers of responded ICMP packet:

   * Precedence bits
    Each IP Datagram has an 8-bit field called the 'TOS Byte', which
    represents the IP support for prioritization and Type-of-Service
    handling.

   The 'TOS Byte' consists of three fields.

   The 'Precedence field'\cite{rfc791}, which is 3-bit long, is intended to
   prioritize the IP Datagram. It has eight levels of prioritization.

   Higher priority traffic should be sent before lower priority traffic.

   The second field, 4 bits long, is the 'Type-of-Service' field. It is
   intended to describe how the network should make tradeoffs between
   throughput, delay, reliability, and cost in routing an IP Datagram.

   The last field, the 'MBZ' (must be zero), is unused and must be zero.
   Routers and hosts ignore this last field. This field is 1 bit long.
   The TOS Bits and MBZ fields are being replaced by the DiffServ
   mechanism for QoS.

   RFC 1812 Requires following for IP Version 4 Routers:

   "4.3.2.5 TOS and Precedence

   ICMP Source Quench error messages, if sent at all, MUST have their
   IP Precedence field set to the same value as the IP Precedence field
   in the packet that provoked the sending of the ICMP Source Quench
   message. All other ICMP error messages (Destination Unreachable,
   Redirect, Time Exceeded, and Parameter Problem) SHOULD have their
   precedence value set to 6 (INTERNETWORK CONTROL) or 7 (NETWORK
   CONTROL). The IP Precedence value for these error messages MAY be
   settable".

   Linux Kernel 2.0.x, 2.2.x, 2.4.x will act as routers and will set
   their Precedence bits field value to 0xc0 with ICMP error messages.
   Networking devices that will act the same will be Cisco routers
   based on IOS 11.x-12.x and Foundry Networks switches.

   * DF bits echoing
   Some TCP/IP stacks will echo DF bit with ICMP Error datagrams,
   others (like linux) will copy the whole octet completely, zeroing
   certain bits, others will ignore this field and set their own.

   * IP ID filend (linux 2.4.0 - 2.4.4 kernels)

   Linux machines based on Kernel 2.4.0-2.4.4 will set the IP
   Identification field value with their ICMP query request and reply
   messages to a value of zero.

   This was later fixed with Linux Kernels 2.4.5 and up.


   * IP ttl field (ttl distance to the target has to be precalculated to
   guarantee accuracy).


   "The sender sets the time to live field to a value that represents
   the maximum time the datagram is allowed to travel on the Internet".

   The field value is decreased at each point that the IP header is
   being processed. RFC 791 states that this field decreasement reflects
   the time spent processing the datagram. The field value is measured
   in units of seconds. The RFC also states that the maximum time to
   live value can be set to 255 seconds, which equals to 4.25 minutes.
   The datagram must be discarded if this field value equals zero -
   before reaching its destination.

   Relating to this field as a measure to assess time is a bit
   misleading. Some routers may process the datagram faster than a
   second, and some may process the datagram longer than a second.

   The real intention is to have an upper bound to the datagram
   lifetime, so infinite loops of undelivered datagrams will not jam the
   Internet.

   Having a bound to the datagram lifetime help us to prevent old
   duplicates to arrive after a certain time elapsed. So when we
   retransmit a piece of information which was not previously delivered
   we can be assured that the older duplicate is already discarded and
   will not interfere with the process.

   The IP TTL field value with ICMP has two separate values, one for
   ICMP query messages and one for ICMP query replies.

   The IP TTL field value helps us identify certain operating systems
   and groups of operating systems. It also provides us with the
   simplest means to add another check criterion when we are querying
   other host(s) or listening to traffic (sniffing).

   TTL-based fingeprinting requires a TTL distance to the done to be
   precalculated in advance (unless a fingerprinting of a local network
   based system is performed system).

   The ICMP Error messages will use values used by ICMP query request
   messages.


   A good statistics of ttl dependancy on OS type has been gathered at:
   http://www.switch.ch/docs/ttl_default.html
   (Research paper on default ttl values)


   * TOS field

   RFC 1349 defines the usage of the Type-of-Service field with the
   ICMP messages. It distinguishes between ICMP error messages
   (Destination Unreachable, Source Quench, Redirect, Time Exceeded,
   and Parameter Problem), ICMP query messages (Echo, Router
   Solicitation, Timestamp, Information request, Address Mask request)
   and ICMP reply messages (Echo reply, Router Advertisement, Timestamp
   reply, Information reply, Address Mask reply).

  Simple rules are defined:
    * An ICMP error message is always sent with the default TOS (0x0000)

    * An ICMP request message may be sent with any value in the TOS
    field. "A mechanism to allow the user to specify the TOS value to
    be used would be a useful feature in many applications that
    generate ICMP request messages".

    The RFC further specify that although ICMP request messages are
    normally sent with the default TOS, there are sometimes good
    reasons why they would be sent with some other TOS value.

    * An ICMP reply message is sent with the same value in the TOS
    field as was used in the corresponding ICMP request message.

   Some operating systems will ignore RFC 1349 when sending ICMP echo
   reply messages, and will not send the same value in the TOS field as
   was used in the corresponding ICMP request message.

   ICMP headers of responded ICMP packet:

   * ICMP Error Message Quoting Size:

   All ICMP error messages consist of an IP header, an ICMP header
   and certain amount of data of the original datagram, which triggered
   the error (aka offending datagram).

   According to RFC 792 only 64 bits (8 octets) of original datagram
   are supposed to be included in the ICMP error message. However RFC
   1122 (issued later) recommends up to 576 octets to be quoted.

   Most of "older" TCP stack implementations will include 8 octets into
   ICMP Errror message. Linux/HPUX 11.x, Solaris, MacOS and others will
   include more.

   Noticiably interesting is the fact that Solaris engineers probably
   couldn't not read RFC properly (since instead of 64 bits Solaris
   2.x includes 64 octets (512 bits) of the original datagram.

   * ICMP error Message echoing integrity

   Another artifact which has been noticed is that some stack
   implementations, when sending back an ICMP error message, may alter
   the offending packet's IP header and the underlying protocol data,
   which is echoed back with the ICMP error message.

   Since mistakes, made by TCP/IP stack programmers are different and
   specific to an operating system, an analysis of these mistakes could
   give a potential attacker a a possibilty to make assumptions about
   the target operating system type.

   Additional tweaks and twists:
   * Using difererent from zero code fields in ICMP echo requests

   When an ICMP code field value different than zero (0) is sent with
   an ICMP Echo request message (type 8), operating systems that will
   answer our query with an ICMP Echo reply message that are based on
   one of the Microsoft based operating systems will send back an ICMP
   code field value of zero with their ICMP Echo Reply. Other operating
   systems (and networking devices) will echo back the ICMP code field
   value we were using with the ICMP Echo Request.

   The Microsoft based operating systems acts in contrast to RFC
   792 guidelines which instruct the answering operating systems to
   only change the ICMP type to Echo reply (type 0), recalculate the
   checksums and send the ICMP Echo reply away.

   * Using DF bit echoing with ICMP query messages

   As in case of ICMP Error messages, some tcp stacks will respond
   these queries, while the others: will not.

   * Other ICMP messages:
       * ICMP timestamp request
       * ICMP Information request
       * ICMP Address mask request

   Some TCP/IP stacks support these messages and respond to some of
   these requests.

--[Xprobe implementation]--

   Currently Xprobe deploys hardcoded logic tree, developed by Ofir
   Arkin in 'Project X'. Initially a UDP datagram is being sent to a
   closed port in order to trigger ICMP Error message: ICMP
   unreachable/port unreach.  (this sets up a limitation of having at
   least one port not filtered on target system  with no service
   running, generically speaking other methods of triggering ICMP
   unreach packet could be used, this will be discussed further).
   Moreover, a few tests (icmp unreach content, DF bits, TOS ...) could
   be combined within a single query, since they do not affect results
   of each other.
   Upon the receipt of ICMP unreachable datagram, contents of the
   received datagram is examined and a diagnostics decision is made, if
   any further tests are required, according to the logic tree, further
   queries are sent.

--[ Logic tree]---

   Quickly recapping the logic tree organization:

   Initially all TCP/IP stack implementations are split into 2 groups,
   those which echo precedence bits back, and those which do not. Those
   which do echo precendence bits (linux 2.0.x, 2.2.x, 2.4.x, cisco IOS
   11.x-12.x, Extreme Network Switches etc), being differentiated
   further based on ICMP error quoting size. (Linux sticks with RFC
   1122 here and echoes up to 576 octets, while others in this subgroup
   echo only 64 bits (8 octets)). Further echo integrity checks are
   used to differentiate cisco routers from Extreme Network switches.

   Time-to-live and IP ID fields of ICMP echo reply  are being used to
   recognize version of linux kernel.

   The same approach is being used to recognize other TCP/IP stacks.
   Data echoing validation (amounts of octets of original datagram
   echoed, checksum validation, etc). If additional information is
   needed to differ two 'similar' IP stacks, additional query is being
   sent. (please refer to the diagram at
   http://www.sys-security.com/html/projects/X.html for more detailed
   explanation/graphical representation of the logic tree).

   One of the serious problems with the logic tree, is that adding new
   operating system types to it becomes extremely painful. At times
   part of the whole logic tree has to be reworked to 'fit' a single
   description. Therefore a singature based fingerprinting method took
   our closer attention.

--[Sinature based approach]--

   Singature based approach is what we are currently focusing on and
   which we believe will be further, more stable, reliable and flexible
   method of remote ICMP based fingerprints.

   Signature-based method is currently based on five different tests,
   which optionally could be included in each operating system
   fingerprint. Initally the systems with lesser amount of tests are
   being examined (normally starting with ICMP unreach test).

   If no single OS stack found matching received signature, those
   stacks which match a part, being grouped again, and another test
   (based on lesser amounts of tests issued principle) is choosen and
   executed. This verification is repeated until an OS stack,
   completely matching the signature is found, or we run out of tests.

   Currently following tests are being deployed:

   * ICMP unreachable test (udp closed port based, host unreachable,
   network unreachable (for systems which are believed to be gateways)
   * ICMP echo request/reply test
   * ICMP timestamp request
   * ICMP information request
   * ICMP address mask request

--[future implementations/development]--

   Following issues are planned to be deployed (we always welcome
   discussions/suggestions though):
   * Fingerprints database (currently being tested)
   * Dynamic, AI based logic (long-term project :))
   * Tests would heavily dependent on network topology (pre-test
   network mapping  will take place).
   * Path-to-target test (to calculate hops distance to the target)
    filtering devices probes.
   * Future implementations will be using packets with
   actual application data to dismiss chances of being detected.
   * other network mapping capabilities shall be included (
    network role identification, search for closed UDP port, reachability
    tests, etc).

--[code for kids]--

 Currently implemented code and further documentation is available at
 following locations:

 http://www.sys-security.com/html/projects/X.html

 http://xprobe.sourceforge.net

 http://www.notlsd.net/xprobe/

Ofir Arkin <[email protected]>
Fyodor Yarochkin <[email protected]>

|=[ EOF ]=---------------------------------------------------------------=|