1
Alert Management Working Group
Chairperson:  Louis Steinberg/IBM





CURRENT MEETING REPORT
Reported by Lee Oattes



AGENDA


  o Introduction

  o Discussion of draft flow control document

  o Preliminary discussion of alert-generation document note:  this was
    shelved due to a lack of time


ATTENDEES


      1. Bierbaum, Neal/vitam6!bierbaum@vitam6

      2. Carter, Glen/[email protected]

      3. Cohn, George/[email protected]

      4. Cook, John/[email protected]

      5. Denny, Barbara/[email protected]

      6. Easterday, Tom/[email protected]

      7. Edwards, David/[email protected]

      8. Fedor, Mark/[email protected]

      9. Hunter, Steven/[email protected]

     10. Kincl, Norman/[email protected]

     11. Malkin, Gary/[email protected]

     12. Oattes, Lee/[email protected]

     13. Paw, Edison/[email protected]

     14. Replogle, Joel/[email protected]

     15. Salo, Tim/[email protected]

     16. Sheridan, Jim/[email protected]

     17. Taft, Vladimir/[email protected]



                                       2
18. Waldbusser, Steve/[email protected]

19. Wintringham, Dan/[email protected]
20. Steinberg, Louis/[email protected]



                                 3
MINUTES


 1. The meeting of the Alert Management Working Group began with an
    introduction from the Chairman (Lou Steinberg).

 2. A discussion of several independent implementations of feedback/pin and
    polled, logged alerts led to an agreement to adopt these mechanisms in
    some form.

 3. The following questions were answered by discussion and consensus:

     (a) Can we have a read-only alerts_enabled mib object, by limiting the
         transmission rate of alerts (no shutoff) and not use feedback?  No.
         We need a total shutoff mechanism in case a number of alert
         generators are "screaming" all at once.  The total traffic might be
         too much for the manager, and this "stable" situation cannot
         improve (while a disabling mechanism would tend to be
         self-correcting).
         Total shutoff implies the use of a resetable, read-write mib
         object.
         An automated, timer-based reset mechanism was discussed but it was
         felt that such a system might tend to sync resets of multiple
         generators and could still lead to an over-reporting condition.

     (b) Might an automated-reset of alerts_enabled from the manager station
         create a "blast-off-blast-off..." alert traffic pattern?
         Yes, but such a manager would still tend to only get as much
         traffic as he could handle.  A re-enable would only be sent when
         the manager isn't swamped (i.e., is capable of sending one).
         A manager experiencing such a traffic pattern should readjust his
         window prior to setting alerts_enabled TRUE.

     (c) When pin disables alerts due to the generation of many similar
         alerts (e.g., link flapping) might we also lose an unrelated alert
         from the same system prior to resetting alerts_enabled?
         Yes, but the rate limiting (as opposed to shutoff) technique has
         the same problem; the probability of sending a single, specific
         alert is much lower than the probability of sending any one of many
         identical alerts.
         This problem is minimized by using polled, logged alerts along with
         feedback/pin (could still lose alerts if log is overwritten).

     (d) Should we allow the implementation to decide if alerts are totally
         disabled or limited to a max rate?  No.
         Implementations should be consistent since this affects the way we
         manage our alert generators.



                                       4
   (e) Can the alert log in polled, logged alerts be overfilled?
       Yes, but the standard suggests that a manager should attempt to

       keep the log empty by removing known alerts.
       If an individual implementation has no mechanism for removing old
       alerts (no set) then the log must wrap when full and the manager
       might lose alerts.

   (f) If using the SNMP get-next, do we want the oldest logged element
       first, or the newest first?
       Clearly the manager wants the oldest first if a full log will
       wrap...this gives him the most chance to see the oldest alert (in a
       full log) before losing it.
       No real concensus here.  It seems as though this should be
       implementation specific since it only applies to SNMP, and since
       the log, actually being a table, makes this a question of "are new
       table entries added at the table top or bottom?".

   (g) Can we shrink the log size by stripping out only the "important"
       information from each alert?
       We can, but this is something we decided we shouldn't do.  It
       requires a different parser at the manager (can't run it through
       the alert parser), and we did not know how do decide what
       information might be needed (it varied with the protocol and alert
       type).

   (h) How about only logging alerts, and sending an "alert logged" alert
       for each new log entry?  The manager gets the asynch.  "alert
       logged" notice and reads the alert log to determine what happened.
       While this is an interesting concept, it was felt that it might
       tend to aggravate some of the other logging problems (e.g., if the
       log is filled and not over-writing, the only chance of getting the
       alert information is from the async alert...this removes the asynch
       alert information and replaces it with "see the log" information).

   (i) A discussion of the cpu cycles and memory needed for keeping a log
       followed.  Since the log size might be settable (to 0) it was felt
       that systems could allow managers to disable logging.  It was also
       felt that the performance and memory hits were not large, but
       numbers to confirm this were not available.

4. The following were decided by vote:

   (a) Feedback/Pin
       Mandatory mib objects:



                                     5
        alerts_enabled   read/ write
        window (time)    read/ optional write

        max_alerts       read/ optional write
   Do not include alert counters as mib objects for this document.
   Individual implementors will decide if they need total dropped
   and/or sent, but not everybody likes the idea of adding more
   counters as (even optional) mib objects.
   Do not optionally allow a reduced rate mode on the over reporting
   condition...require total async.  Alerts to be shutoff for reasons
   given in earlier discussion.

(b) Polled, Logged Alerts Remove time field from the table, as most
   alerts are time stamped and the information in an alert should be
   defined by the protocol...not us.