NMIS To Do
Last updated 25 February 2002

Online Version

NMIS Home Page

To Do
  • Write an installation script.
  • Make polling engine multi-threaded.
  • Enhance documentation.
  • Use NET-SNMP as the SNMP Package
  • Time and Date on Dash Board
Health

Make health include metrics for interface utilisation, could be done by doing utilisation summarystats for each interface where each interface utilisation was weighted out of 100 where 50 utilised would equal 50, etc, which then contributed to a total number divided by the number of interfaces and then weighted into the total health metric.

Locations

Add timezone for each location, in the form of UTC or Local name.

Complete Locations lat and lon from Encarta

Health

Every 5 minutes calculate the health metric for each group & whole network and stick in an RRD allowing graph and drilling of overall health and then graph health on the dash board.

Collection Policy

Create a CSV with interface types and node types and some flags for defining the collection policy for devices.

ifType device_type role description ifAdminStatus ifOperStatus collect
default any any any any any false
ethernet switch any true up up true
ethernet switch any false up up false
atm any any any up up true
summarystats 

I would actually like to rewrite the summary stats routine to return a "summarystats hash".

Interface Stats

For each interface which supports it collect in/out frames, broadcast, unicast.  This would allow determination of average packet size on link and on network.  It would also allow thresholds to be set for percentage of broadcast traffic on network.

More Notes:

An excellent idea, this would be very good information, this is already 
on the todo list but there are few "dependancies":

1. The poller should be enhanced first to be multi-threaded which will 
enable improved performance for SNMP and increase the number of mibs 
polled per second per node. I think this is the next major improvement 
required. That along with making NMIS a daemon which go hand in hand.

2. I would also like to support packet stats, ie pkt in and out, as well 
as unicast and non unicast. There are different error stats per 
interface type.

3. Not all interfaces support full if-mib stats, ie frame relay 
subinterfaces on Cisco devices only support ifInOctets and ifOutOctets, 
all other stats are in the Frame Relay mibs. This is also true for 
other interface types like ATM. So the solution is to keep the extended 
stats in a different RRD for each interface, I would envisage that there 
would be the current interface, extra bit (pkts, errors, etc), specific 
(frame-relay, ATM, etherlike, etc), so one interface may have 1 
interface RRD while another my have 3. This could be implemented in 
phases too I suppose.

4. Yes putting error rates in the node health metric would be excellent.

5. Other metrics could be calculated for nodes and the network:
* like per interface/link average packet size;
* entire network average packet size;
* error rates per interface/network (this equates to the much heard 
never seen "accuracy" metric);
* lots of others that we might think of later.

6. Thresholding on errors should be done (easy).

Create RRD

I had been thinking about making this config options and putting the formulas in the code.

NMIS Command Line

simple script to provide status of nodes and basic info like summary.pl but commandline oriented.

Thresholds

Put the threshold code seperate for just using in nmis.pl

HOST MIB

Add support for HOST-MIB, then get CPU and MEM working dynamically, get DISK mapping working for disk free and usage stuff.

Add a control file like the interface.dat which maps CPU and MEM and DISK, also tracks inventory changes.

Eric,

Yep, this was used pre Event Policy table, which is when I figured the event 
policy should be used.  I think this is only used by nmiscgi.pl and looks 
redundant.  If an event exist for a node, then just get the level from the 
event state table.  Should be a small patch.


Regards


Keith
-----Original Message-----
From: imlnetnz [mailto:imlnetnz@yahoo.co.nz]
Sent: 18 December 2001 18:15
To: nmis_users@yahoogroups.com
Subject: [nmis_users] eventlevel


Keith - I am not sure why the subroutine eventlevel in NMIS.pm 
appears to overule the event levels as defined in the policy table - 
is there  a good reason for this, have I misread the code, or was 
this something to get the project up and running out of the box ?

regards

Eric
Interface Speeds and SNMP Spikes
Yes and No, best handled in RRD but not so easy, sort of need to handle in NMIS and RRD, ie if NMIS detects a reset (which it does by monitoring sysuptime), checks last RRD value and sets current interface poll to 0 which should reset the RRD and remove the spike.
 

Nasty little SNMP!  I think the problem is when the SNMP counter gets reset and RRD doesn't know how to deal with it, ie last collect was 10000, now 100 and it has a small spike as it thinks the counter has wrapped.

 
Limiting the CDEF to the speed of the link is a good idea, but for some interface types it is not accurate, ie for ATM or Frame Relay PVC which have a burst capability.  I will add this to the TODO list and ponder it a little more, will have to put some code in to handle interface types and speed of primary interface.

Greetings,

      I have experienced odd behavior in graphs if a link/router goes down
unexpectedly. I'm not sure exactly why, but when you view the Bits graph
there will be a large spike (larger than the link is capable of
providing). I don't believe it is an issue with NMIS itself, I believe
the router is reporting incorrect figures... but I digress.

      If anyone would like it, I have changed a few CDEF's in nmiscgi.pl to
not allow throughput greater throughput figures than the link is capable
of as defined by the interfaceTable.

      In nmiscgi.pl, just comment the current input/outputBits CDEF's under
the drawrrd sub routine and add the two conditional ones below (I'd
offer up a patch, but there a a few other changes we've already made
here):

#"CDEF:inputBits=input,8,*",
#"CDEF:outputBits=output,8,*",
"CDEF:inputBitsTmp=input,8,*",             
"CDEF:inputBits=inputBitsTmp,$NMIS::interfaceTable{$tmpifDescr}{ifSpeed},GT,UNKN,inputBitsTmp,IF",
"CDEF:outputBitsTmp=output,8,*",              
"CDEF:outputBits=outputBitsTmp,$NMIS::interfaceTable{$tmpifDescr}{ifSpeed},GT,UNKN,outputBitsTmp,IF",


Now, you will still see spikes with these changes, but they will not
exceed the speed of the link (i.e. laws of physics still prevail).

Thanks to Keith and all who have worked on this project - it is an
irreplaceable tool! If I'm the only one who has seen these spikes simply
disregard this message as the ramblings of an idiot.

regards,
david

Guilherme's Patch

I had the same behavior, your fix seemed to make it work right.
BTW I made some other hacks on the nmis I like to share:

1-Multithread collection and update - I believe it was on the TODO list.
To use it just add the option "mthread=true" on the nmis.pl command line for
the types update and collect.
It will make nmis.pl to fork one instance for each node.
It should be warned that it is resource consuming, I had to increase my host
memory from 64MB to 192MB to avoid swapping. But I achieved a performance gain
from 6 to 1 minute to collect from almost hundred routers

2-Fix for 3Com routers wich describe virtual interfaces with the same string as
the actual physical interface. Because of this, all frame relay (I think X.25,
ATM and ISDN also) interfaces were not correctly collected or displayed. Now
they are hard-coded to be interpreted as virtual interfaces.

3-Fix for groups and devices summary statistics when some item have "nan"
value. The fix were also applied to some reports. I believe that almost all
cases are treated correctly

5-Display the IP addresses together with devices in large dash. And, for the
large dash the telnet url uses the IP address, rather than host name (it avoid
problems in networks that donīt use DNS records for routers and switches). I am
working on using this feature on the other screens and reports.

6-The options to change files and tables are hidden in nmiscgi.pl. I didnīt
anyone messing around with them without root access to server. I know that by
default there are no permissions for this to happen, but it made the dash
screens lighter.

Attached is a diff file. I am sorry to not have one diff for each feature.
I want to hear your impressions.

regards,
Guilherme Chehab

View Sorting
--- view.pl    2001/06/23 01:35:28     1.1
+++ view.pl     2001/10/11 15:21:22     1.2
@@ -75,6 +75,31 @@
 
 exit;
 
+sub alphanumerically {
+       local($&, $`, $', $1, $2, $3, $4);
+       # Sort numbers numerically
+       return $a cmp $b if $a !~ /\D/ && $b !~ /\D/;
+       # Sort IP addresses numerically within each dotted quad
+       if ($a =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) {
+               my($a1, $a2, $a3, $a4) = ($1, $2, $3, $4);
+               if ($b =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/) {
+                       my($b1, $b2, $b3, $b4) = ($1, $2, $3, $4);
+                       return ($a1 <=> $b1) && ($a2 <=> $b2)
+                               && ($a3 <=> $b3) && ($a4 <=> $b4);
+               }
+       }
+       # Handle things like Level1, ..., Level10
+       if ($a =~ /^(.*\D)(\d+)$/) {
+               my($a1, $a2) = ($1, $2);
+               if ($b =~ /^(.*\D)(\d+)$/) {
+                       my($b1, $b2) = ($1, $2);
+                       return $a2 <=> $b2 if $a1 eq $b1;
+               }
+       }
+       # Default is to sort alphabetically
+       return $a cmp $b;
+}
+
 sub editRow {
        my %args = @_;
        my $table = $args{table};
@@ -95,7 +120,7 @@
 
        if ( $edit ne "delete" ) {
                if ( $row eq "add" and $edit eq "true" ) {
-                       foreach $key (sort(keys %table_data)) {
+                       foreach $key (sort alphanumerically (keys %table_data)) {
                                if ( not defined $tmp ) { $tmp = $table_data{$key}; }
                        }
                }
@@ -103,7 +128,7 @@
                        $tmp = $table_data{$row};
                }
                # need to get a row to build the menu.
-               foreach $field (sort(keys %$tmp)) {
+               foreach $field (sort alphanumerically (keys %$tmp)) {
                        $table_data{$row}{$field} = $FORM{$field};
                        #print STDERR returnTime." editRow, field=$field row=$row table=$table_data{$row}{$field}\n";
                }
@@ -168,7 +193,7 @@
                }
        }
        $i = 0;
-       foreach $head (sort (keys %$tmp_key)) {
+       foreach $head (sort alphanumerically (keys %$tmp_key)) {
                $headers[$i] = $head;
                ++$i;
        }
@@ -250,7 +275,7 @@
        cssTableStart("white");
 
        #Display each data Row
-       foreach $key ( sort (keys %table_data) ) {
+       foreach $key ( sort alphanumerically (keys %table_data) ) {
                print "\n";
                ++$counter;
                ++$pass;
@@ -267,7 +292,7 @@
                        }
        
                        $i = $c;
-                       foreach $head (sort(keys %{$table_data{$key}})) {
+                       foreach $head (sort alphanumerically (keys %{$table_data{$key}})) {
                                if ( $NMIS::config{$table_key} !~ /$head/ ) {
                                        $headers[$i] = $head;
                                        ++$i;
DONE

SENDMAIL

Perl based sendmail for better mail control.

DONE

Enhanced SNMPv2c

Add support for the HC MIBS, in fact test and see if HC mibs are supported in all SNMPv2c devices.

mibdump.pl and generating the new OID files, see the FAQ,
change
            SNMP_MIB::loadmib($argue{mibdir}, "IF-MIB-V1SMI.my");
for
            SNMP_MIB::loadmib($argue{mibdir}, "IF-MIB.txt");
subroutine runInterface
line 548
                  if ( $NMIS::systemTable{snmpVer} eq "SNMPv2" ) {
                        # do the SNMP stuffy to get the standard stats
                        (      $ifStats{ifDescr},
                              $ifStats{ifOperStatus},
                              $ifStats{ifAdminStatus},
                              $ifStats{ifInOctets},
                              $ifStats{ifOutOctets}
                        ) = $session->snmpget(
                              'ifDescr'.".$interfaceTable{$interface}{ifIndex}",
                              'ifOperStatus'.".$interfaceTable{$interface}{ifIndex}",
                              'ifAdminStatus'.".$interfaceTable{$interface}{ifIndex}",
                              'ifHCInOctets'.".$interfaceTable{$interface}{ifIndex}",
                              'ifHCOutOctets'.".$interfaceTable{$interface}{ifIndex}"
                        );
                  }
                  else {
                        # do the SNMP stuffy to get the standard stats
                        (      $ifStats{ifDescr},
                              $ifStats{ifOperStatus},
                              $ifStats{ifAdminStatus},
                              $ifStats{ifInOctets},
                              $ifStats{ifOutOctets}
                        ) = $session->snmpget(
                              'ifDescr'.".$interfaceTable{$interface}{ifIndex}",
                              'ifOperStatus'.".$interfaceTable{$interface}{ifIndex}",
                              'ifAdminStatus'.".$interfaceTable{$interface}{ifIndex}",
                              'ifInOctets'.".$interfaceTable{$interface}{ifIndex}",
                              'ifOutOctets'.".$interfaceTable{$interface}{ifIndex}"
                        );
                  }
DONE 

CSS Changes

CSS is now case sensitive, ie normal is not Normal!  Have to change CSS to suit.  Problem with IE6.0

DONE

Hardcoded script name

Yep, that should be $ENV{SCRIPT_NAME} instead of /cgi-nmis/etc...

 
 
Keith
-----Original Message-----
From: Kuehnle, Richard [mailto:rkuehnle@us.tiauto.com]
Sent: 03 October 2001 18:49
To: 'nmis_users@ahoogroups.com'
Subject: [nmis_users] configuration problem is nmiscgi.pl

Keith,

I am sure that you have seen this one already.  I only noticed it because I am not using the default of /cgi-nmis/* for my default cgi path, as you see, I am using /cgi-nmis-2/. 

Rich Kuehnle
Network Manager
TI Group Automotive System, LLC.
810.755.8402

-----Original Message-----
From: root [mailto:root@tiasnadtcap91.na.tiautomotive.com]
Sent: Wednesday, October 03, 2001 9:44 PM
To: rkuehnle@us.tiauto.com
Subject:

*** nmiscgi.pl      Wed Oct  3 19:07:50 2001
--- nmiscgi.pl.bad      Wed Oct  3 21:43:13 2001
***************
*** 259,265 ****
        # If the heading isn't blank then there must be a graph type for it.
        if ( $heading ne "" ) {
              #KS 11 Mar 2001 NEW Embedded graphics, none of this dump in a temp directory anymore
!             $graphLink="<img border=\"0\"
src=\"\/cgi-nmis-2\/nmiscgi.pl?type=drawgraph&node=$node&graph=$graphtype&le
ngth=$graphlength&start=$start_time&end=$end_time&width=660&height=150&inter
face=$interface\">";     
        }
        else {
              $graphLink="Other graph types not yet supported\n";
--- 259,265 ----
        # If the heading isn't blank then there must be a graph type for it.
        if ( $heading ne "" ) {
              #KS 11 Mar 2001 NEW Embedded graphics, none of this dump in a temp directory anymore
!             $graphLink="<img border=\"0\"
src=\"\/cgi-nmis\/nmiscgi.pl?type=drawgraph&node=$node&graph=$graphtype&leng
th=$graphlength&start=$start_time&end=$end_time&width=660&height=150&interfa
ce=$interface\">";     
        }
        else {
              $graphLink="Other graph types not yet supported\n";
DONE With Model support and a new model called Catalyst5000Sup3

Catalyst Support

If you are using 6000 series switches and you click on health statistics
then try to drill into the cpu stats you will get a message:
Graph type not supported yet.

Line 1314 of nmiscgi.pl reads :

<a
href="$ENV{SCRIPT_NAME}?file=$conf&type=graph&graphtype=switch&graphlength=2
days&node=$node">

it should read :

<a
href="$ENV{SCRIPT_NAME}?file=$conf&type=graph&graphtype=cpu&graphlength=2day
s&node=$node">


Also previously several people have asked about Cat 5000 switches and what
snmp they support. We are now using some
Supervisor 3's in the cat 5000's and the mibs are identical to a Cat 6000.
(cpu, memory stats, traffic and topology)

We added this to nmis.pl to support them and NMIS trests them as a 6000 .
(There has to be a better way though.)

in sub getNodeInfo line :
      # Checking on the Model Type

elsif (       ( $NMIS::systemTable{sysDescr} =~ /WS-C5/i ) &&
                        ( $NMIS::systemTable{sysDescr} =~ /6.1/i )
            ){
                  $NMIS::systemTable{nodeModel} = "Catalyst6000";
                  $NMIS::systemTable{nodeType} = "switch";
                  $NMIS::systemTable{netType} = "lan";
                  $NMIS::systemTable{supported} = "true";
            }

DONE

runPing

sub runPing{
        my $node = shift;
        my $retries = 3;
        my $sleep = 15;
        my $i;
        if ($debug) { print returnTime." Starting Pinging with $retries retries.\n"; }
        $pingresult = 0;
        
        # do a ping $retries times.
        for ($i=1;$i<=$retries;++$i) {
                $pingresult = ping(node => $node, timeout => 5, debug => $debug );
                if ( $pingresult != 100 ) { 
                        # Sleep a bit and try a second time.
                        if ($debug>1) { print "Sleeping $sleep seconds\n"; }
                        sleep $sleep;
                } else {
                        $i = $retries;
                }
        }
        
        if ( $pingresult != 100 ) {
                # Device is down
                $pingresult=0; 
                if ($debug) { print returnTime." Pinging Failed $node $NMIS::systemTable{roleType} $NMIS::systemTable{nodeType}\n"; }
                notify(node => $node, role => $NMIS::systemTable{roleType}, type => $NMIS::systemTable{nodeType}, event => "Node Down");
                # Device is down only update the runReachability if its an interface.
                if ( $type eq "interface" ) { &runReachability; }
        } else {
                # Device is UP!
                checkEvent(node => $node, role => $NMIS::systemTable{roleType}, type => $NMIS::systemTable{nodeType}, event => "Node Down");
        }               
}

 

DONE

ping.pm

      my $r;
      my $num_tries = 5;
      $ping = Net::Ping->new("icmp", $timeout, $packetsize);
      while ($r<=$num_tries and $result != 100;) {
            if ( $debug eq "verbose" ) { print returnTime." 
Pinging $node timeout $timeout\n"; }
            if ( $ping->ping($node, $timeout) ) { $result = 100; } 
            else { $result = 0; }
            ++$r
      } 
      $ping->close;

 

DONE

Keith - I am thinking that in sub thresholdResponse, NMIS should keep
the level at 'normal' or 'level=1' for response times, rather than
increment for the core and distribution node types ??

Code is..

      if ( $role eq "core" ) { $level + 2; }
      elsif ( $role eq "distribution" ) { $level + 1; }

I suggest maybe it should be this, same as all the other thresholds.

      if ( $level == 1 ) { $level = 1; }
      elsif ( $role eq "core" ) { $level = $level + 2; }
      elsif ( $role eq "distribution" ) { $level = $level + 1; }

Eric

DONE

Keith, IMHO a minor improvement would be to strip newlines from the
logmessage subroutine so that any (SNMP) error messages are formatted
appropriately in the log...

NMIS.pm

sub logMessage {
      my $string = shift;
      open(DATA, ">>$NMIS::config{nmis_log}") or warn returnTime."
logMessage, Couldn't not open log file $NMIS::config{nmis_log}. $!\n";
      for ($string) { s/\n+//g; }      #remove all embedded newlines
      print DATA &returnDateStamp.",$string\n";
      close(DATA);
} # end logMessage

DONE

I believe I've found a couple of instances of hard-coded url paths in
cgi-bin/nmiscgi.pl and cgi-bin/logs.pl.  Diff for nmiscgi.pl is:

262c262
<               $graphLink="<img border=\"0\" src=\"\/cgi-
nmis\/nmiscgi.pl?type=drawgraph&node=$node&graph=$graphtype&le
ngth=$graphlength&start=$start_time&end=$end_time&width=660&height=150
&interface=$interface\">";
---
>               $graphLink="<img border=\"0\" src=\"$NMIS::config
{'<cgi_url_base>'}\/nmiscgi.pl?type=drawgraph&node=$nod
e&graph=$graphtype&length=$graphlength&start=$start_time&end=$end_time
&width=660&height=150&interface=$interface\">";


and for logs.pl is:

475c475
<                               "<a href=\"/cgi-nmis/nmiscgi.pl?
node=$lnode\"><img alt=\"NMIS\" src=\"$NMIS::config{nmis
_icon}\" border=\"0\"></a>".
---
>                               "<a href=\"$NMIS::config
{'<cgi_url_base>'}/nmiscgi.pl?node=$lnode\"><img alt=\"NMIS\" sr
c=\"$NMIS::config{nmis_icon}\" border=\"0\"></a>".
500c500
<                       "<a href=\"/cgi-nmis/nmiscgi.pl?
node=$lnode\"><img alt=\"NMIS\" src=\"$NMIS::config{nmis_icon}\"
border=\"0\"></a>"
---
>                       "<a href=\"$NMIS::config
{'<cgi_url_base>'}/nmiscgi.pl?node=$lnode\"><img alt=\"NMIS\"
src=\"$NMI
S::config{nmis_icon}\" border=\"0\"></a>"
561c561
<                       "<a href=\"/cgi-nmis/nmiscgi.pl?
node=$lnode\"><img alt=\"NMIS Dash\" src=\"$NMIS::config{nmis_ic
on}\" border=\"0\"></a> "
---
>                       "<a href=\"$NMIS::config
{'<cgi_url_base>'}/nmiscgi.pl?node=$lnode\"><img alt=\"NMIS Dash\"
src=\
"$NMIS::config{nmis_icon}\" border=\"0\"></a> "


DONE with model.csv

getNodeInfo

elsif ( $NMIS::systemTable{sysDescr} =~ /sun/i and $NMIS::systemTable{nodeVendor} ne "Cisco Systems" ) { 
        $NMIS::systemTable{nodeModel} = "SunSolaris"; 
        $NMIS::systemTable{nodeType} = "server"; 
        $NMIS::systemTable{netType} = "lan";
        $NMIS::systemTable{supported} = "true";
}
line 1534 (make older IOS generic)
if (    ( $NMIS::systemTable{nodeVendor} eq "Cisco Systems" ) && 
                                ( $NMIS::systemTable{nodeType} eq "router" ) &&
                                $NMIS::systemTable{sysDescr} !~ /Version 10.3/
                        ) { 
                                $NMIS::systemTable{nodeModel} = "CiscoRouter"; 
                                $NMIS::systemTable{nodeType} = "router"; 
                                $NMIS::systemTable{netType} = "wan";
                                $NMIS::systemTable{supported} = "true";
                        }