Getting started with Functions and probes

From MikroTik Wiki
Revision as of 05:21, 11 June 2010 by Lebowski (talk | contribs)
Jump to: navigation, search

Here is a not so quick guide to probes...

The Dude can easily be called "an SNMP poller". SNMP allows the dude to read a value from a device, graph that value and do other things including calling external executeables.


So you will need to SNMPwalk the devices to find OIDs and values you are interested in reporting on. SNMPwalk can be found in the tools menu. This requires that you have a working SNMP setup. If you don't have SNMP working SNMPwalk will fail to read any values. Right click on a device and SNMPwalk it, are values returned, if not go to the settings menu and set up SNMP.


If you know snmp skip to the next paragraph, if you don't know SNMP the basic idea is you put a "string" on the device you are trying to access. The default is "public" for read only. You should access the device and set it so that it uses something else like "snmpreadonly", just anything besides "public". You can also set access rules on most devices to allow read from a specific IP. Put the same string in the SNMP settings in The Dude. With the same string on both the dude and the device the dude should be able to read the Management Information Base (MIB). Each entry is called an Object Identifier (OID).


If you know the exact OID that you are interested in you can place it in the "OID field" of SNMPwalk, click stop then start. To understand more about OIDs place a portion of the OID you are interested in in the OID field of snmpwalk. i.e. 1.3.6.1.2.1.1.1 then try 1.3.6.1.2.1.1 then try 1.3.6.1.2.1 Click stop and start each time, notice more and more of the MIB is displayed.

Note: Some counters are defined by a MIB and you might need that specific MIB to see the correct counter.

There is nothing I have found that is a programmers guide for dude code but it is simple enough. "if, and, or, not, rand, rate, round, array_size" are some of the functions provided.

Note: To understand the built in functions open the function panel and read the descriptions.

The function for "OID" has the following description "returns value of given snmp OID. Only first parameter mandatory. First parameter - oid string, second - cache time - default 5 seconds (5.0), third - negative cache time - default 5 minutes (300.0), forth - ip address (overrides context device), fifth - snmp profile (overrides context device)"

So to use "OID" in a function specify only the first parameter. So now I want to display up-time on the label of a device.

[labels] Right click on the device click appearance, in the label box click the drop down arrow on the right, this will display the default label configured in the main settings under device appearance. If I put the following on the appearance of a single device it will only be on that one device. If I put it on the appearance of the main map configuration page under settings it will be placed on every device.

up-time: [oid("1.3.6.1.2.1.1.3.0")] <--- place that on the label of a single device.

Ok so now you see the up-time of that device on the device label. The point to doing this is you can verify the value that is returned by an OID by using it on a label before you use it in a probe.

When you read an OID from a system The Dude sends a packet asking for that value and then displays the result. So care should be taken how often and how many probes you install. Putting too many probes on a device can run the CPU up. Also the size of data you are requesting from a device can affect performance as well. Don't try something like array_find(oid("1.3.6"),"stuff") that will request almost the entire MIB and if you are doing that every 30 seconds well the system is spending a lot of time reading the MIB. Create probes that are very narrow focus on just a few OIDs or only one OID.


So now lets look at a building a probe based off an OID and then graph the values. The 1 minute Cisco CPU average, [oid("1.3.6.1.4.1.9.2.1.57.0")] If you have a Cisco device place that on the appearance of the device or maybe use SNMPwalk and the Internet to find the OIDs that you are interested in. Place the OID on the label and verify that you have a value that you expect.

value: [oid("1.3.6.1.2.1.1.1.0")]

As long as the value you are interested in appears on the label proceed, if not, figure out what is wrong. Usually it is a typo or the OID needs a .0 at the end or a missing bracket, quote or parentheses.

The next step is to create a probe, open the probe panel, click +, give it a type of function and leave agent default, that is the server that is going to actually do the request.

[Available line] - used to determine if the probe can be installed on the device. I believe it is best to use a function to resolve available. Using the OID directly in the available line can be problematic, the issue is it sometimes will not return false and the probe will install on devices that do not have those OIDs. You can use the OID directly but you will have to test the probe to determine if "available" is resolving correctly.

In the available line place the OID, oid("1.3.6.1.4.1.9.2.1.57") You might need to drop the last 0. You won't need brackets in a probe, only on a label. The available line reads the MIB to determine if that OID is actually on the device. I have seen where having the whole OID in the Available line causes the probe to not graph even though there is no error with the probe. BUT BUT There are OIDs where you have to specify the entire OID in the available line, you will have to do some testing.

[Functions] Skip the pain of putting an OID directly in a probe and build a function. Open the function panel click the plus sign put a name on your new function...

Name: Cisco_CPU_a

Function: if(array_size(oid_column("1.3.6.1.4.1.9.2.1.57", 10 ,29)), oid("1.3.6.1.4.1.9.2.1.57.0", 10, 29)+1 ,"False")

Place the name of that function in the available line of your new probe. there you can check for False. Cisco_CPU_a() <> "False"

This should cause your CPU testing probe to only be installed on Cisco devices when using auto discover... Or more specifically devices that respond with a value when SNMP queries the value.

[Error line] Disclaimer :) These error lines are kind of primitive, make sure you test them. I highly recommend reading the probe thread in the dude forum to get a feel for the code.

On the error line is where all the work is done, it will be in the format of - if (oid(""), "","failed") The description of "if" (in the functions panel) is "first parameter - condition, second - returned if condition yields logical true, third - returned otherwise"

The description of the error line is "if the return string is empty the service is assumed up". This can be confusing, if you return a value other than "" the probe will be in error and you will get a notification. Remember the IF function is "if X, return Y, else return Z". When the probe is in error Z will be sent to your notification.

So on the error line put - if(oid("1.3.6.1.4.1.9.2.1.57.0"), "", "Cant read CPU") in English that would be if the oid exists return "" else return "Can't read CPU". But there is a problem, what if the CPU average is 0, well your probe will generate a notification and will be false. So use this instead. if(oid("1.3.6.1.4.1.9.2.1.57.0")<> "", "", "Cant read CPU") in English that would be ... if the oid is not equal to nothing, return true else return "Can't read CPU"

You might want to correct for a default design... the default poll times are using 30 second reads but they are negatively cached for 300 seconds. So a failed probe stays down for 5 minutes. The issue is negative cache time should be set only if there are no retries left, it is being set on the first poll. Which causes a single poll failure to cause the device to show down for that probe for a long time (relative to networks)...

Instead you can specify the cache time and negative cache time. if(oid("1.3.6.1.4.1.9.2.1.57.0",10,29)<> "", "", "Cant read CPU") if oid stuff, remember it for 10 seconds, If it fails remember that it failed for 29 seconds...

Now we want to know if CPU is not nothing OR is over 95% utilization. if(or(oid("1.3.6.1.4.1.9.2.1.57.0",10,29)<>"", oid("1.3.6.1.4.1.9.2.1.57.0",10,29)<95), "","CPU Problem") Don't do code like that...

Make a function to test your oid and create a better error line - [Both function and probe credit gsandul] if(Cisco_CPU_a()<>"False",if(Cisco_CPU_a() < 60, "", concatenate("Warning: high CPU =", Cisco_CPU_a(), "%")), "Failed read")

Notice that OIDs and Functions can be used interchangeably.

Math and other logic functions can be performed here as well. Say you are reading a Negative value like RSSI but want the graph to be positive. So you can send a notification if your wireless bridge signal is too weak. if((oid("1.3.6.1.4.1.9.2.1.57.0",10,29)*-1) <85, "", "Bridge signal low") I don't think this exact line works but hopefully you get the idea.

[Value line] Now it it time to Graph. In the Value line place the OID that you want to graph. oid("1.3.6.1.4.1.9.2.1.57.0",10,29)

if your OID has a lot of precision you can round it, many things can be fixed here with values that you want to graph.

round(oid("1.3.6.1.4.1.9.2.1.57.0",10,29))

[Unit line] In the unit put % if it is percent, put ms if it is in milliseconds... All the graphs with the same Unit will be graphed on one graph.

Leave the rate to none, that is how often the probe is executed and the default polling interval will be used. (I think)

Looking around you will find built in things like [Device.name] that can be used by your own probes and functions. I don't know where there is a list of them but you can find them in various functions settings and probes.

Lebowski

Edit: removed the incriminating evidence and touched up some SNMP, MIB and OID stuff. Also many edits to facilitate ease of reading. Improved "Available" section. Improved reading a little added more detail on probes and functions. Corrected errors, bold some text.