Getting started with Functions and probes
Here is a not so quick guide to probes...
The Dude can easily be called "an SNMP poller". SNMP allows the dude to read a value from a device, graph that value and do other things including calling external executeables.
So you will need to SNMPwalk the devices to find OIDs and values you are interested in reporting on. SNMPwalk can be found in the tools menu. This requires that you have a working SNMP setup. If you don't have SNMP working SNMPwalk will fail to read any values. Right click on a device and SNMPwalk it, are values returned, if not go to the settings menu and set up SNMP.
If you know snmp skip to the next paragraph, if you don't know SNMP the basic idea is you put a "string" on the device you are trying to access. The default is "public" for read only. You should access the device and set it so that it uses something else like "snmpreadonly", just anything besides "public". You can also set access rules on most devices to allow read from a specific IP. Put the same string in the SNMP settings in The Dude. With the same string on both the dude and the device the dude should be able to read the Management Information Base (MIB). Each entry is called an Object Identifier (OID).
If you know the exact OID that you are interested in you can place it in the "OID field" of SNMPwalk, click stop then start.
To understand more about OIDs place a portion of the OID you are interested in in the OID field of snmpwalk.
i.e. 18.104.22.168.22.214.171.124 then try 126.96.36.199.2.1.1 then try 188.8.131.52.2.1
Click stop and start each time, notice more and more of the MIB is displayed.
Note: Some counters are defined by a MIB and you might need that specific MIB to see the correct counter.
There is nothing I have found that is a programmers guide for dude code but it is simple enough. "if, and, or, not, rand, rate, round, array_size" are some of the functions provided.
Note: To understand the built in functions open the function panel and read the descriptions.
We will be using IF and OID extensively so I have included their descriptions here.
The description of the "if" function is "first parameter - condition, second - returned if condition yields logical true, third - returned otherwise" So the above if example returns "" when everything is fine and "there is a problem" when the if is false.
The description of the "OID" function is "returns value of given SNMP OID. Only first parameter mandatory. First parameter - oid string, second - cache time - default 5 seconds (5.0), third - negative cache time - default 5 minutes (300.0), forth - IP address (overrides context device), fifth - SNMP profile (overrides context device)"
To use "OID" in a function you only need the first parameter an OID.
Now lets use OID to display up-time on the label of a device.
Right click on the device click appearance, in the label box click the drop down arrow on the right, this will display the default label configured in the main settings under device appearance. Note: You can change the appearance of a single device or all the devices on one map or globally.
up-time: [oid("184.108.40.206.220.127.116.11.0")] <--- place that on the appearance label of a single device.
Now you see the up-time of that device on the device label. The point to doing this is you can verify the value by using it on a label before you use it in a probe.
When The Dude reads an OID from a system The Dude sends a packet asking for that value and then displays the result. So care should be taken how often and how many probes you install. Putting too many probes on a device can run the CPU up. Also the size of data you are requesting from a device can affect performance as well. Don't try something like array_find(oid("1.3.6"),"stuff") that will request almost the entire MIB and if you are doing that every 30 seconds well the system will be spending a lot of time reading the MIB. Create probes that are very narrow focus on just a few OIDs or only one OID.
So now lets look at a building a probe based off an OID and then graph the values. First find an OID you are interested in. In this case the 1 minute Cisco CPU average, oid("18.104.22.168.22.214.171.124.1.57.0"). If you have a Cisco device place that on the appearance of the device or maybe use SNMPwalk and the Internets to find the OIDs that you are interested in. Place the OID on the label and verify that you have a value that you expect.
As long as the value you are interested in appears on the label proceed, if not, figure out what is wrong. Usually it is a typo or the OID needs a .0 at the end or a missing bracket, quote or parentheses.
The correct method to build a probe is to leverage a function so you can perform additional steps. So lets make the function first.
There are only 3 lines in a function, the name which can be called from a probe, the description and the code.
Open the function panel click the plus sign put the following in your new function:
Description: Reads the 1 minute CPU of a Cisco device.
Code: if(array_size(oid_column("126.96.36.199.188.8.131.52.1.57", 10 ,29)), oid("184.108.40.206.220.127.116.11.1.57.0", 10, 29)+1 ,"False")
This code will find the oid_column that has our values then if the value is not null or 0 it returns the value otherwise it returns false. Notice the +1 which makes the probe work if the CPU is reporting 0% utilization (more on that later).
Now that we have a function we are ready to create a probe.
To create a probe, open the probe panel, click +, give it a type of function and leave agent default, that is the server that is going to actually do the request. There are 4 lines we are interested in here, Available, Error, Value and Unit. Put the following into each line, more detail about the entries follows..
Available: Cisco_CPU_a() <> "False"
Error: if(Cisco_CPU_a()<>"False",if(Cisco_CPU_a() < 60, "", concatenate("Warning: high CPU =", Cisco_CPU_a(), "%")), "Failed read")
[Available line] - used to determine if the probe can be installed on the device.
Using the OID directly in the available line can be problematic, the issue is it sometimes will not return false and the probe will install on devices that do not have those OIDs. You can use the OID directly but you will have to test the probe to determine if "available" is resolving correctly.
The description of the error line is "if the return string is empty the service is assumed up". This can be confusing, if you return a value other than "" the probe will be in error state and you will get a notification. Remember the IF function is "if X, return Y, else return Z". When the probe is in error Z will be sent to your notification.
Disclaimer :) These error lines are NOT tested make sure you test them. I highly recommend reading the probe thread in the dude forum to get a feel for the code.
On the error line is where all the work is done and it will be in the format of; if (oid("1.2.3..."), "","there is a problem")
Keep in mind how the error line works and how the if function works to make your probe.
So on the error line if you put;
if(oid("18.104.22.168.22.214.171.124.1.57.0"), "", "Cant read CPU")
in English that would be if the oid exists return "" else return "Can't read CPU".
But there is a problem, what if the CPU average is 0, well your probe will generate a notification and will be false. So use this instead;
if(oid("126.96.36.199.188.8.131.52.1.57.0")<> "", "", "Cant read CPU")
if the oid is not equal to nothing, return "" else return "Can't read CPU"
Notice the <> "" calculation.
To expand on the error line maybe test if CPU is not nothing OR is over 95% utilization with this error line;
if(or(oid("184.108.40.206.220.127.116.11.1.57.0",10,29)<>"", oid("18.104.22.168.22.214.171.124.1.57.0",10,29)<95), "","CPU Problem")
Don't do code like that... We made a function so just call it.
if(Cisco_CPU_a()<>"False",if(Cisco_CPU_a() < 60, "", concatenate("Warning: high CPU =", Cisco_CPU_a(), "%")), "Failed read")
Now it it time to Graph. In the Value line place the OID that you want to graph.
if your OID has a lot of precision you can round it, many things can be fixed here with values that you want to graph.
In the unit put % if it is percent, put ms if it is in milliseconds... All the graphs with the same Unit will be graphed on one graph.
Leave the rate to none, that is how often the probe is executed and the default polling interval will be used. (I think)
You might want to correct for a default design... the default poll times are using 30 second but they are negatively cached for 300 seconds. So a failed probe stays down for 5 minutes. The issue is negative cache time should be set only if there are no retries left, it is being set on the first poll. With negative cache time of 300 a single polling failure will cause a 300 second outage.
Instead you can specify the cache time and negative cache time.
if(oid("126.96.36.199.188.8.131.52.1.57.0",10,29)<> "", "", "Cant read CPU")
if oid stuff, remember it for 10 seconds, If it fails remember that it failed for 29 seconds...
Notice that OIDs and Functions can be used interchangeably in probes.
Math and other logic functions can be performed here as well. Say you are reading a Negative value like RSSI but want the graph to be positive. So you can send a notification if your wireless bridge signal is too weak.
if((oid("184.108.40.206.220.127.116.11.1.57.0",10,29)*-1) <85, "", "Bridge signal low") I don't think this exact line works but hopefully you get the idea.
Looking around you will find built in things like [Device.name] that can be used by your own probes and functions. I don't know where there is a list of them but you can find them in various functions settings and probes.
Edit: removed the incriminating evidence and touched up some SNMP, MIB and OID stuff. Also many edits to facilitate ease of reading. Improved "Available" section. Improved reading a little added more detail on probes and functions. Corrected errors, bold some text. Tried to improve readability. [Both function and probe credit gsandul]