Install Domoticz and Razberry2 on Raspbian 2017-01-11

I just installed domoticz with the following setup:

* Razberry2
* Raspberry Pi 3
* Raspbian Jessie, 2017-01-11

There are a couple of things to keep in mind, for the Razberry2 to work properly, especially with the later jessie releases:

* The serial port has to be turned ON
* Console on the serial port has to be turned OFF
* Bluetooth has to be disabled
* hciuart.service can optionally be disable (to get rid of an error message during boot)

So, the minor issue is that when you use “raspi-config” to turn off the serial console, it does not only turn off the console output on the serial port. It also turns off the serial port, which is not really what we want. That is why most people get a bit confused and fiddle around until they figure out that the “enable_uart=0” entry in /boot/configure.txt should be “enable_uart=1”, and never think of why it happened to be that way.

The “console output” to serial is configured in /boot/cmdline.txt with the entry “console=serial0,115200”, which we need to get rid of, but still make sure that there is no “enable_uart=0” in /boot/config.txt.

Unless you really want to, there is no need to redistribute the GPU RAM mapping.

So, a working setup (as of 2017-01-20) is:

* Create an SD card with 2017-01-11-raspbian-jessie.img
* Before you unmount it from your PC, change the following files on the SD card:

/boot/cmdline.txt

[code]
cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
[/code]

/boot/configure.txt

[code]
enable_uart=1
dtoverlay=pi3-disable-bt
[code]

* Boot the raspberry pi
* Disable the hciuart service

[code]
sudo systemctl stop hciuart
sudo systemctl disable hciuart
[code]

* Ensure you have a /dev/ttyAMA0 file

[code]
ls -la /dev/ttyAMA0
crw-rw—- 1 root dialout 204, 64 Jan 20 08:19 /dev/ttyAMA0
[code]

* Install domoticz as described above by kent

[code]
mkdir ~/domoticz
cd ~/domoticz
wget http://releases.domoticz.com/releases/release/domoticz_linux_armv7l.tgz
tar xvfz domoticz_linux_armv7l.tgz
rm domoticz_linux_armv7l.tgz
sudo cp domoticz.sh /etc/init.d
sudo chmod +x /etc/init.d/domoticz.sh
sudo update-rc.d domoticz.sh defaults
sudo service domoticz.sh start
[code]

* Go to “Setup”->”Hardware”
* Add a OpenZWave USB device with the serial port: /dev/ttyAMA0

Done.

Network interface naming in Ubuntu 16 back to eth0

In Ubuntu 16 the network interface naming is changed, so you won’t have your usual “ethX” naming.

If you, for any reason, would like to revert back to the old behavior, do the following:

  • Add the following line to /etc/default/grub
GRUB_CMDLINE_LINUX="biosdevname=0 net.ifnames=0 biosdevname=0"
  • Run “sudo update-grub”
  •  Reboot

That’s all.

wpa_supplicant in debian/ubuntu/raspbian – getting it to work on your raspberry pi

Hi again,

This time, I will go through a couple of things that I learned today, which has bugged me for quite some time. It also took me forever to figure it out why I could not get it to work the way I wanted to. It also was not made easier with a recent change in the network management of Raspbian Jessie. Proper wifi configuration (on the command line) on the Raspberry Pi 3 with its built in wifi is simple, but not easy. This blog entry is about the scenario whan you do all your configuration on the command line. I have no clue on how to do it through any GUI tools.

What I wanted to do, could in principle be easily solved by the first two examples below, but I really wanted to both understand what I was doing, as well as having a flexible solution where my raspberry pi would also be a bit more mobile. By setting the ssid and wpa2 passwords directly in the /etc/network/interfaces file, I would have to remember and reconfigure the wifi configuration (prefreably before I unplug the device) if I were to move a raspberry pi from my home to my office (since they have different ssid:s and network configurations).

So i will touch these topics, since they are closely related:

  • Proper use of wpa_supplicant.conf in a wifi setup
  • Why on earth the raspberry pi claims a DHCP Ip address on eth0, wlan0, etc even if you configure a static IP address

For the quick solution, there are basically two scenarios:

  • If you want wifi, and are ok with DCHP: stay with the default configuration of /etc/network/interfaces, and just add a “network” section to /etc/wpa_supplicant/wpa_supplicant.conf
  • If you want wifi and a static IP address: The key items to remember is: id_str in /etc/wpa_supplicant/wpa_supplicant.conf and wpa-roam instead of wpa-conf in /etc/network/interfaces.

In the “jessie” release of Raspbian, the /etc/network/interfaces configuration changed from using wpa-roam to using wpa-conf to fit with the change to use a system component called dhcpcd. Basically, without any reconfiguration the dhcpcd daemon will monitor the states of any network interface (also eth0) and request an IP address from your DHCP server (which most likely is your internet router at home). In the bulk of installations, this is perfectly ok, even for advanced users, and for beginners it just works out of the box. The default configuration of the /etc/network/interfaces file will now expect you to use the stansa “iface xxx inet manual” instead of “iface xxx inet dhcp”. The DHCP stansa will still work, but is sort of redundant, since dhcpcd should be taking care of your DHCP requests.

Most how-to’s on the internet, showing you how to set up a static ip address on an interface, will work as expected (almost). You will get a static ip addres configured on your interface, but in the background you will also get a second ip address through the dhcpcd daemon, which you sadly will not see with the ifconfig -a command. You will, though, see it with either “hostname -I” or ip addr show“. One side effect (which is also merely more than an annoyance) is that you get a second route to your default gateway. Since the route you configure with your static ip configuration takes precedence, you will not even notice it, but it is there (which is seen by using “netstat -nr” or “ip route show“.

Wifi configurations that come quickly to a decently round up linux admin (which would not at all use the wpa_supplicant.conf):

  • Putting the ssid and WPA2 password directly in the /etc/network/interfaces, using DHCP
allow-hotplug wlan0
iface wlan0 inet dhcp
wpa-ssid "MY_SSID"
wpa-psk "MY_SECRET_PASSWORD"
  • Using the ssid and WPA password directly in the /etc/network/interfaces, using static IP configuration
allow-hotplug wlan0
iface wlan0 inet static
wpa-ssid "MY_SSID"
wpa-psk "MY_SECRET_PASSWORD"
address 192.168.3.11
netmask 255.255.255.0
network 192.168.3.0
broadcast 192.168.3.255
gateway 192.168.3.1
dns-nameservers 192.168.3.1

Although both these ways of configuring the wifi on my raspberry pi works, they are both very static. As mentioned above, I would have to re-configure the /etc/network/interfaces file before I shut the raspberry pi down if I were to move it to another wifi network. I would like to just shut it down, move it, and start it up with a working configuration.

In the second case I would also get that “shadow” ip address:

pi@raspberrypi:~ $ hostname -I
192.168.3.11 192.168.3.133

The difference between “wpa-conf” and “wpa-roam” in the /etc/network/interfaces file, is that when you use “wpa-conf” you should not be moving your gear around that much. If you want to use DHCP, just set up your network in the /etc/wpa_supplicant/wpa_supplicant.conf or directly in your /etc/network/interfaces file. One network to rule them all. If you move your device, you should be prepared before you shut down, or be prepared for a big hassle when you arrive at your new destination. When using “wpa-roam” you should either accept that you get a shadow IP address, or disable the wlan0 interface in the dhcpcd configuration, or turn off dhcpcd once and for all.

I set up a simple test, which still turned out to be 20 scenarios to go through. Sorry for the crappy table format. The table is mostly for show. Green is where the configuration is ok, yellow is where it looks to work at first glance but there is something lurking in the background.

Test#  Config file:
/etc/network/interfaces
DHCPCD enabled ->
“sudo update-rc.d dhcpd enable; sudo shutdown -r”
DHCPCD disabled ->
“sudo update-rc.d dhcpd enable; sudo shutdown -r now”
1 allow-hotplug wlan0
iface wlan0 inet dhcp
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

works (ping 192.168.3.1)
dhclient -v … is running
works (ping 192.168.3.1)
dhclient -v … is running
2 allow-hotplug wlan0
iface wlan0 inet manual
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

works (ping 192.168.3.1)
no dhcp client running after reboot
does not work, but if you start the dhclient manually, you will get an ip address
3 allow-hotplug wlan0
iface wlan0 inet static
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

invalid config without the “address” variable invalid config without the “address” variable
4 allow-hotplug wlan0
iface wlan0 inet static
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
address 192.168.3.11

Works (ping 192.168.3.1)
ifconfig show only one IP address
“ip addr show” shows 2 ip addresses
“netstat -nr” show a default route (from dhcp)
Works (ping 192.168.3.1), but no default route
ifconfig show only one IP address
“ip addr show” shows 1 ip addresses
“netstat -nr” show no default route (from dhcp)
5 allow-hotplug wlan0
iface wlan0 inet static
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
address 192.168.3.11
# bad gateway address
gateway 192.168.3.99

Works (ping 192.168.3.1), but no default route
ifconfig show only one IP address
“ip addr show” shows 2 ip addresses
“netstat -nr” show 2 default routes (from dhcp), but the bad one has precedence
when removing the bad route “sudo route del -net 0.0.0.0 gw 192.168.3.99” all works
Works (ping 192.168.3.1), but no default route
ifconfig show only one IP address
“ip addr show” shows 1 ip addresses
“netstat -nr” show no default route (from dhcp)
6 allow-hotplug wlan0
iface wlan0 inet static
wpa-conf /etc/wpa_supplicant/wpa_supplicant.confiface stg inet static
address 192.168.3.11
netmask 255.255.255.0
gateway 192.168.3.1
broadcast 192.168.3.255
dns-nameservers 192.168.3.1

does not work, missing required variable: address does not work, missing required variable: address
7 allow-hotplug wlan0
iface wlan0 inet manual
wpa-conf /etc/wpa_supplicant/wpa_supplicant.confiface stg inet static
address 192.168.3.11
netmask 255.255.255.0
gateway 192.168.3.1
broadcast 192.168.3.255
dns-nameservers 192.168.3.1

Works (ping 192.168.3.1)
ifconfig show only one IP address
“ip addr show” shows 1 ip addresse (dhcp)
“netstat -nr” show a default route (from dhcp)
does not work, although “sudo wpa_cli status” show proper connection to the ssid
running “dhclient -v …” gives wlan0 an IP address
8 allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.confiface stg inet static
address 192.168.3.11
netmask 255.255.255.0
gateway 192.168.3.1
broadcast 192.168.3.255
dns-nameservers 192.168.3.1

Works (ping 192.168.3.1)
proper gw, dns, all
wpa_cli status” show proper connection
“ip addr show” shows 2 ip addresses
“netstat -nr” shows 2 routes to the default gateway
Works (ping 192.168.3.1)
proper gw, dns, all
wpa_cli status” show proper connection
“ip addr show” shows only 1 ip address
9 command line:
echo “denyinterfaces wlan0” | sudo tee -a /etc/hdcpcd.conf
sudo service dhcpcd restart/etc/network/interfaces

configuration:
allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.confiface stg inet static
address 192.168.3.11
netmask 255.255.255.0
gateway 192.168.3.1
broadcast 192.168.3.255
dns-nameservers 192.168.3.1

Works as expected. Static IP set.
Ping 192.168.3.1 ok
proper gw, dns, all
wpa_cli status” show proper wifi connection
“ip addr show” shows 1 ip address
“hostname -I” shows 1 ip address
“netstat -nr” shows 1 default route
n/a, will work, see test 8
10 command line:
echo “denyinterfaces wlan0” | sudo tee -a /etc/hdcpcd.conf
sudo service dhcpcd restart/etc/network/interfaces

configuration:
allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.confiface default dhcp

Works as expected. DHCP address.
“wpa_cli status” show proper wifi connection
dhclient -v … is running
Ping 192.168.3.1 ok
proper gw, dns, all
wpa_cli status” show proper wifi connection
“ip addr show” shows 1 ip address
“hostname -I” shows 1 ip address
“netstat -nr” shows 1 default route
n/a, will work, see test 8

You can ignore your wlan0 interface by doing the following (which will survive a reboot).

echo "denyinterfaces wlan0" | sudo tee -a /etc/dhcpcd.conf
sudo service dhcpcd restart

In the end, the choice is yours. If you are decently experienced and you disable dhcpcd, you will not lose much at all. I would even recommend it.

  • Disable dhcpcd
sudo update-rc.d dhcpcd disable
sudo service dhcpcd stop
# sudo shutdown -r now
  • /etc/wpa_supplicant/wpa_supplicant.conf
country=GB
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
  ssid="AT_HOME"
  psk="SUPERSECRET"
  id_str="HOME_SSID"
  priority=15
}

network={
  ssid="OFFICE_SSID"
  psk="SUPERSECRET"
  id_str="OFFICE"
  priority=14
}
  • /etc/network/interfaces
auto lo
iface lo inet loopback

allow-hotplug wlan0
iface wlan0 inet manual
    wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf

iface default inet dhcp

iface HOME inet static
  address 192.168.2.11
  netmask 255.255.255.0
  gateway 192.168.2.1
  broadcast 192.168.2.255
  dns-nameservers 192.168.2.1

iface OFFICE inet static
  address 192.168.3.11
  netmask 255.255.255.0
  gateway 192.168.3.1
  broadcast 192.168.3.255
  dns-nameservers 192.168.3.1

Useful commands:

  • wpa_cli status
  • sudo iw wlan0 scan | grep -i ssid
  • hostname -I
  • iwconfig wlan0
  • ip addr show
  • ip route show

References:

  • https://wiki.debian.org/WPA
  • http://manual.aptosid.com/en/inet-setup-en.htm
  • https://www.raspberrypi.org/forums/viewtopic.php?t=110606

Cut-Through Forwarding necessary on dd-wrt and tomato for throughput higer than 300Mbit

Short: dd-wrt does not yet support Cut-Through Forwarding (CTF). Tomato does. If your internet connection is faster than 250Mbit, use Tomato and CTF.

Note that in the release of Advanced Tomato that I tested (2.7 128 release) CTF will break NAT Loopback. If you are not sure what that is, have a look here.

It all started when I decided to upgrade my router at home. Since my setup is a bit different than most other people’s, I wanted a couple of routers that I could tweak in any way I liked. Therefore I wanted to find a fast router, which also work with dd-wrt or Tomato. I want to connect my home in Zürich with my vacation home in Sweden through openvpn as two subnets, so that I can stream music and video from my NAS in Switzerland to my TV in Sweden. I also have a couple of Raspberry Pi’s in the basement of my vacation home in Sweden, which I want to access directly over my home network.

ctf for dd-wrt - conceptual view

 

If you don’t see the beauty of this already, this blog post is not for you. =)

First I opted for the Linksys WRT1900AC, which I thought was intended to be the new open flagship for home grown router firmware. I was wrong. I gave up on the dd-wrt installation on this router, but kept on looking around for something else. Not that I couldn’t get my environment running the way I wanted with the Linksys. It is not a bad router, it is actually quite good. It just did not meet my expectations regarding dd-wrt.

After talking to a couple of friends of mine, I decided to give the Netgear R8000 a go. It had dd-wrt support (according to some websites), and looked impressive by the specs. I was too early. The first release of dd-wrt on the Netgear R8000 had just been released, and it was far from stable. Also, in my setup (where I put it between my computer and my internal network), I did not at all get the throughput I had expected. When copying a large file from my NAS, I got only ca 30 MB per second transfer rate, even when I was physically connected per ethernet cable. I reverted to the default firmware from Netgear, tested again, and… I could transfer the same file at almost 1Gbit per second.

I stumbled across a forum post, where someone complained about his LAN-WAN being throttled by his router at 300Mbit, in a setup where he had 1Gbit connection to the internet over a Netgear R8000 running dd-wrt (the forum where I read it have upgraded their software and all old posts seems to be gone, so I cannot reference this). At first, I thought that it was something I would not have to worry about until I get a much faster connection than I have now. But then, I just could not let go of it I thought that it was due to the new firmware; that somehow the drivers were not yet complete, or whatever.

So I did what any person in my situation would do. I solved the problem by throwing money at it. I bought a Netgear R7000. I was wrong. Again. The R7000 had the exact same issue. Since the highest priority was to replace an old router in Sweden and it was time to go there, I just had no time to investigate further. I accepted the fact, and brought the R7000 with me and set it up with dd-wrt (v24-sp2 (03/31/15) std (SVN revision 26622)).

When I got home I looked around even more, talked to some people about my experience with dd-wrt. Two of my friends recommended the ASUS RT-AC68U, and to put Tomato on it. Since a customer of mine needed a new router for his home office, I decided to order two.

2-asus-1-netgear

With this extensive collection of wifi-routers it was time to get going. First I had some troubles getting the Tomato firmware onto the router, which I could resolve by using tftp[see 1]. In the end I opted for the AdvancedTomato firmware, mainly since I found the look and feel of the GUI appealing. I ended up with the 2.7 128 release. First I could not get the router to work at all. No web GUI, even though I could ping the router. Just by pure luck I figured out that I could connect to it per ssh. Then I found some weird entires related to httpd in /var/log/messages. In depair I tried to erase NVRAM and reboot, which worked for me, by issuing the command “nvram erase” from the command line when connected to the router.

  • ssh root@192.168.1.1 (password: admin)
  • nvram erase
  • reboot

Not that any of this made my life any better. Close to 1Gbit transfer speed when running the default stock firmware from ASUS, and I had really poor performance when running the AdvancedTomato firmware. The average was around 30-35MByte per second with peaks a bit higher. Still not happy.

ctf for tomato - throughput with ctf disabled

A very old forum discussion got me on track to figure this out, http://www.dd-wrt.com/phpBB2/viewtopic.php?p=544534. At least it wouldn’t hurt to try it. I turned on Cut-Through Forwarding, and volia! Problem solved!

ctf for tomato - gui

And now, when I copy files from my NAS, I get the expected throughput.

ctf for tomato - throughput with ctf enabled

There is of course a long discussion on firmware to use. There are differences between dd-wrt, Tomato, AdvancedTomato, and other open firmwares. As of now, dd-wrt does not offer the Cut-Through forwarding, crippling the throughput somewhat for people with very fast internet connections. Tomato and AdvancedTomato are built around closed-source Broadcom drivers, hence not as open as dd-wrt, but offer the CTF functionality. In the end it is your choice, and at the moment not too many people actually have an internet connection that is faster than 250Mbit.

Tested routers:

  • Linksys WRT1900AC
  • Netgear R7000
  • Netgear R8000
  • ASUS RT-AC68U

References:

  1. Uploading firmware to ASUS RT-AC86U from a Mac OSX, https://chrishardie.com/2013/02/asus-router-firmware-windows-mac-linux/
  2. AdvancedTomato, https://advancedtomato.com
  3. The forum entry that got me on the right track. http://www.dd-wrt.com/phpBB2/viewtopic.php?p=544534
  4. http://serverfault.com/questions/55611/loopback-to-forwarded-public-ip-address-from-local-network-hairpin-nat

 

Meta monitoring revisited – EC2 meta monitoring

This post is a revisit of a topic I have already blogged about, Who monitors the monitor?

Meta monitoring -> frequently compare an inventory with your monitoring configuration.

In my terminology I call this meta monitoring since it is not actively monitoring a business function or the functionality of an infrastructure item. By using meta monitoring I am making myself aware of the completeness of my monitoring. Meta monitoring should give an answer to the question: Is there is something I am missing?

Well, as most of you will say; we always miss something. I agree. But with meta monitoring, we will aim to limit the unknown to a bare minimum. If you don’t do it, your configuration will be hopelessly out of date within days.

eyes-in-the-dark

My take on meta monitoring is to make a list of something that could be monitored, filter away known exceptions, then compare it with the monitoring system configuration.

There are plenty of tools on the market that will help you make inventories of more or less every aspect of your infrastructure. They are usually very expensive. And, honestly, to do this yourself is not even hard.

  • Get a list of items from your current infrastructure (may it be vCenter or Amazon Cloud)
  • Remove items that you know should not be monitored
  • Compare this list with your monitoring system.

In an OP5 environment, you can even do this in a “one-liner”, for example:

1
2
3
root@op5-system:~# echo mysql-v001fry magnus monitor synology03 | sed -e 's/ /\n/g' | grep -wv "$(printf "GET hosts\nColumns:name\n" | unixcat /opt/monitor/var/rw/live)" | xargs -I"{}" echo "Host \"{}\" is not configured in OP5"
Host "magnus" is not configured in OP5
Host "synology03" is not configured in OP5

 

Now, this one-liner is listing all configured hosts in your monitoring environment and using that list to filter away known (monitored) hosts from the list you echo, so it is probably not the most effective way to do it, but it works. See it as an example of how easy it can be.

Now, when it comes to gathering the complete (or partial) inventory from your infrastructure is also not that hard. In it’s simplest form you just copy/paste from your favorite excel sheet, or you request it from your infrastructure through an API. Amazon EC2 has a very powerful API. Just create a read-only user with access to your environment, and use a simple ruby script to get the names from EC2. Note that you need to point out which region you would like to list, and optionally add your proxy URI to the script below.

Example:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/ruby
%w[ rubygems aws-sdk ].each { |f| require f }

aws_api = AWS::EC2.new(:access_key_id => 'YOUR_ACCESS_KEY', :secret_access_key => 'YOUR_SECRET_KEY', :region=>'us-west-2' , :proxy_uri => '')

aws_api.client.describe_instances[:reservation_set].each do | instance |
  instance[:instances_set][0][:tag_set].each do | tag |
    puts tag[:value] if tag[:key] == 'Name'
  end
end

Running this script will give you a list of your instances in Amazon EC2. I called this script “listEC2Instances.minimal.rb” and put it together with my one-liner:

1
2
3
4
5
6
root@op5-system:/opt/kmg/ec2/bin# ./listEC2Instances.minimal.rb
vpn-v001ec2
kmg-test002ec2

root@op5-system:/opt/kmg/ec2/bin# ./listEC2Instances.minimal.rb | sed -e 's/ /\n/g' | grep -wv "$(printf "GET hosts\nColumns:name\n" | unixcat /opt/monitor/var/rw/live)" | xargs -I"{}" echo "Host \"{}\" is not configured in OP5"
Host "vpn-v001ec2" is not configured in OP5

 

Now, you know which hosts in your Amazon Cloud that are not monitored. Do something about it! =)

 

Avoid passwords, and use /bin/false as login shell

On computers that are exposed to the internet, hackers are constantly trying to guess passwords for common usernames. The easiest way to do this for a hacker is to first figure out what type of system you’ve got online, then go through a list of known usernames and the corresponding default passwords. That is the basics, though, and the techniques used are getting more and more refined.

Don’t use passwords, use ssh keys. At least until the day when it is obvious that the private/public key scheme is broken, but we are not there yet.

Owned

I remember an incident some years ago, when I had a fileshare on my test box at home, where I wanted to share data to my X-box (yes, many years ago) and I set up a Samba-user, which required me to also create a linux user. Out of convenience I did so, created a user “xbox” with the fancy password “xbox”. I can’t say that was the most clever thing I had done in a long while.

It took only a couple of days before someone in Romania had passed by my box and guessed the right password. I figured it out by pure chance, disabled the account and reinstalled my system. This happens all the time, just look in your /var/log/auth.log file, and you will see quite a lot of break-in attempts.

Example (on my Ubuntu box):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MYUSER@MYHOSTNAME:/var/log$ sudo tail -10000 /var/log/auth.log | grep -E "sales|php|mysql" | grep "Failed"
Jan 30 15:35:28 MYHOSTNAME sshd[22370]: Failed password for invalid user mysql from 203.130.49.10 port 38333 ssh2
Jan 30 15:35:34 MYHOSTNAME sshd[22381]: Failed password for invalid user mysql from 203.130.49.10 port 39204 ssh2
Jan 30 15:41:27 MYHOSTNAME sshd[23533]: Failed password for invalid user mysql from 203.130.49.10 port 48675 ssh2
Jan 30 15:41:35 MYHOSTNAME sshd[23535]: Failed password for invalid user mysql from 203.130.49.10 port 49507 ssh2
Jan 30 15:41:42 MYHOSTNAME sshd[23539]: Failed password for invalid user mysql from 203.130.49.10 port 50411 ssh2
Jan 30 16:35:09 MYHOSTNAME sshd[3051]: Failed password for invalid user sales from 203.130.49.10 port 54920 ssh2
Jan 30 16:35:21 MYHOSTNAME sshd[3053]: Failed password for invalid user sales from 203.130.49.10 port 56226 ssh2
Jan 30 16:35:28 MYHOSTNAME sshd[3055]: Failed password for invalid user sales from 203.130.49.10 port 57014 ssh2
Jan 30 17:27:43 MYHOSTNAME sshd[14408]: Failed password for invalid user mysql from 203.130.49.10 port 53484 ssh2
Jan 30 17:27:51 MYHOSTNAME sshd[14410]: Failed password for invalid user sales from 203.130.49.10 port 54622 ssh2
Jan 30 17:28:04 MYHOSTNAME sshd[14412]: Failed password for invalid user sales from 203.130.49.10 port 55982 ssh2
Jan 30 17:43:41 MYHOSTNAME sshd[17905]: Failed password for invalid user php from 203.130.49.10 port 43304 ssh2
Jan 30 17:43:56 MYHOSTNAME sshd[17909]: Failed password for invalid user phpbb from 203.130.49.10 port 44942 ssh2
Jan 30 17:44:09 MYHOSTNAME sshd[17921]: Failed password for invalid user phpBB from 203.130.49.10 port 46286 ssh2
Jan 30 17:44:22 MYHOSTNAME sshd[17932]: Failed password for invalid user phpbb2 from 203.130.49.10 port 47606 ssh2

This is just some random user names I picked from my head. In your log file you will have a good list of attempts (which could be used to discuss the name of your next child if you lack the imagination to come up with your own). In my current log file I could count over 4000 different user names used by someone to try and log in.

1
2
MYUSER@MYHOSTNAME:/var/log$ sudo cat auth.log | cut -d" " -f 11 | sort -u |wc -l
4114

So, what could I have done? Well, using a harder to guess password is one thing, but far from optimal. Nobody really needs a password. Not even a hard to guess password. I usually say “use the key, Luke, use the key”; meaning ssh-keys. As long as you keep your keys safe, you are the only one who can log in (until someone figures out a way to hack that too). But that is not always enough.

If your name is apache, of course you should be allowed to use that as a username.

I mean, a user that is called “oracle”, “apache”, “mysql”, or whatever user name that does not belong to a person, should never even be able to log in. There are also rare cases where you might actually want to have a password set.

The secret here is to lock the user by giving it “/bin/false” as the login shell. In your /etc/password file you have plenty of examples:

1
2
3
4
5
MYUSER@MYHOSTNAME:/var/log$ cat /etc/passwd | grep false
syslog:x:101:103::/home/syslog:/bin/false
messagebus:x:102:105::/var/run/dbus:/bin/false
whoopsie:x:103:106::/nonexistent:/bin/false
...

Making sure that your non-human users has “/bin/false” as their login shell will ensure that nobody can log into your system per ssh. It will also disable the possibility to do a simple “sudo su – USERNAME”.

– But! (someone will say) What if I really need to switch to that user and do some interactive work?

Well, dont. You can do more or less anything with a proper sudo setup. Given an environment where you have locked in a user called “dummy”:

1
2
MYUSER@MYHOSTNAME:/tmp$ grep dummy /etc/passwd
dummy:x:10001:10001:dummy,,,:/:/bin/false

“sudo -u dummy vi your.file” construct will allow you to edit a file as the user “dummy”. And… if you really, really need to work interactively as that user, you can use…

1
sudo su -s /bin/bash - dummy

The “-s” parameter for the “su” command will start that command as the interactive shell.

For those of you with some mileage under your belt: cron

Cron on a decently modern box _will_ run your crontab even if you don’t have a proper login shell.

1
2
3
4
5
6
MYUSER@MYHOSTNAME:/var/log $grep dummy /etc/passwd
dummy:x:10001:10001:dummy,,,:/:/bin/false
MYUSER@MYHOSTNAME:/var/log $sudo crontab -u dummy -l | grep -v "#"
*/1 * * * * echo test >> /tmp/test.file 2>&1
MYUSER@MYHOSTNAME:/var/log $ls -la /tmp/test.file
-rw-r--r-- 1 dummy 10001 10 Jan 30 18:17 /tmp/test.file

There is not much more to it. Lock your anonymous users, and enjoy your increased security.

Using sudo and redirecting output to a file

Sudo is a cool tool to have under your belt.

If you, as a UNIX/Linux sysadmin does not know how to use sudo, get a new job.

I have heard excuses for not using sudo so many times, that if I was given a penny every time I would at least be able to buy myself a coffee. At Starbucks. This behavior usually ends up with an anonymous user account with a very poor password, often shared between way too many people. I’ve heard:

  • It takes too long time to write all that crap. -> No it doesn’t, unless you lack 9 fingers.
  • Sudo does not allow me to do what I want to do. -> Yes it does, you just have not learned how to, yet.

And lastly:

  • It is impossible to redirect output into a file when using sudo.

sudo

Almost… I have to admit, it took me some time to figure that one out. And until then, I took it as a good excuse for doing a “sudo su -“. Here is a simple example: you would like to transform the output from one command and store it in a file, as root, in a directory where you have no permission to write.

Example:

1
2
USERNAME@MYHOSTNAME:/var $sudo cat /etc/shadow | cut -d":" -f 1 > /var/test.file
-bash: /var/test.file: Permission denied

This is because both the redirection (the | and > signs) will be parsed by _your_ shell, which is _you_. Both the “cut” command, and the redirection into /var/test.file will be executed as yourself, not root.

The solution is simple. And it is easy. Just run the whole thing using “sh -c”

1
2
3
USERNAME@MYHOSTNAME:/var $sudo sh -c "cat /etc/shadow | cut -d":" -f 1 > /var/test.file"
USERNAME@MYHOSTNAME:/var $ls -la /var/test.file
-rw-r--r-- 1 root root 156 Jan 30 18:37 /var/test.file

One excuse less for not using sudo.  Rock on!

One liner to kill Oracle expdp job

This is a very obscure one liner to kill a running an Oracle expdp job.

Background:

  • The expdp/impdp is the Oracle data pump tools to export and import data
  • Killing the expdp process is not enough to stop an export job
  • To kill an export, you will have to “expdp attach=JOB_NAME” and issue a “KILL_JOB” command

Pre requisites for my one liner:

  • You have a log file in the current directory called exp_something.log (by using the LOGFILE=exp_something.log in your parameter file)

Here comes the one-liner which works in ksh:

1
expdp attach=$(grep Starting $(ls -tr exp*.log | tail -1) | cut -d":" -f 1 | cut -d"." -f 2 | sed -e 's/"//g')<$(printf "/ as sysdba\nKILL_JOB\nyes\n">/tmp/someFile$(echo /tmp/someFile)

That’s it! If the expdp job exits with an exit code > 0 (echo $?), it failed. Just run the one liner again. The output of the one liner will hang for some seconds at the question “are you really really sure: [yes]/no:”, which is normal. Just wait.

Over and out!

Hotplugging more than 15 scsi devices in Ubuntu

Hi all,

Today I ran into something that took me a bit to figure out. I could not add new disks to a virtual machine running Ubuntu.

The basic scenario:

  • VMWare ESXi 5.1
  • Ubuntu 12.04.3 LTS
  • 15 virtual harddrives already configured

I had to add more space to a filesystem without rebooting the server, which normally is very simple. This is what I normally do:

  1. Add a virtual disk in vSphere Client
  2. “rescan-scsi-bus -w -c” on the guest system (ubuntu)
  3. fdisk -> create partition and set device id to 8e
  4. pvcreate vgName /dev/sdX1
  5. vgextend
  6. lvextend -L +100G /dev/vgName/lvName
  7. sudo fsadm -v resize /dev/vgName/lvName

I tried to do this, but no matter what, I just could not get my Ubuntu box to see the new disks (I added a few).

Now over to the solution: After quite some research, I figured out what I had to do, but first some theory.

VMWare: When you add your 16th scsi device, VMWare will not only add the disk you ask it to add, but allso add a new scsi controller, and add the new disk to this controller. This since you can only have 16 devices on the scsi2 bus (duh). The new disk will be “SCSI (1:0)”. If you just want to test this, you can add a new disk to your VM and in the last section of the wizard, just assign it to (1:0). Before you apply this to your VM, you will see that you will not only add a new disk; you will also add a new scsi controller.

Ubuntu: If you just try and run rescan-scsi-bus on your Ubuntu system, it will happily do so, but it will not be able to see your new disk; since it does not know of your new scsi controller yet. You will notice that, since the adapters are listed in the beginning of the output:

1
2
3
4
5
6
maglub@nfs-v001alt:~$ sudo rescan-scsi-bus -c -w
/sbin/rescan-scsi-bus: line 592: [: 1.03: integer expression expected
Host adapter 0 (ata_piix) found.
Host adapter 1 (ata_piix) found.
Host adapter 2 (mptspi) found.
...

So, the million dollar question is: How do you add this adapter without rebooting?

First, check the PCI bus, just to see that you don’t have the new scsi controller listed:

1
2
3
4
maglub@nfs-v001alt:~$ lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)
...
00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)

No trace of the new controller. This is, because you will need to rescan the PCI bus as well. To do this, you will need to do the following (as root):

1
echo "1" > /sys/bus/pci/rescan

If you check your PCI bus now, you will see the new scsi-controller:

1
2
3
4
5
6
7
root@nfs-v001alt:~# lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)
...
00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
...
02:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)
02:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01)

This will also add your new disks, but if you are curious, you can scan your scsi bus for new disks to see what happens:

1
2
3
4
5
6
7
root@nfs-v001alt:~# rescan-scsi-bus -w -c
/sbin/rescan-scsi-bus: line 592: [: 1.03: integer expression expected
Host adapter 0 (ata_piix) found.
Host adapter 1 (ata_piix) found.
Host adapter 2 (mptspi) found.
Host adapter 3 (mptspi) found.
...

The rescan-scsi-bus command can see your new scsi adapter! Voila!

Who monitors the monitor?

How do you know that all virtual machines (VM’s) in a VMWare environment is actually monitored in your monitoring system (read Nagios, Op5)?

The follow-up question is: is this really important? The answer is: yes. It is important. Of course there might be virtual machines in your environment that you really don’t care about. But there will be a day, when you realize that you wish that you had monitored that one machine in your environment, that just was not.

There are only two ways to know:

  1. Your deployment system/process/whatever of VM’s also adds the new virtual machine to your monitoring system
  2. You make a list of existing virtual machines and compare it to what is monitored

You decide what is easier for you. In most environments (1) just doesn’t happen. So, what if you are left with (2)? How do you do this automatically? In principle, you are not alone. (2) is common, but is a tedious job. I call (2) “meta monitoring”. The monitoring of the monitoring. In my environment I have a set of monitoring checks that are telling me if I am doing my job properly. This is one of them.

Most people are aware that they actually have a handful of virtual machines in their environment that they really don’t want to monitor. You might want to use a temporary VM for a test, a development system under construction. Whatever your reason might be, you might have a valid reason not to monitor a system. The common denominator is usually that you _know_ that you don’t want to monitor it.

The following approach will give you a way of telling what is not monitored in your virtual environment, as well as allowing you to have the occasional test system running in your environment. What I advocate, is an approach which is illegal in business, called “negative confirmation”. Basically, you should give an explanation, and make an active decision if you do not want a virtual machine to be monitored. What I usually do to accomplish this, is to add a custom attribute to the virtual machines in vCenter called noMonitoring, where one should write a note if monitoring is not desired. If this field is empty, it implies that the system should be monitored.

Sounds simple, no?

Given environment:

  • VMWare hypervisor (formerly known as ESXi)
  • VMWare Virtual Center
  • A read-only user in vCenter, in my case “op5”
  • OP5, version 6.0.7 or higher
  • VMware vSphere SDK for Perl installed on your OP5 installation

In vCenter, set up a custom field called noMonitoring (Management->User defined Attributes->Add (Global attribute). I usually also want to keep track of ownership, so I have added two more custom fields; ownerCustomer and ownerTech, so that I know which customer a VM belongs to, and who is responsible for the VM from a technical point of view.

This way, you can use this field to type in information if you don’t want a virtual machine to be monitored. My recommendation is that you use this field such, that if you don’t write anything into it when you have created a virtual machine, you intend for it to be monitored. If you write anything into it, just one character or more, you mean for the virtual machine not to be monitored. The best way to keep track of the whole thing, is to write a short description on why you don’t want the system to be monitored. For example: “2013-05-20, LUM, demo system” or similar. This way other people will know why you don’t want the system to be monitored.

But, then, how do we get this information into OP5?

I have two scripts to do this:

  • getVMsAndCustomAttributes.pl
  • check_metaMonitoring_vmWare

The perl script connects to a vCenter and reads out all virtual machines and a handful of attributes (of which noMonitoring is one of them). The attributes are separated by a semicolon “;”.

Example:

root@op5-v005fry:/opt/plugins/kmg# ./getVMsAndCustomAttributes.pl --server=192.168.2.30 --username=op5 --password=op5
#vm;onHost;dataStore;noMonitoring;ownerCustomer;ownerTech
kmg-guran-0001;192.168.2.204;NFSProd;;;;
kmg-op5-0001;192.168.2.204;NFSProd,Synology02;2013-05-12, LUM, To be decommissioned;;;
kmg-zenLoadbalancer-0001;192.168.2.204;NFSDev,Synology02;2013-02-05, LUM, To be decommissioned;;;
kmg-web-0001;192.168.2.204;NFSDev,Synology02;;;;
kmg-web-0002;192.168.2.204;NFSDev,Synology02;;;;
kmg-jumphost-0002;192.168.2.204;NFSProd,Synology02;;asdf;;
kmg-sandbox-0003;192.168.2.204;NFSProd,Synology02;;;;
kmg-buildbox-0001;192.168.2.204;NFSDev,Synology02;LUM, To be decommissioned;;;
kmg-plex-0001;192.168.2.204;NFSProd,Synology02;;;;
kmg-winxp-0001;192.168.2.204;NFSDev;2012-01-12, Windows client, no monitoring;;;
kmg-op5-0004;192.168.2.204;NFSDev;2013-04-20, Quarantin, to be decommissioned when v6 works well in prod.;;;
kmg-sandbox-0005;192.168.2.204;NFSDev,Synology02;2012-10-01, LUM, To be decommissioned;;;
jira-v001fry;192.168.2.204;NFSDev,Synology02;;;;
proxy-v001fry;192.168.2.204;NFSProd,Synology02;2013-03-20, LUM, Under construction 4;;;
kmg-pfsense-0001;192.168.2.204;datastore1,Synology02;2012-12-20, Quarantin;;;
op5-v005fry;192.168.2.204;NFSProd;;;;
backup-v001fry;192.168.2.204;NFSProd,Synology02;2013-05-02, LUM, Under construction;Maggan;;
guran-v001fry;192.168.2.204;NFSProd,Synology02;2013-05-10, LUM, New server, Under construction 2;;;
vcenter-v001fry;192.168.2.204;NFSProd;;;;

Field number 4 represents my custom field “noMonitoring”.

noMonitoring field

In principle, I just have to check field number 4 of the output, and print field number 1 to get a decent list to check against my monitoring system.

root@op5-v005fry:/opt/plugins/kmg# ./getVMsAndCustomAttributes.pl --server=192.168.2.30 --username=op5 --password=op5 | awk -F";" ' $4 == "" {print $1}'
kmg-guran-0001
kmg-web-0001
kmg-web-0002
kmg-jumphost-0002
kmg-sandbox-0003
kmg-plex-0001
jira-v001fry
op5-v005fry
vcenter-v001fry

To check this against my OP5 configuration, I just have to ask my monitoring system if the host is monitored. Had I used an older version of OP5, I would have done this by either using grep on /opt/monitor/etc/hosts (grep host_name /opt/monitor/etc/hosts.cfg | grep kmg-guran-0001 | wc -l) or connecting to the merlin database and issuing a clever sql query (no example).

But now, we are on version 6, where Op5 are nowadays using MK Livestatus, which in itself deserves some attention. Long story short; instead of parsing text files or updating a database, MK Livestatus is used to hook into Nagios to keep track of the configuration and the status of the system. The benefit: less disk IO. Asking your monitoring installation about more or less anythings is now very easy, communicating with MK Livestatus over a unix socket. In this case, I will make an extremely simple query, give me the host name of a configured host, that has the host name xxyy. For more inspirational references, look here: http://mathias-kettner.de/checkmk_livestatus.html.

Example:

root@op5-v005fry:/opt/plugins/kmg# printf "GET hosts\nColumns: host_name host_address\nFilter: host_name = kmg-guran-0001\n" | unixcat /opt/monitor/var/rw/live
kmg-guran-0001;192.168.2.37

We put this together into a check_script, check_metaMonitoring_vmWare, which I use to keep track of unmonitored systems.

root@op5-v005fry:/opt/plugins/kmg# ./check_metaMonitoring_vmWare  2>/dev/null
WARN - H: 19 M: 7 !M: 12 ok!M: 8 nok!M: 4
Hosts:  kmg-plex-0001 jira-v001fry op5-v005fry vcenter-v001fry
| hosts=19 monitored=7 notMonitored=12 okNotMonitored=8 nokNotMonitored=4

I have added this as a service check to my installation (just add the command to checkcommands.cfg and add a service check to your vcenter host in your monitoring), and can see the following:

meta monitoring - service check

In the output you can see the following:

  • H: 19 -> VMs in this installation
  • M: 8 -> Number of monitored VM’s
  • !M: 11 -> Number of VM’s that are not monitored
  • ok!M: 8 -> Non monitored VM’s that are ok (to not be monitored)
  • nok!M: 3 -> Not OK -> This is what we try and catch, VM’s that should be monitored.

What can you do to remedy this? You have two possibilities:

  1. Add the VM’s to your monitoring system
  2. Add a comment in the “noMonitoring” fields in your vCenter

Simple as that. I guess I have to add a few VM’s to my monitoring now.

Here, the sweets:

[1] getVMsAndCustomAttributes.pl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#!/usr/bin/perl
## -----------------------------------------------
# Script: getVMsAndCustomAtributes
# Author: magnus.luebeck@kmggroup.ch
# Date: 2013-05-20
#
# Description: This script will output a semicolon ";" separated
#              of VMs from a vCenter, together with the custom
#              attributes:
#                - noMonitoring - Empty field = VM should be monitored
#                - noMonitoring - Non empty = good excuse for not monitoring
#                - ownerCustomer
#                - ownerTech
#
# Usage: ./getVMsAndCustomAttributes.pl --server=192.168.2.30 --username=USERNAME --password=PASSWORD
## Script inspired by/to large extent copied from Reuben Stump
## (rstump@vmware.com | http://www.virtuin.com)
## http://www.virtuin.com/2012/11/best-practices-for-faster-vsphere-sdk.html
## http://communities.vmware.com/docs/DOC-10220 /
## http://communities.vmware.com/servlet/JiveServlet/download/10220-4-24610/queryVMCustomField.pl
## and http://communities.vmware.com/message/519501
## -----------------------------------------------

use strict;
use warnings;
 
use VMware::VIRuntime;
   
Opts::parse();
Opts::validate();
   
Util::connect();
     
 # Fetch all VirtualMachines from SDK, limiting the property set
 my $vm_views = Vim::find_entity_views(view_type => "VirtualMachine",
 properties => ['name', 'runtime.host', 'datastore', 'summary' ]) ||
                 die "Failed to get VirtualMachines: $!";
                 
# Fetch all HostSystems from SDK, limiting the property set
my $host_views = Vim::find_entity_views(view_type => "HostSystem",
                                        properties => ['name']) ||
                                        die "Failed to get HostSystems: $!";

# Fetch all Datastores from SDK, limiting the property set
my $datastore_views = Vim::find_entity_views(view_type => "Datastore",
                                        properties => ['name']) ||
                                        die "Failed to get Datastores: $!";

# Create hash tables with key = entity.mo_ref.value          
my %host_map = map { $_->get_property('mo_ref.value') => $_ } @{ $host_views || [] };
my %ds_map = map { $_->get_property('mo_ref.value') => $_ } @{ $datastore_views || [] };


#--- The correlation between custom field ID and it's name is only found in
#--- the customFields manager
my $sc = Vim::get_service_content();
my $customFieldsMgr = Vim::get_view( mo_ref => $sc->customFieldsManager );

# Create hash table with key = keyName => value
my %keys_map = map { $_->name => $_->key } @{ $customFieldsMgr->field || [] };

# Enumerate VirtualMachines
printf ("#vm;onHost;dataStore;noMonitoring;ownerCustomer;ownerTech\n");
foreach my $vm ( @{$vm_views || []} ) {
  # Get HostSystem from the host map
  my $host_ref = $vm->get_property('runtime.host')->{'value'};
  my $host = $host_map{$host_ref};

  # Get array of datastore moref values
  my @ds_refs = map($_->{'value'}, @{$vm->get_property('datastore') || []});

  # Get array of datastore entities from the datastore map by slicing %ds_map
  my @datastores = @ds_map{@ds_refs};

  # Map the custom field values to a hash
  my %cVals = map { $_->key => $_->value } @{$vm->summary->customValue || []} ;

  my $noMonitoring = "";
  my $ownerCustomer = "";
  my $ownerTech = "";

  $noMonitoring = $cVals{$keys_map{"noMonitoring"}} if (defined($cVals{$keys_map{"noMonitoring"}}));
  $ownerCustomer = $cVals{$keys_map{"ownerCustomer"}} if (defined($cVals{$keys_map{"ownerCustomer"}}));
  $ownerTech = $cVals{$keys_map{"ownerTech"}} if (defined($cVals{$keys_map{"ownerTech"}}));

  printf("%s;%s;%s;%s;%s;%s;\n",
    $vm->get_property('name'),
    $host->get_property('name'),
    join(',', map($_->get_property('name'), @datastores) ),
    $noMonitoring,
    $ownerCustomer,
    $ownerTech
  );

}

# Disable SSL hostname verification for vCenter self-signed certificate
BEGIN {
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
}

[2] kmg# cat check_metaMonitoring_vmWare

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#!/bin/bash

## -----------------------------------------------
# Script: check_metaMonitoring_vmWare
# Author: magnus.luebeck@kmggroup.ch
# Date: 2013-05-20
#
# Description: This script will check if your VMs are monitored
#              in your Op5-environment.
#
## -----------------------------------------------

this_dir=$(cd `dirname $0`;pwd)
live_path=$(awk '/broker_module.*live/ { print $NF}' /opt/monitor/etc/nagios.cfg)


thresholdWarning=0
thresholdCritical=10

OLD_IFS=$IFS
IFS='
'

checkHostExist(){
  curHost=$1

unixcat <<EOT $live_path
GET hosts
Columns: host_name
Filter: host_name = $curHost
EOT
 
}

#-- field 4 is the noMonitoring field
for row in $($this_dir/getVMsAndCustomAttributes.pl --server=192.168.2.30 --username=op5 --password=op5 | grep -v "^#")
do
  #echo $row
  IFS=";"
  set $row
  hostName=$1
  IP=$2
  dataStores=$3
  noMonitoring=$4
  ownerCustomer=$5
  ownerTech=$6

  result=$(checkHostExist $hostName)

  (( numHosts += 1 ))

  [ -n "$result" ] && { echo "$hostName is monitored" 1>&2 ; (( numMonitoredHosts += 1 )) ; }
  [ -z "$result" ] && { echo "$hostName is NOT monitored" 1>&2 ; (( numNotMonitoredHosts += 1 )) ; }

  #--- the secret sauce - noMonitoring field is empty -> should be monitored
  [[ -n "$noMonitoring" && -z "$result" ]] && { echo "  - But does not have to: $noMonitoring" 1>&2 ; (( numNotMonitoredWithGoodExcuseHosts += 1 )) ; }
  [[ -z "$noMonitoring" && -z "$result" ]] && { echo "  - Should be monitored" 1>&2 ; (( numNotMonitoredWithoutExcuseHosts += 1 )) ; hostsToOutput="$hostsToOutput $hostName" ; }
 

done

[ $numNotMonitoredWithoutExcuseHosts -le $thresholdWarning ] && { retVal=0 ; retPrefix=OK ; }
[ $numNotMonitoredWithoutExcuseHosts -gt $thresholdWarning ] && { retVal=1 ; retPrefix=WARN ; }
[ $numNotMonitoredWithoutExcuseHosts -gt $thresholdCritical ] && { retVal=2 ; retPrefix=CRIT ; }


echo "$retPrefix - H: $numHosts M: $numMonitoredHosts !M: $numNotMonitoredHosts ok!M: $numNotMonitoredWithGoodExcuseHosts nok!M: $numNotMonitoredWithoutExcuseHosts"
[ -n "$hostsToOutput" ] && echo "Hosts: $hostsToOutput"
echo "| hosts=$numHosts monitored=$numMonitoredHosts notMonitored=$numNotMonitoredHosts okNotMonitored=$numNotMonitoredWithGoodExcuseHosts nokNotMonitored=$numNotMonitoredWithoutExcuseHosts"

exit $retVal