Ansible: why did my remote script not start my service?

Introduction

Guest blogger: Andrin Linggi

Today we solved a tricky problem for a customer.

At a customer we need to start a java program with a bash script that is stored on a remote server. The java program will run in the background until it is killed by another script.

Starting the program with the script works fine when we run it locally on the remote server, and also when started over an ssh command.

user@remote-server $ /path/to/script.sh
user@ansible-server $ ssh remote-server /path/to/script.sh

But with ansible and the shell module it doesn’t work. Ansible doesn’t report any error and an application config file, that the start script also create, was created on the remote server. Nevertheless, the expected process wasn’t running.

Our first idea was that something had to be different between the two calls (locally and with ansible). We checked environment variables, classpath, folder handling in the bash script, but nothing worked…

To troubleshoot further, we added a pgrep at the end of the script to see if the java process is started and is running. And… it was running as long as the startup script was running!

someuser       1514  1.4 62.8 7435456 3820348 ?     Sl   Apr14 306:42 /path/to/java  ...

Now it became clear that ansible is killing the process. This is apparently a known feature of ansible, which does have some implications on how you need to write your start/stop scripts in Linux. See github

Ansible:

  • Kills any sub processes hanging after it runs the shell module, but
  • Leaves nohup:ed processes

Many sysadmins write start/stop scripts that starts a program with:

/path/to/my/program >/dev/null 2>&1 &

Which mostly works, as your standard shell or ssh does not wildly kill your spawned off processes.

The proper way to do this, though, is to use nohup to break away any TTYs and file handles for stdin and stdout from the background process.

nohup /path/to/my/program >/dev/null 2>&1 &

So, to solve our issue, we added nohup before the java call, and it worked as long as we did not pipe the stdout from the java process to anything else. But that would be too easy…

The next problem we had was that the java call was piped to logsave to handle the logs. nohup and pipes are not best friends. One possible workaround is to inline a script in our script, and spawn off a shell with the java program and logsave. Thanks Stack Overflow!

nohup $SHELL << EOF &
java blabla | logsave blabla
EOF

Hopefully someone find this explanation usefule they run into a similar situation.

References