The JML Continuum: December 2013

2013-12-18

More OpenShift Oddities

I had to fight with OpenShift a bit more today to get my application up and running after a botched code push. Restarting from the website didn't work, and simply re-pushing git code didn't help either... so time to dig in. As you can see here, [node] being in brackets meant it wasn't really running, it was in the process of starting or stopping... in fact, it kept doing it quite frequently according to a tail -f on /nodejs/logs/node.log ... So, I decided I had to stop it restarting, but how?

[(app name).rhcloud.com (username)]\> ps aux
kUSER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1313     240483  0.0  0.0 105068  3152 ?        S    17:02   0:00 sshd: (user)@pts/1
1313     240486  0.0  0.0 108608  2100 pts/1    Ss   17:02   0:00 /bin/bash --init-file /usr/bin/rhcsh -i
1313     249661  0.1  0.4 397100 35224 ?        Sl   17:08   0:00 /usr/bin/mongod --auth -f /var/lib/openshift/(user)/mongodb//conf/mongodb.conf run
1313     261473  5.5  0.0      0     0 ?        R    17:15   0:00 [node]
1313     261476  2.0  0.0 110244  1156 pts/1    R+   17:15   0:00 ps aux
1313     390906  8.1  0.2 1021240 20196 ?       Sl   Dec10 321:14 node /opt/rh/nodejs010/root/usr/bin/supervisor -e node|js|coffee -p 1000 -- server.js
[(app name).rhcloud.com (username)]\> kill 390906

That killed the process "supervisor" that re-spawns the node process. This is generally helpful, but today, it's continually incrementing the PID and it seems like that's happening more often than the gear can attempt to stop it. Unfortunately, now I can't restart it (rerunning that command in the ps output just gave me an error complaining about an Unhandled 'error' event in the supervisor script, so I decided to start the node service myself.

There are a few ways of doing this. You can go to your code and run 'node' or you can use gear start. But if you try gear start, well, it won't start if it thinks it's already running. After killing supervisor, the node process was not attempting to restart, but gear start didn't work either. I tried tricking it by clearing out the $OPENSHIFT_NODEJS_PID_DIR/cartridge.pid file, but that didn't work either... It did point out something I could use though.

[(appname).rhcloud.com (username)]\> gear stop
Stopping gear...
Stopping NodeJS cartridge
usage: kill [ -s signal | -p ] [ -a ] pid ...
       kill -l [ signal ]
Stopping MongoDB cartridge
[(appname).rhcloud.com (username]\> gear start
Starting gear...
Starting MongoDB cartridge
Starting NodeJS cartridge
Application 'deploy' failed to start
An error occurred executing 'gear start' (exit code: 1)
Error message: Failed to execute: 'control start' for /var/lib/openshift/(username)/nodejs

For more details about the problem, try running the command again with the '--trace' option.

What I found interesting about that was that it apparently tried to pass the empty pid that was in the $OPENSHIFT_NODEJS_PID_DIR/cartridge.pid file along to kill and kill didn't know what to do with that. In fact, kill returns a failed error code if you don't tell it what to kill OR if you tell it to kill something that wasn't there (original issue), so instead of getting an 'okay' back from the kill command when the gear script tried to run it, it got a failure and that meant problems for gear. So, I thought if I got something running on a PID that it COULD kill and put that PID in the file, it'd kill it successfully and everything would be back to normal. Easiest thing I could think of was to stick the '}' in my script that I'd forgotten and run that.

The node code is stored in /app-deloyments/<datestamp>/repo/ .. but don't expect things you put here to stick around.

\> node server.js 
^Z
\> ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1313     240483  0.0  0.0 105068  3152 ?        S    17:02   0:00 sshd: (user)@pts/1
1313     240486  0.0  0.0 108608  2124 pts/1    Ss   17:02   0:00 /bin/bash --init-file /usr/bin/rhcsh -i
1313     275483  0.3  0.4 467788 36892 ?        Sl   17:24   0:01 /usr/bin/mongod --auth -f /var/lib/openshift/(user)/mongodb//conf/mongodb.conf run
1313     284292  2.5  0.6 732440 45924 pts/1    Sl   17:30   0:02 node server.js
1313     287036  2.0  0.0 110240  1156 pts/1    R+   17:32   0:00 ps aux
\> echo "284292" > $OPENSHIFT_NODEJS_PID_DIR/cartridge.pid

So, PID is in the file, and the PID is a valid running node process. Then I did my git commit of my fix, and ran git push... and it was back to normal!

Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 344 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Stopping NodeJS cartridge
remote: Stopping MongoDB cartridge
remote: Saving away previously installed Node modules
remote: Building git ref 'master', commit f5e40ef
remote: Building NodeJS cartridge
remote: npm info it worked if it ends with ok
...
remote: npm info ok 
remote: Preparing build for deployment
remote: Deployment id is aa38fed5
remote: Activating deployment
remote: Starting MongoDB cartridge
remote: Starting NodeJS cartridge
remote: Result: success
remote: Activation status: success
remote: Deployment completed with status: success

So, now that the PID was stable and correct, it seemed to deploy properly and I've had no troubles since!

2013-12-12

OpenShift Solving the: 'PID Does not match' Error

OpenShift is a great free service (and paid for larger requirements) to run your Java, Node.JS, Ruby, Python and Perl apps in the cloud quickly and easily. Basically, for the un-initiated, it works like this:

Sign Up
Get assigned a git repo
Clone the repo locally
Put code in the repo
git commit and then git push

Upon pushing the code up, it will execute your server (Sometimes you'll need to write a small config file to tell it What to run) and you're done. I'm using it for Node.js along with what they call a cartridge for MongoDB. Just purring right along until yesterday, when I get this error during a git push:

remote: Stopping NodeJS cartridge
remote: Warning: Application '(appname)' nodejs PID (322361) does not match '$OPENSHIFT_NODEJS_PID_DIR/cartridge.pid' (14154
remote: 390925).  Use force-stop to kill.
remote: An error occurred executing 'gear prereceive' (exit code: 141)
remote: Error message: Failed to execute: 'control stop' for /var/lib/openshift/(username)/nodejs
remote: 
remote: For more details about the problem, try running the command again with the '--trace' option.
To ssh://(username)@(app name).rhcloud.com/~/git/(app name).git/
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://(username)@(app name).rhcloud.com/~/git/(app name).git/'

Well, that's annoying. I did find that I can connect in and manually restart the app by killing the running node pid (ps aux lists it, and then kill (pid) to kill it.) Because they're running 'supervisor' it'll respawn the node process. However, it didn't actually seem to get my git push either. So, I'm now re-running the old code. Not very handy. Of course, there's no way I can git push 'force-stop' and it actually be valid, so I'm left wondering what I can do to get back up and developing.

Turns out, it's not that hard to fix. Observe:

# ssh (username)@(app name).rhcloud.com
(app name) (username)]> ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1313     315834  0.3  0.4 471888 34752 ?        Sl   21:51   0:02 /usr/bin/mongod --auth -f /var/lib/openshift/(username)/mongodb//conf/mongodb.
1313     319791  0.0  0.0 104916  3120 ?        S    21:52   0:00 sshd: (username)@pts/0
1313     319809  0.0  0.0 108608  2064 pts/0    Ss   21:52   0:00 /bin/bash --init-file /usr/bin/rhcsh -i
1313     322361  0.6  0.7 733380 57516 ?        Sl   21:53   0:03 node server.js
1313     360061  2.0  0.0 110240  1148 pts/0    R+   22:02   0:00 ps aux
1313     390906  8.1  0.1 1021104 9008 ?        Sl   Dec10 226:34 node /opt/rh/nodejs010/root/usr/bin/supervisor -e node|js|coffee -p 1000 -- server.js
(app name) (username)]> vi $OPENSHIFT_NODEJS_PID_DIR/cartridge.pid

Now, that will open up vi with the pid file in it. Your PID's might vary, but you're going to want to delete whatever is in this file, and put the pid of the node process (in bold above). In my case here, it was 322361. Once I put that in there and saved it (ESC :wq <= For you non-vi types), you should be back in business. Run another git push and you should be back to your normal git output, something along these lines:

Counting objects: 8, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 679 bytes | 0 bytes/s, done.
Total 6 (delta 4), reused 0 (delta 0)
remote: Stopping NodeJS cartridge
remote: Stopping MongoDB cartridge
remote: Saving away previously installed Node modules
remote: Building git ref 'master', commit f9d21d1
remote: Building NodeJS cartridge
remote: npm info it worked if it ends with ok
remote: npm info using npm@1.2.17
remote: npm info using node@v0.10.5
...
remote: npm info ok 
remote: Preparing build for deployment
remote: Deployment id is a878ff76
remote: Activating deployment
remote: Starting MongoDB cartridge
remote: Starting NodeJS cartridge
remote: Result: success
remote: Activation status: success
remote: Deployment completed with status: success

2013-12-10

Too many authentication failures for

Lately I've been getting this lovely error when trying to ssh to certain hosts (not all, of course):

# ssh ssh.example.com
Received disconnect from 192.168.1.205: 2: Too many authentication failures for

My first thought is "But you didn't even ASK me for a password!" My second thought is "And you're supposed to be using ssh keys anyway!"

So, I decide I need to specify a specific key to use on the command line with the -i option.

# ssh ssh.example.com -i myAwesomeKey
Received disconnect from 192.168.1.205: 2: Too many authentication failures for

Well, that didn't help. Adding a -v shows that it tried a lot of keys... including the one I asked it to. Now, apparently this is the crux of the issue. You see, it looks through the config file (of which mine is fairly extensive as I deal with a few hundred hosts, most of which share a subset of keys, but not all of them). Apparently it doesn't always necessarily try the key I specified FIRST. So, if you have more than, say 5 keys defined, it may not necessarily use the key you want it to use first, it will offer anything from the config file. Yes, even if you have them defined per host. For instance, my config file goes something like this:

Host src.example.com
 User frank.user
 Compression yes
 CompressionLevel 9
 IdentityFile /home/username/.ssh/internal

Host puppet.example.com
 User john.doe
 Compression yes
 CompressionLevel 9
 IdentityFile /home/username/.ssh/jdoe

Apparently, this means ssh will try both of these keys for any host that isn't those two. If the third one you define, "Host ssh.example.com" in our case, is the one you want, it'll do that one THIRD, even though the host entry line matches. The fix is simple: Tack "IdentitiesOnly yes" in there. It tells ssh to apply ONLY the IdentityFile entries having to do with that host TO that host.

Host src.example.com
 User frank.user
 Compression yes
 CompressionLevel 9
        IdentitiesOnly yes
 IdentityFile /home/username/.ssh/internal

The side effect of this is that you don't have to define an IdentityFile line for EVERY HOST. It will apply all the keys it knows about to all of the Host entries in the config, and indeed to every ssh you attempt, listed or not. This is why it didn't always fail, there was a good chance the first one or two in the list worked. It was only when the first 5 it tried didn't work that it failed.