The JML Continuum: Fedora

Showing posts with label Fedora. Show all posts

2016-12-21

Keeping a process running with Flock and Cron

We've got a few processes here that aren't system services, but need to be running in the background and should be restarted if they die. However, this method can also be used for a cron that often runs past it's normal re-execution time (say a every 5 min cron that sometimes runs for 7min). It will prevent multiple executions from running simultaneously.

First off, in your crontab, you can add a line like this:

* * * * * flock -x -n /tmp/awesomenessRunning.lock -c "/usr/local/bin/myAwesomeScript.sh" >/dev/null 2>&1

What happens here is fairly straight forward:

Every minute, flock executes your script in this case "/usr/local/bin/myAwesomeScript.sh"
flock opens up an exclusive write lock on the lock file, here named "/tmp/awesomenessRunning.lock". When it's done executing, it'll release the lock.
The next time this cron runs, flock will attempt to again, get an exclusive lock on that lock file... but it can't if that script is still running, so it'll give up and try again next time the cron runs.

Now, generally, if I'm doing this as a systems level item, I'll put the following in a file named for the job or what the job is doing and drop it in /etc/cron.d/. All the files there will get compiled together into the system cron, which helps other admins (or your later self) to find and disable it later. If you do that, remember to stick the user to execute the cron as between the *'s and the flock!

2015-06-08

Writing Upstart Scripts

Upstart scripts are a great way to deal with starting and stopping system daemons in Ubuntu and CentOS 6, as well as many other flavors of Linux. While later replaced by systemd, Upstart allows quite a bit of customization when creating init scripts, without as much of the hassle and bash programming knowledge required with init.d System V scripts. In this tutorial, I'll show you how to create one for the Kibana 4.x executable distributed by Elastic.co the makers of ElasticSearch and LogStash. If you haven't messed with Kibana I highly recommend it along side LogStash and fluentd, but that's another tutorial. Any executable could be substituted because Kibana itself is simply a binary distribution.

There are many extra potental pieces in upstart scripts, but we're going to start with what we need for a simple start/stop script which will run upon system boot and shutdown properly on the flip side. All Upstart init scripts live in the /etc/init/ directory and end with '.conf'. This one is simply "kibana.conf". Consider the following config:

# cat /etc/init/kibana.conf
description 'Kibana Service startup'
author 'Joe Legeckis'
env NAME='kibana'
env LOG_FILE=/var/log/kibana.log
env USER=nginx
env BASEDIR=/var/www/kibana/
env BIN=/var/www/kibana/bin/kibana
env PARAMS=''
start on started network
stop on stopping network
# Respawn in case of a crash, with default parameters
respawn
script
 # Make sure logfile exists and can be written by the user we drop privileges to
 touch $LOG_FILE
 chown $USER:$USER $LOG_FILE
 date >> $LOG_FILE
 cd $BASEDIR
 exec su -s /bin/sh -c 'exec "$0" "$@"' $USER -- $BIN $PARAMS >> $LOG_FILE 2>&1
end script

post-start script
 echo "app $NAME post-start event" >> $LOG_FILE
end script

It starts off with a simple human readable information about the job we're defining. The Description and Author. After that, you'll see there are a number of Environment Variables defined as "env =". These are sometimes used by the script/binary itself, and other times, as seen here, used by the upstart script later on to execute said script/binary. In this case, we're defining the Name, Log File, User to run as, BaseDir where Kibana lives, the binary executed to run Kibana and any additional Parameters.

Next, we describe when it should start and stop. In this case, and in most cases, the following two lines will be accurate, but more detail can be provided if you need to chain things together, or make sure that something stops or starts according to another scripts timing. Here we're simply starting up after network starts successfully, and stopping prior to stopping network.

We then define what should happen with a crash by simply specifying that the script should respond. Importantly, there is a default limit which varies from system to system, to determine how many times and if any delay is enforced prior to responding a failed or prematurely exited script. 'respawn' does not check the output value of the script. If you didn't ask the script to stop, then it will run the respawn process and start it back up! It may be prudent to add a line after this:

respawn limit COUNT INTERVAL | unlimited

i.e.
reswpan limit 5 10

The example line will restart the process up to 5 times, with 10s in-between the failure and the next execution. This will help prevent race conditions. Specifying 0 (as in zero) for the count will restart it an unlimited number of times.

And now we get into the meat of the script. Between "script" and "end script" is where you put exactly what you want to run every time the daemon is to start up. In this case, we're going to make sure our log file exists, has the right permissions and gets a fresh datestamp (handy when it dies and you're not sure when it started back up). Then we change directory and run our executable. Using the previously defined variables to fill out the command it executes as the correct user, directing all it's output to the log file, including standard out, just in case. After that we close it up with the "end script" and move on!

"post-start" is simply what happens after starting and before exiting upstart. You can use it for all sorts of things like tests or initial setup checks if need be, here we're just adding a line to the log file.

Now, as long as you've put this script in your /etc/init/ directory (NOT /etc/init.d/) as 'name.conf' or in our case here, 'kibana.conf', you can execute it by running "initctl start name". Stop should work, as well as restart!

2015-06-05

Resizing Online Disks in Linux with LVM and No Reboots

When we set this up, we only had a 16GB primary disk... Plenty of space. Until you start to write lots of logs and data... and then it fills up quick. So let's talk about how to resize an LVM based partition on a live server without reboots... Reboots are for Windows! This system is a CentOS 6.x machine running in VMWare 5.x that's currently got a 16GiB VMDK based drive. Let's see what we've got to work with:

$ df -h
Filesystem                                         Size  Used Avail Use% Mounted on
/dev/mapper/myfulldisk--vg-root                     12G   11G     0 100% /
none                                               4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                               7.4G  4.0K  7.4G   1% /dev
tmpfs                                              1.5G  572K  1.5G   1% /run
none                                               5.0M     0  5.0M   0% /run/lock
none                                               7.4G     0  7.4G   0% /run/shm
none                                               100M     0  100M   0% /run/user
/dev/sda1                                          236M   37M  187M  17% /boot
/dev/sdc1                                          246G   44G  190G  19% /data

Hmm... time to get on this then. Now, luckily we're running in VMWare. A quick edit to our VM to enlarge the VMDK (not covered in this how-to) will fix this... First, what device are we talking about?

$ dmesg|grep sd
[    1.562363] sd 2:0:0:0: Attached scsi generic sg1 type 0
[    1.562384] sd 2:0:0:0: [sda] 33554432 512-byte logical blocks: (17.1 GB/16.0 GiB)
[    1.562425] sd 2:0:0:0: [sda] Write Protect is off
[    1.562426] sd 2:0:0:0: [sda] Mode Sense: 61 00 00 00
[    1.562460] sd 2:0:0:0: [sda] Cache data unavailable
[    1.562461] sd 2:0:0:0: [sda] Assuming drive cache: write through
[    1.563331] sd 2:0:0:0: [sda] Cache data unavailable
[    1.563451] sd 2:0:1:0: Attached scsi generic sg2 type 0
[    1.563452] sd 2:0:1:0: [sdb] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
[    1.563479] sd 2:0:1:0: [sdb] Write Protect is off
[    1.563481] sd 2:0:1:0: [sdb] Mode Sense: 61 00 00 00
[    1.563507] sd 2:0:1:0: [sdb] Cache data unavailable
[    1.563508] sd 2:0:1:0: [sdb] Assuming drive cache: write through
[    1.563755] sd 2:0:2:0: Attached scsi generic sg3 type 0
[    1.563881] sd 2:0:2:0: [sdc] 524288000 512-byte logical blocks: (268 GB/250 GiB)
[    1.563942] sd 2:0:2:0: [sdc] Write Protect is off
[    1.563944] sd 2:0:2:0: [sdc] Mode Sense: 61 00 00 00
[    1.564008] sd 2:0:2:0: [sdc] Cache data unavailable
[    1.564010] sd 2:0:2:0: [sdc] Assuming drive cache: write through
[    1.564282] sd 2:0:2:0: [sdc] Cache data unavailable
[    1.564283] sd 2:0:2:0: [sdc] Assuming drive cache: write through
[    1.564360] sd 2:0:1:0: [sdb] Cache data unavailable
[    1.564362] sd 2:0:1:0: [sdb] Assuming drive cache: write through
[    1.564989] sd 2:0:0:0: [sda] Assuming drive cache: write through
[    1.571010]  sdb: sdb1
[    1.571426] sd 2:0:1:0: [sdb] Cache data unavailable
[    1.571514] sd 2:0:1:0: [sdb] Assuming drive cache: write through
[    1.571626] sd 2:0:1:0: [sdb] Attached SCSI disk
[    1.574181]  sda: sda1 sda2 < sda5 >
[    1.574797] sd 2:0:0:0: [sda] Cache data unavailable
[    1.574888] sd 2:0:0:0: [sda] Assuming drive cache: write through
[    1.575003] sd 2:0:0:0: [sda] Attached SCSI disk
[    1.579250]  sdc: sdc1
[    1.579805] sd 2:0:2:0: [sdc] Cache data unavailable
[    1.579944] sd 2:0:2:0: [sdc] Assuming drive cache: write through
[    1.580141] sd 2:0:2:0: [sdc] Attached SCSI disk
[    6.922330] Adding 4193276k swap on /dev/sdb1.  Priority:-1 extents:1 across:4193276k FS
[    7.137134] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[    7.142419] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[    7.218150] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
[    7.384566] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).

The first one is the 16GB drive in question. Take the number on that line and use it in the next step:

$ echo 1 > /sys/class/scsi_device/2\:0\:0\:0/device/rescan
$ dmesg |tail
[1918441.322362] sd 2:0:0:0: [sda] 209715200 512-byte logical blocks: (107 GB/100 GiB)
[1918441.322596] sd 2:0:0:0: [sda] Cache data unavailable
[1918441.330685] sd 2:0:0:0: [sda] Assuming drive cache: write through
[1918441.489622] sda: detected capacity change from 17179869184 to 1073741824

So, that's good, it sees our increased size. Now, lets enlarge that Volume Group. First we get info about the volume group.

$ vgdisplay
  --- Volume group ---
  VG Name               myfulldisk-vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               15.76 GiB
  PE Size               4.00 MiB
  Total PE              4034
  Alloc PE / Size       4028 / 15.73 GiB
  Free  PE / Size       6 / 24.00 MiB
  VG UUID               dv3URd-EVvz-oTwY-WiDW-RPt1-4rbD-FnPxxM

That '6' is only 24MiB. It doesn't see our new space yet. In order to get it to, we need to make a new partition of the right type, then add it to the volume group. We'll then end up with more Free PE's. Here we go:

$ fdisk /dev/sda
Command (m for help): p

Disk /dev/sda: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000ade37

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048      499711      248832   83  Linux
/dev/sda2          501758    33552383    16525313    5  Extended
/dev/sda5          501760    33552383    16525312   8e  Linux LVM

Command (m for help): n
Partition type:
   p   primary (1 primary, 1 extended, 2 free)
   l   logical (numbered from 5)
Select (default p): p
Partition number (1-4, default 3): 3
First sector (499712-209715199, default 499712): 33552384
Last sector, +sectors or +size{K,M,G} (33552384-209715199, default 209715199): 
Using default value 209715199

Command (m for help): t
Partition number (1-5): 3
Hex code (type L to list codes): 8e
Changed system type of partition 3 to 8e (Linux LVM)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

At this point you could reboot... but we're not going to. Even though this is our root drive which makes this a little trickier, it's nothing we can't fix:

$ partprobe /dev/sda

Hopefully, partprobe has found your new partition for you and enlightened the kernel with it's wisdom (or at least fresh load of zeros). Now we need to make it an available volume to use for expanding the disk. This consists of making it a 'Physical volume', and then adding that physical volume to the Volume Group containing the disk we want to expand.

$ pvcreate /dev/sda3
  Physical volume "/dev/sda3" successfully created
$ pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               myfulldisk-vg
  PV Size               15.76 GiB / not usable 2.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              4034
  Free PE               6
  Allocated PE          4028
  PV UUID               a3mhvZ-ogyk-ao4y-2JSM-KVfL-i9no-q0LAUk
   
  "/dev/sda3" is a new physical volume of "84.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/sda3
  VG Name               
  PV Size               84.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               IpyjOU-1GDy-bTLL-U9kE-iSGP-BYg1-a25LIm
   
$ vgextend /dev/myfulldisk-vg /dev/sda3
  Volume group "myfulldisk-vg" successfully extended
$ vgdisplay
  --- Volume group ---
  VG Name               myfulldisk-vg
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               99.76 GiB
  PE Size               4.00 MiB
  Total PE              25538
  Alloc PE / Size       4028 / 15.73 GiB
  Free  PE / Size       21510 / 84.02 GiB
  VG UUID               dv3URd-EVvz-oTwY-WiDW-RPt1-4rbD-FnPxxM

Awesome, we now have 21510 free PE's that we can use... That's, apparently, 84.02GB in this case. Next up, we'll need to know what portion of the VG we need to extend. Looking back up at a 'df' output, and knowing our system, it says "root" in there. doing a quick ls of /dev/myfulldisk-vg/ shows that there's only really a choice between root" and "swap". So, knowing it's root we move on with:

$ lvextend -L95G /dev/myfulldisk-vg/root
  Extending logical volume root to 95.00 GiB
  Logical volume root successfully resized
$ df -h
Filesystem                                         Size  Used Avail Use% Mounted on
/dev/mapper/myfulldisk--vg-root                     12G   11G  114M  99% /
none                                               4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                               7.4G  4.0K  7.4G   1% /dev
tmpfs                                              1.5G  576K  1.5G   1% /run
none                                               5.0M     0  5.0M   0% /run/lock
none                                               7.4G     0  7.4G   0% /run/shm
none                                               100M     0  100M   0% /run/user
/dev/sda1                                          236M   37M  187M  17% /boot
/dev/sdc1                                          246G   44G  190G  19% /data

Okay, the VG might be bigger, but no one else knows that. Because the filesystem ON the VG is still the same size, luckily there's a command for that too!

$ resize2fs /dev/myfulldisk-vg/root
resize2fs 1.42.9 (4-Feb-2014)
Filesystem at /dev/myfulldisk-vg/root is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 6
The filesystem on /dev/myfulldisk-vg/root is now 24903680 blocks long.

$ df -h
Filesystem                                         Size  Used Avail Use% Mounted on
/dev/mapper/myfulldisk--vg-root                     94G   11G   79G  12% /
none                                               4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                               7.4G  4.0K  7.4G   1% /dev
tmpfs                                              1.5G  576K  1.5G   1% /run
none                                               5.0M     0  5.0M   0% /run/lock
none                                               7.4G     0  7.4G   0% /run/shm
none                                               100M     0  100M   0% /run/user
/dev/sda1                                          236M   37M  187M  17% /boot
/dev/sdc1                                          246G   44G  190G  19% /data

Ha, there we go! 79GB available, enjoy!

2014-01-03

WordPress asks for FTP Credentials

Being a modern Systems Administrator, I'm sometimes asked to manage things that throw me for a loop. WordPress is one of those things. It's both really simple and really complex, and sometimes not direct with it's response to problems. I've noticed with a fresh WordPress install that when my users wanted to upload a new theme, they were presented with a normal 'upload' link. Click this button, browse to your file, hit okay, then hit upload. All well and good. But then it prompted them for FTP or SFTP credentials. No error message, no reason why FTP would be needed in light of the previous, seemingly successful upload. We don't run FTP here in relation to WordPress, nor would I want to add that complexity to the setup or allow another potential access point for an attacker.

After digging a bit, the reason came down to my being a bit too secure and clamping down the permissions for my web server user, on the WordPress files, too far. I found the following issues during troubleshooting:

The default install from a tarball doesn't make the wp-content/uploads directory. You've got to make it yourself.
The uploads dir must allow writes (for obvious reason) by apache or www-user or whoever your web user is.
The target of your upload has to go somewhere... Theme uploads need apache to be able to write to wp-content/themes/, upgrades wp-content/upgrades/, etc.

Fixing these issues made the FTP prompt stop showing up, and we were good to go.

2013-12-10

Too many authentication failures for

Lately I've been getting this lovely error when trying to ssh to certain hosts (not all, of course):

# ssh ssh.example.com
Received disconnect from 192.168.1.205: 2: Too many authentication failures for

My first thought is "But you didn't even ASK me for a password!" My second thought is "And you're supposed to be using ssh keys anyway!"

So, I decide I need to specify a specific key to use on the command line with the -i option.

# ssh ssh.example.com -i myAwesomeKey
Received disconnect from 192.168.1.205: 2: Too many authentication failures for

Well, that didn't help. Adding a -v shows that it tried a lot of keys... including the one I asked it to. Now, apparently this is the crux of the issue. You see, it looks through the config file (of which mine is fairly extensive as I deal with a few hundred hosts, most of which share a subset of keys, but not all of them). Apparently it doesn't always necessarily try the key I specified FIRST. So, if you have more than, say 5 keys defined, it may not necessarily use the key you want it to use first, it will offer anything from the config file. Yes, even if you have them defined per host. For instance, my config file goes something like this:

Host src.example.com
 User frank.user
 Compression yes
 CompressionLevel 9
 IdentityFile /home/username/.ssh/internal

Host puppet.example.com
 User john.doe
 Compression yes
 CompressionLevel 9
 IdentityFile /home/username/.ssh/jdoe

Apparently, this means ssh will try both of these keys for any host that isn't those two. If the third one you define, "Host ssh.example.com" in our case, is the one you want, it'll do that one THIRD, even though the host entry line matches. The fix is simple: Tack "IdentitiesOnly yes" in there. It tells ssh to apply ONLY the IdentityFile entries having to do with that host TO that host.

Host src.example.com
 User frank.user
 Compression yes
 CompressionLevel 9
        IdentitiesOnly yes
 IdentityFile /home/username/.ssh/internal

The side effect of this is that you don't have to define an IdentityFile line for EVERY HOST. It will apply all the keys it knows about to all of the Host entries in the config, and indeed to every ssh you attempt, listed or not. This is why it didn't always fail, there was a good chance the first one or two in the list worked. It was only when the first 5 it tried didn't work that it failed.

2013-11-20

Adding Swap Space in Linux Without a Reboot

So, let's say you've got a server running out of memory. Not just RAM, but swap too. Now, generally, there are a few well known ways to solve this issue.

Close/Kill processes you don't need
Reboot
Add another swap partition
Buy more RAM
Buy more Hardware

Now, In our scenario, the first option isn't helping, the second one is just the nuclear option to the first. But we've got one huge process and it's not all active memory... it's just consuming a lot of RAM and Swap and we want it to succeed. Buying more RAM is the best idea, but this server won't take anymore, or we're not sure we'll have this workload often, so we can't justify wasting money on more hardware. We've gotta get creative before it fills up and gets OOM killed. Adding another swap partition is a great idea, but we're out of available disk partitions or drives to throw at it. However, we do have some free space on an existing partition, we can leverage that.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2               47G   11G   35G  23% /
/dev/hda1              99M   20M   74M  22% /boot

Alright, looking at a top or vmstat, we know we've got 4GB of RAM in here, and another 2GB of swap. Knowing the size of the process, we figure doubling that swap will give us plenty of overhead at the moment. Let's do this!

$ dd if=/dev/zero of=/newswap bs=32k count=64k

65536+0 records in
65536+0 records out
2147483648 bytes (2.1 GB) copied, 18.9618 seconds, 113 MB/s

$ ls -al /newswap
-rw-r--r-- 1 root root 2147483648 Nov 19 23:02 /newswap
$ mkswap /newswap
Setting up swapspace version 1, size = 2147479 kB
$ swapon /newswap

And that's it. A quick check should find that we now have another 2GB of swap space and a system that can breathe a little more.

Note: The size of the swap space is determined by the size of the file. 'bs' is the block size, and 'count' is the number of blocks. I generally stick to 32k or 64k block sizes and then adjust the count from there. 64k & 64k is 4GB, 64k and 128k is 8GB, etc.

Now, this won't stick after a reboot as is. If you'd like it to, I recommend changing the process a bit. It's the same until you've finished the mkswap command, after that instead of running swapon, open up the /etc/fstab in your favorite editor (vi /etc/fstab) and then add another swap line after the disk the file is on is listed like so:

/newswap         swap                    swap    defaults        0 0

Then you can run 'swapon -a' and it will mount ALL swap partitions.

Note: Swap automatically stripes across multiple swap partitions of the same priority. It might be useful to make swap partitions on multiple drives to allow for faster RAID-0 type speeds across drives!

Hope this helped someone out. I had to use it the other day and was able to save a long running process that was eating up RAM like candy. It finished a few hours after I put this fix in place. Since I don't run that process often, I simply removed the line from the /etc/fstab and the next time it rebooted, it was back to it's normal swap sizes. I then deleted the file and it was like nothing ever happened!