2014-03-10

LogStash ElasticSearch Index Cleanup

LogStash is a great way to track logs from lots of different sources and store them in a central location where metrics and monitoring can occur. I've started pushing LOTS of data into our setup which uses the ElasticSearch back end. To quote their site, "ElasticSearch is a flexible and powerful open source, distributed, real-time search and analytics engine." and I think it has a really bright future... but currently it's soaking up a lot of disk space. I'm sure I'm not the only one with this issue, after all, when something can handle LOADS of data, you want to give it all you've got! So, we've got 3 hosts running ElasticSearch processes, each with 250GB of data storage. Sometimes one will start to fill up. Looking into the API, I found it's really, REALLY easy to delete old data to keep the size within requested parameters. First off, looking at LogStash's ElasticSearch plugin, it notes that by default LogStash indexes are "logstash-%{+YYYY.MM.dd}". Keeping that in mind, the following info would work for anything as long as you know the indexes you want to delete, but let's start off simple.

curl -s -XDELETE  'http://127.0.0.1:9200/logstash-2014.02.28'

That'll delete the "logstash-2014.02.28" index. I've had to connect in and do this sometimes. Great to do when you need it on demand, but we can do better. Assuming that I'm cool keeping the last 7 days up there, let's edit up a quick bash script:

#!/bin/bash
DATETODELETE=`date +%Y.%m.%d -d '7 days ago'`
curl -s -XDELETE  http://127.0.0.1:9200/logstash-${DATETODELETE}

Now, we could put that in the crontab, have it run once or twice a day and be good to go... And if you knew you could ALWAYS keep 7 days worth of data on your system, that'd be acceptable. But let's have some more fun. Let's assume that we want to keep as much as we can on our system and still keep 10% space free and that the drive we store this on is mounted on /data

#!/bin/bash
#This is about 10% of a 250GB volume (not GiB) using 1000 = k 
DESIRED=24410000
AVAIL=`df /data|grep -v Filesystem|awk '{print $4}'`
if [ $AVAIL -lt $DESIRED ] 
then
    curl -s -XDELETE 127.0.0.1:9200/`curl -s 127.0.0.1:9200/_stats?pretty|grep logstash|sort|awk -F'"' '{ print $2 }'|head -n1`
fi

Let's explain this sample a bit... First off, we set DESIRED to be the amount of "Available" space we desire the system to retain. in our case above, I calculated 10% of a 250GB drive and put that in. So if it ever starts to go below 10% remaining (90%+ used) the if statement will fire.

Next, I pull the Available space. If you take what's in the backquotes and put it on a command line, you'll see what happens. I run df, limited to just the filesystem I care about, I get rid of the line with the labels and then awk pulls out the 4th column (Avail). This number gets stored in AVAIL and we move on.

The If statement then compares the two, if DESIRED is less than AVAIL, we are bumping our limit and have something's got to give, so we run the curl... This curl is a combination of two actually. Starting inside out, we do a "curl -s 127.0.0.1:9200/_stats?pretty" which prints out a list of indexes and a bunch of cool stats about them... then we grep for logstash to get rid of all the cool stats and just keep the names, then we use sort to make sure they're in order of lowest to highest (since they have dates 2014-03-04 and such, that works) and then we use some awk magic to pull out JUST the name of the index and get rid of the other chars 'pretty' uses. That then gets placed back in the right place for the outter curl to execute a delete on it and bye, bye index!

If you put this in your crontab and run it often (it won't do anything if the drive has more than the desired available space remaining) you'll be able to maintain free space on your ElasticSearch hosts without having to set a hard limit on days to keep.

Thinking further into it, you can use the same script with different commands inside the if statement to keep free space on many other systems as well.

2014-03-05

Running Custom SNMPd Checks in CentOS 6

I've been fighting with a problem between CentOS 5.x and CentOS 6.x SNMPD configs. In CentOS 5.x we have two lines in /etc/snmpd/snmpd.conf like the following:
exec 1.3.6.1.4.1.5001.100 mailq-check /usr/local/nagios/libexec/check_mailq -w2 -c4 -t7 
exec 1.3.6.1.4.1.5002.1 fscheck /bin/touch /opt/rocheck && /bin/touch /tmp/rocheck
They're simple commands that extend snmp. In CentOS 5.x this form is: "exec <return oid> <name> <command>" and when setup the command or script's output will be returned via snmpd to those that request the MIB/OID. All well and good, but copying this into the standard snmpd.conf file from CentOS 6 gave me nothing. running snmpd with verbose logging gave me nothing useful other than being able to see that it was being requested. Digging deeper, I found the following command and it's output:
# snmpwalk -v 2c 10.93.90.209 -c rsprod .1.3.6.1.4.1.2021.8
UCD-SNMP-MIB::extIndex.1 = INTEGER: 1
UCD-SNMP-MIB::extIndex.2 = INTEGER: 2
UCD-SNMP-MIB::extNames.1 = STRING: 1.3.6.1.4.1.5001.100
UCD-SNMP-MIB::extNames.2 = STRING: 1.3.6.1.4.1.5002.1
UCD-SNMP-MIB::extCommand.1 = STRING: mailq-check
UCD-SNMP-MIB::extCommand.2 = STRING: fscheck
UCD-SNMP-MIB::extResult.1 = INTEGER: 1
UCD-SNMP-MIB::extResult.2 = INTEGER: 1
UCD-SNMP-MIB::extOutput.1 = STRING: mailq-check: No such file or directory
UCD-SNMP-MIB::extOutput.2 = STRING: fscheck: No such file or directory
UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::extErrFix.2 = INTEGER: 0
UCD-SNMP-MIB::extErrFixCmd.1 = STRING: 
UCD-SNMP-MIB::extErrFixCmd.2 = STRING: 
Looking at this, it doesn't look like it's running my script "/usr/local/nagios/libexec/check_mailq -w2 -c4 -t7" but attempting to run the name "mailq-check". Of course, that isn't the script name+path, so it doesn't find it. Turns out that in CentOS 6, you don't use the oid portion:

exec mailq-check "/usr/local/nagios/libexec/check_mailq -w2 -c4 -t7 -Mpostfix"
exec fscheck "/bin/touch /opt/rocheck && /bin/touch /tmp/rocheck"

Which, when called with the same snmpwalk above, comes back as:

UCD-SNMP-MIB::extIndex.1 = INTEGER: 1
UCD-SNMP-MIB::extIndex.2 = INTEGER: 2
UCD-SNMP-MIB::extNames.1 = STRING: mailq-check
UCD-SNMP-MIB::extNames.2 = STRING: fscheck
UCD-SNMP-MIB::extCommand.1 = STRING: /usr/local/nagios/libexec/check_mailq -w2 -c4 -t7 -Mpostfix
UCD-SNMP-MIB::extCommand.2 = STRING: /bin/touch /opt/rocheck && /bin/touch /tmp/rocheck
UCD-SNMP-MIB::extResult.1 = INTEGER: 0
UCD-SNMP-MIB::extResult.2 = INTEGER: 0
UCD-SNMP-MIB::extOutput.1 = STRING: OK: mailq reports queue is empty|unsent=0;2;4;0
UCD-SNMP-MIB::extOutput.2 = STRING: 
UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::extErrFix.2 = INTEGER: 0
UCD-SNMP-MIB::extErrFixCmd.1 = STRING: 
UCD-SNMP-MIB::extErrFixCmd.2 = STRING: 

Now, let me explain what all of these MIBs mean.
  • extIndex.#
    The Index number of the command
  • extNames.#
    The Name you gave the command. This is the first parameter you pass to exec
  • extCommand.#
    The full command you had SNMPd call to return you this info. Second parameter you passed to exec
  • extResult.#
    Exit Code from the command exec'd
  • extOutput.#
    Raw output of the command exec'd
  • extErrFix.#
    This is a bit that can be flipped by the client. Flipping this to a 1 kicks off the extErrFixCmd by SNMPd. Generally this is used to 'fix' and 'error' condition.
  • extErrFixCmd.#
    The command to be executed by the SNMPd server

2014-03-04

Restoring MS SQL Analysis Server Database or Cube with Specific ID

Restoring a MS SQL Database in 2008 and later is really easy, but I recently found that it won't always result in the Database having the ID you want it to. In my case, I needed to restore the database next to one that already existed. Upon doing so, it automatically changed the ID of the database to match that of the name I gave it. This is nice, but not what I wanted. I have a lot of jobs that refer to the database by it's ID, and you can't change an ID once it's in place. Furthermore, the restore dialog doesn't give you the option to specify your ID either, however, the Script button does... XMLA Scripts to the rescue!

Step Through It

  1. Open up Management Studio
  2. Connect to your Analysis Services Server
  3. Right Click on the Databases and hit "Restore..."
  4. When the Restore window comes up, fill in as much info as you can, but instead of hitting "OK", hit the "Script" button at the top. A new script will open behind your window. Close the window and go to the script, it should look something like this.
<Restore xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <File>\\Gibson\Awesome\AnalysisServices\Awesome_Data\Awesome_Data_20140201030032.abf</File>
  <DatabaseName>Awesome Data New</DatabaseName>
 <DatabaseID>Awesome ID</DatabaseID>
  <AllowOverwrite>true</AllowOverwrite>
</Restore>

Finally, Add that DatabaseID line and Hit F5 to run the restore!

When it's done you should have a newly restored Cube/Database, with the ID you were hoping for!