2013-09-23

Setting up Replication in MySQL of 3+ cluster nodes.

I've been running a multinode MySQL Replication Loop for a while and thought it'd be useful to write up how replication is synced between the peers. I'm writing this up as if there are three nodes, Adam, Ben and Charlie in a loop, but you can do this with any number of nodes. In our setup Adam and Ben are in our primary location, with Charlie sitting in our DR setup. This means Charlie is always on but doesn't get many queries on a regular basis. Adam and Ben have heartbeatd setup as a failover pair so that when Adam has a vacation (downtime) everything fails over to Ben and continues to run. We do this simply with a floating IP.

1. Decide which node to start with

Because of this setup, prior to doing a resync, it's important to connect to the node who currently has the active IP and start things from there. To find out who that is, connect to the nodes and run:
$ /sbin/ip addr
You'll see something like:
: eth0:  mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:56:82:29:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.56/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.59/24 brd 192.168.1.255 scope global secondary eth0
Note how this one has both the primary (eth0) and "secondary eth0" address on it. Also, we know our floating IP is 192.167.1.59, so we know it's here. Because of that, this node has the latest info and will thus be the beginning of our resync process.

2. Stop the Slaves

To stop the slaves, connect to mysql as follows:
$ mysql -p -u root

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
I recommend going ahead and stopping the slave on both Adam and Charlie as well at this point.

3. Get a fresh mysqldump

To start the resync process, we need to take a dump of all databases with masterdata in a single-transation:
/bin/mysqldump -p -u root --all-databases --single-transaction --master-data | bzip2 > 20120912-midsync-`hostname`.sql.bz2
This will make sure that we get all the info we need in the backup. Without Master Data we'd have to remember when we took the backup when we restored it in order to not miss any updates, which is almost impossible when running this against an active loop. When this is done, transfer the backup from Adam to Ben. Then, stop the slaves around the circle.

4. Transfer dump to the next node in the circle

You can use scp or what ever you'd like. just get the backup over there.

5. Import the dump

Then, on the first target (Ben in this case), bunzip2 the file, and then import it:
$ mysql -p -u root < 20120912-midsync-adam.sql
Because we included Master Data, this will automatically set the master position to be correct to resume the slave... but don't do it yet.

6. Rinse, repeat

We're going to continue around the circle making backups of each, passing them onto the next and then restoring. in our case, we need a fresh backup of Ben, same way we did the original Adam backup. Transfer that one to Charlie. bunzip2 it, and import it with the MySQL command. If you have more nodes, you'll continue this around until all have fresh restores.
If you have multiple primary or online nodes, you'll want to 'start slave;' right after you import the backup. This will make sure they don't have stale data or data that will conflict with another node's data.
For us, we're going to take advantage of the fact that Charlie is in our DR site and thus doesn't have much written to it, but still we'll need to do this quickly.

7. The Last Hop

Okay, you've got all of the up to date, but we haven't made that last hop where 0 pulls from N. Instead of the normal restore process, we're going to switch it up a bit:
$ ls
20120912-BenBackup.sql.bz2
$ bunzip 20120912-BenBackup.sql.bz2
$ echo "reset master;" >> 20120912-BenBackup.sql
$ mysqp -u root -p < 20120912-BenBackup.sql
Adding on 'reset master' will reset the master files and counter. Now, bring up both this node's terminal and Adam's terminal up next to each other. Login to Adam's MySQL prompt and type in "CHANGE MASTER TO MASTER_LOG_FILE='', MASTER_LOG_POS=###;", but don't hit enter. On Charlie, at the MySQL prompt, type in 'show master status;' and hit enter. Now, copy the file name in the first data column over to your command on Adam. Then, move your cursor to the ###'s, and run the 'show master status' command again copying the number over and hitting enter as quick as you can accurately.

8. Gentlemen, start your slaves!

Once you have it set, run "start slave" on Adam, then on Charlie, and finally on Ben. It's important that this number matches.

9. Check your status

When the slave is started, it pulls any changes since the backups' restore (or master reset) from the node behind it (Charlie pulls from Ben, for instance). By working backward, we have a better chance of the replication circle staying in sync and once they're all up, you're almost done. Give it a minute and run "Show Slave Status\G" on each. You should see something similar to this:
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.57
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: ben-bin.000022
          Read_Master_Log_Pos: 888045912
               Relay_Log_File: charlie-relay-bin.000002
                Relay_Log_Pos: 81957
        Relay_Master_Log_File: ben-bin.000022
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 888045912
              Relay_Log_Space: 82121
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
1 row in set (0.00 sec)