Engineering For Zero Downtime – Advanced RDS Database Migrations

zero downtime engineering

Having more users means putting more thought into how we minimize disruption.


Scaling successfully brings challenges of its own. For Learnosity, one of the most important of these is engineering for zero downtime. It’s something we take very seriously, which is pretty clearly reflected in how we approach the design and upgrades of our systems.

One of the most difficult parts of any high-volume system to upgrade is the Database layer, as there are two ways to approach it:

  1. Easy way  have an outage and do a backup and restore of databases.
  2. The Right way  sync up a new database and switch over to it live.

Learnosity is built on Amazon’s Web Services (AWS) and their RDS service provides great tools that solve many of the core heavy lifting that our systems require. Many of the things that were painful are now problems that have been solved:

  • Backup and restore of DB’s
  • Creation of Read Replicas
  • Database encryption

However, there are still places where it’s necessary to dive a bit deeper, and I found one of those recently.

Migrating to Encrypted Volumes with Zero Downtime

With RDS, there is currently no built-in way to migrate from a non-encrypted to an encrypted volume so we had to go off-piste a bit.

Knowing that it is possible to migrate into RDS using MySQL/MariaDB standard replication – and that it is also possible to migrate out of RDS using the same technique  we figured it would be possible to do this internally between 2 differently configured RDS machines. It turns out we were correct, and here is how we did it.

Key components:

  • RDS Master DB  source-master
  • RDS Read Only Slave  source-slave
  • RDS New Master DB  target-master

Setup

For testing I set up the following machines:

RDS Master DB:

Name: source-master
Not encrypted
User: master
Pwd: password
DBName: test
MariaDB 10.0.24

RDS Read Only Replica:

Setup via AWS console, so has the same configuration as the RDS Master DB.

Name: source-slave

The RDS Target Master DB was configured as follows:

Name: target-master
Encrypted
User: master
Pwd: password
DBName: test
MariaDB 10.1.23

 

Instructions

  1. Wait for source-master and source-slave to both have the Available status.
  2. Open a MySQL connection to source-slave and run the following command so that it retains is logs for longer while doing this.:
    call mysql.rds_set_configuration('binlog retention hours', 24);
  3. On source-master create a replication user (with more secure passwords)
    CREATE USER 'repl’@‘%’ IDENTIFIED BY 'pass';
    GRANT REPLICATION SLAVE ON *.* TO 'repl’@’%';
  4. Open a MySQL connection to source-slave and run the commands:
    call mysql.rds_stop_replication;
    SHOW SLAVE STATUS;
  5. Note down the values in Master_Log_File and Read_Master_Log_Pos as shown in the snippet:
    Slave_IO_State:
    Master_Host: 10.0.0.189
    Master_User: rdsrepladmin
    Master_Port: 3306
    Connect_Retry: 60
    Master_Log_File: mysql-bin-changelog.000013
    Read_Master_Log_Pos: 1246
    Relay_Log_File: relaylog.000002
    Relay_Log_Pos: 918
    Relay_Master_Log_File: mysql-bin-changelog.000013
    Slave_IO_Running: No
    .....
    Master_SSL_Crlpath:
    Using_Gtid: Slave_Pos
    Gtid_IO_Pos: 0-1385452505-37
  6. Run a mysqldump from source-slave with the following command:
    mysqldump -h source-slave -u user -p password --single-transaction
      --routines --triggers --databases database1 database2 
      | gzip > source-slave-dump.sql.gz
  7. Import the DB dump to target-master:
    zcat source-slave-dump.sql.gz | mysql -h target-master -u user -p password
  8. Open a MySQL connection to target-master and set the source as source-master using the following command:
    CALL mysql.rds_set_external_master (
    'source-master.xxxxxxx.ap-southeast-2.rds.amazonaws.com'
    , 3306
    , 'repl'
    , 'pass'
    , 'mysql-bin-changelog.000013'
    , 1246
    , 0
    );
  9. Start replication by running on target-master and monitoring progress using:
    CALL mysql.rds_start_replication;
    SHOW SLAVE STATUS;
  10. You can now safely delete the source-slave machine
  11. You can now transition your writes over to the new master and proceed with near zero downtime (depending on your application)
  12. When you want to disconnect the target-master from the source-master use the following:
    CALL mysql.rds_stop_replication;
    CALL mysql.rds_reset_external_master;
  13. That’s it!

Mark Lynch is co-founder and CTO of Learnosity.

References


This post was posted in , by on