In this article, we will discuss some of the common scenarios and solutions for dealing with a MySQL master crash in a master-slave replication setup. We will assume that the master and the slave are using InnoDB as the storage engine and binary logging as the replication method.
Scenario 1: The master crashes and no data changes on the slave
This is the simplest and most ideal scenario, where the master server crashes but no data changes on the slave server. In this case, the slave server can continue to serve read-only queries until the master is back online. The slave server will have the same data as the master before the crash, so no data loss or inconsistency will occur.
To recover the master server, we need to do the following steps:
Stop the slave server and copy over the database files from the slave to the master.
From the relay-log.info and the master.info files on the slave, find out the latest binary log file and position from which the slave was reading before the master crashed.
Replay the binary logs on the master from that position until the end of the logs.
Reset the slave server and restart the replication from the last position executed by the slave before the master crashed.
This procedure is based on [this answer] from Server Fault.
Scenario 2: The master crashes and some data changes on the slave
This scenario is more complicated, where the master server crashes and some data changes on the slave server. This can happen if the slave server is configured to allow write operations or if some maintenance tasks are performed on the slave. In this case, the slave server may have some extra transactions that are not present on the master before the crash. This can lead to data inconsistency and replication errors.
To recover the master server, we need to do the following steps:
Stop the slave server and enable binary logging on it.
From the relay-log.info file on the slave, find out the latest binary log file and position from which the slave was reading before the master crashed.
Replay the binary logs on the master from that position until the end of the logs.
Use a tool like pt-table-checksum or pt-table-sync to compare and synchronize the data between the master and the slave.
Reset both servers and restart the replication from a consistent point.
This procedure is based on [this article] from ScaleGrid.
Scenario 3: The master crashes and some data changes on both servers
This scenario is even more complex, where both servers have diverged data after a crash. This can happen if both servers are configured to allow write operations or if some maintenance tasks are performed on both servers. In this case, both servers may have some conflicting transactions that are not present on each other. This can lead to data inconsistency and replication errors.
To recover both servers, we need to do one of the following options:
Option 1: Choose one server as a new master and discard or overwrite any changes made on another server. This option may result in data loss or corruption.
Option 2: Merge any changes made on both servers using a tool like pt-table-checksum or pt-table-sync. This option may require manual intervention and conflict resolution.
Option 3: Restore both servers from a backup taken before or after a crash. This option may require downtime and data loss depending on when a backup was taken.
This procedure is based on [this answer] from Stack Overflow.
Conclusion
In this article, we have discussed some of the common scenarios and solutions for dealing with a MySQL master crash in a master-slave replication setup. We have seen that the recovery process depends on the replication mode, the data changes, and the backup strategy. We have also seen that some tools can help us to compare and synchronize the data between the servers. However, we should always test and verify the recovery procedure before applying it in a production environment.
a104e7fe7e