Create and Insert Work but not Drop or Delete for MariaDB Galera Clustering
I am using MariaDB 10.3.10 with 3-node clustering. I created table and inserted data at node 1. I could see new table and new data inserted in node 2 and 3. But if I perform delete or drop table at node 1, the table or data still exist in node 2 and 3.
3-node cluster. Last night, I inserted 1G data in table_1 on node 1. Data sync to other two nodes instantaneously.
Then I deleted all the data in table_1 by the following command on node 1:
delete from table_1;
All data (1G data with 7266722 rows ) was deleted from table_1 on node 1. Record count is zero. But there are still all the records (7266722 rows) in table_1 on node 2 and 3. Then I tried to execute a new insert command in node 3 to insert a new record into table_1.
The command was hang but when I queried table_1 on node 1, record count was 1, i.e. the insert command on node 3 was executed successfully! I queried table_1 on node 2 and 3 again. Still all the deleted records from node 1 still existed in node 2 and 3.
This morning I came back to office to check the result. There is still 1 record in table_1 on node 1. But all the records (7266722 rows) still exist in node 2 and 3. Then I tried to shut down Mariadb on both node 2 and 3. I used the following command:
systemctl stop mysql
The stop command hangs up. Then I use 'kill -9 [process id]' to kill mysql process for both node 2 and 3.
I started MariaDB for node 2 and 3 with the following command:
systemctl start mysql
MariaDB could be started up successfully. Both node 2 and 3 joined the cluster again. Then the table_1 content was sync from node 1 to node 2 and 3, i.e. only 1 record exists in all table_l on all nodes.
Evaluating your scenario. Can you please let us know what is the exact steps that you are carrying out for dropping the table. Its seems highly unlikely that if a table is dropped in one of the nodes in Galera cluster the change does not reflect on other nodes in the cluster, just not the way Galera works. Have you taken a look at the wsrep_OSU_method variable while you are doing these changes.
Please take a look at http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-osu-method and https://severalnines.com/blog/online-schema-upgrade-mysql-galera-cluster-using-rsu-method and let us know if this is the problem.
I dropped the table, table_1, on node 1. Create the same table again. This time less records, 159MB, with 1036245 records. After deleting all the records with the following statement on node 1:
delete from table_1;
there are still 1036245 records in table_1 on node 2 and 3.
The only way I can sync is stopping MariaDB on node 2 and 3 (using 'kill' because can't stop using 'systemctl stop mysql'). Start node 2 and 3 and re-join the cluster.
Simply speaking may I interpret TOI and RSU as follows:
=TOI, one table update on one node will replicate to all the other nodes.
=RSU, one table update on one node will NOT replicate to all the other nodes, until wsrep_OSU_method is updated to TOI
I have checked all nodes' wsrep_OSU_method. They are all TOI, the default values.
I have performed more testing on record deletion synchonization issue.
When deleting 11MB data from table on one node and inserting a new record on another node, it required 13 second to sync all nodes.
When deleting 22MB data from table on one node and inserting a new record on another node, it required 3 minutes to sync all nodes.
Yesterday when I deleted 1GB data on one node, after more than 8 hours the nodes still could not be sync!!!
After triggering the delete sql command, I queried all the nodes and found that they were all in 'synced' status:
wsrep_local_state_comment = Synced
Can you please send the complete error logs from all the three nodes as well as the my.cnf. The wsrep_OSU_method does not affect imply for changes to data for tables but rather schema upgrades like ALTER, OPTIMIZE etc. It would be much clearer where the problem lies if we take a look at the error logs from all the three nodes while issue was occuring.
Logs too large to be pasted here.
I found those statements in err log:
2018-12-04 11:37:19 2 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table table_1 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
Can you please create a ticket for this issue? we can discuss it there, also you will be able to attach the logs for further investigation. You can visit https://support.severalnines.com/hc/en-us and create a service request. The issue seems to be with the primary key as the delete will do a full table scan every time it is invoked and cause the cluster to stall. But we can be sure only from the error logs and also the configuration of the nodes in the clsuter. Hence can you please create a ticket and we can discuss in detail via the same.
After looking at the Galera documentation, deletes on tables without Primary Key is not permitted. It is one of the limitations of Galera. Please take a look at https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/
- All tables should have a primary key (multi-column primary keys are supported). DELETE operations are unsupported on tables without a primary key. Also, rows in tables without a primary key may appear in a different order on different nodes.
So once you add Primary key to the table the issue should be resolved.
Please sign in to leave a comment.