Galera Cluster - Diffrent DB size on each node / Resync Node

M Mueller

January 04, 2019 18:09

Hi all,

i have some questions and hope you can give me some answers:

- I've noticed that the DB Size is different on each node. For example:

root@db01:/var/lib/mysql# du -sh

32G .

vs.

root@db02:/var/lib/mysql# du -sh
33G .

Or:

db01 --> mysql> SELECT table_schema "database name",
-> sum( data_length + index_length ) / 1024 / 1024 "database size in MB",
-> sum( data_free )/ 1024 / 1024 "free space in MB"
-> FROM information_schema.TABLES
-> GROUP BY table_schema;
+--------------------+---------------------+------------------+
| database name | database size in MB | free space in MB |
+--------------------+---------------------+------------------+
| db1 | 3.50000000 | 4.00000000 |
| information_schema | 0.17187500 | 88.00000000 |
| mysql | 2.66300869 | 4.00000000 |
| performance_schema | 0.00000000 | 0.00000000 |
| db2 | 99.56250000 | 53.00000000 |
| db3 | 18880.06250000 | 72.00000000 |
| sys | 0.01562500 | 0.00000000 |
+--------------------+---------------------+------------------+
7 rows in set (0.02 sec)

db02 --> mysql> SELECT table_schema "database name",
-> sum( data_length + index_length ) / 1024 / 1024 "database size in MB",
-> sum( data_free )/ 1024 / 1024 "free space in MB"
-> FROM information_schema.TABLES
-> GROUP BY table_schema;
+--------------------+---------------------+------------------+
| database name | database size in MB | free space in MB |
+--------------------+---------------------+------------------+
| db1 | 3.50000000 | 4.00000000 |
| information_schema | 0.17187500 | 88.00000000 |
| mysql | 2.64738369 | 4.00000000 |
| performance_schema | 0.00000000 | 0.00000000 |
| db2 | 100.54687500 | 55.00000000 |
| db3 | 20125.09375000 | 61.00000000 |
| sys | 0.01562500 | 0.00000000 |
+--------------------+---------------------+------------------+
7 rows in set (0.02 sec)

How can this happen? Is there anything which is not in sync? db01 is the only galera node, where writes are performed.

- I've noticed that there are some .GRA files inside and i already was reading something about that (http://www.geoffmontee.com/ddl-failures-in-mariadb-galera-cluster/ and http://galeracluster.com/documentation-webpages/schemaupgrades.html#toi), i'm searching a way to get the exact command that caused that problem to identify the reason. I found this article: https://www.percona.com/blog/2012/12/19/percona-xtradb-cluster-pxc-what-about-gra_-log-files/ but no header for 5.7.

In general, are those .GRA files bad all the time and does this happen often on other clusters?

Because i found those files i wanted to check the node with mysqlcheck and this caused that db02 got kicked out of the cluster with this error message:

2018-12-28T13:49:45.473787Z 0 [Note] Plugin 'FEDERATED' is disabled.
2018-12-28T13:49:45.477692Z 0 [Note] Plugin mysqlx reported: 'Scheduler "work" started.'
2018-12-28T13:49:45.477769Z 0 [Note] Plugin mysqlx reported: 'Scheduler "network" started.'
2018-12-28T13:49:45.477835Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2018-12-28T13:49:45.477841Z 0 [Note] InnoDB: Transaction 2585205 in prepared state after recovery
2018-12-28T13:49:45.477844Z 0 [Note] InnoDB: Transaction contains changes to 1 rows
2018-12-28T13:49:45.477847Z 0 [Note] InnoDB: Transaction 2585204 in prepared state after recovery
2018-12-28T13:49:45.477850Z 0 [Note] InnoDB: Transaction contains changes to 1 rows
2018-12-28T13:49:45.477852Z 0 [Note] InnoDB: 2 transactions in prepared state after recovery
2018-12-28T13:49:45.477861Z 0 [Note] Found 2 prepared transaction(s) in InnoDB
2018-12-28T13:49:45.477879Z 0 [ERROR] Found 2 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2018-12-28T13:49:45.477883Z 0 [ERROR] Aborting

and

root@db02:/var/lib/mysql# cat grastate.dat
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
safe_to_bootstrap: 0

So i deleted the whole /var/lib/mysql folder and started mysql again, so the node was completed resynced. My question, is that the correct way for one node to resync or in general a good idea in this case?

And i would really like to know if it's safe to run mysqlcheck on galera.

- Last question, i was reading that you have to manually add users to new nodes. But in this case the node had all users synced already.

We're running on Percona XtraDB 5.7

Any help is appreciated.

Comments

8 comments

Ashraf Sharif

January 07, 2019 04:33
Hi,

Regarding GRAheader for 5.7, you may try this one: https://www.dropbox.com/s/xeokdy4ok2cyc3o/GRA_header57.tar.gz?dl=0

Regarding mysqlcheck:

AFAIK, mysqlcheck command is not replicated in Galera and it uses the unsupported explicit LOCK/UNLOCK TABLE statements during the operation. Details on this can be found at https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/ . The error log that you've shown was related to an error when MySQL was starting up, not the actual reason why the node got kicked out from the cluster at the first place. If you could give the whole MySQL error log, that would be helpful.

Regarding node resync:

Yes, that is the best way to fully resync a node in Galera, by wiping out /var/lib/mysql content (or simply delete grastate.dat). Depending on the DB size, it could cause performance impact to the donor node so handle it with care.

Regarding adding users to new nodes:

When a node rejoins and resyncs via SST, the new node should have all the users from the synced node. If you are adding a user via CREATE/GRANT statements, the user will be synced by Galera automatically. Not sure what you really mean by manually add users to new node, one use case I can think of if you add a user via 'INSERT INTO' statement to mysql.user table, which requires FLUSH PRIVILEGES and won't be synced by Galera due to its MyISAM storage engine.

Regards,
Ashraf
0

Comment actions Permalink
M Mueller

January 25, 2019 07:57
Hi Ashraf,

thanks for your help.

Yesterday the same node got kicked out again. We do a full mysqldump on that one every hour (command: /usr/bin/mysqldump --defaults-file=/etc/mysql/my.cnf --opt --single-transaction --skip-lock-tables --skip-add-locks --set-gtid-purged=OFF --routines --triggers --all-databases > destination.sql ).

Is that command right?

Here are the errors on that node, before it got kicked out:

2019-01-24T11:01:02.875232Z 8 [ERROR] Slave SQL: Could not execute Write_rows event on table db3.UserEvent; Cannot add or update a child row: a foreign key constraint fails (`db3`.`UserEvent`, CONSTRAINT `UserEvent_ibfk_1` FOREIGN KEY (`eid`) REFERENCES `Event` (`uuid`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 241, Error_code: 1452

2019-01-24T11:01:02.875247Z 8 [Warning] WSREP: RBR event 3 Write_rows apply warning: 151, 12370423

2019-01-24T11:01:02.875331Z 8 [Note] WSREP: Applier statement rollback needed

2019-01-24T11:01:02.875394Z 8 [Warning] WSREP: Failed to apply app buffer: seqno: 12370423, status: 1

at galera/src/trx_handle.cpp:apply():353

Retrying 2th time

2019-01-24T11:01:02.875537Z 8 [ERROR] Slave SQL: Could not execute Write_rows event on table db3.UserEvent; Cannot add or update a child row: a foreign key constraint fails (`db3`.`UserEvent`, CONSTRAINT `UserEvent_ibfk_1` FOREIGN KEY (`eid`) REFERENCES `Event` (`uuid`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 241, Error_code: 1452

2019-01-24T11:01:02.875544Z 8 [Warning] WSREP: RBR event 3 Write_rows apply warning: 151, 12370423

2019-01-24T11:01:02.875608Z 8 [Note] WSREP: Applier statement rollback needed

2019-01-24T11:01:02.875636Z 8 [Warning] WSREP: Failed to apply app buffer: seqno: 12370423, status: 1

at galera/src/trx_handle.cpp:apply():353

Retrying 3th time

2019-01-24T11:01:02.875753Z 8 [ERROR] Slave SQL: Could not execute Write_rows event on table db3.UserEvent; Cannot add or update a child row: a foreign key constraint fails (`db3`.`UserEvent`, CONSTRAINT `UserEvent_ibfk_1` FOREIGN KEY (`eid`) REFERENCES `Event` (`uuid`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 241, Error_code: 1452

2019-01-24T11:01:02.875758Z 8 [Warning] WSREP: RBR event 3 Write_rows apply warning: 151, 12370423

2019-01-24T11:01:02.875917Z 8 [Note] WSREP: Applier statement rollback needed

2019-01-24T11:01:02.875951Z 8 [Warning] WSREP: Failed to apply app buffer: seqno: 12370423, status: 1

at galera/src/trx_handle.cpp:apply():353

Retrying 4th time

2019-01-24T11:01:02.876072Z 8 [ERROR] Slave SQL: Could not execute Write_rows event on table db3.UserEvent; Cannot add or update a child row: a foreign key constraint fails (`db3`.`UserEvent`, CONSTRAINT `UserEvent_ibfk_1` FOREIGN KEY (`eid`) REFERENCES `Event` (`uuid`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 241, Error_code: 1452

2019-01-24T11:01:02.876081Z 8 [Warning] WSREP: RBR event 3 Write_rows apply warning: 151, 12370423

2019-01-24T11:01:02.876135Z 8 [Note] WSREP: Applier statement rollback needed

2019-01-24T11:01:02.889001Z 8 [ERROR] WSREP: Failed to apply trx: source: 4827da4a-0223-11e9-be7f-87eaa7558c60 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 41296663 trx_id: 5250016608 seqnos (l: 10841157, g: 12370423, s: 12370422, d: 12370320, ts: 3259055077710382)

2019-01-24T11:01:02.889019Z 8 [ERROR] WSREP: Failed to apply trx 12370423 4 times

2019-01-24T11:01:02.889023Z 8 [ERROR] WSREP: Node consistency compromised, aborting...

2019-01-24T11:01:02.895259Z 8 [Note] WSREP: turning isolation on

2019-01-24T11:01:02.896259Z 8 [Note] WSREP: Closing send monitor...

2019-01-24T11:01:02.896269Z 8 [Note] WSREP: Closed send monitor.

2019-01-24T11:01:02.896276Z 8 [Note] WSREP: gcomm: terminating thread

2019-01-24T11:01:02.896287Z 8 [Note] WSREP: gcomm: joining thread

2019-01-24T11:01:02.898108Z 8 [Note] WSREP: gcomm: closing backend

2019-01-24T11:01:03.197365Z 8 [Note] WSREP: (d41e767f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.104.13:4567 tcp://10.0.104.15:4567

2019-01-24T11:01:07.898246Z 8 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)

2019-01-24T11:01:07.898264Z 8 [Note] WSREP: declaring node with index 1 suspected, timeout PT5S (evs.suspect_timeout)

2019-01-24T11:01:07.898272Z 8 [Note] WSREP: evs::proto(d41e767f, LEAVING, view_id(REG,2e409302,41)) suspecting node: 2e409302

2019-01-24T11:01:07.898278Z 8 [Note] WSREP: evs::proto(d41e767f, LEAVING, view_id(REG,2e409302,41)) suspected node without join message, declaring inactive

2019-01-24T11:01:07.898283Z 8 [Note] WSREP: evs::proto(d41e767f, LEAVING, view_id(REG,2e409302,41)) suspecting node: 4827da4a

2019-01-24T11:01:07.898287Z 8 [Note] WSREP: evs::proto(d41e767f, LEAVING, view_id(REG,2e409302,41)) suspected node without join message, declaring inactive

2019-01-24T11:01:07.898313Z 8 [Note] WSREP: Current view of cluster as seen by this node

view (view_id(NON_PRIM,2e409302,41)

memb {

d41e767f,0

}

joined {

}

left {

}

partitioned {

2e409302,0

4827da4a,0

}

)

2019-01-24T11:01:07.898341Z 8 [Note] WSREP: Current view of cluster as seen by this node

view ((empty))

2019-01-24T11:01:07.899501Z 8 [Note] WSREP: gcomm: closed

2019-01-24T11:01:07.899570Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2019-01-24T11:01:07.899853Z 0 [Note] WSREP: Flow-control interval: [100, 100]

2019-01-24T11:01:07.899859Z 0 [Note] WSREP: Trying to continue unpaused monitor

2019-01-24T11:01:07.899863Z 0 [Note] WSREP: Received NON-PRIMARY.

2019-01-24T11:01:07.899866Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 12370428)

2019-01-24T11:01:07.899881Z 0 [Note] WSREP: Received self-leave message.

2019-01-24T11:01:07.899887Z 0 [Note] WSREP: Flow-control interval: [0, 0]

2019-01-24T11:01:07.899890Z 0 [Note] WSREP: Trying to continue unpaused monitor

2019-01-24T11:01:07.899894Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.

2019-01-24T11:01:07.899897Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 12370428)

2019-01-24T11:01:07.900345Z 0 [Note] WSREP: RECV thread exiting 0: Success

2019-01-24T11:01:07.901703Z 8 [Note] WSREP: recv_thread() joined.

2019-01-24T11:01:07.901717Z 8 [Note] WSREP: Closing replication queue.

2019-01-24T11:01:07.901721Z 8 [Note] WSREP: Closing slave action queue.

2019-01-24T11:01:07.902133Z 8 [Note] WSREP: /usr/sbin/mysqld: Terminated.

We are only sending traffic to the first node, this is the second one. I only see errors on second and third node. The third node is always being used as donor in such cases. Do you have an idea what the problem is and how we can solve it? Does this happen very often on other clusters?

With a normal MySQL Installation we never had problems.
0

Comment actions Permalink
M Mueller

January 25, 2019 08:01
We have the same error some time after a new resync but this time the node works just fine.
0

Comment actions Permalink
Ashraf Sharif

January 25, 2019 08:16
Hi,

Can you check if you have any tables without Primary Key defined? I don't know about the version your are running on but your issue seems related to this bug: https://jira.percona.com/browse/PXC-1566 .

Since you mentioned that only first node is receiving all the traffics, that node is the best candidate for a donor if you would want to re-sync everything again.

Regards,
Ashraf
0

Comment actions Permalink
M Mueller

January 25, 2019 09:20
No we do not have any tables without Primary Keys. Our version is:

Percona-XtraDB-Cluster-server 5.7.23-23-57

Percona XtraDB Cluster (GPL), Release rel23, Revision f5578f0, WSREP version 31.31, wsrep_31.31

So i think it's very updated.

How can i choose wich node to use for donor? Node is running with binary loging enabled and shown as Master in ClusterControl. Would i have to shutdown node 3 and 2 (so there is only node 1 left) and then let node 2 resync?
0

Comment actions Permalink
M Mueller

January 29, 2019 07:41
Any idea?
0

Comment actions Permalink
Paul Namuag

January 29, 2019 08:39
Hi M Mueller,

Your issue is mainly inconsistency. Galera does have data inconsistency detection during/or when transaction is currently processed. You can read the FAQ here http://galeracluster.com/documentation-webpages/faq.html. Possible scenarious could be encounter split brain (see here https://severalnines.com/blog/how-recover-galera-cluster-or-mysql-replication-split-brain-syndrome), or possible could be wsrep_on was set OFF for instance.

I suggest you can check the most consistent node and fix or sync the affected table. You can use mysqldump on this case or use pt-table-checksum tool from Percona (see this post https://www.percona.com/blog/2012/10/15/using-pt-table-checksum-with-percona-xtradb-cluster/). Once you are able to identify most correct and consistent node, then you can sync from those node to the affected node(s).
0

Comment actions Permalink
M Mueller

August 02, 2019 12:03
Just wanted to post an update: The reason was the foreign key and that i've done a mysqldump every hour.

As soon as i set wsrep_slave_threads = 1 it worked without any problem.

But thanks for your help.
0

Comment actions Permalink

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?