ClusterControl (CMON) on Galera Cluster not working
Hi,
i have installed CMON v1.1.33 from .tar.gz file per instructions from
http://support.severalnines.com/entries/20613923-installation-on-an-existing-cluster
for monitoring my Test Cluster ( Percona XTRA DB Cluster )
When i go to the Webinterface (Dashboard) i can see the cluster but Status is "FAILURE - NO CONTACT".
When I switch to "actions => view Cluster" i can see my two mysql-servers with follwoing Status
MySQL Status: OK
Galera Status: N/A
Server Stats: N/A
Host Status (ping,CPU,uptime is ok )
LastCheck : hours ago !
View Graphs => no Graphs available but Host stats tell me
MySQL Status = unknown
LastCheck = "here time is ok"
Host Stats ="looks good"
When i switch to the Dashboard (MySQL Servers)
Everything seems to be ok ( Top,Load ... and Graphs are available )
I have checked the communication between the ControlHost and one of the MySQL Servers with tcpdump and tshark
The only SQL Statements from ControlHost to Mysql-Host are
SET WSREP_ON=0
select 1
SET WSREP_ON=0
select 1
SET WSREP_ON=0
System Details
ControlHost:
-----------------
OS: debian Squeeze
Mysql: Server Version: 5.5.25a-27.1
Cluster Hosts (2x)
-------------------
OS: debian Squeeze
MySQL: Percona XTRADB Cluster binaries (Percona-XtraDB-Cluster-5.5.24-23.6.342.Linux.x86_64)
(Server version: 5.5.24-23.6-log Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342)
Cluster Status from mysql
-----------------------------------
root@mysql-host-control:~# mysql -h 10.212.11.91 -u cmon -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.5.24-23.6-log Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show status like 'wsrep%';
+----------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------+--------------------------------------+
| wsrep_local_state_uuid | 32ca6938-f10f-11e1-0800-54b1dd29fa87 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 8 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 253 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0.000000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced (6) |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cluster_conf_id | 12 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 32ca6938-f10f-11e1-0800-54b1dd29fa87 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 1 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 2.1(r113) |
| wsrep_ready | ON |
+----------------------------+--------------------------------------+
39 rows in set (0.00 sec)
Any hints ?
Kind regards
Stefan
cmon_controller_log.txt
cmon.cnf
-
Hi Stefan,
We have not yet verified this on percona, but can you try to do:
edit my.cnf and set:
cmon_core_dir=/root/s9s-galera-2.1.0/
save and exit.
Then do
mkdir -p /root/s9s-galera-2.1.0/mysql/config
Now since this is a cluster, i trust you have the equivalent my.cnf files. The cmon conroller needs to read a my.cnf for the galera nodes in order to make some conclusions about data dirs of the mysql servers etc. (this will go away very soon, but for now it is like this).
Copy one of the my.cnf to /root/s9s-galera-2.1.0/mysql/config
So that you have:$ ls /root/s9s-galera-2.1.0/mysql/config/my.cnf
/root/s9s-galera-2.1.0/mysql/config/my.cnf(please note the s9s-galera-2.1.0 directory can be called whatever, but it must have mysql/config underneath).
Now how well the recovery works depends on how you have setup your Galera Cluster, Most tutorials on the internet are wrong, and as i said we have never tested on Percona Xtra Cluster, but we do plan to support it quite soon. So your input is very much appreciated!
Let us know what happens, but try the above, and then send a new set of logs.
If you can also provide the different my.cnf files that would be great too!
Thanks,
Johan
-
Hi Johan,
thanks for your quick response !
i have droped the database and started from scratch - have also done what you have advised
additional i have stopped the cmon agents on the cluster hosts
i have also put a symlink on the cluster-hosts to the my.cnf file as they are not in /etc
as per log it seem that this is a must
Aug 31 16:23:01 : (INFO) ssh -q -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -oNumberOfPasswordPrompts=0
-oIdentityFile=/root/.ssh/id_rsa root@10.212.11.90 "/bin/cat /etc//my.cnf" )
After that i have tried to start the cmon controller
/usr/local/cmon/sbin/cmon --config-file=/etc/cmon.cnf
but i get a segfault => tried this several times ( ca. 7x ) and suddenly it was running
i have attached the consoel output, syslog file has following messages
Aug 31 16:37:45 mysql-host-control kernel: [ 1990.327832] cmon[925]: segfault at 7fc823128000 ip 0000000000429f97 sp 00007fc823125eb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:10 mysql-host-control kernel: [ 2075.448526] cmon[1255]: segfault at 0 ip 00007f1fb71e4de3 sp 00007f1fb5770ab0 error 6 in libstdc++.so.6.0.13[7f1fb717b000+f6000]
Aug 31 16:39:18 mysql-host-control kernel: [ 2082.959326] cmon[1374]: segfault at 7fd6a233d000 ip 0000000000429f97 sp 00007fd6a233aeb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:20 mysql-host-control kernel: [ 2084.996815] cmon[1420]: segfault at 7f5bdb6e7000 ip 0000000000429f97 sp 00007f5bdb6e4eb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:21 mysql-host-control kernel: [ 2086.586962] cmon[1467]: segfault at 7f586cca8000 ip 0000000000429f97 sp 00007f586cca5eb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:23 mysql-host-control kernel: [ 2087.746793] cmon[1511]: segfault at 7fa8501c3000 ip 0000000000429f97 sp 00007fa8501c0eb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:24 mysql-host-control kernel: [ 2088.890331] cmon[1555]: segfault at 7f451add9000 ip 0000000000429f97 sp 00007f451add6eb0 error 4 in cmon[400000+3e3000]
Aug 31 16:39:43 mysql-host-control kernel: [ 2108.513499] cmon[1669]: segfault at 7fcbafc86000 ip 0000000000429f97 sp 00007fcbafc83eb0 error 4 in cmon[400000+3e3000]i will test this some time and then i will activate the agents on the Cluster-Hosts
Thanks,
Stefan
-
i have created now core dumps
i tooks 6 times to get cmon running ( the first 5 attempts segfaulted)
i have tried to backtrace the core dump but i think i would need a "debug" version of cmon with debug symbols.
anyway i have attached the results
If the controller is running it seems to be stable - agent communication is working too
regards
Stefan
-
Hi,
Thanks, i will look at them. I am currently setting up a Percona Xtradb Cluster. You are a bit on unchartered territory right now (normally people deploy using our deployment packages and then we know the infrastructure + config settings, and locations etc), but it should naturally work on an existing cluster too (and it does for MySQL Cluster, and Replication, but Galera lacks some testing in that area for now).best regards,
Johan
Please sign in to leave a comment.
Comments
4 comments