Currently wondering if there is some real interest for us to keep Cluster Control because of its poor stability, at the moment I'm a bit disappointed about the product, I know this is Open Source and I'm really aware of this because it is proposed free of charge and freely available to everyone, this is always a big effort from dev side I know !
Here some more details to explain my opinion :
Using cmon 1.1.30 at the moment, tried to upgrade at every new version :
1 - on every nodes continually having some messages within /var/log/cmon.log saying :
Jun 20 09:38:11 : (INFO) Checking if there is a MySQL Server running @ 127.0.0.1
Jun 20 09:38:20 : (WARNING) Query select p.pidfile, p.exec_cmd, p.process, p.id as pid, p.hid, p.pgrep_expr from processes p where p.hid=2 and p.active=1 and p.cid=1 failed: Error: Unknown column 'p.pgrep_expr' in 'field list'
I don't know why I have this warning first and I can't understand why this INFO is here : useless IMO, it fills the log this is the only thing I see.
2 - Very often in node's log and even in the cmon.log of the server on which the cluster control mysql server and web interface is located, we have :
Jun 20 09:42:45 : (WARNING) Could not open /proc/573/stat file
Jun 20 09:42:45 : (WARNING) Could not open /proc/647/stat file
Jun 20 09:42:45 : (WARNING) Could not open /proc/883/stat file
Jun 20 09:42:45 : (WARNING) Could not open /proc/936/stat file
For my specific case, these messages are here from the beginning we start to use the tool ( so probably from 1.1.16 or 1.1.18 if I remember well). Why ? I don't know.
3 - Very often also, I found that the cmon agent on nodes or on cluster control server is dead, the process is not running anymore and this without any log entries making it impossible to know the cause of this 'crash' ?
4 - Graphs/RRD problems :
I do not count anymore the times that my graph are not working anymore and/or displaying some weird log entries related with the cron job which is generating graphs, below is a sample :
ERROR: /var/lib/cmon//cluster_1_mysql_192.168.0.1|3306_stats.rrd: found extra data on update argument: 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:3009621:0:0:0:0:18:3:21:1:86492:86492:0:0:0:0:93:357:33:3:0:0:0:0:0:0:0:0:0:826393:4:11:38029:0:0:0:0:0:0:4:0:0:4:0:0:2:0:434863:223124655:38134:37751134
ERROR: /var/lib/cmon//cluster_1_mysql_192.168.0.5|3306_stats.rrd: found extra data on update argument: 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:2485179:0:0:0:0:22:3:25:1:88011:88011:0:0:0:0:93:357:33:3:0:0:0:0:0:0:0:0:0:826393:0:0:36750:2:0:0:0:0:0:4:0:0:4:0:0:2:0:443950:229454199:36848:35546852
ERROR: /var/lib/cmon//cluster_1_mysql_192.168.0.10|3306_stats.rrd: found extra data on update argument: 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:8874516:0:0:0:0:31:5:101:0:155268:155268:0:0:0:0:93:357:33:3:0:0:0:0:0:0:0:0:0:826393:1:1:750726:1:0:0:1:0:0:4:0:0:4:0:0:2:0:76965:73500161:751278:380746069
ERROR: /var/lib/cmon//cluster_1_stats.rrd: expected 9 data source readings (got 1) from N
5 - Could it be possible or maybe is it already existing to have a detailed ChangeLog of what has been changed/upgraded/added/deleted/fixed ... at every new version please ?
One more question : is there a way to disable the authorisation for clustercontrol to stop/start mysql/Galera on nodes or to lower number of commands it's authorized to run on remote hosts because as I do not have any visibility on what cluster control can do/is doing on nodes and because of the fact that I really find the product too buggy at the moment, I would like to have it only be able to grab data to fill web interface, graphs and mysql database but not able to stop/start services on remote nodes ?
At your disposal to discuss about these points if needed for more informations in the aim to improve the product.
Please sign in to leave a comment.