cmon cannot start after power failure centos 7
I deployed HA cluster using Custercontrol 5 days ago but today server was shut down twice due to power issue.
Front end says
Cluster details cannot be retrieved. Please check the CMON process status (service cmon status). Also, ensure the dcps.apis token matches the rpc_key in /etc/cmon.cnf.
Failed connect to 127.0.0.1:9500; Connection refused
Token is identical
netstat -tulnp|grep 9500
nothing is listening (no cmon daemon launched)
[root@ha-1 ~]# /usr/sbin/cmon -d
The Cmon Controller is initializing.
Aug 13 20:57:57 : (INFO) Loaded configuration file '/etc/cmon.cnf'.
Aug 13 20:57:57 : (INFO) Initializing the user manager.
Aug 13 20:57:57 : (INFO) User manager is connecting the Cmon Database.
Aug 13 20:57:57 : (INFO) Checking that the system users exist.
Aug 13 20:57:57 : (INFO) Starting user manager thread.
Aug 13 20:57:57 : (INFO) Server started at tcp://127.0.0.1:9500
Aug 13 20:57:57 : (INFO) Server started at tls://127.0.0.1:9501
Aug 13 20:57:57 : (INFO) Initializing controller manager.
Aug 13 20:57:57 : (INFO) Cmon HA is disabled, running in single mode.
Aug 13 20:57:57 : (INFO) Checking command handler.
Aug 13 20:57:57 : (INFO) CDT entry '/.runtime/controller_clock' found.
Aug 13 20:57:57 : (ERROR) The controller on host ha-1 with pid 11733 is already running (seen -17820s ago).
Aug 13 20:57:58 : (ERROR) The Cmon Controller is exiting...
Closed all, terminating. Bye.
[root@ha-1 ~]# ps -aux|grep cmon
nothing
[root@ha-1 ~]# ps -aux|grep 11733
nothing
How do i start it ? It was supposed to be HA cluster but meh ...
-
Official comment
Hi,
Does /var/run/cmon.pid exist? If yes, what is the content of it? Please remove it first since cmon is not actually running in the host. It might be due to the unclean shutdown happened during the power issue.
Regards,
AshrafComment actions -
There is no /var/run/cmon.pid
Ack search result finds this PID 11733 only in an earlier centos audit log.
And MySQL dump | grep and ack finds "11733" is within mariadb/cmon/cmon_status table
But cmon refuses to start. I tried killing almost all daemon and flushed most of /var/run/ PID files.
-
After force fsck and a dozen reboots, I was able to start the cluster control at last. Unfortunately, I don't know what exactly went wrong in the first place.
Now my paranoia is kicking in and I am worried the cluster control will try and change settings of running pgsql hosts. I don't want cluster control deleting psql /main dir on running psql node or changing master automatically since I have manually set up them while cluster control was unable to start.
How do I disable all auto-recovery or auto master selection of the cluster control before joining it back onto the network? Is there a way dry run or forbid changes to nodes on cluster control?
Now I have disconnected the cluster control from the network so it would not change settings automatically on manually started running PostgreSQL cluster.
Best regards,
Tenuun
-
if someone sees this issue just add line
enable_autorecovery=0
inside
/etc/cmon.cnf
https://severalnines.com/blog/installing-clustercontrol-standby-server
-
Found the error. Centos7
The culprit was time settings on mariadb
Journalctl -u mariadb time stamp was wrong. so checking the now() yeilded wrong time so update to use system time.
Make sure you ntpd update and then hwclock -systohc
Regardless CMON service will start but not work completely netstat -tulpn | 9500 will be empty no errors given and no logs in /var/log/cmon.log written.
mysql> select now(); +---------------------+ | now() | +---------------------+ | 2017-10-26 15:13:16 | +---------------------+ 1 row in set (0.07 sec) mysql> SET GLOBAL time_zone = 'SYSTEM'; Query OK, 0 rows affected (0.07 sec) mysql> show global variables like '%time_zone%'; +------------------+---------------------+ | Variable_name | Value | +------------------+---------------------+ | system_time_zone | India Standard Time | | time_zone | SYSTEM | +------------------+---------------------+ 2 rows in set (0.18 sec) mysql>
-
Hi
I have same problem, but using ubuntu.
2023-12-14T17:00:07.435Z : (INFO) Thread UserManager is running (LWP 5079).
2023-12-14T17:00:07.436Z : (INFO) CmonRpcServerPrivate::log()
2023-12-14T17:00:07.436Z : (INFO) Server started at tcp://127.0.0.1:9500
2023-12-14T17:00:07.437Z : (INFO) CmonRpcServerPrivate::log()
2023-12-14T17:00:07.437Z : (INFO) Thread RpcServer:9500 is running (LWP 5080).
2023-12-14T17:00:07.437Z : (INFO) Server started at tls://127.0.0.1:9501
2023-12-14T17:00:07.437Z : (INFO) Checking command handler.
2023-12-14T17:00:07.437Z : (INFO) Thread RpcServer:9501 is running (LWP 5081).
2023-12-14T17:00:07.445Z : (INFO) CDT entry '/.runtime/controller_clock' found.
2023-12-14T17:00:07.445Z : (ERROR) The controller on host clustercontrol with pid 1268 is already running (seen -24920s ago).
2023-12-14T17:00:08.445Z : (ERROR) The Cmon Controller is exiting...Have try all the method above but still not working. Any idea? I try to restore my server to known working restore point and its working but once i reboot the server, cmon stop working
-
Since you filed a ticket already. We'll take it from there.
Please sign in to leave a comment.
Comments
7 comments