cmon cannot start after power failure centos 7

Tenuun

August 13, 2019 21:05

I deployed HA cluster using Custercontrol 5 days ago but today server was shut down twice due to power issue.

Front end says

Cluster details cannot be retrieved. Please check the CMON process status (service cmon status). Also, ensure the dcps.apis token matches the rpc_key in /etc/cmon.cnf.

Failed connect to 127.0.0.1:9500; Connection refused

Token is identical

netstat -tulnp|grep 9500

nothing is listening (no cmon daemon launched)

[root@ha-1 ~]# /usr/sbin/cmon -d

The Cmon Controller is initializing.
Aug 13 20:57:57 : (INFO) Loaded configuration file '/etc/cmon.cnf'.
Aug 13 20:57:57 : (INFO) Initializing the user manager.
Aug 13 20:57:57 : (INFO) User manager is connecting the Cmon Database.
Aug 13 20:57:57 : (INFO) Checking that the system users exist.
Aug 13 20:57:57 : (INFO) Starting user manager thread.
Aug 13 20:57:57 : (INFO) Server started at tcp://127.0.0.1:9500
Aug 13 20:57:57 : (INFO) Server started at tls://127.0.0.1:9501
Aug 13 20:57:57 : (INFO) Initializing controller manager.
Aug 13 20:57:57 : (INFO) Cmon HA is disabled, running in single mode.
Aug 13 20:57:57 : (INFO) Checking command handler.
Aug 13 20:57:57 : (INFO) CDT entry '/.runtime/controller_clock' found.
Aug 13 20:57:57 : (ERROR) The controller on host ha-1 with pid 11733 is already running (seen -17820s ago).
Aug 13 20:57:58 : (ERROR) The Cmon Controller is exiting...
Closed all, terminating. Bye.

[root@ha-1 ~]# ps -aux|grep cmon
nothing

[root@ha-1 ~]# ps -aux|grep 11733
nothing

How do i start it ? It was supposed to be HA cluster but meh ...

Comments

7 comments

Official comment
Ashraf Sharif

August 14, 2019 02:22
Hi,

Does /var/run/cmon.pid exist? If yes, what is the content of it? Please remove it first since cmon is not actually running in the host. It might be due to the unclean shutdown happened during the power issue.

Regards,
Ashraf
Comment actions Permalink
Tenuun

August 14, 2019 07:21
There is no /var/run/cmon.pid

Ack search result finds this PID 11733 only in an earlier centos audit log.

And MySQL dump | grep and ack finds "11733" is within mariadb/cmon/cmon_status table

But cmon refuses to start. I tried killing almost all daemon and flushed most of /var/run/ PID files.
0

Comment actions Permalink
Tenuun

August 14, 2019 08:09
After force fsck and a dozen reboots, I was able to start the cluster control at last. Unfortunately, I don't know what exactly went wrong in the first place.

Now my paranoia is kicking in and I am worried the cluster control will try and change settings of running pgsql hosts. I don't want cluster control deleting psql /main dir on running psql node or changing master automatically since I have manually set up them while cluster control was unable to start.

How do I disable all auto-recovery or auto master selection of the cluster control before joining it back onto the network? Is there a way dry run or forbid changes to nodes on cluster control?

Now I have disconnected the cluster control from the network so it would not change settings automatically on manually started running PostgreSQL cluster.

Best regards,

Tenuun
0

Comment actions Permalink
Tenuun

August 16, 2019 18:00
if someone sees this issue just add line

enable_autorecovery=0

inside

/etc/cmon.cnf

https://severalnines.com/blog/installing-clustercontrol-standby-server
0

Comment actions Permalink

Tenuun

August 19, 2019 05:31

Found the error. Centos7

The culprit was time settings on mariadb

Journalctl -u mariadb time stamp was wrong. so checking the now() yeilded wrong time so update to use system time.

Make sure you ntpd update and then hwclock -systohc

Regardless CMON service will start but not work completely netstat -tulpn | 9500 will be empty no errors given and no logs in /var/log/cmon.log written.

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2017-10-26 15:13:16 |
+---------------------+
1 row in set (0.07 sec)

mysql> SET GLOBAL time_zone = 'SYSTEM';
Query OK, 0 rows affected (0.07 sec)

mysql> show global variables like '%time_zone%';
+------------------+---------------------+
| Variable_name    | Value               |
+------------------+---------------------+
| system_time_zone | India Standard Time |
| time_zone        | SYSTEM              |
+------------------+---------------------+
2 rows in set (0.18 sec)

mysql>

MOHD HAFRIZ NURAL AZHAN

December 14, 2023 17:03
Hi

I have same problem, but using ubuntu.

2023-12-14T17:00:07.435Z : (INFO) Thread UserManager is running (LWP 5079).
2023-12-14T17:00:07.436Z : (INFO) CmonRpcServerPrivate::log()
2023-12-14T17:00:07.436Z : (INFO) Server started at tcp://127.0.0.1:9500
2023-12-14T17:00:07.437Z : (INFO) CmonRpcServerPrivate::log()
2023-12-14T17:00:07.437Z : (INFO) Thread RpcServer:9500 is running (LWP 5080).
2023-12-14T17:00:07.437Z : (INFO) Server started at tls://127.0.0.1:9501
2023-12-14T17:00:07.437Z : (INFO) Checking command handler.
2023-12-14T17:00:07.437Z : (INFO) Thread RpcServer:9501 is running (LWP 5081).
2023-12-14T17:00:07.445Z : (INFO) CDT entry '/.runtime/controller_clock' found.
2023-12-14T17:00:07.445Z : (ERROR) The controller on host clustercontrol with pid 1268 is already running (seen -24920s ago).
2023-12-14T17:00:08.445Z : (ERROR) The Cmon Controller is exiting...

Have try all the method above but still not working. Any idea? I try to restore my server to known working restore point and its working but once i reboot the server, cmon stop working
0

Comment actions Permalink
Paul Namuag

December 14, 2023 17:44
Hi MOHD HAFRIZ NURAL AZHAN

Since you filed a ticket already. We'll take it from there.
0

Comment actions Permalink

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?