Clustercontrol 1.8. or 1.9 not starting in Rancher/Kubernetes

Oliver

September 01, 2021 09:02

Hi,

we have a rancher/kubernetes platform. I've created a yaml with the next content:

apiVersion: apps/v1
kind: Deployment
metadata:
name: clustercontrol
namespace: xxx
labels:
app: clustercontrol
spec:
selector:
matchLabels:
app: clustercontrol
replicas: 1
template:
metadata:
labels:
app: clustercontrol
spec:
containers:
- name: clustercontrol
image: severalnines/clustercontrol:1.7.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: clustercontrol
env:
- name: TZ
value: "Atlantic/Canary"
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: clustercontrol-mysql-secrets
key: ROOT_PASSWORD
- name: MYSQL_CMON_PASSWORD
valueFrom:
secretKeyRef:
name: clustercontrol-mysql-secrets
key: CMON_PASSWORD
volumeMounts:
- mountPath: /var/lib/mysql
name: clustercontrol-data
subPath: datadir
- mountPath: /etc/cmon.d
name: clustercontrol-data
subPath: cmon.d
- mountPath: /root/.ssh
name: clustercontrol-data
subPath: sshkey
- mountPath: /var/lib/cmon
name: clustercontrol-data
subPath: cmonlib
volumes:
- name: clustercontrol-data
persistentVolumeClaim:
claimName: pvc-clustercontrol

It works ok, but when I delete pvc and I recreate a new yaml with 1.8.0 or 1.9.0 image, it keeps restarting all time... Logs when it works and when not are identical, except when it fails stops in:

>> Stopping MySQL daemon so Supervisord can take over
210901 09:01:45 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
>> Sleeping 15s for the stopping processes to clean up..

It doesn't show the next as happens in 1.7.0:

>> Starting Supervisord and all related services:
>> sshd, httpd, cmon, cmon-events, cmon-ssh, cc-auto-deployment

Someone can help me please? Thanks beforehand.

Cheers...

Comments

6 comments

Official comment
Ashraf Sharif

September 01, 2021 09:53
Hi Oliver,

It looks like the pod creation was terminated/rescheduled before the pod start-up process completes. As you can tell from the output, the entrypoint on 1.8.0 and later has been added with extra 15 seconds of sleep time for the MySQL clean-up process.

I believe you need to configure the liveness and readiness probe in your YAML, especially the initialDelaySeconds. A successful start-up process should end with ">> Starting Supervisord.." line.

Regards,
Ashraf
Comment actions Permalink
Oliver

September 01, 2021 11:02
Hi,

thanks for your response. Not sure that I'm understanding your reply. Why is terminating before start-up process is complete?

I've configured a readiness probe but not sure as configure it. I've put initialDelaySeconds with 60sec. Should I put less sec? I'm checking httpget. Thanks beforehand.
0

Comment actions Permalink
Krzysztof Ksiazek

September 02, 2021 14:41
Hi,

As mentioned in the ticket that you have opened, the problem is related to a slow environment and we weren't able to reproduce it in our tests. Pod may terminate if the startup takes too long. The best we can recommend is to, as my colleague mentioned above, define liveness and readiness probes. You can read more about those here: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Please keep in mind that the docker image we provide is not intended to be executed in a Kubernetes cluster. It may or may not work, we do not provide any documentation, support nor we do not test such environments therefore your mileage may vary when trying to use CC image with Kubernetes.
0

Comment actions Permalink
Oliver

September 03, 2021 09:52
Hi,

thanks for the response, but I've continued investigating...

I've compared entrypoint.sh from 1.7.5 (last working) with 1.7.6 and you have added (included in 1.8 and 1.9) a line with ping_stats command below when it fails in my environment. I'm going to test without that line.

Is it necessary? I've seen what it does and it sends info to your site, is it correct? My pod hasn't configured Internet connection, for that it could stop there and starting breaks. Can I delete that line? I can configure Internet connection on it, but I would prefer delete it if not necessary. Thanks beforehand.

Cheers...
0

Comment actions Permalink
Clayton J. Lajoie

November 14, 2025 07:34
Hi Oliver,

It seems like the issue could be related to MySQL shutdown handling in versions 1.8.0/1.9.0. I’d suggest checking the MySQL logs for any errors preventing clean shutdown. Also, review the release notes for changes in service management. Ensure your readiness/liveness probes aren’t too aggressive, and consider cleaning up the PVC data or trying a fresh PVC for the upgrade. A volume mismatch could also cause issues. Let me know if that helps!
0

Comment actions Permalink
David Robert

November 18, 2025 06:28
It looks like the issue is related to changes in how ClusterControl 1.8 and 1.9 handle the MySQL startup and supervisord process compared to 1.7.0. In the newer images, MySQL stopping and the handoff to supervisord sometimes fails if the container filesystem or persistent volume isn’t completely clean. This can happen even if the logs look identical at first, because supervisord never actually starts the other services if it detects leftover or inconsistent state in /var/lib/mysql or the configuration directories.
0

Comment actions Permalink

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?