Clustercontrol 1.8. or 1.9 not starting in Rancher/Kubernetes
Hi,
we have a rancher/kubernetes platform. I've created a yaml with the next content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: clustercontrol
namespace: xxx
labels:
app: clustercontrol
spec:
selector:
matchLabels:
app: clustercontrol
replicas: 1
template:
metadata:
labels:
app: clustercontrol
spec:
containers:
- name: clustercontrol
image: severalnines/clustercontrol:1.7.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: clustercontrol
env:
- name: TZ
value: "Atlantic/Canary"
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: clustercontrol-mysql-secrets
key: ROOT_PASSWORD
- name: MYSQL_CMON_PASSWORD
valueFrom:
secretKeyRef:
name: clustercontrol-mysql-secrets
key: CMON_PASSWORD
volumeMounts:
- mountPath: /var/lib/mysql
name: clustercontrol-data
subPath: datadir
- mountPath: /etc/cmon.d
name: clustercontrol-data
subPath: cmon.d
- mountPath: /root/.ssh
name: clustercontrol-data
subPath: sshkey
- mountPath: /var/lib/cmon
name: clustercontrol-data
subPath: cmonlib
volumes:
- name: clustercontrol-data
persistentVolumeClaim:
claimName: pvc-clustercontrol
It works ok, but when I delete pvc and I recreate a new yaml with 1.8.0 or 1.9.0 image, it keeps restarting all time... Logs when it works and when not are identical, except when it fails stops in:
>> Stopping MySQL daemon so Supervisord can take over
210901 09:01:45 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
>> Sleeping 15s for the stopping processes to clean up..
It doesn't show the next as happens in 1.7.0:
>> Starting Supervisord and all related services:
>> sshd, httpd, cmon, cmon-events, cmon-ssh, cc-auto-deployment
Someone can help me please? Thanks beforehand.
Cheers...
-
Official comment
Hi Oliver,
It looks like the pod creation was terminated/rescheduled before the pod start-up process completes. As you can tell from the output, the entrypoint on 1.8.0 and later has been added with extra 15 seconds of sleep time for the MySQL clean-up process.
I believe you need to configure the liveness and readiness probe in your YAML, especially the initialDelaySeconds. A successful start-up process should end with ">> Starting Supervisord.." line.
Regards,
AshrafComment actions -
Hi,
thanks for your response. Not sure that I'm understanding your reply. Why is terminating before start-up process is complete?
I've configured a readiness probe but not sure as configure it. I've put initialDelaySeconds with 60sec. Should I put less sec? I'm checking httpget. Thanks beforehand.
-
Hi,
As mentioned in the ticket that you have opened, the problem is related to a slow environment and we weren't able to reproduce it in our tests. Pod may terminate if the startup takes too long. The best we can recommend is to, as my colleague mentioned above, define liveness and readiness probes. You can read more about those here: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Please keep in mind that the docker image we provide is not intended to be executed in a Kubernetes cluster. It may or may not work, we do not provide any documentation, support nor we do not test such environments therefore your mileage may vary when trying to use CC image with Kubernetes.
-
Hi,
thanks for the response, but I've continued investigating...
I've compared entrypoint.sh from 1.7.5 (last working) with 1.7.6 and you have added (included in 1.8 and 1.9) a line with ping_stats command below when it fails in my environment. I'm going to test without that line.
Is it necessary? I've seen what it does and it sends info to your site, is it correct? My pod hasn't configured Internet connection, for that it could stop there and starting breaks. Can I delete that line? I can configure Internet connection on it, but I would prefer delete it if not necessary. Thanks beforehand.
Cheers...
Please sign in to leave a comment.
Comments
4 comments