Best Practices for Setting Up a Highly Available ClusterControl Architecture

Comments

2 comments

  • Official comment
    Avatar
    Paul Namuag

    Hi Rosie

    To answer your questions below. Here are few points you can refer:

    • What’s the best way to configure CMON HA so that ClusterControl doesn’t become a single point of failure?

    - Using CMON HA does not have any SPOF. I recommend to check our documentation found here https://docs.severalnines.com/clustercontrol/latest/admin-guide/redundancy-high-availability/ Basically, you have to deploy 3-node MariaDB Galera Cluster which will sync all its cmon DB across the cluster.  We also encourage you to check the manual by running `man cmon-ha` 

    • Have you used Galera or another backend for the CMON nodes? Any tips for configuring quorum or failover?

    I suggest you check our articles regarding understanding with Galera https://severalnines.com/blog/galera-cluster-recovery-101-deep-dive-network-partitioning/ and https://severalnines.com/blog/how-clustercontrol-performs-automatic-database-recovery-and-failover/. 

    • For load balancing between CMON instances, do you recommend a specific setup (HAProxy, etc)?

    It depends to your requirements. HAProxy is lightweight but if you need more true database-aware load balancer, ProxySQL is widely used especially if you need more observability and advance monitoring. This blog would help you more. Please checkout https://severalnines.com/blog/database-aware-load-balancing-how-migrate-haproxy-proxysql/.

     

    • Any gotchas or lessons learned when running multiple ClusterControl controllers in a production environment? 

    Make sure you have enough resources to manage your database clusters. See the requirements here 

    https://docs.severalnines.com/clustercontrol/latest/getting-started/overview/requirements/?h=requirements. This would help you guide if you have doubts on how to utilize ClusterControl especially for production environment. Several customers we have are deploying ClusterControl to manage their large-scale database clusters and mostly have enough hardware resources to cater the needs. ClusterControl is also use as the heart-and-soul to manage our CCX which can manage multiple-deployments and can manage thousands of nodes in the cloud or sovereign DBaaS.

    Comment actions Permalink
  • Avatar
    Jarrett VonRueden

    Good questions. Making the controller highly available is important because it really can become a single point of failure if not planned well.

    Setting up CMON in HA usually works best when you run at least three CMON nodes with a shared backend. Most people use Galera because it gives synchronous replication and removes a lot of worry about data drift. Make sure the nodes are in different availability zones so that quorum stays stable. Keeping network latency low between them is also important.

    For load balancing, HAProxy is a solid and simple choice. Just point your applications and automation to the VIP so controller failover does not break anything. Keep the health checks tight so a failing node is pulled out quickly.

    In production, the main lesson is to test failover before you depend on it. A lot of issues only show up when you simulate node loss, packet drops, or storage glitches. Also, watch resource usage. CMON nodes under heavy load can lag a bit if the cluster is large.

    If you follow a three node setup, stable replication, and proper load balancing, the controller becomes much more resilient and you avoid sudden outages.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk