Xtradb handle High load

Comments

7 comments

  • Avatar
    Ashraf Sharif

    Hi,

    It's workload dependant. How big was your dataset? Did it fit inside the buffer pool? Did you perform big transaction (transaction that updates many rows per commit)? You might find a standalone MySQL server performs better in some workloads, but you are exposed to SPOF. Galera offers better availability and consistency, with a bit overhead due to write-set replication.

    It's possible to reach 3k TPS, or even more than that. With an increasing number of writeable masters, the transaction rollback rate may increase, especially if there is write contention on the same dataset. In most cases, 3-node Galera Cluster usually performs better than 5 nodes.

    There are also a number of configuration options that you can tweak for Galera, for example:
    - tweak wsrep_slave_threads according to wsrep_cert_deps_distance
    - use leastconn for HAProxy balancing algorithm
    - max_connection=5000 is too high. If you want to reach more than 2000 connections per second, it doesn't necessarily mean you have to increase this value way higher than you desire. You have 3 nodes to balance the connection so 1000 seems more reasonable. Take note that unused max_connections slots will waste the memory.
    - Reduce the MySQL timeouts, especially interactive_timeout, wait_timeout, net_read_timeout, net_write_timeout.

    You probably want to read this blog post:
    https://severalnines.com/blog/9-tips-going-production-galera-cluster-mysql

     

    Regards,
    Ashraf

  • Avatar
    Johan

    Hi,

    Can you also bypass the HAProxy and redirect traffic to only one mysql server in the galera cluster?

    This way it is very easy to see if your application is subject to collisions and that flow control is increasing.
    What does the Wsrep flow control graph look like in the CC UI ? You find it on the Overview page.

    BR

    johan

  • Avatar
    mhs

    thanks for your reply .

    First of all I have a big problem with number of connection when connection increase from 800 to 1024 all nodes will fail and wont work  but without cluster single node can easily handle more than 3000 connections !!

    With 5 nodes Galera cluster Xtradb can handle less than 1000 connections with around 800 TPS !!!

     

    I've changed my config like below configuration but mysql cluster fails when we have more than 1000 connection .

     

    Could you please help me  and why whole nodes will fail when we have only more than 1100 connections with load balancer?? 

     

     

     my.cnf

    [MYSQLD]
    user=mysql
    basedir=/usr/
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    pid-file=/var/lib/mysql/mysql.pid
    port=3306
    log_error=/var/log/mysql/mysqld.log
    log_warnings=2
    # log_output = FILE
    ### INNODB OPTIONS
    innodb_buffer_pool_size=12055M
    ##Edited By Mohsen Change 1 to 0
    innodb_flush_log_at_trx_commit=1
    ###Added By Mohsen
    # trx_commit = 0
    innodb_file_per_table=1
    innodb_data_file_path = ibdata1:100M:autoextend
    ## You may want to tune the below depending on number of cores and disk sub
    innodb_read_io_threads=4
    innodb_write_io_threads=4
    innodb_doublewrite=1
    innodb_log_file_size=1024M
    innodb_log_buffer_size=96M
    innodb_buffer_pool_instances=-1
    innodb_log_files_in_group=2
    innodb_thread_concurrency=64
    innodb_io_capacity=15000
    innodb_io_capacity_max=25000
    # innodb_file_format = barracuda
    innodb_flush_method = O_DIRECT
    # innodb_locks_unsafe_for_binlog = 1
    innodb_autoinc_lock_mode=2
    ## avoid statistics update when doing e.g show tables
    innodb_stats_on_metadata=0
    default_storage_engine=innodb
    back_log=1500
    # CHARACTER SET
    # collation_server = utf8_unicode_ci
    # init_connect = 'SET NAMES utf8'
    # character_set_server = utf8
    thread_handling = pool-of-threads
    # REPLICATION SPECIFIC
    server_id=1
    binlog_format=ROW
    # log_bin = binlog
    # log_slave_updates = 1
    # gtid_mode = ON
    # enforce_gtid_consistency = 1
    # relay_log = relay-bin
    # expire_logs_days = 7

    # OTHER THINGS, BUFFERS ETC
    tmp_table_size = 64M
    max_heap_table_size = 64M
    max_allowed_packet = 512M
    # sort_buffer_size = 256K
    # read_buffer_size = 256K
    # read_rnd_buffer_size = 512K
    # myisam_sort_buffer_size = 8M
    skip_name_resolve
    memlock=0
    sysdate_is_now=1
    max_connections=5000
    thread_cache_size=512k
    query_cache_type = 0
    query_cache_size = 0
    table_open_cache=1024
    lower_case_table_names=0
    # 5.6 backwards compatibility (FIXME)
    # explicit_defaults_for_timestamp = 1
    ##
    ## WSREP options
    ##

    performance_schema = ON
    performance-schema-max-mutex-classes = 0
    performance-schema-max-mutex-instances = 0

    # Full path to wsrep provider library or 'none'
    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_on=ON
    wsrep_node_address=192.168.100.11

    # Provider specific configuration options
    wsrep_provider_options="base_port=4567; gcache.size=1024M; gmcast.segment=0"

    # Logical cluster name. Should be the same for all nodes.
    wsrep_cluster_name="my_wsrep_cluster"

    # Group communication system handle
    wsrep_cluster_address=gcomm://192.168.100.11,192.168.100.12,192.168.100.13,192.168.100.14,192.168.100.15

    # Human_readable node name (non-unique). Hostname by default.
    wsrep_node_name=192.168.100.11

    # Address for incoming client connections. Autodetect by default.
    #wsrep_node_incoming_address=

    # How many threads will process writesets from other nodes
    wsrep_slave_threads=8

    # DBUG options for wsrep provider
    #wsrep_dbug_option

    # Generate fake primary keys for non-PK tables (required for multi-master
    # and parallel applying operation)
    wsrep_certify_nonPK=1

    # Location of the directory with data files. Needed for non-mysqldump
    # state snapshot transfers. Defaults to mysql_real_data_home.
    #wsrep_data_home_dir=

    # Maximum number of rows in write set
    wsrep_max_ws_rows=131072

    # Maximum size of write set
    wsrep_max_ws_size=1073741824

    # to enable debug level logging, set this to 1
    wsrep_debug=0

    # convert locking sessions into transactions
    wsrep_convert_LOCK_to_trx=0

    # how many times to retry deadlocked autocommits
    wsrep_retry_autocommit=1

    # change auto_increment_increment and auto_increment_offset automatically
    wsrep_auto_increment_control=1

    # replicate myisam, not supported in PXC 5.7
    wsrep_replicate_myisam=0

    # retry autoinc insert, which failed for duplicate key error
    wsrep_drupal_282555_workaround=0

    # enable "strictly synchronous" semantics for read operations
    wsrep_causal_reads=0

    # Command to call when node status or cluster membership changes.
    # Will be passed all or some of the following options:
    # --status  - new status of this node
    # --uuid    - UUID of the cluster
    # --primary - whether the component is primary or not ("yes"/"no")
    # --members - comma-separated list of members
    # --index   - index of this node in the list
    #wsrep_notify_cmd=

    ##
    ## WSREP State Transfer options
    ##

    # State Snapshot Transfer method
    # ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
    wsrep_sst_method=xtrabackup-v2

    # Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
    # (SST method dependent. Defaults to the first IP of the first interface)
    #wsrep_sst_receive_address=



    # Desired SST donor name.
    #wsrep_sst_donor=

    # Protocol version to use
    # wsrep_protocol_version=

    [MYSQL]
    socket=/var/lib/mysql/mysql.sock
    # default_character_set = utf8
    [client]
    socket=/var/lib/mysql/mysql.sock

     

     

     

     

     

     

  • Avatar
    Johan

    Hi!

    Can you send error logs from the failed nodes (you may need  to create a support ticket for this as i think it does not work to attach logs here).


    Or if you can paste  the logs from a complete "run" from server start to server fail on e.g https://paste.ubuntu.com/ and share the link here?

    Also what version do you use of PXC?
    BR

    johan

  • Avatar
    Johan

    Also if you can share:

     

    SHOW GLOBAL VARIABLES;

    SHOW GLOBAL STATUS;

     

    would be nice.

     

    THanks

  • Avatar
    mhs

    Hi Johan.

    Unfortunately Cluster dont work perfectly .

    E.g When 3 users want to register on My app simultaneously it takes some times to update other master DB.

    I want to know is there any solution to decrease sync time or write to only one node multi master clustering ??

    Also However I have 5 nodes I think it has low TPS .

    Max : 1100tps .

    I Have to many warning logs like these on heavy load :

    2017-05-07T04:27:15.528403Z 0 [Warning] WSREP: Failed to report last committed 5089923, -4 (Interrupted system call)
    2017-05-07T04:27:32.524480Z 0 [Warning] WSREP: Failed to report last committed 5110188, -4 (Interrupted system call)
    2017-05-07T04:27:59.741547Z 0 [Warning] WSREP: Failed to report last committed 5124111, -4 (Interrupted system call)
    2017-05-07T04:28:02.533566Z 0 [Warning] WSREP: Failed to report last committed 5124349, -4 (Interrupted system call)
    2017-05-07T04:28:06.180804Z 0 [Warning] WSREP: Failed to report last committed 5126017, -4 (Interrupted system call)
    2017-05-07T04:28:08.736537Z 0 [Warning] WSREP: Failed to report last committed 5126429, -4 (Interrupted system call)
    2017-05-07T04:28:11.708098Z 0 [Warning] WSREP: Failed to report last committed 5128027, -4 (Interrupted system call)
    2017-05-07T04:28:15.411354Z 0 [Warning] WSREP: Failed to report last committed 5134626, -4 (Interrupted system call)
    2017-05-07T04:28:17.411568Z 0 [Warning] WSREP: Failed to report last committed 5135617, -4 (Interrupted system call)
    2017-05-07T04:28:20.803860Z 0 [Warning] WSREP: Failed to report last committed 5137324, -4 (Interrupted system call)
    2017-05-07T04:28:24.980570Z 0 [Warning] WSREP: Failed to report last committed 5142745, -4 (Interrupted system call)
    2017-05-07T04:28:29.174663Z 0 [Warning] WSREP: Failed to report last committed 5149315, -4 (Interrupted system call)
    2017-05-07T04:28:32.995625Z 0 [Warning] WSREP: Failed to report last committed 5152289, -4 (Interrupted system call)
    2017-05-07T04:28:36.173443Z 0 [Warning] WSREP: Failed to report last committed 5156942, -4 (Interrupted system call)
    2017-05-07T04:28:39.886460Z 0 [Warning] WSREP: Failed to report last committed 5159441, -4 (Interrupted system call)
    2017-05-07T04:28:42.524749Z 0 [Warning] WSREP: Failed to report last committed 5162342, -4 (Interrupted system call)
    2017-05-07T04:28:47.230677Z 0 [Warning] WSREP: Failed to report last committed 5170459, -4 (Interrupted system call)
    2017-05-07T04:28:51.550293Z 0 [Warning] WSREP: Failed to report last committed 5173522, -4 (Interrupted system call)
    2017-05-07T04:28:54.020691Z 0 [Warning] WSREP: Failed to report last committed 5175825, -4 (Interrupted system call)
    2017-05-07T08:38:38.299896Z 0 [Warning] WSREP: Failed to report last committed 5271641, -4 (Interrupted system call)
    2017-05-07T08:38:40.300119Z 0 [Warning] WSREP: Failed to report last committed 5271991, -4 (Interrupted system call)
    2017-05-07T08:38:45.265835Z 0 [Warning] WSREP: Failed to report last committed 5272033, -4 (Interrupted system call)
    2017-05-07T08:39:08.900942Z 0 [Warning] WSREP: Failed to report last committed 5292896, -4 (Interrupted system call)
    2017-05-07T08:39:25.294561Z 0 [Warning] WSREP: Failed to report last committed 5310540, -4 (Interrupted system call)
    2017-05-07T08:39:55.673318Z 0 [Warning] WSREP: Failed to report last committed 5329074, -4 (Interrupted system call)
    2017-05-07T08:39:59.503639Z 0 [Warning] WSREP: Failed to report last committed 5329309, -4 (Interrupted system call)
    2017-05-07T08:40:02.711629Z 0 [Warning] WSREP: Failed to report last committed 5330665, -4 (Interrupted system call)
    2017-05-07T08:40:05.267378Z 0 [Warning] WSREP: Failed to report last committed 5331465, -4 (Interrupted system call)
    2017-05-07T08:40:08.238001Z 0 [Warning] WSREP: Failed to report last committed 5336816, -4 (Interrupted system call)

    and also

    too many errors like these .

     

     

    ' (Got an error reading communication packets)
    2017-05-07T09:21:05.711982Z 12749 [Note] Aborted connection 12749 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-07T11:29:08.274080Z 21812 [Note] Aborted connection 21812 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-07T11:46:36.851563Z 22849 [Note] Aborted connection 22849 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-07T12:07:49.184649Z 24096 [Note] Aborted connection 24096 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-07T12:13:45.027936Z 24435 [Note] Aborted connection 24435 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-08T10:02:10.988238Z 100447 [Note] Aborted connection 100447 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-08T14:06:01.036241Z 113236 [Note] Aborted connection 113236 to db: 'unconnected' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T00:00:09.383318Z 150915 [Note] Aborted connection 150915 to db: 'unconnected' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T02:00:13.235268Z 158234 [Note] Aborted connection 158234 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T02:32:35.760538Z 160143 [Note] Aborted connection 160143 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T03:26:35.691508Z 163052 [Note] Aborted connection 163052 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T03:26:35.691574Z 162985 [Note] Aborted connection 162985 to db: 'unconnected' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T03:39:46.222497Z 164099 [Note] Aborted connection 164099 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T03:58:58.228964Z 165170 [Note] Aborted connection 165170 to db: 'unconnected' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T04:39:42.517901Z 165649 [Note] Aborted connection 165649 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)
    2017-05-09T07:39:19.088908Z 172008 [Note] Aborted connection 172008 to db: 'test' user: 'root' host: '192.168.100.6' (Got an error reading communication packets)

     

    Show Global Status ;

    https://paste.ubuntu.com/24541970/

    Show Global Variable :

    https://paste.ubuntu.com/24541973/

    .

    .

    .

     

     

     

     

  • Avatar
    Johan

    Hi,

    I think you should do like this:

    1) Remove two nodes you have only 3 nodes in the galera cluster

    2) remove the load balancer (if you have one)
    3) read and write from only one node.

    What perf do you get now?

    It will not be as good as the single node due to the nature of synchronous replication.

    BR
    johan

Please sign in to leave a comment.

Powered by Zendesk