Continual Node Failure

Comments

13 comments

  • Avatar
    Jason Mallory

    /etc/init.d/mysql start
    Starting MySQL (Percona XtraDB Cluster).State transfer in progress, setting sleep higher
    ...Terminated

  • Avatar
    Jason Mallory

    2015-08-04 15:19:38 60388 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f236f776-fe7c-11e4-9e85-0aaa745d2414): 1 (Operation not permitted)

  • Avatar
    Krzysztof Ksiazek

    Can you please attach MySQL error logs from all Galera instances? If you don't want to do it here, you can open a support ticket with us. We'd also need to look at the innobackup.prepare.log (if it exists on the node that can't get online) and on innobackup.backup.log that should be located on the donor node. Both those logs should exist in either MySQL data directory or datadir/.sst on the broken node. Warning that you mention is very frequent and it doesn't impact node's ability to get the state transfer. Something else has to play a role here.

  • Avatar
    Johan

    Depending of the contents of the failed node's log, then it would be easier to determine the problem.

    If you see any "[ERROR] Innodb" , then it is probably data corruption, then removing the contents inside the datadir of the mysql server helps (and the .sst folder too). This will then force an SST to happen (Full data sync).

    If it is a xtrabackup problem, the innobackup.backup.log log file from the Donor could indicate if there is a problem with too many open files, or if the "tmpdir" is too small e.g

    If the problem is on the joiner (e.g connection refused) it could indicate there is a firewall problem.

    BR
    johan

  • Avatar
    Jason Mallory

    Case in point:

     

    Wanted to change the configuration. Did a reboot from ClusterControl. Of course, it fails:

     

    Starting node failed, error message: Starting mysqld service failed..

    Exit Code: 1

    1143
    06 Aug 2015 13:19:05

    last 100 lines of mysql error log: 2015-08-06 20:18:39 13498 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2015-08-06 20:18:39 13498 [Note] InnoDB: Memory barrier is not used 2015-08-06 20:18:39 13498 [Note] InnoDB: Compressed tables use zlib 1.2.3 2015-08-06 20:18:39 13498 [Note] InnoDB: Using Linux native AIO 2015-08-06 20:18:39 13498 [Note] InnoDB: Using CPU crc32 instructions 2015-08-06 20:18:39 13498 [Note] InnoDB: Initializing buffer pool, size = 8.7G 2015-08-06 20:18:40 13498 [Note] InnoDB: Completed initialization of buffer pool 2015-08-06 20:18:40 13498 [Note] InnoDB: Highest supported file format is Barracuda. 2015-08-06 20:18:41 13498 [Note] InnoDB: 128 rollback segment(s) are active. 2015-08-06 20:18:41 13498 [Note] InnoDB: Waiting for purge to start 2015-08-06 20:18:41 13498 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.22-72.0 started; log sequence number 4625944265946 2015-08-06 20:18:41 13498 [ERROR] /usr/sbin/mysqld: unknown variable 'SET GLOBAL event_scheduler=ON;' 2015-08-06 20:18:41 13498 [ERROR] Aborting 2015-08-06 20:18:42 13498 [Note] WSREP: (52c9c913, 'tcp://0.0.0.0:4567') turning message relay requesting off 2015-08-06 20:18:43 13498 [Note] WSREP: Closing send monitor... 2015-08-06 20:18:43 13498 [Note] WSREP: Closed send monitor. 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: terminating thread 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: joining thread 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: closing backend 2015-08-06 20:18:43 13498 [Note] WSREP: view(view_id(NON_PRIM,17eb193d,874) memb { 52c9c913,0 } joined { } left { } partitioned { 17eb193d,0 3d591ad3,0 883cc3d5,0 }) 2015-08-06 20:18:43 13498 [Note] WSREP: view((empty)) 2015-08-06 20:18:43 13498 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: closed 2015-08-06 20:18:43 13498 [Note] WSREP: Flow-control interval: [16, 16] 2015-08-06 20:18:43 13498 [Note] WSREP: Received NON-PRIMARY. 2015-08-06 20:18:43 13498 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1042291938) 2015-08-06 20:18:43 13498 [Note] WSREP: Received self-leave message. 2015-08-06 20:18:43 13498 [Note] WSREP: Flow-control interval: [0, 0] 2015-08-06 20:18:43 13498 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2015-08-06 20:18:43 13498 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1042291938) 2015-08-06 20:18:43 13498 [Note] WSREP: RECV thread exiting 0: Success 2015-08-06 20:18:43 13498 [Note] WSREP: recv_thread() joined. 2015-08-06 20:18:43 13498 [Note] WSREP: Closing replication queue. 2015-08-06 20:18:43 13498 [Note] WSREP: Closing slave action queue. 2015-08-06 20:18:43 13498 [Note] WSREP: Service disconnected. 2015-08-06 20:18:43 13498 [Note] WSREP: rollbacker thread exiting 2015-08-06 20:18:44 13498 [Note] WSREP: Some threads may fail to exit. 2015-08-06 20:18:44 13498 [Note] Binlog end 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'partition' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'ARCHIVE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'BLACKHOLE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CHANGED_PAGES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FIELDS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_INDEXES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_CONFIG' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_DELETED' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_METRICS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMPMEM' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_LOCK_WAITS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_LOCKS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_TRX' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_RSEG' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_INTERNAL_HASH_TABLES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_READ_VIEW' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'InnoDB' 2015-08-06 20:18:44 13498 [Note] InnoDB: FTS optimize thread exiting. 2015-08-06 20:18:44 13498 [Note] InnoDB: Starting shutdown... 2015-08-06 20:18:46 13498 [Note] InnoDB: Shutdown completed; log sequence number 4625944301433 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'CSV' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MyISAM' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MRG_MYISAM' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MEMORY' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'sha256_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'mysql_old_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'mysql_native_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'wsrep' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'binlog' 2015-08-06 20:18:46 13498 [Note] /usr/sbin/mysqld: Shutdown complete Error in my_thread_global_end(): 1 threads didn't exit 150806 20:18:51 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended

    Exit Code: 0

    1143
    06 Aug 2015 13:19:05

    Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.47.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:05

    Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.47.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:05

    Retrieved /var/log/mysqld.log from 10.1.105.47.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:04

    Collecting error logs from '10.1.105.47', accessible from UI -> Nodes -> select node -> Logs

    Exit Code: 0

    1143
    06 Aug 2015 13:19:04

    Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.48.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:03

    Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.48.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:03

    Retrieved /var/log/mysqld.log from 10.1.105.48.

    Exit Code: 0

    1143
    06 Aug 2015 13:19:03

    Collecting error logs from '10.1.105.48', accessible from UI -> Nodes -> select node -> Logs

    Exit Code: 0

    1143
    06 Aug 2015 13:19:02

    Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.46.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:55

    Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.46.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:55

    Retrieved /var/log/mysqld.log from 10.1.105.46.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:55

    Collecting error logs from '10.1.105.46', accessible from UI -> Nodes -> select node -> Logs

    Exit Code: 0

    1143
    06 Aug 2015 13:18:55

    Starting mysqld service failed.

    Exit Code: 1

    1143
    06 Aug 2015 13:18:52

    Starting 10.1.105.48:3306.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:37

    Starting cluster service.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:36

    10.1.105.48:3306: Stopped mysqld.

    Exit Code: 0

    1143
    06 Aug 2015 13:18:36

    10.1.105.48: Sent stop to mysqld.

    Exit Code: 0

    1143
    06 Aug 2015 13:17:40

    10.1.105.48:3306: Stopping mysqld (timeout=1800, force stop after timeout=0).

    Exit Code: 0

    1143
    06 Aug 2015 13:17:20

    Stopping node on 10.1.105.48.

    Exit Code: 0

    1143
    06 Aug 2015 13:17:20

    Message sent to controller

    Exit Code: 0

  • Avatar
    Jason Mallory

    The problem is these servers have about 500GB of data. If all I do is issue a reboot to change the configuration causes this kind of outage, we can't have this software in our environment

  • Avatar
    Johan

    Hi,

    it seems you have written in the config file:
    'SET GLOBAL event_scheduler=ON'

    But it should be:

    event_scheduler=ON

     

    BR

    johan

  • Avatar
    Jason Mallory

    Weird. We don't have any events turned on for these servers

  • Avatar
    Johan

    Maybe, but that is the problem anyways that someone has typed "SET GLOBAL event_scheduler=ON" in the my.cnf file. 

    it needs to be taken out or replaced with:

    event_scheduler=ON

    BR

    johan

  • Avatar
    Jason Mallory

    Still won't start right. Here is the config of the node:

     

    # All files in this package is subject to the GPL v2 license
    # More information is in the COPYING file in the top directory of this package.
    # Copyright (C) 2011 severalnines.com

    [MYSQLD]
    user=mysql
    basedir=/usr/
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    pid_file=mysqld.pid
    port=3306
    log_error=/var/log/mysqld.log
    log_warnings=2
    event_scheduler = ON
    #log_output=FILE
    ### INNODB OPTIONS
    innodb_buffer_pool_size=8936M
    innodb_flush_log_at_trx_commit=2
    innodb_file_per_table=1
    innodb_data_file_path = ibdata1:100M:autoextend
    ## You may want to tune the below depending on number of cores and disk sub
    innodb_read_io_threads=4
    innodb_write_io_threads=4
    innodb_io_capacity=200
    innodb_doublewrite=1
    innodb_log_file_size=1024M
    innodb_log_buffer_size=96M
    innodb_buffer_pool_instances=8
    innodb_log_files_in_group=2
    innodb_thread_concurrency=64
    #innodb_file_format=barracuda
    innodb_flush_method = O_DIRECT
    innodb_autoinc_lock_mode=2
    ## avoid statistics update when doing e.g show tables
    innodb_stats_on_metadata=0
    default_storage_engine=innodb
    tmpdir = /var/tmp

    # CHARACTER SET
    #collation_server = utf8_unicode_ci
    #init_connect='SET NAMES utf8'
    #character_set_server = utf8

    # REPLICATION SPECIFIC
    #server_id must be unique across all mysql servers participating in replication.
    #server_id=SERVERID
    binlog_format=ROW
    #log_slave_updates=1
    #log_bin=binlog
    #relay_log=relay-bin
    #expire_logs_days=7
    #gtid_mode=ON
    #enforce_gtid_consistency=1
    # OTHER THINGS, BUFFERS ETC
    key_buffer_size = 24M
    tmp_table_size = 64M
    max_heap_table_size = 64M
    max_allowed_packet = 512M
    sort_buffer_size = 24M
    #read_buffer_size = 256K
    read_rnd_buffer_size = 24M
    #myisam_sort_buffer_size = 8M
    skip_name_resolve
    memlock=0
    sysdate_is_now=1
    max_connections=150
    thread_cache_size=512
    query_cache_type = 0
    query_cache_size = 0
    table_open_cache=1024
    lower_case_table_names=0
    # 5.6 backwards compatibility
    explicit_defaults_for_timestamp=1
    ##
    ## WSREP options
    ##

    # Full path to wsrep provider library or 'none'
    wsrep_provider=/usr/lib64/libgalera_smm.so

    wsrep_node_address=10.1.105.48
    # Provider specific configuration options
    wsrep_provider_options="gcache.size=16384M; gmcast.segment=0"

    # Logical cluster name. Should be the same for all nodes.
    wsrep_cluster_name="my_wsrep_cluster"

    # Group communication system handle
    wsrep_cluster_address=gcomm://10.1.105.46,10.1.105.47,10.1.105.48

    # Human_readable node name (non-unique). Hostname by default.
    wsrep_node_name=10.1.105.48

    # Address for incoming client connections. Autodetect by default.
    #wsrep_node_incoming_address=

    # How many threads will process writesets from other nodes
    wsrep_slave_threads=8

    # DBUG options for wsrep provider
    #wsrep_dbug_option

    # Generate fake primary keys for non-PK tables (required for multi-master
    # and parallel applying operation)
    wsrep_certify_nonPK=1

    # Location of the directory with data files. Needed for non-mysqldump
    # state snapshot transfers. Defaults to mysql_real_data_home.
    #wsrep_data_home_dir=

    # Maximum number of rows in write set
    wsrep_max_ws_rows=131072

    # Maximum size of write set
    wsrep_max_ws_size=1073741824

    # to enable debug level logging, set this to 1
    wsrep_debug=0

    # convert locking sessions into transactions
    wsrep_convert_LOCK_to_trx=0

    # how many times to retry deadlocked autocommits
    wsrep_retry_autocommit=1

    # change auto_increment_increment and auto_increment_offset automatically
    wsrep_auto_increment_control=1

    # replicate myisam
    wsrep_replicate_myisam=1
    # retry autoinc insert, which failed for duplicate key error
    wsrep_drupal_282555_workaround=0

    # enable "strictly synchronous" semantics for read operations
    wsrep_causal_reads=0

    # Command to call when node status or cluster membership changes.
    # Will be passed all or some of the following options:
    # --status  - new status of this node
    # --uuid    - UUID of the cluster
    # --primary - whether the component is primary or not ("yes"/"no")
    # --members - comma-separated list of members
    # --index   - index of this node in the list
    #wsrep_notify_cmd=

    ##
    ## WSREP State Transfer options
    ##

    # State Snapshot Transfer method
    # ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
    wsrep_sst_method=xtrabackup-v2

    # Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
    # (SST method dependent. Defaults to the first IP of the first interface)
    #wsrep_sst_receive_address=

    # SST authentication string. This will be used to send SST to joining nodes.
    # Depends on SST method. For mysqldump method it is root:<root password>
    wsrep_sst_auth=root:9bPOO03

    # Desired SST donor name.
    #wsrep_sst_donor=

    # Protocol version to use
    # wsrep_protocol_version=
    [MYSQL]
    socket=/var/lib/mysql/mysql.sock
    #default_character_set=utf8
    [client]
    socket=/var/lib/mysql/mysql.sock
    #default_character_set=utf8
    [mysqldump]
    max_allowed_packet = 512M
    socket=/var/lib/mysql/mysql.sock
    #default_character_set=utf8
    [MYSQLD_SAFE]
    pid_file=mysqld.pid
    log_error=/var/log/mysqld.log
    basedir=/usr/
    datadir=/var/lib/mysql

  • Avatar
    Jason Mallory

    It is constantly rebuilding the data .sst drive over and over. This needs to stop

  • Avatar
    Ashraf Sharif

    Hi Jason,

    It would be helpful if you can generate an error report and attach it here so we can understand your cluster. Instructions at: http://support.severalnines.com/entries/21091498-Error-reporting-

    Since it will contain sensitive information, if you don't want to do it here, you can open a support ticket with us.

    Regards,

    Ashraf

  • Avatar
    Jason Mallory

    2015-08-07 04:11:59 16359 [Note] WSREP: RECV thread exiting 0: Success
    2015-08-07 04:11:59 16359 [Note] WSREP: recv_thread() joined.
    2015-08-07 04:11:59 16359 [Note] WSREP: Closing replication queue.
    2015-08-07 04:11:59 16359 [Note] WSREP: Closing slave action queue.
    2015-08-07 04:11:59 16359 [Note] WSREP: Service disconnected.
    2015-08-07 04:11:59 16359 [Note] WSREP: rollbacker thread exiting
    2015-08-07 04:12:00 16359 [ERROR] WSREP: Requesting state transfer failed: -77(File descriptor in bad state)
    2015-08-07 04:12:00 16359 [ERROR] WSREP: State transfer request failed unrecoverably: 77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
    2015-08-07 04:12:00 16359 [Note] WSREP: /usr/sbin/mysqld: Terminated

Please sign in to leave a comment.

Powered by Zendesk