Continual Node Failure
Cannot get a node to start:
Starting MySQL (Percona XtraDB Cluster).State transfer in progress, setting sleep higher
..../sbin/service: line 66: 41983 Terminated env -i PATH="$PATH" TERM="$TERM" "${SERVICEDIR}/${SERVICE}" ${OPTIONS}
-
Can you please attach MySQL error logs from all Galera instances? If you don't want to do it here, you can open a support ticket with us. We'd also need to look at the innobackup.prepare.log (if it exists on the node that can't get online) and on innobackup.backup.log that should be located on the donor node. Both those logs should exist in either MySQL data directory or datadir/.sst on the broken node. Warning that you mention is very frequent and it doesn't impact node's ability to get the state transfer. Something else has to play a role here.
-
Depending of the contents of the failed node's log, then it would be easier to determine the problem.
If you see any "[ERROR] Innodb" , then it is probably data corruption, then removing the contents inside the datadir of the mysql server helps (and the .sst folder too). This will then force an SST to happen (Full data sync).
If it is a xtrabackup problem, the innobackup.backup.log log file from the Donor could indicate if there is a problem with too many open files, or if the "tmpdir" is too small e.g
If the problem is on the joiner (e.g connection refused) it could indicate there is a firewall problem.
BR
johan -
Case in point:
Wanted to change the configuration. Did a reboot from ClusterControl. Of course, it fails:
Starting node failed, error message: Starting mysqld service failed..
Exit Code: 1
114306 Aug 2015 13:19:05last 100 lines of mysql error log: 2015-08-06 20:18:39 13498 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2015-08-06 20:18:39 13498 [Note] InnoDB: Memory barrier is not used 2015-08-06 20:18:39 13498 [Note] InnoDB: Compressed tables use zlib 1.2.3 2015-08-06 20:18:39 13498 [Note] InnoDB: Using Linux native AIO 2015-08-06 20:18:39 13498 [Note] InnoDB: Using CPU crc32 instructions 2015-08-06 20:18:39 13498 [Note] InnoDB: Initializing buffer pool, size = 8.7G 2015-08-06 20:18:40 13498 [Note] InnoDB: Completed initialization of buffer pool 2015-08-06 20:18:40 13498 [Note] InnoDB: Highest supported file format is Barracuda. 2015-08-06 20:18:41 13498 [Note] InnoDB: 128 rollback segment(s) are active. 2015-08-06 20:18:41 13498 [Note] InnoDB: Waiting for purge to start 2015-08-06 20:18:41 13498 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.22-72.0 started; log sequence number 4625944265946 2015-08-06 20:18:41 13498 [ERROR] /usr/sbin/mysqld: unknown variable 'SET GLOBAL event_scheduler=ON;' 2015-08-06 20:18:41 13498 [ERROR] Aborting 2015-08-06 20:18:42 13498 [Note] WSREP: (52c9c913, 'tcp://0.0.0.0:4567') turning message relay requesting off 2015-08-06 20:18:43 13498 [Note] WSREP: Closing send monitor... 2015-08-06 20:18:43 13498 [Note] WSREP: Closed send monitor. 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: terminating thread 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: joining thread 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: closing backend 2015-08-06 20:18:43 13498 [Note] WSREP: view(view_id(NON_PRIM,17eb193d,874) memb { 52c9c913,0 } joined { } left { } partitioned { 17eb193d,0 3d591ad3,0 883cc3d5,0 }) 2015-08-06 20:18:43 13498 [Note] WSREP: view((empty)) 2015-08-06 20:18:43 13498 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2015-08-06 20:18:43 13498 [Note] WSREP: gcomm: closed 2015-08-06 20:18:43 13498 [Note] WSREP: Flow-control interval: [16, 16] 2015-08-06 20:18:43 13498 [Note] WSREP: Received NON-PRIMARY. 2015-08-06 20:18:43 13498 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1042291938) 2015-08-06 20:18:43 13498 [Note] WSREP: Received self-leave message. 2015-08-06 20:18:43 13498 [Note] WSREP: Flow-control interval: [0, 0] 2015-08-06 20:18:43 13498 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2015-08-06 20:18:43 13498 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1042291938) 2015-08-06 20:18:43 13498 [Note] WSREP: RECV thread exiting 0: Success 2015-08-06 20:18:43 13498 [Note] WSREP: recv_thread() joined. 2015-08-06 20:18:43 13498 [Note] WSREP: Closing replication queue. 2015-08-06 20:18:43 13498 [Note] WSREP: Closing slave action queue. 2015-08-06 20:18:43 13498 [Note] WSREP: Service disconnected. 2015-08-06 20:18:43 13498 [Note] WSREP: rollbacker thread exiting 2015-08-06 20:18:44 13498 [Note] WSREP: Some threads may fail to exit. 2015-08-06 20:18:44 13498 [Note] Binlog end 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'partition' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'ARCHIVE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'BLACKHOLE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CHANGED_PAGES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_FIELDS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_INDEXES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_SYS_TABLES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_CONFIG' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_DELETED' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_METRICS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMPMEM' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP_RESET' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_CMP' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_LOCK_WAITS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_LOCKS' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'INNODB_TRX' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_RSEG' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_INTERNAL_HASH_TABLES' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'XTRADB_READ_VIEW' 2015-08-06 20:18:44 13498 [Note] Shutting down plugin 'InnoDB' 2015-08-06 20:18:44 13498 [Note] InnoDB: FTS optimize thread exiting. 2015-08-06 20:18:44 13498 [Note] InnoDB: Starting shutdown... 2015-08-06 20:18:46 13498 [Note] InnoDB: Shutdown completed; log sequence number 4625944301433 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'CSV' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MyISAM' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MRG_MYISAM' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'MEMORY' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'sha256_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'mysql_old_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'mysql_native_password' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'wsrep' 2015-08-06 20:18:46 13498 [Note] Shutting down plugin 'binlog' 2015-08-06 20:18:46 13498 [Note] /usr/sbin/mysqld: Shutdown complete Error in my_thread_global_end(): 1 threads didn't exit 150806 20:18:51 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
Exit Code: 0
114306 Aug 2015 13:19:05Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.47.
Exit Code: 0
114306 Aug 2015 13:19:05Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.47.
Exit Code: 0
114306 Aug 2015 13:19:05Retrieved /var/log/mysqld.log from 10.1.105.47.
Exit Code: 0
114306 Aug 2015 13:19:04Collecting error logs from '10.1.105.47', accessible from UI -> Nodes -> select node -> Logs
Exit Code: 0
114306 Aug 2015 13:19:04Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.48.
Exit Code: 0
114306 Aug 2015 13:19:03Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.48.
Exit Code: 0
114306 Aug 2015 13:19:03Retrieved /var/log/mysqld.log from 10.1.105.48.
Exit Code: 0
114306 Aug 2015 13:19:03Collecting error logs from '10.1.105.48', accessible from UI -> Nodes -> select node -> Logs
Exit Code: 0
114306 Aug 2015 13:19:02Retrieved /var/lib/mysql/innobackup.backup.log from 10.1.105.46.
Exit Code: 0
114306 Aug 2015 13:18:55Retrieved /var/lib/mysql/innobackup.prepare.log from 10.1.105.46.
Exit Code: 0
114306 Aug 2015 13:18:55Retrieved /var/log/mysqld.log from 10.1.105.46.
Exit Code: 0
114306 Aug 2015 13:18:55Collecting error logs from '10.1.105.46', accessible from UI -> Nodes -> select node -> Logs
Exit Code: 0
114306 Aug 2015 13:18:55Starting mysqld service failed.
Exit Code: 1
114306 Aug 2015 13:18:52Starting 10.1.105.48:3306.
Exit Code: 0
114306 Aug 2015 13:18:37Starting cluster service.
Exit Code: 0
114306 Aug 2015 13:18:3610.1.105.48:3306: Stopped mysqld.
Exit Code: 0
114306 Aug 2015 13:18:3610.1.105.48: Sent stop to mysqld.
Exit Code: 0
114306 Aug 2015 13:17:4010.1.105.48:3306: Stopping mysqld (timeout=1800, force stop after timeout=0).
Exit Code: 0
114306 Aug 2015 13:17:20Stopping node on 10.1.105.48.
Exit Code: 0
114306 Aug 2015 13:17:20Message sent to controller
Exit Code: 0
-
Still won't start right. Here is the config of the node:
# All files in this package is subject to the GPL v2 license
# More information is in the COPYING file in the top directory of this package.
# Copyright (C) 2011 severalnines.com
[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid_file=mysqld.pid
port=3306
log_error=/var/log/mysqld.log
log_warnings=2
event_scheduler = ON
#log_output=FILE
### INNODB OPTIONS
innodb_buffer_pool_size=8936M
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=200
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=64
#innodb_file_format=barracuda
innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
default_storage_engine=innodb
tmpdir = /var/tmp
# CHARACTER SET
#collation_server = utf8_unicode_ci
#init_connect='SET NAMES utf8'
#character_set_server = utf8
# REPLICATION SPECIFIC
#server_id must be unique across all mysql servers participating in replication.
#server_id=SERVERID
binlog_format=ROW
#log_slave_updates=1
#log_bin=binlog
#relay_log=relay-bin
#expire_logs_days=7
#gtid_mode=ON
#enforce_gtid_consistency=1
# OTHER THINGS, BUFFERS ETC
key_buffer_size = 24M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 512M
sort_buffer_size = 24M
#read_buffer_size = 256K
read_rnd_buffer_size = 24M
#myisam_sort_buffer_size = 8M
skip_name_resolve
memlock=0
sysdate_is_now=1
max_connections=150
thread_cache_size=512
query_cache_type = 0
query_cache_size = 0
table_open_cache=1024
lower_case_table_names=0
# 5.6 backwards compatibility
explicit_defaults_for_timestamp=1
##
## WSREP options
##
# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_node_address=10.1.105.48
# Provider specific configuration options
wsrep_provider_options="gcache.size=16384M; gmcast.segment=0"
# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="my_wsrep_cluster"
# Group communication system handle
wsrep_cluster_address=gcomm://10.1.105.46,10.1.105.47,10.1.105.48
# Human_readable node name (non-unique). Hostname by default.
wsrep_node_name=10.1.105.48
# Address for incoming client connections. Autodetect by default.
#wsrep_node_incoming_address=
# How many threads will process writesets from other nodes
wsrep_slave_threads=8
# DBUG options for wsrep provider
#wsrep_dbug_option
# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1
# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_data_home.
#wsrep_data_home_dir=
# Maximum number of rows in write set
wsrep_max_ws_rows=131072
# Maximum size of write set
wsrep_max_ws_size=1073741824
# to enable debug level logging, set this to 1
wsrep_debug=0
# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0
# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1
# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1
# replicate myisam
wsrep_replicate_myisam=1
# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_282555_workaround=0
# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0
# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status - new status of this node
# --uuid - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index - index of this node in the list
#wsrep_notify_cmd=
##
## WSREP State Transfer options
##
# State Snapshot Transfer method
# ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
wsrep_sst_method=xtrabackup-v2
# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=
# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_auth=root:9bPOO03
# Desired SST donor name.
#wsrep_sst_donor=
# Protocol version to use
# wsrep_protocol_version=
[MYSQL]
socket=/var/lib/mysql/mysql.sock
#default_character_set=utf8
[client]
socket=/var/lib/mysql/mysql.sock
#default_character_set=utf8
[mysqldump]
max_allowed_packet = 512M
socket=/var/lib/mysql/mysql.sock
#default_character_set=utf8
[MYSQLD_SAFE]
pid_file=mysqld.pid
log_error=/var/log/mysqld.log
basedir=/usr/
datadir=/var/lib/mysql -
Hi Jason,
It would be helpful if you can generate an error report and attach it here so we can understand your cluster. Instructions at: http://support.severalnines.com/entries/21091498-Error-reporting-
Since it will contain sensitive information, if you don't want to do it here, you can open a support ticket with us.
Regards,
Ashraf
-
2015-08-07 04:11:59 16359 [Note] WSREP: RECV thread exiting 0: Success
2015-08-07 04:11:59 16359 [Note] WSREP: recv_thread() joined.
2015-08-07 04:11:59 16359 [Note] WSREP: Closing replication queue.
2015-08-07 04:11:59 16359 [Note] WSREP: Closing slave action queue.
2015-08-07 04:11:59 16359 [Note] WSREP: Service disconnected.
2015-08-07 04:11:59 16359 [Note] WSREP: rollbacker thread exiting
2015-08-07 04:12:00 16359 [ERROR] WSREP: Requesting state transfer failed: -77(File descriptor in bad state)
2015-08-07 04:12:00 16359 [ERROR] WSREP: State transfer request failed unrecoverably: 77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2015-08-07 04:12:00 16359 [Note] WSREP: /usr/sbin/mysqld: Terminated
Please sign in to leave a comment.
Comments
13 comments