Cannot join the New Cluster
I am using MariaDB 10.3.10 and tried to establish a 3-node cluster. First I dealt with the first two nodes. I have successfully started the 1st cluster node using command, 'galera_new_node'. But when I tried to start another node by 'systemctl start mysql', it failed. It complains...
[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158
[ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
There is a network switch between these two nodes. All ports are allowed to pass through. I can ping node 2 from node 1 and vice versa. Here below are the configurations and error log.
1st node (10.15.6.1) server.cnf
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name=Cluster
wsrep_cluster_address="gcomm://10.15.6.1,10.15.6.2"
wsrep_node_name=node1
wsrep_node_address=10.15.6.1
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_provider_options="pc.weight=2"
2nd node (10.15.6.2) server.cnf
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name=Cluster
wsrep_cluster_address="gcomm://10.15.6.1,10.15.6.2"
wsrep_node_name=node2
wsrep_node_address=10.15.6.2
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_provider_options="pc.weight=2"
Node 2 (10.15.6.2) error log after executing "systemctl start mysql"
2018-11-28 11:11:30 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-11-28 11:11:30 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2018-11-28 11:11:30 0 [Note] WSREP: wsrep_load(): Galera 25.3.24(r3825) by Codership Oy <info@codership.com> loaded successfully.
2018-11-28 11:11:30 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2018-11-28 11:11:30 0 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2018-11-28 11:11:30 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2018-11-28 11:11:30 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.15.6.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ig
2018-11-28 11:11:31 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2018-11-28 11:11:31 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2018-11-28 11:11:31 0 [Note] WSREP: wsrep_sst_grab()
2018-11-28 11:11:31 0 [Note] WSREP: Start replication
2018-11-28 11:11:31 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2018-11-28 11:11:31 0 [Note] WSREP: protonet asio version 0
2018-11-28 11:11:31 0 [Note] WSREP: Using CRC-32C for message checksums.
2018-11-28 11:11:31 0 [Note] WSREP: backend: asio
2018-11-28 11:11:31 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-11-28 11:11:31 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2018-11-28 11:11:31 0 [Note] WSREP: restore pc from disk failed
2018-11-28 11:11:31 0 [Note] WSREP: GMCast version 0
2018-11-28 11:11:31 0 [Note] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2018-11-28 11:11:31 0 [Note] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2018-11-28 11:11:31 0 [Note] WSREP: EVS version 0
2018-11-28 11:11:31 0 [Note] WSREP: gcomm: connecting to group 'Cluster', peer '10.15.6.1:,10.15.6.2:'
2018-11-28 11:11:31 0 [Note] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') connection established to 4da1a035 tcp://10.15.6.2:4567
2018-11-28 11:11:31 0 [Warning] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') address 'tcp://10.15.6.2:4567' points to own listening address, blacklisting
2018-11-28 11:11:34 0 [Note] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') connection to peer 4da1a035 with addr tcp://10.15.6.2:4567 timed out, no messages seen in PT3S
2018-11-28 11:11:34 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2018-11-28 11:11:34 0 [Note] WSREP: view(view_id(NON_PRIM,4da1a035,1) memb {
4da1a035,0
} joined {
} left {
} partitioned {
})
2018-11-28 11:11:34 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50385S), skipping check
2018-11-28 11:12:04 0 [Note] WSREP: view((empty))
2018-11-28 11:12:04 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2018-11-28 11:12:04 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2018-11-28 11:12:04 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'Cluster' at 'gcomm://10.15.6.1,10.15.6.2': -110 (Connection timed out)
2018-11-28 11:12:04 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2018-11-28 11:12:04 0 [ERROR] WSREP: wsrep::connect(gcomm://10.15.6.1,10.15.6.2) failed: 7
2018-11-28 11:12:04 0 [ERROR] Aborting
Node 1 grastate.dat content after executing 'galera_new_node'
# GALERA saved state
version: 2.1
uuid: 853bac76-f2ba-11e8-95c8-f30792fe8b00
seqno: -1
safe_to_bootstrap: 1
Node 2 grastate.dat content after executing 'systemctl start mysql'
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
safe_to_bootstrap: 1
Node 1 and 2 sestatus
disabled
Node 1 and 2 firewalld
disabled
-
Hi,
This line tells us that there is no node in the cluster now is in primary state:
[Warning] WSREP: no nodes coming from prim view, prim not possible
On node1, what is the output of:
mysql> SHOW STATUS LIKE 'wsrep%';
Node2 was trying to establish communication with the primary component, however it failed. Probably because node1 was not in PRIMARY state, which you can verify by looking at the wsrep_cluster_status from the statement above. Did you look for any clue inside node1's MySQL error log?
Regards,
Ashraf -
I got the solution. iptable blocks the port 4567 traffic. I added the following entries into /etc/sysconfig/iptables and made it work:
-A INPUT -m state --state NEW -m tcp -p tcp --dport 4567 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 4444 -j ACCEPTPort 4567 is for gcomm while port 4444 is for rsync.
Any other ports I need to allow normal operation of MariaDB and Galera clustering?
-
Hi,
That was my hunch but I didn't think about that because I thought you had disabled firewall completely since you said "Node 1 and 2 firewalld - disabled".
Anyway, we have listed out the necessary ports for Galera in our documentation page: https://severalnines.com/docs/requirements.html#firewall-and-security-groups
Regards,
Ashraf -
I'm facing exactly similar error but 2 error lines are missing from my log plus my local firewall is already disabled. (also i've checked mysql connection remotely)
FIREWALL STATUS on both nodes :
# firewall-cmd --list-all
FirewallD is not running
NODE2 error log :
2019-01-11T12:51:19.429223Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.24) starting as process 29460 ...
2019-01-11T12:51:19.431879Z 0 [Note] WSREP: Setting wsrep_ready to 0
2019-01-11T12:51:19.431918Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-01-11T12:51:19.431927Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
2019-01-11T12:51:19.437760Z 0 [Note] WSREP: wsrep_load(): Galera 3.25(rddf9876) by Codership Oy <info@codership.com> loaded successfully.
2019-01-11T12:51:19.437829Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-01-11T12:51:19.438231Z 0 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2019-01-11T12:51:19.438290Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2019-01-11T12:51:19.446302Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.172.0.22; base_port = 4567; cert.log_conflicts = ON; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S;
2019-01-11T12:51:19.469268Z 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2019-01-11T12:51:19.471111Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2019-01-11T12:51:19.471148Z 0 [Note] WSREP: wsrep_sst_grab()
2019-01-11T12:51:19.471156Z 0 [Note] WSREP: Start replication
2019-01-11T12:51:19.471172Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2019-01-11T12:51:19.471271Z 0 [Note] WSREP: protonet asio version 0
2019-01-11T12:51:19.471405Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2019-01-11T12:51:19.471458Z 0 [Note] WSREP: backend: asio
2019-01-11T12:51:19.471540Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2019-01-11T12:51:19.471659Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2019-01-11T12:51:19.471679Z 0 [Note] WSREP: restore pc from disk failed
2019-01-11T12:51:19.471901Z 0 [Note] WSREP: GMCast version 0
2019-01-11T12:51:19.472111Z 0 [Note] WSREP: (975977c6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-01-11T12:51:19.472140Z 0 [Note] WSREP: (975977c6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-01-11T12:51:19.472506Z 0 [Note] WSREP: EVS version 0
2019-01-11T12:51:19.472645Z 0 [Note] WSREP: gcomm: connecting to group 'g1', peer '192.172.0.21:3306,192.172.0.22:3306'
2019-01-11T12:51:22.473549Z 0 [Note] WSREP: (975977c6, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.172.0.21:3306 timed out, no messages seen in PT3S
2019-01-11T12:51:22.474056Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-01-11T12:51:22.474114Z 0 [Note] WSREP: view(view_id(NON_PRIM,975977c6,1) memb {
975977c6,0
} joined {
} left {
} partitioned {
})
2019-01-11T12:51:22.974257Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50175S), skipping check
2019-01-11T12:51:26.974048Z 0 [Note] WSREP: (975977c6, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.172.0.21:3306 timed out, no messages seen in PT3S
2019-01-11T12:51:52.482535Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2019-01-11T12:51:52.482563Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2019-01-11T12:51:52.482639Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'g1' at 'gcomm://192.172.0.21:3306,192.172.0.22:3306': -110 (Connection timed out)
2019-01-11T12:51:52.482653Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2019-01-11T12:51:52.482662Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.172.0.21:3306,192.172.0.22:3306) failed: 7
2019-01-11T12:51:52.482671Z 0 [ERROR] Aborting2019-01-11T12:51:52.482676Z 0 [Note] WSREP: unireg_abort
2019-01-11T12:51:52.482682Z 0 [Note] Giving 0 client threads a chance to die gracefully
2019-01-11T12:51:52.482696Z 0 [Note] WSREP: Service disconnected.
2019-01-11T12:51:53.482853Z 0 [Note] WSREP: Some threads may fail to exit.
2019-01-11T12:51:53.482970Z 0 [Note] Binlog end
2019-01-11T12:51:53.483126Z 0 [Note] /usr/sbin/mysqld: Shutdown completenode1 conf :
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M;socket.checksum=1"
wsrep_cluster_name="g1"
wsrep_cluster_address="gcomm://192.172.0.21:3306,192.172.0.22:3306"
wsrep_sst_method=rsync
server_id=21
wsrep_node_address="192.172.0.22"
wsrep_node_name="node2"
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=ON"
wsrep_debug=ON
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
connect_timeout=60
wait_timeout=60
interactive_timeout=60
net_read_timeout=60
net_write_timeout=60
max_allowed_packet=128M
log_error_verbosity = 3
node2 conf :
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M;socket.checksum=1"
wsrep_cluster_name="g1"
wsrep_cluster_address="gcomm://192.172.0.21:3306,192.172.0.22:3306"
wsrep_sst_method=rsync
server_id=22
wsrep_node_address="192.172.0.22"
wsrep_node_name="atish-vm-3"
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=ON"
wsrep_debug=ON
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
connect_timeout=60
wait_timeout=60
interactive_timeout=60
net_read_timeout=60
net_write_timeout=60
max_allowed_packet=128M
log_error_verbosity = 3 -
missing lines from my logs compared to above post are :
2018-11-28 11:11:31 0 [Note] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') connection established to 4da1a035 tcp://10.15.6.2:4567
2018-11-28 11:11:31 0 [Warning] WSREP: (4da1a035, 'tcp://0.0.0.0:4567') address 'tcp://10.15.6.2:4567' points to own listening address, blacklisting
I've tried multiple different servers but couldn't get it working. Can anyone point out if i'm missing something. :(
Please sign in to leave a comment.
Comments
7 comments