Installing MySQL Cluster on Two Amazon EC2 instances fails because ndb_mgmd can't be started
Hello,
I just tried to install the MySQL Cluster system using the Cluster Control Configurator on severalnines.com unfortunately after installing everything the system failed to start ndb_mgmd:
/usr/sbin//ndb_mgmd --ndb-nodeid=1 -c "10.48.98.35:1186" -f /etc//config.ini --configdir=/etc// --reload --initial
Warning: could not add log destination 'FILE:filename=ndb_1_cluster.log,maxsize=10000000,maxfiles=6'. Reason: Permission denied
I used the ec2-user as proposed in the following blog: http://johanandersson.blogspot.com/2011/08/test-deployment-of-mysql-cluster-on.html
I will now try to do it with root user.
/Manuel
-
Here is what actually happens:
$./deploy.sh
.... [lots of status messages with ok (about 5 minutes of processing)]
-----------------------------------
*******************************************************************************
* Creating datadirs and starting management servers *
*******************************************************************************
10.48.98.35: Executing 'rm -rf /storage/data/mysqlcluster/'[ok]
10.48.98.35: Executing 'killall -9 ndb_mgmd cmon'[ok]
10.50.6.150: Executing 'rm -rf /storage/data/mysqlcluster/'[ok]
10.50.6.150: Executing 'killall -9 ndb_mgmd cmon'[ok]
Test starting the management servers. If this fails you have a config problem.
Creating /storage/data/mysqlcluster_mgm/ and /etc// for management server on 10.48.98.35
10.48.98.35: Executing 'mkdir -p /etc//' [ok]
10.48.98.35: Executing 'mkdir -p /storage/data/mysqlcluster_mgm/' [ok]
10.48.98.35: Starting management server (nodeid=1)
10.48.98.35: Copying '../config/config.ini' [ok]
10.48.98.35: Executing 'sync' [ok]
10.48.98.35: Executing 'mv /tmp/config.ini /etc//config.ini' [ok]
10.48.98.35: Executing 'sync' [ok]
10.48.98.35: Checking if '/etc//config.ini' exists [ok]
10.48.98.35: Checking if '/usr/sbin//ndb_mgmd' exists [ok]
10.48.98.35: Executing 'rm -rf /etc//ndb_*bin*'[ok]
10.48.98.35: Executing '/usr/sbin//ndb_mgmd --ndb-nodeid=1 -c "10.48.98.35:1186" -f /etc//config.ini --configdir=/etc// --reload --initial' [ok]
10.48.98.35: Checking pid existance of 'ndb_mgmd' (timeout=30): ...............................[failed]
10.48.98.35: Failed to start management server
10.48.98.35: Error message:
WARNING! FAILED TO START MANAGEMENT SERVER ON 10.48.98.35
WARNING! Bootstrap failed!! Aborting... -
Just for reference here is my config.ini. I am using only two machines on EC2 (small. I am a poor guy ;-) )
mysql_password='*******'
cmon_password='*******'
cmon_monitor='10.48.98.35'
mysql_port=3306
port=1186
repl_user=''
repl_pass=''
os_user='ec2-user'
connectstring='10.48.98.35:1186,10.50.6.150:1186'
ndb_binary='ndbd'
configini=/etc//config.ini
configdir=/etc/
mycnf=/etc//my.cnf
installconfigpath=/etc//
socket='/var/lib/mysql/mysql.sock'
slave_datadir=''
master_datadir=''
slave_datadir=''
mgm_datadir='/storage/data/mysqlcluster_mgm/'
ndbd_datadir='/storage/data/mysqlcluster/'
backup_datadir='/storage/data/mysqlcluster//backup/'
mysql_datadir='/storage/data/mysql/'
rsync_bw_limit=5000
libdir=/usr/lib/bindir=/usr/bin/
libexec=/usr/sbin/
scriptdir=/usr/bin/
basedir=/usr/
installdir=/usr/bin/
installdir_cmon=/usr/bin/
platform='i686'
rhel=rhel5
TOOL='yum -y install '
LIBAIO='libaio'
CHKCONFIG='/sbin/chkconfig '
WEBSERVER='httpd'
WWWROOT='/var/www/html/'
APACHE_USER='apache'
UPDATE_PACKAGE_MGR='yum -y update'
INSTALL_CMON_PACKAGES='yum -y install httpd php php-mysql php-gd rrdtool.i386 -x mysql-libs'
EPEL='rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
IDENTITY='-t -i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem'
IDENTITY2='-i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem' -
Here is a strace showing what the sshd daemon is doing:
$ ssh -q -t -i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem ec2-user@10.48.98.35 "sudo /usr/sbin//ndb_mgmd --ndb-nodeid=1 -c "10.48.98.35:1186" -f /etc//config.ini --configdir=/etc// --reload --initial "
$ sudo strace -p 1139
Process 1139 attached - interrupt to quit
select(8, [3 4], NULL, NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_INET, sin_port=htons(53642), sin_addr=inet_addr("10.48.98.35")}, [16]) = 5
fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
pipe([6, 7]) = 0
socketpair(PF_FILE, SOCK_STREAM, 0, [8, 9]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78eb7a8) = 31668
close(7) = 0
write(8, "\0\0\2@\0", 5) = 5
write(8, "\0\0\0027\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProtocol"..., 575) = 575
close(8) = 0
close(9) = 0
close(5) = 0
select(8, [3 4 6], NULL, NULL, NULL) = 1 (in [6])
close(6) = 0
select(8, [3 4], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], WNOHANG) = 31668
waitpid(-1, 0xbfa60df8, WNOHANG) = 0
rt_sigaction(SIGCHLD, NULL, {0xf1fd17, [], 0}, 8) = 0
sigreturn() = ? (mask now [])
select(8, [3 4], NULL, NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_INET, sin_port=htons(53644), sin_addr=inet_addr("10.48.98.35")}, [16]) = 5
fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
pipe([6, 7]) = 0
socketpair(PF_FILE, SOCK_STREAM, 0, [8, 9]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78eb7a8) = 31684
close(7) = 0
write(8, "\0\0\2@\0", 5) = 5
write(8, "\0\0\0027\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProtocol"..., 575) = 575
close(8) = 0
close(9) = 0
close(5) = 0
select(8, [3 4 6], NULL, NULL, NULL) = 1 (in [6])
close(6) = 0
select(8, [3 4], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], WNOHANG) = 31684
waitpid(-1, 0xbfa60df8, WNOHANG) = 0
rt_sigaction(SIGCHLD, NULL, {0xf1fd17, [], 0}, 8) = 0
sigreturn() = ? (mask now [])
select(8, [3 4], NULL, NULL, NULLAs you can see the sshd spawns a new child, starts the ndb_mgmd daemon and directly afterwards kills all the childs and itself.
-
Adding nohup to the start-mgmd.sh in the following line:
remote_cmd ${host} "nohup $libexec/ndb_mgmd --ndb-nodeid=${nodeid} -c \"$host:1186\" -f $configini --configdir=/etc// --reload --initial"
Fixes the start up problem of ndp_mgmd. Now I have the problem that my nodes are not starting up :-)
-
Further the mysql password should be quoted in bootstrap.sh:
remote_cmd_nosudo ${host} "LD_LIBRARY_PATH=$libdir:$libdir/mysql $bindir/mysql --defaults-file=$installconfigpath/my.cnf -uroot -p$mysql_password -e \"SET SQL_LOG_BIN=0; GRANT ALL ON *.* TO 'root'@'$host' IDENTIFIED BY '$mysql_password'\""
=>
remote_cmd_nosudo ${host} "LD_LIBRARY_PATH=$libdir:$libdir/mysql $bindir/mysql --defaults-file=$installconfigpath/my.cnf -uroot -p'$mysql_password' -e \"SET SQL_LOG_BIN=0; GRANT ALL ON *.* TO 'root'@'$host' IDENTIFIED BY '$mysql_password'\""
I also think that the path to mysql start script was wrong:
/etc/rc.d/mysql
was cited in bootstrap but it actually was:
/etc/rc.d/init.d/mysql
So the mysql alive checks failed. I just started the services manually on the command line with 2 nodes this is still easy.
after about 2 days of configuring and debugging I finally get ...
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0, Master)
id=4 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0)[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
id=2 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)[mysqld(API)] 9 node(s)
id=5 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
id=6 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
id=7 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)
id=8 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from any host)
id=11 (not connected, accepting connect from 10.48.98.35)
id=12 (not connected, accepting connect from 10.50.6.150)
id=13 (not connected, accepting connect from 10.48.98.35)@s9s: Thanks a lot of these installation scripts for MySQL cluster even when there are some small issues I am sure this will be better in future.
-
The paths to the cmon rpm files in the i386 packages are also not correctly anymore.
Here are the correct ones:
http://www.severalnines.com/downloads/cmon/cmon-agent-1.1.10-1.i386.rpm
http://www.severalnines.com/downloads/cmon/cmon-controller-1.1.10-1.i386.rpm
It seams that the 32 bit version is not available as 1.1.11.
Please sign in to leave a comment.
Comments
16 comments