Installing MySQL Cluster on Two Amazon EC2 instances fails because ndb_mgmd can't be started

Comments

16 comments

  • Avatar
    Manuel Blechschmidt

    It seams that the folder /storage/mysqlcluster is owned by root and the process is run as the ec2-user. I will further dig into it.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Seams that I found the problem. I used the same datadir for ndbd_mgm and ndbd. I changed this now.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    This did not help.

    I think I found the problem. The script start-mgmd.sh opens a SSH connection to start the management server but the connection is directly closed again and therefore the ndb_mgmd process is not running anymore. The testing if the process is running fails.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Here is what actually happens:

    $./deploy.sh

    .... [lots of status messages with ok (about 5 minutes of processing)]

    -----------------------------------
    *******************************************************************************
    * Creating datadirs and starting management servers *
    *******************************************************************************
    10.48.98.35: Executing 'rm -rf /storage/data/mysqlcluster/'[ok]
    10.48.98.35: Executing 'killall -9 ndb_mgmd cmon'[ok]
    10.50.6.150: Executing 'rm -rf /storage/data/mysqlcluster/'[ok]
    10.50.6.150: Executing 'killall -9 ndb_mgmd cmon'[ok]
    Test starting the management servers. If this fails you have a config problem.
    Creating /storage/data/mysqlcluster_mgm/ and /etc// for management server on 10.48.98.35
    10.48.98.35: Executing 'mkdir -p /etc//' [ok]
    10.48.98.35: Executing 'mkdir -p /storage/data/mysqlcluster_mgm/' [ok]
    10.48.98.35: Starting management server (nodeid=1)
    10.48.98.35: Copying '../config/config.ini' [ok]
    10.48.98.35: Executing 'sync' [ok]
    10.48.98.35: Executing 'mv /tmp/config.ini /etc//config.ini' [ok]
    10.48.98.35: Executing 'sync' [ok]
    10.48.98.35: Checking if '/etc//config.ini' exists [ok]
    10.48.98.35: Checking if '/usr/sbin//ndb_mgmd' exists [ok]
    10.48.98.35: Executing 'rm -rf /etc//ndb_*bin*'[ok]
    10.48.98.35: Executing '/usr/sbin//ndb_mgmd --ndb-nodeid=1 -c "10.48.98.35:1186" -f /etc//config.ini --configdir=/etc// --reload --initial' [ok]
    10.48.98.35: Checking pid existance of 'ndb_mgmd' (timeout=30): ...............................[failed]
    10.48.98.35: Failed to start management server
    10.48.98.35: Error message:
    WARNING! FAILED TO START MANAGEMENT SERVER ON 10.48.98.35
    WARNING! Bootstrap failed!! Aborting...

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    I can start the server manually with another sshd client running at the same time on the same machine:

    $ sudo /usr/sbin//ndb_mgmd --ndb-nodeid=1  -c "10.48.98.35:1186"  -f /etc//config.ini --configdir=/etc//  --reload --initial

    This works,

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Just for reference here is my config.ini. I am using only two machines on EC2 (small. I am a poor guy ;-) )

    mysql_password='*******'
    cmon_password='*******'
    cmon_monitor='10.48.98.35'
    mysql_port=3306
    port=1186
    repl_user=''
    repl_pass=''
    os_user='ec2-user'
    connectstring='10.48.98.35:1186,10.50.6.150:1186'
    ndb_binary='ndbd'
    configini=/etc//config.ini
    configdir=/etc/
    mycnf=/etc//my.cnf
    installconfigpath=/etc//
    socket='/var/lib/mysql/mysql.sock'
    slave_datadir=''
    master_datadir=''
    slave_datadir=''
    mgm_datadir='/storage/data/mysqlcluster_mgm/'
    ndbd_datadir='/storage/data/mysqlcluster/'
    backup_datadir='/storage/data/mysqlcluster//backup/'
    mysql_datadir='/storage/data/mysql/'
    rsync_bw_limit=5000
    libdir=/usr/lib/

    bindir=/usr/bin/

    libexec=/usr/sbin/

    scriptdir=/usr/bin/

    basedir=/usr/
    installdir=/usr/bin/
    installdir_cmon=/usr/bin/
    platform='i686'
    rhel=rhel5
    TOOL='yum -y install '
    LIBAIO='libaio'
    CHKCONFIG='/sbin/chkconfig '
    WEBSERVER='httpd'
    WWWROOT='/var/www/html/'
    APACHE_USER='apache'
    UPDATE_PACKAGE_MGR='yum -y update'
    INSTALL_CMON_PACKAGES='yum -y install httpd php php-mysql php-gd rrdtool.i386 -x mysql-libs'
    EPEL='rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm'
    IDENTITY='-t -i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem'
    IDENTITY2='-i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem'

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Here is a strace showing what the sshd daemon is doing:

    $ ssh -q -t -i/home/ec2-user/ClusterSSHKey/manuelblechschmidt.pem ec2-user@10.48.98.35 "sudo /usr/sbin//ndb_mgmd --ndb-nodeid=1  -c "10.48.98.35:1186"  -f /etc//config.ini --configdir=/etc//  --reload --initial "

    $ sudo strace -p 1139
    Process 1139 attached - interrupt to quit
    select(8, [3 4], NULL, NULL, NULL) = 1 (in [3])
    accept(3, {sa_family=AF_INET, sin_port=htons(53642), sin_addr=inet_addr("10.48.98.35")}, [16]) = 5
    fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
    pipe([6, 7]) = 0
    socketpair(PF_FILE, SOCK_STREAM, 0, [8, 9]) = 0
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78eb7a8) = 31668
    close(7) = 0
    write(8, "\0\0\2@\0", 5) = 5
    write(8, "\0\0\0027\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProtocol"..., 575) = 575
    close(8) = 0
    close(9) = 0
    close(5) = 0
    select(8, [3 4 6], NULL, NULL, NULL) = 1 (in [6])
    close(6) = 0
    select(8, [3 4], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted)
    --- SIGCHLD (Child exited) @ 0 (0) ---
    waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], WNOHANG) = 31668
    waitpid(-1, 0xbfa60df8, WNOHANG) = 0
    rt_sigaction(SIGCHLD, NULL, {0xf1fd17, [], 0}, 8) = 0
    sigreturn() = ? (mask now [])
    select(8, [3 4], NULL, NULL, NULL) = 1 (in [3])
    accept(3, {sa_family=AF_INET, sin_port=htons(53644), sin_addr=inet_addr("10.48.98.35")}, [16]) = 5
    fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
    pipe([6, 7]) = 0
    socketpair(PF_FILE, SOCK_STREAM, 0, [8, 9]) = 0
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78eb7a8) = 31684
    close(7) = 0
    write(8, "\0\0\2@\0", 5) = 5
    write(8, "\0\0\0027\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProtocol"..., 575) = 575
    close(8) = 0
    close(9) = 0
    close(5) = 0
    select(8, [3 4 6], NULL, NULL, NULL) = 1 (in [6])
    close(6) = 0
    select(8, [3 4], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted)
    --- SIGCHLD (Child exited) @ 0 (0) ---
    waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], WNOHANG) = 31684
    waitpid(-1, 0xbfa60df8, WNOHANG) = 0
    rt_sigaction(SIGCHLD, NULL, {0xf1fd17, [], 0}, 8) = 0
    sigreturn() = ? (mask now [])
    select(8, [3 4], NULL, NULL, NULL

     

    As you can see the sshd spawns a new child, starts the ndb_mgmd daemon and directly afterwards kills all the childs and itself.

    0
    Comment actions Permalink
  • Avatar
    Johan

    Thanks Manual for this,

    I will try and reproduce this.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Adding nohup to the start-mgmd.sh in the following line:

    remote_cmd ${host} "nohup $libexec/ndb_mgmd --ndb-nodeid=${nodeid}  -c \"$host:1186\"  -f $configini --configdir=/etc//  --reload --initial"

    Fixes the start up problem of ndp_mgmd. Now I have the problem that my nodes are not starting up :-)

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Further the mysql password should be quoted in bootstrap.sh:

    remote_cmd_nosudo ${host}  "LD_LIBRARY_PATH=$libdir:$libdir/mysql $bindir/mysql  --defaults-file=$installconfigpath/my.cnf  -uroot -p$mysql_password -e \"SET SQL_LOG_BIN=0; GRANT ALL ON *.* TO 'root'@'$host' IDENTIFIED BY '$mysql_password'\""

    =>

    remote_cmd_nosudo ${host}  "LD_LIBRARY_PATH=$libdir:$libdir/mysql $bindir/mysql  --defaults-file=$installconfigpath/my.cnf  -uroot -p'$mysql_password' -e \"SET SQL_LOG_BIN=0; GRANT ALL ON *.* TO 'root'@'$host' IDENTIFIED BY '$mysql_password'\""

     

    I also think that the path to mysql start script was wrong:

    /etc/rc.d/mysql

    was cited in bootstrap but it actually was:

    /etc/rc.d/init.d/mysql

    So the mysql alive checks failed. I just started the services manually on the command line with 2 nodes this is still easy.

    after about 2 days of configuring and debugging I finally get ...

    Cluster Configuration
    ---------------------
    [ndbd(NDB)] 2 node(s)
    id=3 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0, Master)
    id=4 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0)

    [ndb_mgmd(MGM)] 2 node(s)
    id=1 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
    id=2 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)

    [mysqld(API)] 9 node(s)
    id=5 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
    id=6 @10.48.98.35 (mysql-5.1.56 ndb-7.1.15)
    id=7 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)
    id=8 @10.50.6.150 (mysql-5.1.56 ndb-7.1.15)
    id=9 (not connected, accepting connect from any host)
    id=10 (not connected, accepting connect from any host)
    id=11 (not connected, accepting connect from 10.48.98.35)
    id=12 (not connected, accepting connect from 10.50.6.150)
    id=13 (not connected, accepting connect from 10.48.98.35)

    @s9s: Thanks a lot of these installation scripts for MySQL cluster even when there are some small issues I am sure this will be better in future.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Sometimes the mysql_password appeared in plaintext. My password contains a $ which also screws up the installation process :-)

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi again,

    Thanks for this. I will give it a stab.

    What AMI did you use ?

    BR
    johan

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    The paths to the cmon rpm files in the i386 packages are also not correctly anymore.

    Here are the correct ones:

    http://www.severalnines.com/downloads/cmon/cmon-agent-1.1.10-1.i386.rpm

    http://www.severalnines.com/downloads/cmon/cmon-controller-1.1.10-1.i386.rpm

    It seams that the 32 bit version is not available as 1.1.11.

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    32bit ami-973b06e3

    0
    Comment actions Permalink
  • Avatar
    Manuel Blechschmidt

    Yeah. cmon is also working. Seams that I am only allowed to see one node.

    0
    Comment actions Permalink
  • Avatar
    Severalnines Support

    Hi there.. actually, what you are seeing is the dashboard (that shows you all your clusters - in your case, you have 1 cluster).

    If you click on 'View Cluster', you will be able to see all your nodes. 

    Vinay

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk