Why do I get: (ERROR) cmon_coredir not set, set it in /etc/cmon.cnf

Comments

14 comments

  • Avatar
    Johan

    Hi Clint,

    It should be:

    cmon_core_dir=/root/clustercontrol

    (take out the cmon_coredir)

    Are you installing on a existing cluster or on a cluster deployed with the Severalnines Deployment package from the Configurator?

    Thank you,

    Johan

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

     

    The "cmon_coredir" is in the message printed by cmon. The config file has the cmon_core_dir"... I just wanted to be sure.

     

    This is an existing cluster setup previously without cmon.

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    FYI --

    Server: RedHat Enterprise, CentOS 5.8
    Cluster Network:
    - 4 data nodes with 2 replicas
    - 2 api/mysql nodes
    - 1 management node
    - The cmon database and controller is on the management node.

    - On managment node, while ndb_mgmd is running, the local database is not connected to the cluster in any way.

    Local Paths:
      /usr/local/cmon/ --- the source (tar.gz)
      /var/www/cluster.management/ --- the cmon website/files

    I followed the instructions given here:
    http://support.severalnines.com/entries/20613923-installation-on-an-existing-cluster

    I was using "service cmon start [restart|stop|status]" to manage the process.

    -----------------------------------------------------------
    I did try the RPM a few times, but I like using src files because I usually can find whatever I'm missing and put things together by hand if necessary.

    After re-copying /usr/local/cmon/etc/init.d/cmon to /etc/init.d/cmon (which I've done many times already) for some reason the cmon process began to run. I really don't know why.

    -----------------------------------------------------------
    There are some new errors, but I'm assuming this is due to a problem I'm now having on an agent (mentioned below). The error is:
    Aug 06 00:35:40 : (ERROR) Failed to init - check you have set ssh user, staging dir, and bindir in the Cluster Settings and Global Settings
    Aug 06 00:35:40 : (WARNING) Workflow failed to init - proceeding but with limited functionality.
    Aug 06 00:35:40 : (ERROR) MYSQL BINDIR not set - set in 'Cluster Setup'
    Aug 06 00:35:40 : (WARNING) Could not init Galera sub

    ... as before, these values are defined in the cmon.conf

    -----------------------------------------------------------
    In my search for resolution, I looked into the create bin/cmon_rrdcreate code, and I don't have a cmon_rrd or anything related to the rrdtool in /etc/*

    As expected, I don't see any graphs being made.


    -----------------------------------------------------------
    ON MySQL API - AGENT

    On an agent which is an API on the cluster network, startup fails with

    Aug 06 01:55:59 : (INFO) IPv4 address: 192.168.1.2 (cluster02)
    Aug 06 01:55:59 : (INFO) Setting up threads to handle monitoring
    Aug 06 01:55:59 : (INFO) Starting MySQLCollectorThread
    Aug 06 01:55:59 : (INFO) Starting host collector
    Aug 06 01:55:59 : (INFO) Starting Process Manager Thread
    Aug 06 01:55:59 : (INFO) Starting Bencher Thread
    Aug 06 01:55:59 : (INFO) Connected to MySQL Server @ 127.0.0.1
    Aug 06 01:55:59 : (INFO) Hostname=cluster02, ip=192.168.1.2, skip_name_resolve=0
    Aug 06 01:55:59 : (INFO) Connected to local mysql server!
    Aug 06 01:55:59 : (ERROR) Critical error (mysql error code -1) occured - shutting down

    MySQL error log on the Controller has:
    Error: Got an error reading communication packet

    This is usually a network error, but the internal network is working just fine.

    I've used both RPM and SRC in an attempt to fix this. I also used "cmon_install_agent.sh" (or whatever the name was). It actually made me work more on the cmon.cnf, having to change default values that were not asked about during installation.


    Other smaller issues I've run into:

    When using 127.0.0.1, it fails if you don't have a /tmp/mysql.sock. There is no option to change the mysql.sock path. I have to use hostnames instead to get the controller to run.

    The cron scripts are not mentioned in the online installation guide:
    http://support.severalnines.com/entries/20613923-installation-on-an-existing-cluster

    The pid file /var/run/cmon* and the lock file /var/lock/cmon* is not removed on failures.


    -----------------------------------------------------------
    I dont see anyone else having these problem which concerns me, like I did something unusual or something. However, I do not have any unusual setups, and a very basic layout inside a normal internal network. So I'm not sure what else it could be...

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    Additional small issue -- RPM xinit.d script does not have "status" option, but the src.tar.gz does.

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    Additinoal small issue - the rrd errors do not reflect the actual file names, etc (much like the error before: cmon_coredir)

    [root@clustermgmt ~]# /usr/local/cmon/bin/cmon_rrd_all /usr/local/cmon/bin
    /usr/local/cmon/bin
    usage: ./create_rrd <path to rrd.conf>

    Why does it say "./create_rrd? Isn't it suppose to read $0?

    The file that comes close to this is cmon_create_rrd.

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Clint,

    I am looking at this.

    Thanks for you comments about the cmon_coredir vs cmon_core_dir.

    On the controller, did you try to set (if using RPMs) :

    mysql_basedir=/usr/

    or if using mysql-cluster tar ball install, e.g::

    mysql_basedir=/usr/local/mysql

    I saw this was missing in the install instructions  here http://support.severalnines.com/entries/20613923-installation-on-an-existing-cluster

    BR
    johan 

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    Yes I did. I made sure every single key/val was set correctly.

    I went ahead and converted all ndbcluster tables to myisam and rebuilt the cluster through the automatic script.

    I had to run the script a few times because it didn't expect to see another cmon installation.

     

    Also, I had to reinstall php53 because the auto-install script installs 5.1 in CentOS. It would probably be best to simply provide links to the admin for them to download and install httpd and php. In fact, the instructions should probably link to the generic download page so you don't have to update it every version and let the sysadmin do the actual pick/select and download.

    The autoscript came out successful, but I had to use a different mysql for the api for fear it would destroy the api installation. I was right.. it removed mysql if found prior to reinstalling it. This caused a few issues that I had to fix as well.

    I think I'll build an Errata and send it over to you. I like the Cluster management tool. It saves me from writing my own (which is what I do on my network). Unfortunately, we'll never be able to authorize the license purchase as this is a gov. project and getting them to spend anything is almost impossible.I say this because I went to remove that temporary api and got the message about needing a key... I removed it from the config files but I can deal with it if it doesn't remove it completely...

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    I just noticed (or was informed actually) that the management server's mysql (where cmon db is located) has 137 processes in the processlist. It prevented other things from running because max_connections was reached.

    Is cmon not closing links or something to create so many connections like this?

    0
    Comment actions Permalink
  • Avatar
    Johan

    Can you attach your /var/log/cmon.log here? 

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    attached.

    FYI...

    192.168.1.8 | server08 = cluster manager & cmon controller

    192.168.1.4 | server04 = cluster data node

    192.168.1.5 | server05 = cluster data node

    192.168.1.6 | server06 = cluster data node

    192.168.1.7 | server07 = cluster data node

    192.168.1.2 | server02 = cluster api node

    192.168.1.3 | server03 = cluster api node

    192.168.1.17 | server17 = cluster api node (temporary)

    server02 and server03 was not installed with the configurator script (as I said before, I did not risk the current production databases)

    I attempted to add api server02 and server03 by adding them in server08:/etc/cmon.cfg and installing the agent rpm and updating its cnf file and finally starting cmon on those servers.

    I also noticed on server02 that cmon is running several cmon processes at one time. Can't figure that one out but shutting it down stopped maxing out the connections.

    0
    Comment actions Permalink
  • Avatar
    Johan

    Clint,

    I have fixed the issues you have seen here.  We will build a new version and then we have fixed a lot in the cmon_install_controller.sh script that allows you to setup the correct cmon.cnf files and also shows you want grants that needs to be granted.

    I will keep you posted here when you can try the new version and it would be great if you can provide feedback on that part. 

    Best regards,

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    No problem. I'll give it a thorough review.

     

    PS - I sent some email to 'sales@' about some info but haven't received any response. Can you check on that for me?

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Clint, I just checked into our CRM system and could not see any request from you there.. can you resend it to sales@severalnines.com, in case there was a typo.. thanks.

    Johan

    0
    Comment actions Permalink
  • Avatar
    Clint Alexander

    .... sent.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk