RRDTool-Graphs are broken!

Comments

30 comments

  • Avatar
    Johan

    Hi,

    About the first problem 

    test -x  $CMON_BINDIR/cmon_rrd_all && $CMON_BINDIR/cmon_rrd_all $CMON_BINDIR > /dev/null
    ERROR: /usr/local/cmon/data/cluster_1_mysql_localhost|3306_stats.rrd: expected 408 data source readings (got 2) from N

    What version of CMON are you using?

    The second one,  looks crazy, what OS (and version= have you installed your VM?

    What rrdtool version are you using

    rrdtool --version   

    BR

    johan

  • Avatar
    Chris B.

    Hi Johan,

    thanks for your fast answer!

    I have downloaded this Package "cmon-1.1.16-64bit-glibc23-mc70.tar" from your site.

    The other infos:

    OS:

    SUSE Linux Enterprise Server 11 (x86_64) PATCHLEVEL = 1

    RRDtool:

     rrdtool --version
    RRDtool 1.3.4  Copyright 1997-2008 by Tobias Oetiker <tobi@oetiker.ch>
                   Compiled Feb 23 2009 19:49:14

    Usage: rrdtool [options] command command_options

    Valid commands: create, update, updatev, graph, graphv,  dump, restore,
                    last, lastupdate, first, info, fetch, tune,
                    resize, xport
    RRDtool is distributed under the Terms of the GNU General
    Public License Version 2. (www.gnu.org/copyleft/gpl.html)
    For more information read the RRD manpages

     

    Sorry, that is not the newest, it think!

    BR

  • Avatar
    Johan

    Hi,

    We have not tested on SUSE that much. I would like to advise the following

    0) upgrade to newer version of rrdtool (it works  with rrdtool 1.4.3), but there may be a font package that are missing?

    1) download cmon1-1.20

     wget -O cmon-1.1.20-64bit-glibc23-mc70.tar.gz http://www.severalnines.com/downloads/cmon/cmon-1.1.20-64bit-glibc23-mc70.tar.gz

    just unpack as you did with the 1.1.16, and change perhaps the symlink /usr/local/cmon  so it points to the new installation.

    2) remove old rrd graphs

    sudo rm -rf  /var/lib/cmon/*

    3) upgrade the database schema

    drop the old cmon db and recreate it with the cmon_db.sql. and cmon_data.sql files
    or
    apply in this order ( the .sql files are in cmon-1.1.20-64bit-glibc23-mc70/sql) 

    mysql -ucmon -p  cmon -h127.0.0.1  < cmon_db_mods-1.1.16-1.1.17.sql
    mysql -ucmon -p  cmon < cmon_db_mods-1.1.17-1.1.19.sql

    # the below will recreate triggers/events/sprocs so you get up to the latest;
    mysql -ucmon -p  cmon < cmon_db.sql
    mysql -ucmon -p  cmon < cmon_data.sql

    3) Use the new version of cmon_rrd_all

    and run it, example:

    /usr/local/cmon/bin/cmon_rrd_all

    What  happens?

     

    BR

    johan

  • Avatar
    Chris B.

    Hi,

    sorry for the delay!

    It is not really easy to update rrdtool in my environment, because the original SLES-Repo provides no updated version.

    However i followed your steps 1-4 (3 with applying the patches), but the graph-texts are still corrupted.

    Here the output of cmon_rrd_all:

     

    ./cmon_rrd_all
    /usr/local/cmon/cmon/bin
    Creating RRDtool DB for cluster statistics
    Done!
    Creating RRDtool DB for cluster mem usage
    Done!
    Creating RRDtool DB for cluster mem usage
    Done!
    Creating RRDtool DB for cluster tablespace usage
    Done!
    Creating RRDtool DB for mysql statistics
    Done!
    Creating RRDtool DB for node statistics
    Done!
    Creating RRDtool DB for node mem usage
    Done!
    Creating RRDtool DB for node mem usage
    Done!
    Creating RRDtool DB for node statistics
    Done!
    Creating RRDtool DB for node mem usage
    Done!
    Creating RRDtool DB for node mem usage
    Done!
    Creating RRDtool DB for mysql statistics
    Creating RRDtool DB for mysql statistics
    Done!
    Creating RRDtool DB for mysql statistics
    Creating RRDtool DB for mysql statistics
    Done!
    ERROR: /usr/local/cmon/data/cluster_1_mysql_localhost|3306_stats.rrd: expected 408 data source readings (got 2) from N

    ;-(

     

    Do you have any idea, how i can find out (debug) which fonts the rrdtool misses?!

    Thx!

  • Avatar
    Johan

    Hi,  

    strange...

    Can you try to compile rrdtool 

    wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.4.7.tar.gz

    tar xvfz rrdtool-1.4.7.tar.gz
     #the below is for Centos, so package names may be different on SUSE

    yum -y install libxml++-devel gcc make autoconf automake make libtool gcc-c++  pango-devel  perl-ExtUtils-MakeMaker

    cd rrdtool-1.4.7

    ./configure

    make

    sudo make install

    cp /opt/rrdtool-1.4.7/bin/rrdtool  /usr/bin/rrdtool

    Try again:

    ./cmon_rrd_all

    What happens?

  • Avatar
    Chris B.

    Hi,

    yes, really strange...

    When i try to compile the latest version of rrdtool it requires many new dependencies like "glib" "zlib" so it is really difficult to compile :(

    Nevertheless i found an updated version of rrdtool (<4.0.0) as an rpm-package which i have installed successfully!

    Now, my rrdtoll is on Version 1.3.7 (before 1.3.5), but i get the same error (see above).

     

    BR

  • Avatar
    Chris B.

    Hi Johan,

    today i have spend a lot of time to debug this graph-problem and i solved it now.

    What i have done:

    We know that the problem stuck in the creation of the graphs. Therfore i have checked the "cmon_create_graphs" script and picked up any graph-creat-function.

    In this case "function create_graph_cluster_stats()" in line 62. Than i see that the eecution of the rrdtool (line 92) wasn't logged --> /dev/null (line 146).

    So i changed the path to e.g. /tmp/rrdtool.log and execute "cmon_rrd_all" once again.

    The following output was logged in the rrdtool.log, which introduces the solution:


     tail -f /tmp/rrdtool.log
    PangoFc will not work correctly.
    This probably means there was an error in the creation of:
      '/etc/pango/pango64.modules'
    You should create this file by running:
      pango-querymodules > '/etc/pango/pango64.modules'
    (process:17620): Pango-WARNING **: failed to choose a font, expect ugly output. engine-type='PangoRenderFc', script='common'
    (process:17620): Pango-WARNING **: failed to choose a font, expect ugly output. engine-type='PangoRenderFc', script='latin'
    345x356



    The mentioned command "pango-querymodules" was not available on my system because it was 64bit, so i have executed the following:

    pango-querymodules-64 > '/etc/pango/pango64.modules'

     

    Now i can execute the "cmon_rrd_all" and the graphs are fine!!!

    Unfortunately the mysql-error (mentioned before) still appears! :(

     

    Have a nice weekend!

    BR


  • Avatar
    Johan

    Hi Chris, 

    excellent!  

    Did you upgrade to the 1.1.20?

    BR

    johan

  • Avatar
    Chris B.

    Yes, cmon is running on 1.1.20!

    I have followed the second way (manual updating with the sql-patches --> no drop db) which you have described.

    Should i try the first way with(!!) recreating the db to fix the mysql-problem anyway?!

    I'm continue the next steps on monday!
    Nice weekend  :)

    BR
    Chris

  • Avatar
    Chris B.

    Hi Johan,

    do you have any new Idea how to fix this mysql-issue?!

    BR

  • Avatar
    Chris B.

    Okay, i think i have solved the mysql-problem myself!

     

    The Situation:

    Every time the cronjob was started it executed the "cmon_rrd_all"-script which only executes "cmon_create_rrd", "cmon_update_rrd", "cmon_create_graphs" (in this order).
    When the "cmon_update_rrd" was executed it throws the following error:
    "ERROR: /usr/local/cmon/data/cluster_1_mysql_localhost|3306_stats.rrd: expected 408 data source readings (got 2) from N"

     

    Debugging:

    I've searched the "cmon_update_rrd"-script for the "cluster_1_mysql_localhost|3306_stats.rrd" and found it in the part beginning with line 201.
    In this part the script does the following:
    1. Check how many MySQL-Instances are available (SELECT on the "mysql.servers"-table in the cmon database)
    2. Collect all mysql-statistics from every server, which was found in the "mysql.servers"-table
    3. Executing the rrdttol to update the new statistics

    The last step is throwing the error mentioned above, so the problem have to stuck in the collection of the statistic-data or the server-table.
    Therefore i have added an "echo $mysqlservers" after line 204 and "echo test $data" after line 214.
    After the reexecution of the update-script he shows me the that:
     |3306 localhost|3306
    test 0:0:2:3:4:5:6............and so on....
    test NULL

    ==> Okay, this shows me that there are two servers listed in the "mysql.servers"-table:
    First:  |3306 (without servername)
    Second:  localhost|3306 (WITH servername)

    Therefore the error is that the SELECT-Statement from part 2 works only with the first entry in this table.

    Workaround:
    I have deleted the second server entry and the script is now working!

     


    I don't know why there were 2 entries with the same port in this table allthough i have installed 2 MySQL-APIs (with different ports!!! 3306+3307) locally in my virtual machine(cmon mode=dual -->controller+agent). Is it possible that the dual-mode don't support checking 2 local MySQL-APIs or how can i add the monitoring fot the second mysql-api?!

    Nice weekend!


    BR
    Chris

  • Avatar
    Johan

    Great you solved this.

    This has to do with problems with resolving the hostname.

    We have fixed this in 1.1.19 and onwards. There is a new parameter called 'hostname' in the cmon.cnf files.

    The 'hostname' should be set to:

    hostname=<the ip of the host the agent is running on OR the hostname, but hostname must not resolve to 127.0.0.1>

    How to know if the hostname resolves to 127.0.0.1 ?

    # hostname -i 

    will tell.

    Previously, we did a gethostbyname or gethostbyaddr and that worked poorly due to the wide variety how users have their /etc/hosts files or dns resolve.

    I will put your own comment as the answer.

    Thank you!

    Johan

  • Avatar
    Chris

    Hi Johan

     

    I`ve begin testing ClusterControl™ for MySQL Galera just today. So far it` good, sync works well but I have a problem with cmon_rrd_all. It just stuck and consume CPU. I think, this topic is related to it. I`ve also upgraded cmon to 1.1.21 (my system is Debian, so TAR file was used - untar, new link, DB upgrade). Problem is still there.

     

    Any ideas? Manual call of cmon_rrd_all shows:

    root@debian-cc:/usr/local# cmon_rrd_all
    /usr/bin/
    ERROR: /var/lib/cmon//cluster_1_stats.rrd: expected 9 data source readings (got 1) from N

    ... and stuck (45% of CPU is taken).

     

    Chriss

    BTW Great tool! Good job :)

  • Avatar
    Chris

    quick update:

     

    I`ve added a few echos to cmon_rrd_all and it stuck on cmon_create_graphs. As this script is not small, no idea how to debug it.

  • Avatar
    Johan

    Hi Krzysztof,

    Thank you. 

    It would be great if you can check the CPU when and when it does the cmon_create_graphs, if you can do

    ps -ef | grep rrd

    and send what you get then.

    It would be interesting to see exactly what it stumbles one.

    When you say gets stuck at 45% - does it go down to 0 after a while, or is it stuck indefinately?

    Best regards,

    Johan

  • Avatar
    Chris

    Hi


    I`ve put a loots of ECHOs into cmon_rrd_log script to trace what is going on. And I see that create_graph_cluster_mysql_stats_counter i called so many times. Lots of "counters", each single call of create_graph_cluster_mysql_stats_counter takes apox 2sec! Total execution time of cmon_rrd_all on my virtual machine (XEN, 2VCPU E5420, 1GB) take over 7 minutes.


    So when CRON call this script every 5min, after 24h load of the machine is.... whait for it.... 54! Some simple lock mechanism is required, to not call script if last call is still running. Is the 5min period in CRON importent? Can I change it to 10min?


    Is that state ok? Maybe some tune needs to be done (index, table refactoring etc). 

     

    Chris

  • Avatar
    Johan

    Hi,

    rrdtool is quite slow, but the 5 min period is important for feeding the RRD database (it assumes data every 5mins), but graphs can be generated less often.

    Attached you can find a new /usr/bin/cmon_rrd_all  and a new cron job script that runs the graphs less often.

    Put cmon_rrd_all  in /usr/bin/

    and the cmon file in:

    /etc/cron.d/cmon

    Let me know how it works with that.

    A lock file would be the next step.

    BR
    johan

  • Avatar
    Chris

    Hi

    Johan, this is it. Now the load is ok, no conjunctions, only disk space is blowing up. What are the recommended minimums for RAM and disk? 

    Chris  

  • Avatar
    Laurent

    Hi Johan,

    Sorry but I'm not properly understanding what you mean :

    The 'hostname' should be set to:
    > hostname=<the ip of the host the agent is running on OR the hostname, but hostname must not resolve to 127.0.0.1>

    Could you please confirm if I have :
    node1 = 192.168.0.1
    node2 = 192.168.0.110
    node3 = 192.168.0.120
    ClusterControl Server with mysqld server = 192.168.0.130

    What should be the IP address that should be filled in each hostname= entry for each cmon.cnf's node please ?

    Thks.
    BR,

    Laurent 

  • Avatar
    Johan

    Hi Laurent,

    On node1's cmon.cnf you should set:
    hostname=192.168.0.1

    On node2's cmon.cnf you should set:
    hostname=192.168.0.110 

    On node3's cmon.cnf you should set:
    hostname=192.168.0.120

    And on clustercontrol server's cmon.cnf file:
    hostname=192.168.0.130

    This is just to avoid potential problems with hostname resolve.

    I hope this helps.

    Best regards,

    Johan

  • Avatar
    Laurent

    Ok, thks a lot for your quick answer and time ! :)

    I was asking this because I currently have a problem with my ClusterControl for Galera test environment which is not displaying properly the status for two of my 3 available nodes (Mysqld is tuning properly on nodes but cluster control reports not running) ? attached is a screenshot so I was wondering if this parameter was not culprit but it seems not ...
    Do you know how the mysqld status is checked internally against all nodes please ? 

    Thks,
    BR.

    Laurent 

  • Avatar
    Johan

    What is written in /var/log/cmon.log on 192.168.0.110  and 192.168.0.130 

    It looks like there is a privilege problem.

    Thank you,

    Johan

     

     

  • Avatar
    Laurent

    Johan,

    By the way I also did a test with the cron_rrd_all script and I also have the following entries reported each time it is executed :

    ERROR: /var/lib/cmon//cluster_1_stats.rrd: expected 9 data source readings (got 1) from N
    ERROR: /var/lib/cmon//cluster_1_mysql_cygnus|3306_stats.rrd: expected 363 data source readings (got 340) from N
    ERROR: /var/lib/cmon//cluster_1_mysql_vmdebian1|3306_stats.rrd: expected 363 data source readings (got 348) from N
    ERROR: /var/lib/cmon//cluster_1_mysql_vmdebian2|3306_stats.rrd: expected 363 data source readings (got 348) from N

    Maybe it is related with the impossibility to view some graphs with the message 'Graph requires rrdtool', (see attached screenshot), but beside I already have rrdtool installed :

    root@vmdebian3:/var/log# which rrdtool
    /usr/bin/rrdtool
    root@vmdebian3:/var/log# /usr/bin/rrdtool -v
    RRDtool 1.4.3 Copyright 1997-2009 by Tobias Oetiker <tobi@oetiker.ch>
    Compiled Mar 24 2010 16:17:45

     Thanks.
    BR,

    Laurent 

  • Avatar
    Laurent

    Johan,

    > What is written in /var/log/cmon.log on 192.168.0.110  and 192.168.0.130 
    > It looks like there is a privilege problem.

    Nothing really interesting for my opinion except another problem related with a timeout for sending alerts mail but it will be another subject in another thread I think lol :)
    I attached you the two log files to this comment, please consider the end of files as there was some misconfigurations at the beginning at the logfile which were solved by now.

    Thks.
    BR,

    Laurent 

  • Avatar
    Chris B.

    Hi Laurent,

    @Graphs: May be it is the same problem as i have (see above). Try deleting (you can export the table first if you want) the other (not the first!) server-entries in table "mysql_servers" in the "cmon"-database and after that take a look on the graphs and the logs. If the graphs are okay you can try to manually add the mysqld's in the cmon-frontend.

    BR

    Chris

  • Avatar
    Laurent

    Hi Chris,

    Thks for your answer.
    I read your previous post but when checking this 'mysql_server' table entries, I saw the three nodes composing my Galera cluster so it seems ok for me ? (see attached screenshot)
    By the way, I can have a try to mysqldump the table and then delete entries in there ? what shoould I do after deleting the entries : adding again the mysql servers in Cluster Control Web frontend ?

    Thks.
    BR,

    Laurent 

  • Avatar
    Chris B.

    Yes, test it in that order please:

    1. Make a dump of the "mysql_servers" table (backup)

    2. Delete the second and the third entry in the "mysql_servers"-table.

    3. restart cmon

    4. Take a look on your graphs (may be it will take a few moments). Are they okay now?


    If the graphs are okay, you can try to add the servers manually in the frontend.

    Good luck!

  • Avatar
    Johan

    @chris - many thanks for your input here.

    @laurent - if the problem persists perhaps we can setup  a remote session and see why. Please contact me on johan@severalnines.com

  • Avatar
    Laurent

    Hi,

    I finally tried to dump mysql_server table for backup purpose then delete two entries in there and mysql servers for the two nodes appeared as OK in the GUI but after a while that was the first node which was OK before that finally get : disconnected ( even if really it was UP on the host) ...
    Beside, there is a lot of problems with mails alerts that seems not working, cron for graph is set to run each 5 mins but the script takes more than 5 mins so there is some problems and no check for an already running session.
    I will try to completely reinstall the solution to be sure but at the moment it seems very unstable to me. 

    BR,

    Laurent 

  • Avatar
    Johan

    Hi Laurent,

    Sorry for the late reply, we missed this note.

    I would be more than happy to remote assist you on this if possible. . Please contact me on johan@severalnines.com

    Best regards,

    Johan

Please sign in to leave a comment.

Powered by Zendesk