Ez egy élesben is működő megoldás, amit elszabaduló weboldalak és egyebek miatt kellett létrehozni, hogy kiderüljön pontosan mi okozza az adott szerveren a memória elfogyását és ezzel a szerver kifekvését. Eredetileg a coderprofile-ra dobtam fel ezt a kódot, így a kódhoz tartozó leírás is angolul következik:
Description
This is a bash script that i made when i was bored on a saturday night.
There were memory issues with one of our systems and we couldn’t debug it, because nobody really monitored the server, we just know that in 24hours the memory fills and the server dies. So i created this script to warn me when the memory usage goes up and list me the processes with the highest memory usage.
This wasn’t a solution for the problem. :-) Later i manually checked the server in every hour, listed the processes with ‘ps’, etc. Then i found the problem. We had a cronjob that opened a link with lynx in every 5 minutes, but lynx never exited, so in every 5 minutes the processes were groving with one more lynx and the first version of this script couldn’t detect it, because we were waiting for a huge memory usage of one process, but the cronjob made a many processes with low memory usage, so the server died again. :-)
Then i developed this script to the version as you can see now, this monitors the whole memory usage too, counts with the cached memory size and else. So now it can really warn you if the free memory is running out.
In the first section of the script you can find the Config, where you should adjust the values to your system, especially for the limit_vsz, limit_rss, i just set them based on what i saw in ‘top’.
Also if you disable the mail_cmd then email won’t be send, but you will get a nice output in the console, but you may check the warn email too. ;-)
PS.: i have some trouble with the mail send part, it will be better to set an email address and subject in the config and then use it in the end of the script for the mail send, but on our server it made a lot of errors i couldn’t make out, althought i copied solutions from tutorials.. so that’s why i made a mail_cmd config variable instead.
Technical
OS: Linux (tested on Debian 3)
Requirements: probably nothing more than a Linux server has already.
Installation: it should be runned as a cronjob, although it has a nice console output.
Setup: in the first (config) section, the variables should be changed to suit your needs, especially the mail_cmd that will be used to send the warning email (if needed).
Config variables and meanings:
- mail_cmd – command to send email
- limit_vsz – virtual memory usage size limit per process (in kilobyte)
- limit_rss – normal memory usage size limit per process (in kilobyte)
- limit_warning – percentage of the used memory limit -> warn email send
- limit_critical – percentage of the used memory limit -> critical warn email send
- limit_report_lines – number of top memory using processes to list in email
- report_file – here will be the contents of the email generated
- memget_file – a temporary file to store data returned be ‘free’
Cron line for running the script every 5 minute (ignoring console output):
*/5 * * * * bash /path/to/file/check_memory_usage.sh &> /dev/null
Source code
#!/bin/bash ############################################################################# # Check memory usage script # # This script will check for the highest virtual and physical memory usage. # # If they exceed a given limit then a report will be sent to a given email. # # Written by: Tommey <http://coderprofile.com/coder/Tommey> @ 2009.02.27 # ############################################################################# # Config - set mail sending command, limits (in kB) , report tmp file mail_cmd="mail monitor@email-address.com -s MemUsageCheck" limit_vsz=20000 # in kB limit_rss=10000 # in kB limit_warning=90 # in % limit_critical=95 # in % limit_report_lines=10 report_file="/tmp/check_mem_usage_report" memget_file="/tmp/check_mem_usage_get" # Init echo "---------------------------" echo "Check memory usage (script)" echo "---------------------------" send_email=0 echo "Memory Usage Statistics (MB)" > $report_file free -m -t > $memget_file get_memory=`cat $memget_file | head -2 | tail -1 | awk '{print " Memory: "$2"\t"$3"\t"$4"\t"$7}'` get_swap=`cat $memget_file | head -4 | tail -1 | awk '{print " Swap: "$2"\t"$3"\t"$4}'` get_total=`cat $memget_file | tail -1 | awk '{print " Total: "$2"\t"$3"\t"$4}'` get_max=`cat $memget_file | tail -1 | awk '{print $2}'` get_total_used=`cat $memget_file | tail -1 | awk '{print $3}'` get_cached=`cat $memget_file | head -2 | tail -1 | awk '{print $7}'` get_real_used=`printf "%.0lf" $(echo "scale=4; $get_total_used-$get_cached" | bc)` get_limit_warning=`printf "%.0lf" $(echo "scale=4; $get_max*$limit_warning/100" | bc)` get_limit_critical=`printf "%.0lf" $(echo "scale=4; $get_max*$limit_critical/100" | bc)` get_usage_percent=`printf "%0.2lf" $(echo "scale=4; $get_real_used/$get_max*100" | bc)` get_physical=`cat $memget_file | head -2 | tail -1 | awk '{print $3" "$2" "$7}'` get_virtual=`cat $memget_file | head -4 | tail -1 | awk '{print $3" "$2}'` echo -n "Checking total memory usage..." if [ $get_limit_warning -lt $get_real_used ] && [ $get_real_used -lt $get_limit_critical ] then echo "WARNING, usage is above $limit_warning% -> $get_usage_percent%" echo " " >> $report_file echo " WARNING - Total memory usage is above $limit_warning%!" >> $report_file echo " " >> $report_file let send_email+=1 elif [ $get_limit_critical -lt $get_real_used ] then echo "CRITICAL, usage is above $limit_critical% -> $get_usage_percent%" echo " " >> $report_file echo " CRITICAL - Total memory usage is above $limit_critical%!" >> $report_file echo " " >> $report_file let send_email+=1 else echo "OK, usage: $get_usage_percent% - $get_real_used MB / $get_max MB" printf " - Physical: %3d MB / %3d MB, cached: %d MB\n" $get_physical printf " - Virtual: %3d MB / %3d MB\n" $get_virtual fi echo " Total Used Free Cached" >> $report_file echo "$get_memory" >> $report_file echo "$get_swap" >> $report_file echo "$get_total" >> $report_file echo " " >> $report_file # Check virtual memory usage max_vsz=`ps axo vsz --sort -vsz | head -2 | tail -1` report_vsz=`ps axo pid,pcpu,vsz,rss,comm,cmd --sort -vsz | head -$(($limit_report_lines+1))` echo -n "Checking virtual memory usage..." if [ $max_vsz -gt $limit_vsz ] then echo "EXCEEDED with size: $(($max_vsz-$limit_vsz)) kB, limit: $limit_vsz kB" echo "Top $limit_report_lines processes by Virtual Memory Usage (limit: $limit_vsz kB)" >> $report_file echo "$report_vsz" >> $report_file echo " " >> $report_file let send_email+=1 else echo "OK - max: $max_vsz kB, limit: $limit_vsz kB" fi # Check physical memory usage max_rss=`ps axo rss --sort -rss | head -2 | tail -1` report_rss=`ps axo pid,pcpu,vsz,rss,comm,cmd --sort -rss | head -$(($limit_report_lines+1))` echo -n "Checking physical memory usage..." if [ $max_rss -gt $limit_rss ] then echo "EXCEEDED with size: $(($max_rss-$limit_rss)) kB, limit: $limit_rss kB" echo "Top $limit_report_lines processes by Physical Memory Usage (limit: $limit_rss kB)" >> $report_file echo "$report_rss" >> $report_file let send_email+=1 else echo "OK - max: $max_rss kB, limit: $limit_rss kB" fi # Check if report created and send if [ $send_email -gt 0 ] then echo "$send_email check(s) FAILED!" echo -n "Sending notify email..." $mail_cmd < $report_file echo "DONE." else echo "All checks PASSED!" fi rm -f $report_file