15 January 2014

Capturing per process bandwidth usage using nethogs

On some of our servers at directi, we capture the bandwidth per process using nethogs on an interface over a period of time. The official description of nethogs from its author is as follows.

http://nethogs.sourceforge.net/

NetHogs is a small 'net top' tool. Instead of breaking the traffic down per protocol or per subnet, like most tools do, it groups bandwidth by process. NetHogs does not rely on a special kernel module to be loaded. If there's suddenly a lot of network traffic, you can fire up NetHogs and immediately see which PID is causing this. This makes it easy to indentify programs that have gone wild and are suddenly taking up your bandwidth.Since NetHogs heavily relies on /proc, it currently runs on Linux only.

The -t parameter : Trace mode

This is the bug-hunt mode. You enable it and collect the output stream over a period of time. From this log you calculate the necessary results. Nethogs refreshes every second and gives the output in kbps.

How ?

Start nethogs to listen on some interface and grab the output to some file.

    /usr/sbin/nethogs -t eth1 &> output

Now you might want to run this in the background. And the solution is going to be simple.

    nohup /usr/sbin/nethogs -t eth1 &> output &

But remember that, that script is going to run forever unless you manually kill it. Now you have to follow the time-out-and-move-on paradigm. There are many ways to do it in bash, but the simplest and most efficient way is this:

    sh -ic "{ /usr/sbin/nethogs -t eth1 &> output; \
    kill 0; } | { sleep $TIMEOUT; \
    kill 0; }" 3>&1 2>/dev/null

The output:

The output of the nethogs tracemode will be something like this.

Refreshing:
/usr/sbin/exim/689041/47 0.0167969 0.289844
10.10.10.10:80-11.11.11.11:58485/0/0 0.0105469 0.194141
pop3/703049/2926 2.46855 0.140625
/usr/local/apache/bin/httpd/700895/99 4.44023 0.0949219
/usr/local/apache/bin/httpd/669759/99 1.69883 0.0832031
/usr/local/apache/bin/httpd/678264/99 1.71016 0.0738281
/usr/local/apache/bin/httpd/673811/99 2.99844 0.0730469
/usr/local/apache/bin/httpd/578292/99 0.588281 0.0621094
unknown TCP/0/0 0.855469 0.0503906
/usr/local/apache/bin/httpd/701327/99 3.65273 0.046875
pure-ftpd (PRIV)/680421/0 0.0574219 0.0460938
imap-login/88707/97 0.0183594 0.0271484
sshd: boopathi@pts/7/656215/32076 0.0730469 0.0222656
12.12.12.12:80-13.13.13.13:58271/0/0 0 0.0144531
/usr/local/apache/bin/httpd/677948/99 0.291797 0.0117188
Unknown connection: 60.60.60.60:80-66.66.66.66:36813
Unknown connection: 100.200.105.205:80-65.56.56.56:42293

And the fields (for the lines that are not either a warning or an error) are

Process / Process id / User id    sent kbps    recv kbps

Parser

Source code: https://github.com/boopathi/nethogs-parser

Automating

Let's take into consideration the output stream of nethogs - specifically number of bytes per second and assume that the file size will not be too high for logs fetched over a particular time period. Let's fix this as 1hour. So, for 1hour nethogs is going to listen on some interface and for the next hour a new process will be invoked. The simplest way to do is to write the script that will run for an hour and call it in an hourly cron.

monitor.sh:

    #!/bin/bash


    TIMEOUT=3600
    IFACE=eth0
    TIMESTAMP=`date +%b_%d_%Y_%H_%M_%S`
    OUTPUT=/var/log/nethogs/$TIMESTAMP.log


    sh -ic "{ /usr/sbin/nethogs -t $IFACE &>$OUTPUT; \
    kill 0; } | { sleep $TIMEOUT; \
    kill 0; }" 3>&1 2>/dev/null

cron: /etc/cron.d/nethogs.cron

0 5,6,7,10,11,20,21 * * *    root    /path/to/monitor.sh

newer → ← older

Hi, I'm Boopathi Rajaa.

An Infatuation for Breaking Things