> Indeed, unlike on BSD systems, signals under Linux are reset to their
> default behavior when raised.
I've put a new release of hf here:
ftp://ftp.ee.lbl.gov/hf.tar.Z
It uses the setsignal() routine I wrote for tcpdump that addresses the
various behaviors of signal(). I believe under linux it uses
sigaction() to setup a persistent signal handler.
My only linux died mysteriously a few months ago and resited attempts
to reinstall the OS so I can't easily test this change; so if it tests
out ok, please let us know so Vern can include this version of hf in
the next bro distribution.
Craig
Hi Vern and other Bro users,
We had some trouble with the bro tool hf when resolving names of 150k
lines logs...
The problem is when using Linux and that names are not found on dns,
gethostbyaddr hangs.
We saw in the code that you (Vern) were already aware of this problem and
that you already implement a timeout mechanism (-t time option).
But even when using the -t option, it doesn't help because of a bug in the
program.
Indeed, unlike on BSD systems, signals under Linux are reset to their
default behavior when raised.
Reinstalling signal during interrupt procedure doesn't work due to the
longjmp which does not allow interruption to finish properly.
So, the consequence of all this is that alarm in hf only worked once...
Solution is to save not only the environment but also the signal mask.
Doing this way, once we've come back from interruption SIGALRM mask is set
again.
So, we suggest to patch hf.l as follow :
1) replace all "setjmp(alrmenv)" with "sigsetjmp(alrmenv,1)"
2) replace "longjmp(alrmenv,1)" with "siglongjmp(alrmenv,1)"
Some other small suggestions:
1) put alarm(0) just after doingdns=0
2) move alarm(tmo) & doingdns=1 after "if (sigsetjmp(alrmenv)) { } "
Best regards,
Alexandre Dumortier & Patrick Verstraete
Universite catholique de Louvain, Belgium
> without seeing a SYN-ack from B.80 in between. This then leads to
> Bro holding state for the half-established connection after it sees
> A.1234 -> B.80.
I should add that I diagnosed this because the connection summaries
Bro generated on stdout looked like:
925897359.600000 0.26 http ? 1775 199.108.25.84 130.104.28.234 SHR X
"SHR" indicates a half-stablished connection that was closed by the
responder. (It's the responder in this case because the only packets
Bro saw were the SYN-ack [rather than the SYN] and the FIN.)
This is a highly unusual state for normal traffic, i.e. when Bro sees
both sides of the connections.
Vern
Thanks for sending the trace. The problem is that either you have
split routing, in which the monitor isn't seeing both sides of most
connections, or the packet filter is dropping a whole lot of packets,
so that effectively the monitor again doesn't see both sides.
So Bro sees patterns like:
A.1234 -> B.80 SYN
...
A.1234 -> B.80 FIN
without seeing a SYN-ack from B.80 in between. This then leads to
Bro holding state for the half-established connection after it sees
A.1234 -> B.80.
That's arguably a bug, it should just flush the connection after it sees
the half-close. The patch below makes it does this, and then instead of
requiring 100+ MB to process the file you sent me, it needs about 20 MB.
Give it a try and let me know how well it works.
Vern
*** TCP.cc- Thu May 6 16:49:13 1999
--- TCP.cc Thu May 6 16:50:26 1999
***************
*** 1711,1718 ****
// connection has likely terminated.
if ( (orig->did_close && resp->did_close) ||
(orig->state == TCP_RESET ||
! resp->state == TCP_RESET) )
! { // Either both closed, or one RST.
// The Timer has Ref()'d us and won't Unref()
// us until we return, so it's safe to have
// the session remove and Unref() us here.
--- 1711,1720 ----
// connection has likely terminated.
if ( (orig->did_close && resp->did_close) ||
(orig->state == TCP_RESET ||
! resp->state == TCP_RESET) ||
! (orig->state == TCP_INACTIVE ||
! resp->state == TCP_INACTIVE) )
! { // Either both closed, or one RST, or half-opened.
// The Timer has Ref()'d us and won't Unref()
// us until we return, so it's safe to have
// the session remove and Unref() us here.
Thanks Vern for your help
As suggested, we've tried :
bro -i eth0 -w bro.dump -f "(tcp[13] & 0x7 != 0) or tcp port telnet or tcp
port finger or tcp port ftp or port 111" ../policy/mt.bro >> bro.out 2>>
bro.err
But unfortunately it didn't help.
The only difference is that it take a bit longer before crashing.
If you want, we will send you with another mail the compressed dump
file (3Mb).
Alexandre Dumortier
Patrick Verstraete
UCL, Belgium
This is the 'ps' output taken every 10 minutes since Bro has started:
Wed May 5 11:50:01 CEST 1999
100100 0 9775 1 12 5 7800 7256 R N p0 0:20 bro
-i et
Wed May 5 12:00:00 CEST 1999
100100 0 9775 1 10 5 16188 15644 R N p0 0:50 bro
-i et
Wed May 5 12:10:00 CEST 1999
100100 0 9775 1 10 5 24244 23708 R N p0 1:20 bro
-i et
Wed May 5 12:20:00 CEST 1999
100100 0 9775 1 10 5 32624 32088 R N ? 1:50 bro
-i et
Wed May 5 12:30:00 CEST 1999
100100 0 9775 1 12 5 40348 39812 R N ? 2:19 bro
-i et
Wed May 5 12:40:00 CEST 1999
100100 0 9775 1 12 5 46248 45712 R N ? 2:45 bro
-i et
Wed May 5 12:50:00 CEST 1999
100100 0 9775 1 11 5 53040 52504 R N ? 3:10 bro
-i et
Wed May 5 13:00:02 CEST 1999
100100 0 9775 1 13 5 59320 58784 R N ? 3:30 bro
-i et
Wed May 5 13:10:04 CEST 1999
100100 0 9775 1 9 5 66356 58788 R N ? 3:51 bro
-i et
Wed May 5 13:20:03 CEST 1999
100100 0 9775 1 13 5 72412 58052 R N ? 4:12 bro
-i et
Wed May 5 13:30:08 CEST 1999
100100 0 9775 1 11 5 79700 58908 wait_on_pag D N ? 4:35 bro
-i et
Wed May 5 13:40:09 CEST 1999
100100 0 9775 1 11 5 86904 58608 wait_on_pag D N ? 4:59 bro
-i et
Wed May 5 13:50:11 CEST 1999
100100 0 9775 1 9 5 94572 58756 wait_on_pag D N ? 5:24 bro
-i et
At this point, Bro crashed
> > > We've got some trouble with bro...
> > > After about 2 hours running bro (mt script), bro crash with a :
> > > "Virtual Memory exceeded in 'new'" Error.
> >
> > How large a volume traffic stream are you monitoring? (how many hosts,
> > connections/sec, raw link speed) What filter (bro -F) are you using?
>
> # hosts: about 60
> # connections/sec: no idea. A lot of HTTP connections
> # raw link speed: 10Mb/s (ethernet-shared)
That's not much load at all. (Does it really run out of memory in 2 hours?
Later you discuss running it over the weekend, which sounds like you run it
a lot longer than 2 hours.)
However, I wonder if:
> Bro runs with no filter specified (bro -i eth0 mt.bro)
this is tickling a memory leak somewhere, since I always run it with a
filter so it only captures the traffic it's interested in. Try running
with the following filter:
-F "(tcp[13] & 0x7 != 0) or tcp port telnet or tcp port finger or tcp port ftp or port 111"
and let me know if that does the trick. If not, and if you're willing to
send me a trace file (you can make one using bro -w <file>), then I'll see
if I can find the problem.
> Another remark we have. During our monitoring of the network, we get
> entries in bro.log:
> pm_getport unknown-1073741824 (timeout)
> how could such a huge port number be used ?
That's a 32-bit portmapper port, not a 16-bit TCP/UDP port. See /etc/rpc
(and Bro's portmapper.bro) for mappings from numbers to ports.
Vern
> We've got some trouble with bro...
> After about 2 hours running bro (mt script), bro crash with a :
> "Virtual Memory exceeded in 'new'" Error.
How large a volume traffic stream are you monitoring? (how many hosts,
connections/sec, raw link speed) What filter (bro -F) are you using?
> Can someone suggest me what I can do about this (I've already increased
> RAM and swap from 32Mb to 64Mb) ?
For our environment (5000+ hosts, FDDI link) we run with a lot more memory
than that.
> Is there a way to check what part of bro is taking whole this memory (stream
> buffers or active sessions table, ...)
It will mostly be the active sessions.
Vern