Hi all,
Recently I have some problems with Bro and PF_RING in cluster.
On my server, when I have less than 32 worker threads(rings),
everything is okay, but when I use worker threads more than 32, pf_ring
start to receive repeating data packets. For example, rings less than 32, I
send 400000 packets to server and pf_ring info in /proc shows there is
400000 packets in rings, but when rings greater than 32, I can get 800000
packets when 33 rings and 1200000 packets when 34 rings and so on.
I guess if there is some rules that a pf_ring or a bro cluster can only
support less than 32 rings or worker threads on a server or some other
reasons?
Any insight would be helpful.
Zeek community,
We’re writing to let you know about zq <https://github.com/brimsec/zq>, an open source command-line processor for structured logs, built for Zeek. (In fact, we’ve been told zq is “like zeek-cut on steroids”.)
Those of you who were on the “Ask the Zeeksperts” call on January 16th saw Seth Hall and Justin Azoff give an early peek of zq (thanks guys!), so this is just an “official” announcement. Come one, come all!
You can get involved by:
• Checking out the zq GitHub repo <https://github.com/brimsec/zq> for install info, code, and docs
• Joining our public Slack <https://join.slack.com/t/brimsec/shared_invite/enQtOTMwMDczODg2ODgyLTk1NTdj…> workspace for announcements, Q&A, and to trade query ideas
• Contacting us directly via email <mailto:info@brimsecurity.com> to schedule a Zoom videoconference
All you need is some Zeek logs (and there’s sample logs <https://github.com/brimsec/zq-sample-data> to help you get started). Here’s just a taste of what’s possible:
- A table of top hosts in a subnet that are experiencing the most SYNs-without-ACK:
zq -f table "10.164.94.0/24 conn_state=S0 | count() by id.orig_h | sort -r" *
- A regex search for certain HTTP methods, with full events output as NDJSON:
zq -f ndjson "method=/^(PUT|PATCH|UPDATE)$/" *
- Connections open a long time with low traffic, printed as a Zeek TSV log:
zq -f zeek "duration>1000 orig_bytes<10 resp_bytes<10" *
Of course, that’s just scratching the surface. Please try it out and let us know what you think on GitHub <https://github.com/brimsec/zq> or Slack <https://join.slack.com/t/brimsec/shared_invite/enQtOTMwMDczODg2ODgyLTk1NTdj…>.
Happy hunting, Zeeking, & zq’ing!
--
The Brim team
Hi Everyone,
with the Zeek 3.1 release around the corner, I just wanted to outline my
current plan for the binary packages.
As we outlined in
https://blog.zeek.org/2019/04/new-zeek-release-schedule.html, 3.1 will
be the first “feature release” which will exist alongside Zeek 3.0
(which sill still get patches).
I currently plan to update the “zeek” package to 3.1, and to
introduce a new zeek-lts package for people who want to stay on 3.0. The
zeek-lts package will continue to track 3.0 until zeek 4.0 is released -
at which point zeek-lts will be updated to 4.0. This means with the 4.0
release zeek-lts and zeek will essentially be the same package - until
the release of 4.1 when they will diverge again.
Please let me know if you have any feelings about this - if I don’t
hear back anything I will create the zeek-lts package in the next few
days - and write another message about it to this thread.
Johanna
I'm seeing an interesting problem on zeek 3.0.1 (running stock
SecurityOnion sensor setup) where the main thread suddenly spikes to 100%
CPU and stays there.
The base setup: Ubuntu 16.04 kernel 4.15.0-88-generic, zeek is using
AF_PACKET to distribute packets across 8 worker processes. Baseline load
for each worker is around 18-22% CPU. Running strace for 1 second and
filtering with 'sort strace.out | uniq -c' on a normal worker looks like
this:
31 futex(0x7f9586384a38, FUTEX_WAKE_PRIVATE, 1) = 1
33 futex(0x7f9586384a64, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f9586384a60,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
1 futex(0x7f9586d03a0c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f9586d03a08,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
6671 nanosleep({0, 1000}, NULL) = 0
1 read(12, "@", 1) = 1
1 read(14, "@", 1) = 1
26 read(16, "@", 1) = 1
6699 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 1 (out [0],
left {0, 0})
198 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 2 (in [11],
out [0], left {0, 0})
1 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 2 (in [12],
out [0], left {0, 0})
1 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 2 (in [14],
out [0], left {0, 0})
21 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 2 (in [16],
out [0], left {0, 0})
1 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 3 (in [11 16],
out [0], left {0, 0})
1 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0} <detached ...>
101 select(17, [6 8 10 12 14 16], [], [], {0, 0}) = 0 (Timeout)
92 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
Notice there are close to the same number of nanosleep calls as there are
select calls.
After some time - i've seen it happen within 4 minutes of start to several
hours afterwards - the usage suddenly spikes to 100%. Packets continue to
be processed, and the load on the remaining workers stays about the same,
as does the load on logger, manager, and proxy. Changing the number of
worker processes doesn't seem to prevent it. There is no degredation in
output logging, since I have enough cores to compensate for that single hot
process. Running strace for 1 second and filtering with 'sort strace.out |
uniq -c' looks like this:
1 futex(0x7f270ef7ea38, FUTEX_WAKE_PRIVATE, 1) = 0
30 futex(0x7f270ef7ea38, FUTEX_WAKE_PRIVATE, 1) = 1
35 futex(0x7f270ef7ea64, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f270ef7ea60,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
1 futex(0x7f270f900a6c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f270f900a68,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
1 poll([{fd=16, events=POLLIN}], 1, -1) = 1 ([{fd=16,
revents=POLLIN}])
1 read(12, "@", 1) = 1
1 read(14, "@", 1) = 1
28 read(16, "@", 1) = 1
1 select(12, [10 11], [0], [0], {0, 0}) = 1 (out [0], left {0, 0})
21703 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 2 (in [16],
out [0], left {0, 0})
141 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 3 (in [11 16],
out [0], left {0, 0})
1 select(17, [6 8 10 11 12 14 16], [0], [0], {0, 0}) = 3 (in [12 16],
out [0], left {0, 0})
109 select(17, [6 8 10 12 14 16], [], [], {0, 0}) = 1 (in [16], left
{0, 0})
106 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
Note the complete lack of nanosleeps and the jump in the number of calls to
select. Also notice that fds 12, 14, and 16 continue to be read, but
select always returns with data available on fd16. procfs shows this (also
showing fd17 since that appears to be the other end of the pipe):
root:/proc/32327# ls -l fd/16 fd/17
lr-x------ 1 root root 64 Feb 20 20:43 fd/16 -> pipe:[48070104]
l-wx------ 1 root root 64 Feb 20 20:43 fd/17 -> pipe:[48070104]
root:/proc/32327# cat fdinfo/16
pos: 0
flags: 02004000
mnt_id: 12
root:/proc/32327# cat fdinfo/17
pos: 0
flags: 02000001
mnt_id: 12
lsof output shows these file descriptors in use on the main thread (and
thus on all):
zeek 32327 sguil 3u a_inode 0,13 0 12847 [eventpoll]
zeek 32327 sguil 4r FIFO 0,12 0t0 48059893 pipe
zeek 32327 sguil 5w FIFO 0,12 0t0 48059893 pipe
zeek 32327 sguil 6r FIFO 0,12 0t0 48059895 pipe
zeek 32327 sguil 7w FIFO 0,12 0t0 48059895 pipe
zeek 32327 sguil 8r FIFO 0,12 0t0 48059896 pipe
zeek 32327 sguil 9w FIFO 0,12 0t0 48059896 pipe
zeek 32327 sguil 10u IPv4 48059899 0t0 UDP
localhost:55072->localhost:domain
zeek 32327 sguil 11u pack 48059900 0t0 ALL
type=SOCK_RAW
zeek 32327 sguil 12r FIFO 0,12 0t0 48048835 pipe
zeek 32327 sguil 13w FIFO 0,12 0t0 48048835 pipe
zeek 32327 sguil 14r FIFO 0,12 0t0 48048836 pipe
zeek 32327 sguil 15w FIFO 0,12 0t0 48048836 pipe
zeek 32327 sguil 16r FIFO 0,12 0t0 48070104 pipe
zeek 32327 sguil 17w FIFO 0,12 0t0 48070104 pipe
zeek 32327 sguil 18u IPv6 48038789 0t0 TCP *:47770
(LISTEN)
zeek 32327 sguil 19u IPv4 48038790 0t0 TCP
localhost:47996->localhost:47761 (ESTABLISHED)
zeek 32327 sguil 20u IPv4 48038791 0t0 TCP
localhost:42076->localhost:47763 (ESTABLISHED)
zeek 32327 sguil 21u IPv4 48038792 0t0 TCP
localhost:34920->localhost:47762 (ESTABLISHED)
Of the 7 threads created, the 4th (judging by the thread's pid) is calling
'write(17, "@", 1)' 10x per second and getting a return value of 1.
Any ideas what might be wrong? Any suggestions for further diagnosis?
These are in production, so I can't do too much other than a restart and an
occasional strace. I cannot reproduce in lab conditions.
I've just upgraded 2 of my sensors from 2.6.4 to 3.0.1, and this is
happening on one only, but its the more heavily loaded one. I'm hoping to
resolve this before upgrading the remaining 10 sensors, as I don't want to
see it on others...
--
Pete
Hi,friends:
I use restrict_filters to filter the traffic. but the settings did not take effect, all of the traffic was filtered. What should I do?
My script is as follows:
redef restrict_filters += {
["unmonitored host"] = "host 123.2.15.75"
};
I am looking forwoard to your replay. Thakns.
Good morning,
I'm new to the list, and have been working on inheriting an existing zeek
deployment that we have here. I'm trying to track down some (to me)
excessive packet dropping.
We had an older version of zeek (bro) installed and mostly functional,
though as I recall they were having issues with workers occasionally
crashing.
Before I started looking into things, a new version of zeek was deployed
(from an binary) and is mostly vanilla. We've included the bhr and myricom
plugin, but that's about it.
Zeek master and workers run bare metal on 3 pretty big Intel hosts (192GB
memory, 2x Xeon E5-2690 with 14cores/socket, Debian 9). The workers have
myricom interfaces. There's span ports at the edge that feed into Arista
switches that feed the Myricom interfaces in the workers.
We have a few issues:
1) If I try to start up the workers with any more than ~8 threads,
packet drop and memory usage goes through the roof in pretty short order.
If I try to pin them, the first "worker" cpu's get pegged pretty high and
the others stay more or less idle (though that could be due to the amount of
traffic the second worker interface is receiving).
2) If I try to start up "1" worker (per worker node), using the
"myricom::*" interface, the worker node goes unresponsive and needs to be
hardware bounced. (Driver issue?)
3) I can start workers nodes with multiple workers and ~5 threads each
(currently "unpinned"), but after a few days, Packet drop is still
excessive.
My current node.cfg is below [1]. Output from 'zeekctl netstats' is also
below [2]. It's been up since Friday ~2:00pm Eastern. Load average is
higher than I would think it should be (given how much cpu these workers
actually have, and how idle most of the cpu's actually are). Htop output
included [3].
I understand we should probably be pinning the worker threads, but the
output of 'lstopo-no-graphics --of txt' is terrible to try and trace with
56 threads available. Also, do I want to use the "P" or the "L" listings?
I can include that as a follow up if necessary.
Please help!
[1]
==================
[manager]
type=manager
host=THE MASTER
[logger]
type=logger
host= THE MASTER
[proxy-1]
type=proxy
host= THE MASTER
[worker-1]
type=worker
host=WORKER 1
lb_method=custom
lb_procs=5
interface=myricom::eth4
[worker-2]
type=worker
host=WORKER 2
lb_method=custom
lb_procs=5
interface=myricom::eth4
[worker-3]
type=worker
host=WORKER 1
lb_method=custom
lb_procs=5
interface=myricom::eth5
[worker-4]
type=worker
host=WORKER 2
lb_method=custom
lb_procs=5
interface=myricom::eth5
=================================================
[2]
================
bro@bro-master-1:~$ zeekctl netstats
Warning: ZeekControl plugin uses legacy BroControl API. Use
'import ZeekControl.plugin' instead of 'import BroControl.plugin'
worker-1-1: 1581949346.194441 recvd=2178149468 dropped=2260820124
link=15063051356
worker-1-2: 1581949346.194473 recvd=274557259 dropped=2260820124
link=13159459147
worker-1-3: 1581949346.168558 recvd=1888926901 dropped=2260820124
link=14773828789
worker-1-4: 1581949346.081130 recvd=2110377092 dropped=2260820124
link=14995278980
worker-1-5: 1581949346.234478 recvd=1032618510 dropped=2260820124
link=13917520398
worker-2-1: 1581949346.269794 recvd=1551167612 dropped=640636540
link=14436069500
worker-2-2: 1581949346.271224 recvd=2811566586 dropped=640636540
link=15696468474
worker-2-3: 1581949346.292474 recvd=3295536154 dropped=640636540
link=16180438042
worker-2-4: 1581949346.314556 recvd=2505663441 dropped=640636540
link=15390565329
worker-2-5: 1581949343.011855 recvd=3459004896 dropped=640636540
link=20638874080
worker-3-1: 1581949346.239424 recvd=938819819 dropped=0 link=938819819
worker-3-2: 1581949346.249540 recvd=890104345 dropped=0 link=890104345
worker-3-3: 1581949346.259501 recvd=894787204 dropped=0 link=894787204
worker-3-4: 1581949346.269501 recvd=895479546 dropped=0 link=895479546
worker-3-5: 1581949346.274490 recvd=878546610 dropped=0 link=878546610
worker-4-1: 1581949346.329587 recvd=892356780 dropped=0 link=892356780
worker-4-2: 1581949346.344510 recvd=922981664 dropped=0 link=922981664
worker-4-3: 1581949346.349568 recvd=855515132 dropped=0 link=855515132
worker-4-4: 1581949346.359652 recvd=931447757 dropped=0 link=931447757
worker-4-5: 1581949346.368349 recvd=876976485 dropped=0 link=876976485
===========================================================
[3]
===================
1 [|| 3.3%] 15 [ 0.0%] 29 [|| 6.1%] 43
[||||||91.6%]
2 [|| 7.9%] 16 [ 0.0%] 30 [||| 14.2%] 44 [
0.0%]
3 [| 3.3%] 17 [| 1.4%] 31 [|||| 20.2%] 45 [||
1.9%]
4 [|| 3.3%] 18 [ 0.0%] 32 [|| 4.7%] 46 [|
0.5%]
5 [|| 3.7%] 19 [||||||76.3%] 33 [|| 4.2%] 47 [||
9.1%]
6 [|| 5.2%] 20 [ 0.0%] 34 [||||||39.5%] 48 [||
3.3%]
7 [|| 2.8%] 21 [ 0.0%] 35 [|| 5.2%] 49 [
0.0%]
8 [|| 5.6%] 22 [|| 1.4%] 36 [|| 3.7%] 50 [||
3.3%]
9 [|| 6.0%] 23 [| 0.5%] 37 [||| 16.7%] 51 [
0.0%]
10 [|| 1.9%] 24 [| 0.5%] 38 [||||||56.5%] 52 [||
7.1%]
11 [|| 2.8%] 25 [||||||88.4%] 39 [|| 6.6%] 53 [
0.0%]
12 [||| 13.6%] 26 [|| 1.4%] 40 [||||| 30.7%] 54 [||
0.9%]
13 [||| 15.2%] 27 [ 0.0%] 41 [|| 3.3%] 55 [||
0.9%]
14 [|| 4.8%] 28 [ 0.0%] 42 [|| 8.1%] 56 [||
2.3%]
Mem[||||||||||||||||||||||119G/188G]
Swp[ 0K/191G]
Tasks: 58, 107 thr; 3 running
Load average: 6.79 6.10 5.83
Uptime: 2 days, 18:59:33
Hi all,
After re-installing my Zeek hosts to version 3.0.2 in my home lab, I haven't received any mail from cron task or any process/alert related to Zeek. But I see some emails queued in /var/zeek/spool/tmp directory like this:
-rw-r--r--. 1 zeek idps 296 Feb 27 07:30 mail.1493.tmp
With the following content:
From: admin.zeek(a)domain.org
Subject: [Zeek] cron: expire-logs failed
To: myadmin(a)otherdomain.org
User-Agent: ZeekControl 2.0.0
expire-logs failed
expire-logs: directory not found: /var/zeek/logs/stats
creating directory for stats file: /var/zeek/logs/stats
--
[Automatically generated.]
In /var/log/maillog, there is an error when Zeek tries to send any email:
Feb 26 09:20:07 aberdeen postfix/sendmail[21353]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 13:10:07 aberdeen postfix/sendmail[27852]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 13:20:08 aberdeen postfix/sendmail[27999]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 16:40:08 aberdeen postfix/sendmail[718]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 17:00:08 aberdeen postfix/sendmail[1529]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 17:30:07 aberdeen postfix/sendmail[1968]: fatal: zeek(994): No recipient addresses found in message header
Feb 26 17:50:07 aberdeen postfix/sendmail[2261]: fatal: zeek(994): No recipient addresses found in message header
Options MailTo = and MailFrom = contain the values indicated in the mail I have shown.
Is this a bug? In previous versions I didn't have this problem.
Regards,
C. L. Martinez
Dear Zeek Community,
I'm new to zeek but now I'm working on project and I need to solve problem with anomaly detection on Wi-Fi. Is there any possibility how to detect frames specific for 802.11 like EAPOL frame?
Thanks in advance,
Karel K.
Recently I got into Zeek and started to play around with BinPAC plugin
development. BinPAC allowed me to pretty easily write a protocol parser
for IKE messages. However, I stumbled upon a problem. As I already read
on the mailing list, BinPAC is aimed at parsing protocols which run on
top of UDP or TCP. I also read that to parse protocols on lower layers
(let's say the transport layer), BinPAC won't be able to help you
anymore. The solution that was proposed in a few messages that I read
was to modify the source code of Zeek to support layer 4 protocols other
than TCP, UDP and ICMP.
First and foremost; before posting this message, that's exactly what I
did. My approach was to look at the implementation of ICMP and UDP in
Zeek (which are also layer 4 protocols). Based on this I tried my best
at writing a protocol analyzer alongside these protocols. However, after
spending a good amount of hours trying to write a protocol parser for
ESP-messages (protocol number 50) I came to the conclusion that the code
had become quite messy. Most importantly I didn't get the ESP-parser to
work properly. Even if I would have got it working, the code wouldn't be
patch safe anymore from future versions of Zeek.
My issue is as follows; I only want to be able to detect that a protocol
number 50 packet has been seen with the parsing of the very first field.
Is the only way to get this working to give another shot at modifying
the source code or is there a more cleaner/patch friendly path to
travel? Even a gentle push in the right direction would very much be
appreciated.