On Sat, Jan 8, 2022 at 11:27 AM <ravi48(a)purdue.edu> wrote:
We've set up a Zeek cluster (version 4.1.1) with 8 worker nodes and a manager node
(which is also the logger and the proxy). All nodes are on the same physical rack and
configured to be on the same subnet. We have an issue where the zeek cron job
intermittently reports that one (or a few) hosts are down. Within 5 minutes when the cron
job runs again, we get a mail saying that the hosts are back up. There doesn't seem
to be any notable reasons for this behavior. We've checked all settings from the
firewall rules to increasing the connection timeout. The CPU and memory usage seems fine
too. Whenever 'zeekctl status' is run manually, the output shows all nodes to be
working and the logs are indeed being generated.
The exact same hardware (and network architecture) had been running Bro (version 2.5.4)
for 2+ years without any issues. While we used to see such alert emails once a month, we
now see them as frequently as 5 times a day. It would be great if someone can help us
diagnose this issue.
Thanks and Regards
Not sure what is wrong, but first step you could add debug=1 to your
zeekctl.cfg. That will have zeekctl create a debug.log in the spool
directory ( I believe, it's been a while ). The information in that
log may shed some light on why it is failing.
Justin Azoff | Sr. Staff Engineer