Good morning everyone.
I'm researching compression of Zeek data. I'm currently dumping Zeek data
into Parquet files, and one of the most challenging fields to compress is
uid because of its high entropy.
I'm wondering if there's any interest in changing the format of the uid to
something like ULID <https://github.com/ulid/spec>, of which there is a C++
implementation <https://github.com/suyash/ulid>already.
A ULID-based uid implementation would:
- allow uids to be sorted, which isn't helpful in-and-of-itself, but
very helpful for compression
- still URL-safe
- always 26 characters, for simpler storage
- case-insensitive
Looking through the code (UID.h
<https://github.com/bro/bro/blob/master/src/UID.h> and UID.cc
<https://github.com/bro/bro/blob/master/src/UID.cc>) and its usages, it
doesn't look technically difficult but I'm sure I'm missing some reasons.
For example, I noticed that prefixes such as the letter 'C' are used to
denote kinds of connections. Perhaps that data can be extracted to another
field instead?
Anyways, looking for thoughts, comments, suggestions, and anything else.
Thank you!
--
Karl
Master has code for setting up the cluster framework with time machine
nodes, and is_external_connection is a BIF that determines if a connection
has been received from an external source, but in Broker, I don't see how I
would send a packet into the Zeek packet processing system.
Does such functionality exist? Or was it planned to be added later but
still needs to be implemented?
Thanks,
--Vlad
Just wanted to point out that I was surprised this morning when I
recalled for the first time in about 10 years that the Zeek parser can't
handle multiline strings...
event zeek_init()
{
print "Hello,
World!";
}
That code doesn't work. :)
.Seth
--
Seth Hall * Corelight, Inc * www.corelight.com