I've found it convenient to use an undocumented feature of Sumstats:
changing the epoch. This comes particularly handy when creating statistics
for human consumption, as oftentimes it is useful to synchronize to a
logging interval. For example, if hourly stats are desired, it is useful
to have a shorter epoch for the original sumstats to align with an hour,
then to have subsequent sumstats trigger on the hour.
Researching into this, I realized that the epoch variable can be changed,
if the argument to *Sumstats::create* is a variable, rather than the usual
style of an anonymous argument. Then, in *epoch_result*, or
*epoch_finished*, the timeout for the next epoch can be recomputed on the
fly using *calc_next_rotate()*.
However, this fails to work as expected as the next sumstat is scheduled
prior to executing *epoch_result*, and *epoch_finished*. What does work is
the following hack:
1. Create the initial sumstat with a epoch that will synchronize to the
logging interval
2. Immediately change the epoch to the desired interval
Example:
*event bro_init()*
* {*
* # So network_time() will be initialized...*
* schedule 0 usec { setup_sumstat() };*
* }*
*event setup_sumstat()*
* {*
* ... blah ...*
* local mysumstat: SumStats::SumStat;*
* mysumstat = [*
* $name="mysumstat",*
* $epoch=calc_next_rotate(10 min) - network_time(),*
* etc...*
* ];*
* SumStats::create(mysumstat);*
* # Now SumStat has been created, and the initial epoch scheduled,
change epoch to regular interval for the future*
* mysumstat$epoch = 10 min;*
* }*
It would be convenient if the epoch could be changed in *epoch_result* or
*epoch_finished*, but some internals would require a bit of change - the
reschedule would need to take place after processing results, which could
throw the timing off a bit - on the other hand, unless one is interested in
exact statistics over a known time period (as I am), the small amount of
jitter probably wouldn't be noticeable or significant.
The above is horribly hackish, and a different approach for accomplishing
the goal would be to allow use scripts to schedule the end of the epoch:
1. Mark *epoch* as *&optional*.
2. Expose and document *SumStats::finish_epoch* as part of the public API
3. Make the minor changes to not schedule *SumStats::finish_epoch* if
*epoch* is undefined.
By not defining *epoch* a script would indicate that it will manage epoch
timing. The script would schedule the first epoch based on the logging
interval, and in the *epoch_finished* function schedule each successive
epoch to stay in sync with the logging interval.
Any comments, suggestions, etc. ????
Jim
That could get very messy in the real world. How about start of first gap, length of first gap, total number of gaps?
Sent with BlackBerry Work
(www.blackberry.com)
From: anthony kasza <anthony.kasza(a)gmail.com<mailto:anthony.kasza@gmail.com>>
Date: Wednesday, Apr 10, 2019, 12:18 AM
To: Jim Mellander <jmellander(a)lbl.gov<mailto:jmellander@lbl.gov>>
Cc: zeek-dev(a)zeek.org <zeek-dev(a)zeek.org<mailto:zeek-dev@zeek.org>>, Vern Paxson <vern(a)corelight.com<mailto:vern@corelight.com>>
Subject: [EXT] Re: [Zeek-Dev] connection $history - 'g' for gap
I like the idea of logging gap ranges for a connection. Could a vector be used to store gap start and gap stop offsets?
-AK
On Tue, Apr 9, 2019, 11:01 Jim Mellander <jmellander(a)lbl.gov<mailto:jmellander@lbl.gov>> wrote:
Thanks. I was thinking of something a bit different - the total amount of the content gap is useful, but in some cases it might be useful to know where the content gaps occurred, whether in the head of the connection, which likely is impactful for protocol analysis, or in a long tail, where it probably doesn't affect analysis.
Perhaps some tunable setting indicating that "I only care about content gaps in the first 10K (or whatever) of the connection" could address that...
On Tue, Apr 9, 2019 at 9:36 AM Justin Azoff <justin(a)corelight.com<mailto:justin@corelight.com>> wrote:
On Mon, Apr 8, 2019 at 8:13 PM Jim Mellander <jmellander(a)lbl.gov<mailto:jmellander@lbl.gov>> wrote:
It might be valuable to have some (optional) way of accessing the byte counts consisting the content gap(s). If the content gap is somewhere in a long tail, but DPD still fails, then the explanation could be something other than a content gap.
On the other hand, maybe you're just thinking about content gaps at the head of a connection before it has been fully analyzed.
This is the missed_bytes field:
missed_bytes: count &log &default = 0 &optional
Indicates the number of bytes missed in content gaps, which is representative of packet loss. A value other than zero will normally cause protocol analysis to fail but some analysis may have been completed prior to the packet loss.
--
Justin
_______________________________________________
zeek-dev mailing list
zeek-dev(a)zeek.org<mailto:zeek-dev@zeek.org>
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev
I'm finding it would be handy to be able to glance at a connection log line
and know that the analysis for the connection experienced a content gap.
For example, this can immediately explain why DPD failed to identify a
known server.
Proposal: add 'g'/'G' connection history values, scaled in the same
exponential way as for 'c', 't' and 'w'.
Any thoughts/objections before I go ahead and implement this?
Vern
Hello All,
Is there a way to add Bro server hostname field into all the Bro log types? We have 5 Bro servers capturing traffic on different network nodes, we are trying to add each server/sensor hostname into all the log types so analyst can identify where the logs are coming from.
v/r
Jawad Rajput