I have been thinking and trying different things but for now, it appears that if we are to share policies around, there is no easy way to be able to distribute input-files along with policy files.
Basically, right now I use
redef Scan::whitelist_ip_file = "/usr/local/bro/feeds/ip-whitelist.scan" ;
and then expect everyone to edit path as their setup demands it and place accompanying sample file in the directory or create one for themselves - this all introduces errors as well as slows down deployment.
Is there a way I can use relative paths instead of absolute paths for input-framework digestion. At present a new-heuristics dir can have __load__.bro with all policies but input-framework won't read files relative to that directory or where it is placed.
redef Scan::whitelist_ip_file = "../feeds/ip-whitelist.scan" ;
Something similar to __load__.bro model
Also, one question I have is should all input-files go to a 'standard' feeds/input dir in bro or be scattered around along with their accompanied bro policies (ie in individual directories )
Something to think about as with more and more reliance on input-framework i think there is a need for 'standardization' on where to put input-files and how to easily find and read them.
I tried doing that and then merging with an existing (initialized) bloomfilter on worker.
I see this error:
1493427133.170419 Reporter::INFO calling inside the m_w_add_bloom worker-1 -
1493427133.170419 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) -
1493427133.170419 Reporter::ERROR failed to merge Bloom filter (empty) -
1493427115.582247 Reporter::INFO calling inside the m_w_add_bloom worker-6 -
1493427115.582247 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) -
1493427115.582247 Reporter::ERROR failed to merge Bloom filter (empty) -
1493427116.358858 Reporter::INFO calling inside the m_w_add_bloom worker-20 -
1493427116.358858 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) -
1493427116.358858 Reporter::ERROR failed to merge Bloom filter (empty) -
1493427115.935649 Reporter::INFO calling inside the m_w_add_bloom worker-7 -
1493427115.935649 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) -
1493427115.935649 Reporter::ERROR failed to merge Bloom filter (empty) -
1493427115.686241 Reporter::INFO calling inside the m_w_add_bloom worker-16 -
1493427115.686241 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) -
1493427115.686241 Reporter::ERROR failed to merge Bloom filter (empty) -
Not sure if the error is because an opaque of bloomfilter cannot be sent over worker2manager_events and manager2worker_events or if I am doing something not quite right.
I am writing a TCP application analyzer and depend on packet order to build
a full PDU over many TCP packets. Occasionally I will receive a packet out
of order in my analyzer's DeliverStream function.
Is there a way to assure I am getting packets in order? Or, any advice on
debugging the reassembly?
I recently got a minimal CAF-based run loop for Bro working, did crude performance comparisons, and wanted to share.
The approach was to measure average time between calls of net_packet_dispatch() and also the average time it takes to analyze a packet. The former attempts to measure the overhead imposed by the loop implementation and the later just gives an idea of how significant a chunk of time that is in relation to Bro’s main workload. I found that the overhead of the loop can be ~5-10% of the packet processing time, so it does seem worthwhile to try and keep the run loop overhead low.
Initial testing of the CAF-based loop showed the overhead increased by ~1.8x, but there was still a major difference in the implementations: the standard Bro loop only invokes its IOSource polling mechanism (select) once every 25 cycles of the loop, while the CAF implementation’s polling mechanism (actor/thread scheduling + messaging + epoll) is used for every cycle/packet. As one would expect, by just trivially spinning the main process() function in a loop for 25 iterations, the overhead of the CAF-based loop comes back into line with the standard run loop.
To try and better measure the actual differences related to the polling mechanism implementation, I quickly hacked Bro’s standard runloop to select() on every packet instead of once every 25th and found that the overhead measures +/- 10% within the 1.8x overhead increase of the initial CAF-based loop. So is the cost of the extra system call for epoll/select per packet the main thing to avoid? Sort of. I again hacked Bro’s standard loop to be able to use either epoll or poll instead of select and found that those do better, with the overhead increase being about 1.3x (still doing one “poll” per packet) in relation to the standard run loop. Meaning there is some measurable trend in polling mechanism performance (for sparse # of FDs/sources): poll comes in first, epoll second, with CAF and select about tied for third.
(1) Regardless of runloop implementation or polling mechanism choices, performing the polling operation once per packet should probably be avoided. In concept, it’s an easy way to get a 2-5% speedup in relation to total packet processing time.
(2) Related to (1), but not in the sense of performance, is that even w/ a CAF-based loop it still seems somewhat difficult to reason about the reality of how IOSources are prioritized. In the standard loop, the priority of an IOSource is a combination of its “idle” state, the polling frequency, and a timestamp, which it often chooses arbitrarily as the “time of last packet”, just so that it gets processed with higher priority than subsequent packets. Maybe the topic of making IOSource prioritization more explicit/well-defined could be another thread of discussion, but my initial thought is that the whole IOSource abstraction may be over-generalized and maybe not even needed.
(3) The performance overhead of a CAF-based loop doesn’t seem like a showstopper for proceeding with it as a choice for replacing the current loop. It’s not significantly worse than the current loop (provided we still throttle the polling ratio when packet sources are saturated), and even using the most minimal loop implementation of just poll() would only be about a 1% speedup in relation to the total packet processing workload.
Just raw data below, for those interested:
I tested against the pcaps from http://tcpreplay.appneta.com/wiki/captures.html
(I was initially going to use tcpreplay to test performance against a live interface, but decided reading from a file is easier and just as good for what I wanted to measure).
Numbers are measured in “ticks”, which are equivalent to nanoseconds on the test system.
Bro and CAF are both compiled w/ optimizations.
bigFlows.pcap, 1 “poll" per packet
('avg overhead', 1018.8868239999998)
('avg process', 11664.4968147)
('avg overhead', 1114.2168096999999)
('avg process', 11680.6078816)
('avg overhead', 1515.9933343999996)
('avg process', 11914.897109200003)
('avg overhead', 1792.8142910999995)
('avg process', 11863.308550400001)
bigFlows.pcap, Polling Throttled to 1 per 25 packets
('avg overhead', 772.6118347999999)
('avg process', 11504.2397625)
('avg overhead', 814.4771509)
('avg process', 11547.058394900001)
('avg overhead', 847.6571822)
('avg process', 11681.377972700002)
('avg overhead', 855.2147494000001)
('avg process', 11585.1111236)
smallFlows.pcap, 1 “poll" per packet
('avg overhead', 1403.8950280800004)
('avg process', 22202.960570839998)
('avg overhead', 1470.0554376)
('avg process', 22210.3240474)
('avg overhead', 2305.6278429200006)
('avg process', 22549.29251384)
('avg overhead', 2405.1401093399995)
('avg process', 23401.66596454)
smallFlows.pcap, Polling Throttled to 1 per 25 packets
('avg overhead', 1156.0900352)
('avg process', 22113.8645395)
('avg overhead', 1192.37176)
('avg process', 22000.2246757)
('avg overhead', 1269.0761219)
('avg process', 22017.891367999997)
('avg overhead', 1441.6064868)
('avg process', 22658.534969599998)
I'm casting around for thoughts on adding a mechanism to add the ConfigurePackaging cmake packaging mechanism to plugins without having to replicate the cmake script in the main cmake repository or making near-clones of it. Is there some way we could use that script from the main Bro repository without needing to include it with the plugin?
I would like to ask how to enable compilation and installation of bro's
plugin at the same time as the rest of bro is compiling/installing. I would
like to enable redis plugin (but only this one) - so I copied over
BroPluginStatic.cmake into its cmake dir, but I am not sure how to call it
from main CMake.
For example - when I included into CMakeLists.txt (toplevel)
"CheckOptionalBuildSources(aux/plugins/redis Redis true) "
it was not finding eg plugin/Plugin.h or logging/WriterBacked.h
I can somehow hack it in - in the way as dynamic cmake suggest, but I would
like to know if there is neater way how to enable it.
Thanks for any help on this,