I have been thinking and trying different things but for now, it appears that if we are to share policies around, there is no easy way to be able to distribute input-files along with policy files.
Basically, right now I use
redef Scan::whitelist_ip_file = "/usr/local/bro/feeds/ip-whitelist.scan" ;
and then expect everyone to edit path as their setup demands it and place accompanying sample file in the directory or create one for themselves - this all introduces errors as well as slows down deployment.
Is there a way I can use relative paths instead of absolute paths for input-framework digestion. At present a new-heuristics dir can have __load__.bro with all policies but input-framework won't read files relative to that directory or where it is placed.
redef Scan::whitelist_ip_file = "../feeds/ip-whitelist.scan" ;
Something similar to __load__.bro model
Also, one question I have is should all input-files go to a 'standard' feeds/input dir in bro or be scattered around along with their accompanied bro policies (ie in individual directories )
Something to think about as with more and more reliance on input-framework i think there is a need for 'standardization' on where to put input-files and how to easily find and read them.
Aashish
The package manager client is at a point now where I think it would be usable. Documentation is here:
https://bro.github.io/package-manager/
There is a branch in the ‘bro’ repo called ‘package-manager’ that simply changes CMake scripts to install ‘bro-pkg’ along with bro. Here’s an example usage/session:
$ git clone --recursive --branch=package-manager git://bro.org/bro
...
$ cd bro && ./configure && make install
...
$ /usr/local/bro/bin/bro-pkg list all
default/jsiwek/bro-test-package
$ /usr/local/bro/bin/bro-pkg install bro-test-package
installed "bro-test-package"
loaded "bro-test-package"
$ /usr/local/bro/bin/bro packages
loaded bro-test-package plugin
loaded bro-test-package scripts
$ /usr/local/bro/bin/broctl
Test package: initialized
…
That test package shows that bro-pkg was able to install a package containing Bro scripts, a Bro plugin, and a BroControl plugin and everything should “just work” without needing any configuration.
Roadmap/TODO/Questions:
* Add a way for package’s to define “discoverability metadata”.
E.g. following the original plan for this would involve putting something like a “tags” field in each package’s pkg.meta file, but the problem with this is the client would need to either download every package to be able to search this data or have a third-party periodically aggregate it.
My current idea is that instead of putting this type of data inside the package’s metadata, the user puts it in the package source’s metadata. They do this on first registration and may update it whenever. That way, bro-pkg always has access to latest discoverability metadata, no need for a separate aggregation process. It’s also something that will rarely change, so not a problem for that data to live in a repo not owned by the package author and not much increased burden for Bro Team to accept pull requests to update this data. Thoughts?
* Automatic inter-package dependency analysis
Simply a TODO. I put it at lower priority since I don’t think it will be common right off the bat to have complex package dependencies and users can always manually resolve dependencies at the moment.
* Is it acceptable to depend on GitPython and semantic_version python packages?
Both are replaceable implementation details, just didn’t want to write something myself if not necessary and in interest of time.
* Documentation is hosted on GitHub at the moment, move to bro.org?
Mostly just on GitHub now to be able to show something without having to touch any of the master bro/www doc generation processes, but maybe it’s a nice thing to start keeping docs more compartmentalized? The current doc/www setup feels like it’s getting rather large/monolithic and maybe that contributes to the difficulty of approaching/understanding it when there’s breakages. Just an idea.
* Thoughts on when to merge ‘package-manager’ branch in ‘bro’ ?
IMO, it can be done now or soon after I address responses/feedback to this email.
- Jon
I took a closer look at scan-NG and at the scan.bro that shipped with 1.5 to understand how the detection could be better than what we have now. 1.5 wasn't fundamentally better, but compared to what we are doing now it has an unfair advantage :-)
I found that it used tables like this:
global distinct_ports: table[addr] of set[port]
&read_expire = 15 mins &expire_func=port_summary &redef;
Not only is it using a default timeout of 15 minutes vs 5 minutes, it is using read_expire. This means that an attacker can send one packet every 14 minutes 25 times and still be tracked.
Meaning scan.bro as shipped with 1.5 can pick up slow scans over as much as a 6 hour period.
The sumstats based scan.bro can only detect scans that fit in the fixed time window (it is effectively using create_expire, but as Aashish points out, limited even further since the 'creation time' is a fixed interval regardless of when the attacker is first seen)
The tracking that 1.5 scan.bro has isn't doing anything inherently better than what we have now, it's just doing it over a much longer period of time. The actual detection it uses has the same limitations the current sumstats based scan.bro has: it does not detect fully randomized port scans. It would benefit from the same "unification" changes.
Since that fixing sumstats and adding new functionality to solve this problem in a generic way is a huge undertaking, I tried instead to just have scan.bro do everything itself. We may not be able to easily fix sumstats, but I think we can easily fix scan.bro by making it not use sumstats.
To see if this was even viable or a waste of time I wrote the script: it works. It sends new scan attempts to the manager and stores them in a similar '&read_expire = 15 mins' table. This should detect everything that the 1.5 based version did, plus all the fully random scans that were previously missed. And with the simpler unified data structure and capped set sizes it will use almost zero resources.
Attached is the code I just threw on our dev cluster. It's the implementation of "What is the absolute simplest thing that could possibly work". It uses 1 event and 2 tables, one for the workers and one for the manager.
What does this look like from a CPU standpoint?
[cid:9351b696-d4ba-4a9c-820f-8c156f24e3f5@mx.uillinois.edu]
This graph shows a number of experiments.
* The first block around 70% is the unified sumstats based scan.bro plus hacked up sumstats/cluster.bro to do data transfer more efficiently
* The next block at 40% was the unified scan.bro hacked up to make the manager do all the sumstats (worked, but had issues)
* The small spike upwards back to 70% was a return to the unified scan.bro that is in git with the threshold changed back to 25
* The spike up to 170-200% was a return to stock sumstats/cluster.bro. This is what 2.5 would be with sumstats based scan.bro
* The drop back down to 40% is the switch to the attached scan.bro that does not use sumstats at all.
The 'duration' is TODO in the notices, but otherwise everything works. I want to just get the start time directly from the time information in the table.. I'm not sure if bro exposes it or even stores it in a usable way. If there's no way to get it out of the table I just need to track when an attacker is first seen separately, but that is easy enough to do.
--
- Justin Azoff
HI Daniel,
Are there any specific node.cfg settings or broctl.cfg settings to run the Logging node ? Could you please point me to the right locations.
Thanks,
Aashish
I'm in the process of documenting Broker with Sphinx. With minimal
effort, I put up a scaffold that looks like this:
http://bro.github.io/broker/
It's the bootstrap theme for sphinx, as an alternative to the classic
read-the-docs theme. I've hacked the sidebar so that it shows the table
of contents.
The nice thing about this setup is that it doesn't require any
server-side support. I just type `make doc` locally and can open the
HTML pages. I pushed the above to broker's gh-pages branch so that you
can view it under the above github.io URL.
Search is also implemented via JS and works great.
Sphinx also has a plugin to also generate a PDF Version of the manual,
which I've put here: http://docdro.id/rHNvn1X.
Don't look too much at the content, I'm just getting started. But the
whole setup looks really simple and could be a good starting point for
the next Bro documentation overhaul.
Matthias
TL;DR:
- Does anyone use Broker's RocksDB backend?
- Brief overview of the revamped data store frontend API
I've been working on the Broker data store API a bit, trying to come
with the smallest denominator possible for an initial release. So far I
have ported the in-memory SQLite backend over. This made me wonder: did
anyone ever use (or wanted to use) the RocksDB in production? I wonder
if we can keep it out for Bro 2.5.
Regarding the API, here's a snippet that illustrates the user-facing
parts:
// Setup an endpoint.
context ctx;
auto ep = ctx.spawn<blocking>();
// Attach a master datastore with backend. The semantics of
// "attaching" are open-or-create: if a master exists under the
// given name, use it, otherwise create it.
backend_options opts;
opts["path"] = "/tmp/test.db";
auto ds = ep.attach<master, sqlite>("foo", std::move(opts));
if (!ds)
std::terminate();
// Perform some asynchronous operations.
ds->put("foo", 4.2);
ds->put(42, set{"x", "y", "z"});
ds->remove(42, "z"); // data at key 42 is now {"x", "y"}
ds->increment("foo", 1.7); // data at key "foo" is now 5.7
// Add a value that expires after 10 seconds.
ds->put("bar", 4.2, time::now() + std::chrono::seconds(10));
// Get data in a blocking fashion.
auto x = ds->get<blocking>("foo"); // Equivalent to: get("foo"), the
// blocking API is the default.
// Get data in a non-blocking fashion. The function then() returns
// immediately and one MUST NOT capture any variables on the stack by
// reference in the callback. The runtime invokes the callback as soon
// as the result has arrived.
ds->get<nonblocking>("foo").then(
[=](const data& d) {
cout << "data at key 'foo': " << d << endl;
},
[=](const error& e) {
if (e == ec::no_such_key)
cout << "no such key: foo" << endl;
}
});
Here's another setup with two peering endpoints, one having a master and
one a clone (directly taken from the unit tests). This illustrates how
data stores and peering go hand in hand.
context ctx;
auto ep0 = ctx.spawn<blocking>();
auto ep1 = ctx.spawn<blocking>();
ep0.peer(ep1);
auto m = ep0.attach<master, memory>("flaka");
auto c = ep1.attach<clone>("flaka");
REQUIRE(m);
REQUIRE(c);
c->put("foo", 4.2);
std::this_thread::sleep_for(propagation_delay); // master -> clone
auto v = c->get("foo");
REQUIRE(v);
CHECK_EQUAL(v, data{4.2});
c->decrement("foo", 0.2);
std::this_thread::sleep_for(propagation_delay); // master -> clone
v = c->get("foo");
REQUIRE(v);
CHECK_EQUAL(v, data{4.0});
I think this API covers the most common use cases. It's always easy to
add functionality later, so my goal is to find the smallest common
denominator.
Matthias
Hi,
I'm having problems with IP-in-IP tunneled traffic which contains an
ethernet frame check sequence (FCS).
1) Bro seems to attribute the FCS to the length of the outer IP packet
and then complains that the inner IP packet is too small compared to the
capture length (in weird.log: "inner_IP_payload_length_mismatch")
Then I thought it would be ok to simply drop the corresponding check in
Sessions.c: ParseIPPacket() because too much content shouldn't "hurt".
- if ( (uint32)caplen != inner->TotalLen() )
- return (uint32)caplen < inner->TotalLen() ? -1 : 1;
+ if ( (uint32)caplen < inner->TotalLen() )
+ return -1;
Would that be ok in your opinion? If not, what would be a better way to
deal with this?
2) With the above patch applied, bro correctly sees the inner traffic,
but from time to time it segfaults (every other day roughtly). Until now
i figured out the following information, but cannot really see what's
going wrong:
a) bro always crashes at a tunneled TCP packet with active reset flag
b) I see very few such packets (it might be that the crashing one
is the only within quite some time before the crash: I don't have all
traffic available)
c) I cannot reproduce the problem by simply starting bro on a pcap
file with the offending packet (and ~100MB traffic before the crash)
(even valgrind doesn't report anything useful)
From the stacktrace of the core file (cf. below) it looks as if
PacketWithRst() somehow triggered the destructor of (my own) SIP plugin.
However, I have no idea how that could happen.
Could you help me with this problem?
Thanks,
Dirk
#0 std::_List_base<plugin::BifItem, std::allocator<plugin::BifItem>
>::_M_clear (this=this@entry=0x2f373b0) at
/usr/include/c++/4.7/bits/list.tcc:74
#1 0x00000000006a0ade in ~_List_base (this=0x2f373b0,
__in_chrg=<optimized out>) at /usr/include/c++/4.7/bits/stl_list.h:379
#2 ~list (this=0x2f373b0, __in_chrg=<optimized out>) at
/usr/include/c++/4.7/bits/stl_list.h:436
#3 plugin::Plugin::~Plugin (this=0x2f37360, __in_chrg=<optimized out>)
at bro/src/plugin/Plugin.cc:136
#4 0x00007f1fa7d2ef77 in ~Plugin (this=0x2f37360, __in_chrg=<optimized
out>) at sip/src/Plugin.cc:8
#5 plugin::Consistec_SIP::Plugin::~Plugin (this=0x2f37360,
__in_chrg=<optimized out>) at sip/src/Plugin.cc:8
#6 0x000000000079d4bd in PacketWithRST (this=0x3482680) at
bro/src/analyzer/protocol/tcp/TCP.cc:1810
#7 analyzer::tcp::TCP_Analyzer::DeliverPacket (this=0x3482680, len=0,
data=0x7f1fa16f9aca <Address 0x7f1fa16f9aca out of bounds>,
is_orig=false, seq=<optimized out>, ip=0x34e05c0, caplen=0)
at bro/src/analyzer/protocol/tcp/TCP.cc:1280
#8 0x0000000000807a6a in analyzer::Analyzer::NextPacket (this=0x3482680,
len=20, data=<optimized out>, is_orig=<optimized out>, seq=<optimized
out>, ip=<optimized out>, caplen=20)
at bro/src/analyzer/Analyzer.cc:222
#9 0x000000000055ecee in Connection::NextPacket (this=0x2f48c00,
t=<optimized out>, is_orig=<optimized out>, ip=<optimized out>,
len=<optimized out>, caplen=<optimized out>, data=<optimized out>,
record_packet=@0x7ffc33d50898: 1,
record_content=@0x7ffc33d5089c: 1, hdr=0x7ffc33d50b10,
pkt=0x7f1fa16f9aa2 <Address 0x7f1fa16f9aa2 out of bounds>, hdr_size=0)
at bro/src/Conn.cc:260
#10 0x00000000005f819a in NetSessions::DoNextPacket
(this=this@entry=0xf25000, t=1468916092.7505391, t@entry=<error reading
variable: Could not find type for DW_OP_GNU_const_type>,
hdr=hdr@entry=0x7ffc33d50b10,
ip_hdr=ip_hdr@entry=0x34e05c0, pkt=pkt@entry=0x7f1fa16f9aa2 <Address
0x7f1fa16f9aa2 out of bounds>, hdr_size=hdr_size@entry=0,
encapsulation=0x0, encapsulation@entry=0x34b3138)
at bro/src/Sessions.cc:757
#11 0x00000000005f91a4 in NetSessions::DoNextInnerPacket (this=0xf25000,
t=1468916092.7505391, hdr=<optimized out>, inner=0x34e05c0,
prev=<optimized out>, ec=...)
at bro/src/Sessions.cc:805
#12 0x00000000005f88ca in NetSessions::DoNextPacket
(this=this@entry=0xf25000, t=1468916092.7505391, t@entry=<error reading
variable: Could not find type for DW_OP_GNU_const_type>,
hdr=hdr@entry=0xf762a0, ip_hdr=<optimized out>,
ip_hdr@entry=0x7ffc33d50e60, pkt=pkt@entry=0x7f1fa16f9a80 <Address
0x7f1fa16f9a80 out of bounds>, hdr_size=hdr_size@entry=14,
encapsulation=encapsulation@entry=0x0)
at bro/src/Sessions.cc:665
#13 0x00000000005f96d6 in NetSessions::NextPacket (this=0xf25000,
t=1468916092.7505391, hdr=0xf762a0, pkt=0x7f1fa16f9a80 <Address
0x7f1fa16f9a80 out of bounds>, hdr_size=14)
at bro/src/Sessions.cc:231
#14 0x00000000005c8048 in net_packet_dispatch (t=1468916092.7505391,
hdr=0xf762a0, pkt=0x7f1fa16f9a80 <Address 0x7f1fa16f9a80 out of bounds>,
hdr_size=14, src_ps=0xf76160)
at bro/src/Net.cc:277
--
Dr.-Ing. Dirk Leinenbach - Leitung Softwareentwicklung
consistec Engineering & Consulting GmbH
------------------------------------------------------------------
Europaallee 5 Fon: +49 (0)681 / 959044-0
D-66113 Saarbrücken Fax: +49 (0)681 / 959044-11
http://www.consistec.de e-mail: dirk.leinenbach(a)consistec.de
Registergericht: Amtsgericht Saarbrücken
Registerblatt: HRB12003
Geschäftsführer: Dr. Thomas Sinnwell, Volker Leiendecker, Stefan Sinnwell
As part of the sumstats things I've been looking into I tried refactoring scan.bro to put less load on sumstats.
The refactored script is at https://gist.github.com/JustinAzoff/fe68223da6f81319d3389c605b8dfb99
It is.. amazing! The unified code is simpler, uses less memory, puts less load on sumstats, generates nicer notice messages, and detects attackers scanning across multiple victims AND ports.
Details:
The current scan.bro maintains two sumstats streams keyed by attacked ip and port.
When attacker attempts to connect to victim on port 22, sumstats effectively creates:
an [attacker 22] key with data containing [victim]
an [attacker victim] key with data containing [22]
It does this so it can figure out if an attacker is scanning lots victims on one port, or lots of ports on one victim.
When an attacker does the equivalent of 'nmap -p 22 your/16', sumstats ends up with 65536 extra [attacker victim] keys. This kills the sumstats :-)
my refactored version simply creates:
an [attacker] key containing [victim/22, othervictim/22, ...]
This means that no matter how many hosts or ports attacker scans, there will only ever be one key.
Additionally, since the reducer is configured as
... $apply=set(SumStats::UNIQUE), $unique_max=double_to_count(scan_threshold+2)
the data the key references can not grow unbounded, so a full /16 port scan can only create 1 key and scan_threshold+2 values per worker process. This is a huge reduction in the amount of data stored.
The downside of this was that the notices were effectively "attacker scanned... something!", but I realized I could analyze all the victim/port strings in unique_vals and figure out what was scanned. With that in place, bro now generates notices like this:
Scan::Scan 198.20.69.98 made 102 failed connections on 102 hosts and 77 ports in 4m59s
Scan::Scan 198.20.99.130 made 102 failed connections on 102 hosts and 78 ports in 4m59s
Scan::Scan 36.101.163.186 made 102 failed connections on port 23 in 0m14s
Scan::Scan 91.212.44.254 made 102 failed connections on ports 135, 445 in 4m59s
Scan::Scan 207.244.70.169 made 103 failed connections on port 389 in 5m0s
Scan::Scan 222.124.28.164 made 102 failed connections on port 23 in 0m14s
Scan::Scan 91.236.75.4 made 102 failed connections on ports 8080, 3128 in 4m58s
Scan::Scan 177.18.254.165 made 102 failed connections on port 23 in 0m38s
Scan::Scan 14.169.221.169 made 102 failed connections on port 23 in 0m36s
Scan::Scan 192.99.58.163 made 100 failed connections on 100 hosts and 100 ports in 4m55s
The only downside is that 192.99.58.163 appears to be backscatter (conn_state and history are OTH H), but that's an issue inside is_failed_conn somewhere which is unchanged from scan.bro
It should be a drop in replacement for scan.bro other than that any notice policies or scan policy hooks will need to be changed.
It could possibly be changed to still raise Address_Scan/Port_Scan notices at least in some cases. I don't know how people may be using those notices differently - we handle them the same, so the change to a unified notice type is a non-issue for us.
--
- Justin Azoff
What does everyone think of making some change for 2.5 so that certificates from SSL aren't logged in the files.log by default? I've heard grumblings about the number of certs that show up from quite a few people and personally noticed that the number of certificates will dwarf all other files types pretty badly which makes the output look a bit weird since very few people are ever interested in looking at those files in the files.log.
Certificates would still be passed through the files framework, so it's not an architectural change, it would all be related to just not doing the log. There is one minor issue that this brings up though in that right now certificate hashes are all given in the files.log. We could move them elsewhere like x509.log or ssl.log, but I'm curious if anyone had thoughts on what they think would be most useful?
.Seth
--
Seth Hall
International Computer Science Institute
(Bro) because everyone has a network
http://www.bro.org/
*tl;dr*
I continued my work on the bro deep cluster in the last months and just
want to share my outcome so far and future plans with you:
1. I want to get your opinion on my broker enhancement that allows to
route messages in between not directly connected peers (given that there
is a path between them).
2. I want to share some preliminary thoughts on how to enhance the
sumstats framework to a deep cluster setting, so that it is possible to
create (multiple) subgroups of bros (dynamically) in the deep cluster
that can share and aggregate information.
Criticism, opinions, and further suggestions are very welcome!
Best,
Mathias
--------------------
Summary deep cluster
--------------------
A deep cluster provides one administrative interface for several
conventional clusters and/or standalone Bro nodes at once. A deep
cluster eases the monitoring of several links at once and can
interconnect different Bros and different Bro clusters in different
places. Due to its better scalability it can bring monitoring from the
edge of the monitored network into its depth (-> deep cluster).
Moreover, it enables and facilitates information exchange in between
different Bros and different Bro clusters. In essence, a deep cluster
can be seen as a P2P overlay network of Bro nodes, so that all Bros can
communicate with each other.
In summary, my outcome so far towards building such a deep-cluster is
the following
* a modified multi-hop broker that allows to forward content in between
peers that are only connected indirectly with each other.
* some bro modifications in code and foremost in bro script land
* an enhanced broctl that operates as daemon and that can initiate
connections to other such daemons (all communication based on broker),
including a json-based configuration of nodes and connections
A summary of all changes can be found on the bro website (including
instruction on how to run the current version of the deep cluster):
https://www.bro.org/development/projects/deep-cluster.html
Ongoing work is currently the adaption of my multi-hop broker to the
currently revised broker and the adaption of sumstats to work in a deep
cluster setting. Both is described in detail in the long (sorry)
remainder of this Email.
----------------------------------
Multi-hop broker
----------------------------------
I enhanced broker to support publish-subscribe-based communication
between nodes that are not connected directly, but that are connected by
a path of other nodes. The working title of this is multi-hop broker. As
broker gets a significant revision soon, I want to share my design for
multi-hop broker with you, so that I can include your comments when
adding my multi-hop functionality to the revised broker.
A specific challenge here is the routing of publications to all
interested subscribers. For that, routing tables need to be established
among all nodes in a deep cluster. These routing tables are established
by flooding subscriptions in the deep cluster. Afterwards, publications
can be routed on the shortest paths to all interested subscribers
In that context, two issues arise, namely loop detection and avoidance
as well as limiting the scope of subscriptions for rudimentary access
control. Both issues are described in detail in the following.
*** Loop detection and avoidance
There is no unique identifier (like an IP address) anymore on which
basis you can forward information. There might be only one recipient for
a publish operation, but it can be also many of them. This can result in
routing loops, so that messages are forwarded endlessly in the broker
topology that is a result of the peerings between broker endpoints. Such
loops has to be avoided as it would falsify results, e.g., results
stored in datastores.
There is basically two options here:
1. Loop avoidance: During the set up phase of the deep cluster it needs
to be ensured that the topology does not contain loops.
2. Loop detection: Detect loops and drop duplicate packets. This
requires either to store each forwarded message locally to detect
duplicates or, more light-weight, to attach a ttl value to every broker
message. When the ttl turns 0, the message gets deleted. However, the
ttl does not prevent duplicates completely.
For multi-hop broker we chose a hybrid approach between the two options.
Loops in the broker topology need to be avoided during the initial
configuration of the deep cluster. A ttl that is attached to every
broker message will allow to detect routing loops and will result in an
error output. The ttl value can be configured, but its default value is 32.
However, there are certain configurations that require a more dense
interconnection of nodes. In conventional bro clusters all workers are
connected to manager and datanode, while the manager is also connected
to the datanode. Obviously this already represents a loop.
To avoid such routing loops we introduced an additional endpoint flag
``AUTO_ROUTING``. It indicates if the respective endpoint is allowed to
route message topics on behalf of other nodes.
Multi-hop topics are only stored locally and propagated if this flag is
set. If an auto-routing endpoint is coupled with an ordinary endpoint,
only the auto-routing endpoint will forward messages on behalf of the
other endpoint. As a result, not every node will forward subscriptions
received by others, so that loops can be prevented even though the
interconnection of nodes in the deep cluster results in topological loops.
*** Rudimentary Access Control
To prevent that subscriptions are disseminated in the whole deep cluster
single (=local) and multi-hop (=global) subscriptions are introduced.
Single-hop subscriptions are shared among the direct neighbors only and
thus make them only available within the one-hop neighborhood. In
contrast, multi-hop subscriptions get flooded in the whole deep
clusters. The differentiation in subscriptions with local
(``LOCAL_SCOPE``) and global scope (``GLOBAL_SCOPE``) is intended to
provide better efficiency and is configured as additional parameter when
creating a broker ``message_queue``. The default setting is always
``LOCAL_SCOPE``.
---------------------------------
Deep Sumstats
---------------------------------
The intention is to extend sumstats to be used within a deep cluster to
aggregate results in large-scale, but also to form sumstats groups on
the fly, e.g., as a result of detected events.
In the original sumstats only directly connected nodes in a
cluster-setup exchanged messages. By using multi-hop broker, we can
extend this to the complete deep cluster. We can form small groups of
nodes that are not directly connected to each other, but that rather are
connected indirectly by their subscriptions to a group id (e.g.,
"/bro/sumstats/port-scan-detected").
To adapt sumstats to the deep cluster two basic approaches are feasible:
1. Sumstats groups: Instead of a cluster we apply sumstats on a group of
nodes in the deep cluster. This means that we keep the basic structure
and functioning of the current sumstats. We only replace direct links by
multi-hop links via multi-hop broker. However, we need a coordinator per
group (in original sumstats the manager took over this task). This
manager will initiate queries and will retrieve all results via the
routing mechanisms of multi-hop broker. There will be no processing or
aggregation of information directly in the deep cluster. Only nodes in
the group and foremost the manager will be able to process and aggregate
information. The deep cluster will only provide a routing service
between all members of the group.
2. Sumstats and deep cluster become one: We integrate the data
forwarding and the data storage with each other. The deep cluster is
used to aggregate and process results in a completely distributed
manner, while forwarding data to its destination. This means that all
members of a sumstats group get interconnected by the deep cluster (and
thus multi-hop broker) as in option 1, but now we have additional
processing and aggregation of information while it is forwarded towards
the manager by nodes of the deep cluster that are not part of the
sumstats group. That is definitely the most challenging option, but in
the long-term probably the most valuable one.
I am currently working on option 1 as it is the straightforward option
and as it is also a necessary intermediate step to get to option 2. I
would be especially grateful for additional input / alternate views here.