#8: Filesize issues
-------------------------------------------------+-------------------------
Reporter: gregor (original via mail from | Owner: somebody
Martin Holste) | Status: new
Type: defect | Milestone:
Priority: major | Version:
Component: component1 |
Keywords: |
-------------------------------------------------+-------------------------
{{{
#!rst
Cut and paste from several emails describing the problem:
* With filesize set at exactly 280g (279g does not produce the problem)
tm will create one disk fifo file per packet in the workdir for each
evicted packet with a disk setting of 1000g. I am only using one
default class for "all."
* That sounds like something is wrapping and going negative at
the 2^38 barrier.
* Further testing shows that filesize > 2000m (including 2g for
some reason) leads to tm not rolling the file ever.
}}}
--
Ticket URL: <http://tracker.icir.org/time-machine/ticket/8>
The Time Machine <http://tracker.icir.org/time-machine>
High-volume network traffic stream recorder.
Dear TimeMachine users,
in following up with the infrastructure changes for Bro the Time Machine
also has a new website and repository now:
You can find the TimeMachine's website at:
http://tracker.bro-ids.org/time-machine
The code respository is now using git and can be found at
git://git.bro-ids.org/time-machine
--
Gregor Maier
<gregor(a)icir.org> <gregor(a)icsi.berkeley.edu>
Int. Computer Science Institute (ICSI)
1947 Center St., Ste. 600
Berkeley, CA 94704, USA
http://www.icir.org/gregor/
Repository : ssh://git@bro-ids.icir.org/time-machine
On branch : master
>---------------------------------------------------------------
commit 12029bb296c62c3595a878c653b358b4b6140bb5
Author: Gregor Maier <gregor(a)icir.org>
Date: Thu Jul 21 15:13:56 2011 -0700
Updating TODO list.
Mostly adding ideas that have been floating around for while (but also
add some newer ideas).
>---------------------------------------------------------------
TODO | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 96 insertions(+), 3 deletions(-)
diff --git a/TODO b/TODO
index 8e10af4..50f78f8 100644
--- a/TODO
+++ b/TODO
@@ -1,6 +1,101 @@
TODOs
-* ORDER AND CLEANUP THIS LIST ;-)
+A) The smaller items:
+=====================
+
+* Move build system to cmake. Can probably borrow quite some chunks from
+ Bro's new build system.
+
+* TM restart. This is probably the most pressing issue!
+ Currently when the TM restarts (or crashes) it cannot use
+ the data it still has on its disk. It would be great if the restart
+ could take this data into account. There are several options to do
+ this:
+
+ + Re-read the stored files (they are in pcap format) and/or the index
+ files and rebuild the full internal state and continue after the
+ last file. This enables queries for stored data. However, when
+ large disk-storage is used re-reading the files might well take
+ several hours
+
+ + Do not re-read the stored files, but learn about them and include
+ them in buffer management. I.e., the TM starts building its internal
+ state only from data that newly arrives, but it knows that there are
+ old files lying around and it will delete the old files in order to
+ stay within the buffer budget. Then the old files can be searched
+ manually with tcpdump/tcpslice/whatever and restart is pretty much
+ instantaneous.
+
+* TM-cluster mode. This should be fairly easy. We would need a TM
+ cluster front-end. Bro would then communicate with the front-end. The
+ front-end sends the request to its workers (maybe with some
+ intelligence to only query the workers that see the traffic according
+ to the load balancing scheme) and gathers the results from the works,
+ sorts them (by time) and delivers them to the requesting Bro.
+
+* Use a directory/inventory for disk searches. Currently disk queries
+ are done using pcapnav to try to find the "right" location in a
+ file (probabilistically jump to an offset and try to see if the
+ offset is a valid start of a pcap-record).
+ It would be good if the TM could store a directory for each pcap file
+ it writes. The directory could then contain the file offset of
+ each n-th packet + timestamp. A query can then just check the
+ directory for the best location to jump to. No probabilist search.
+ (Maybe this should be part of (B) though)
+ We can then get rid of pcapnav as a dependency
+
+
+B) TM code rework
+=================
+
+In general some of the biggest problems of the TM are IMHO:
+
+* poor write performance. The memory buffer is not really used as elastic
+ storage (packets are only moved from memory to disk once the memory buffer
+ is full). Thus disk can block the capturing thread and thus lead to packet
+ loss.
+ Solution: have the most current packets in memory and on disk at the same
+ time by
+ + write to memory first. A second disk-writer thread will then read
+ packets from memory and write them to disk as soon as possible (TODO:
+ try to minimize lock/unlock operations
+ + write to memory and a to-disk-queue at the same time. A disk-writer
+ thread will then pick up the data from this to-disk-queue and write it
+ to disk.
+
+
+* Index generation. Currently the capture thread generates for each stored
+ packet IndexField* (i.e., the index keys for this packet) and then places
+ these pointers in per-Index queues. The index threads then pick up the
+ IndexFields from these queues and store them.
+ When we rework how packets are stored on disk (see above) it might be
+ worthwhile to also change the way the IndexFields are passed to the
+ Index threads. E.g., if start using a disk-writer thread then this
+ disk writer thread could generate the IndexField* and pass them on to
+ the Index threads. This would reduce the number of lock/unlocks the
+ capture thread needs to do.
+
+* inflexible, hard-coded indexes. Slow-ish lookup performance for on-disk
+ queries.
+ All possible key combinations (e.g.,
+ 2-tuple, 5-tuple, etc.) have to be specified at compile-time. It would
+ be great if the TM could support queries for any combination of
+ IP,IP,port,port,transport.
+ Using fastbit and an indirection could help here. I have some
+ early ideas on how this could be done.
+
+* Keep flow records in addition to packet data and keep it *longer*.
+ The TM pretty implicitly keeps "flow" data for the connections it
+ has in its storage. We could extend this to actually write the flow
+ records to disk and assign a separate disk budget for such flow
+ records. This would allow us to store flow records for significantly
+ longer than just packet data. So it would increase the amount of time
+ we can "travel back", but with less information.
+
+
+-------------------------------
+UNSORTED ITEMS:
+
FOR THE PAPER
* Concurrent queries
@@ -30,8 +125,6 @@ intervals
* Handle Queries with syntax errors
* There's an awful mix of iostreams, Strings, char *, stdout, stderr .... ==> Solve this.
* Make stats logfile configureable
-* There are heaps of different typedefs for sizes in the Fifos, but none is used
-consistently.
* held_bytes / stored_bytes / total_bytes / ... whatever they may be called are
inconsistend. Some use caplen, some wirelen, some caplen+pcap_header, etc.
#6: ‘stderr’ was not declared in this scope
------------------------+----------------------
Reporter: sroddy | Owner: somebody
Type: defect | Status: new
Priority: minor | Milestone:
Component: component1 | Version:
Keywords: |
------------------------+----------------------
Hash.cc: In member function ‘void Hash<K, V>::debugPrint()’:
Hash.cc:181: error: ‘stderr’ was not declared in this scope
make[1]: *** [Hash.o] Error 1
make[1]: Leaving directory `/home/sroddy/tm-20090206'
make: *** [all] Error 2
Building on Ubuntu 10.04.1 server.
Adding '#include <cstdio>' to Hash.hh fixes build problem.
--
Ticket URL: <http://tracker.icir.org/time-machine/ticket/6>
The Time Machine <http://tracker.icir.org/time-machine>
High-volume network traffic stream recorder.
#5: ‘uint64_t’ does not name a type
------------------------+----------------------
Reporter: sroddy | Owner: somebody
Type: defect | Status: new
Priority: trivial | Milestone:
Component: component1 | Version:
Keywords: |
------------------------+----------------------
Building on Ubuntu server 10.04.1 LTS, I get:
In file included from Connections.hh:42,
from Connections.cc:37:
tm.h:55: error: ‘uint64_t’ does not name a type
tm.h:56: error: ‘uint64_t’ does not name a type
tm.h:57: error: ‘uint64_t’ does not name a type
tm.h:58: error: ‘uint64_t’ does not name a type
tm.h:59: error: ‘uint64_t’ does not name a type
tm.h:60: error: ‘uint64_t’ does not name a type
make[1]: *** [Connections.o] Error 1
make[1]: Leaving directory `/home/sroddy/tm-20090206'
make: *** [all] Error 2
Adding '#include <stdint.h>' to tm.h fixes this problem.
--
Ticket URL: <http://tracker.icir.org/time-machine/ticket/5>
The Time Machine <http://tracker.icir.org/time-machine>
High-volume network traffic stream recorder.
Author: gregor
Date: 2011-07-20 11:56:46 -0700 (Wed, 20 Jul 2011)
New Revision: 270
Repository: svn.icir.org/time-machine
Removed:
trunk/config.h.in
Modified:
trunk/Hash.cc
trunk/tm.h
trunk/types.h
Log:
config.h.in
Modified: trunk/Hash.cc
===================================================================
--- trunk/Hash.cc 2011-07-20 17:46:47 UTC (rev 269)
+++ trunk/Hash.cc 2011-07-20 18:56:46 UTC (rev 270)
@@ -38,6 +38,7 @@
#include <algorithm>
#include <assert.h>
+#include <stdio.h>
#include "Hash.hh"
Deleted: trunk/config.h.in
===================================================================
--- trunk/config.h.in 2011-07-20 17:46:47 UTC (rev 269)
+++ trunk/config.h.in 2011-07-20 18:56:46 UTC (rev 270)
@@ -1,101 +0,0 @@
-/* config.h.in. Generated from configure.in by autoheader. */
-
-/* Define to 1 if you have the <broccoli.h> header file. */
-#undef HAVE_BROCCOLI_H
-
-/* Define to 1 if you have the <inttypes.h> header file. */
-#undef HAVE_INTTYPES_H
-
-/* Define to 1 if you have the `broccoli' library (-lbroccoli). */
-#undef HAVE_LIBBROCCOLI
-
-/* Define to 1 if you have the `pcap' library (-lpcap). */
-#undef HAVE_LIBPCAP
-
-/* Define to 1 if you have the `pcapnav' library (-lpcapnav). */
-#undef HAVE_LIBPCAPNAV
-
-/* Define to 1 if you have the `pcre' library (-lpcre). */
-#undef HAVE_LIBPCRE
-
-/* Define to 1 if you have the `pcrecpp' library (-lpcrecpp). */
-#undef HAVE_LIBPCRECPP
-
-/* Define to 1 if you have the `pthread' library (-lpthread). */
-#undef HAVE_LIBPTHREAD
-
-/* Define to 1 if you have the <memory.h> header file. */
-#undef HAVE_MEMORY_H
-
-/* Define to 1 if you have the <pcapnav.h> header file. */
-#undef HAVE_PCAPNAV_H
-
-/* Define to 1 if you have the <pcap.h> header file. */
-#undef HAVE_PCAP_H
-
-/* Define to 1 if you have the <pcrecpp.h> header file. */
-#undef HAVE_PCRECPP_H
-
-/* Define to 1 if you have the <pthread.h> header file. */
-#undef HAVE_PTHREAD_H
-
-/* Define to 1 if you have the <readline/readline.h> header file. */
-#undef HAVE_READLINE_READLINE_H
-
-/* Define to 1 if you have the <stdint.h> header file. */
-#undef HAVE_STDINT_H
-
-/* Define to 1 if you have the <stdlib.h> header file. */
-#undef HAVE_STDLIB_H
-
-/* Define to 1 if you have the <strings.h> header file. */
-#undef HAVE_STRINGS_H
-
-/* Define to 1 if you have the <string.h> header file. */
-#undef HAVE_STRING_H
-
-/* Define to 1 if you have the <sys/stat.h> header file. */
-#undef HAVE_SYS_STAT_H
-
-/* Define to 1 if you have the <sys/types.h> header file. */
-#undef HAVE_SYS_TYPES_H
-
-/* Define to 1 if you have the <unistd.h> header file. */
-#undef HAVE_UNISTD_H
-
-/* Define to 1 if your C compiler doesn't accept -c and -o together. */
-#undef NO_MINUS_C_MINUS_O
-
-/* Name of package */
-#undef PACKAGE
-
-/* Define to the address where bug reports for this package should be sent. */
-#undef PACKAGE_BUGREPORT
-
-/* Define to the full name of this package. */
-#undef PACKAGE_NAME
-
-/* Define to the full name and version of this package. */
-#undef PACKAGE_STRING
-
-/* Define to the one symbol short name of this package. */
-#undef PACKAGE_TARNAME
-
-/* Define to the home page for this package. */
-#undef PACKAGE_URL
-
-/* Define to the version of this package. */
-#undef PACKAGE_VERSION
-
-/* The size of `void*', as computed by sizeof. */
-#undef SIZEOF_VOIDP
-
-/* Define to 1 if you have the ANSI C header files. */
-#undef STDC_HEADERS
-
-/* Version number of package */
-#undef VERSION
-
-/* Define to 1 if `lex' declares `yytext' as a `char *' by default, not a
- `char[]'. */
-#undef YYTEXT_POINTER
Modified: trunk/tm.h
===================================================================
--- trunk/tm.h 2011-07-20 17:46:47 UTC (rev 269)
+++ trunk/tm.h 2011-07-20 18:56:46 UTC (rev 270)
@@ -39,6 +39,8 @@
#include <string>
+#include "types.h"
+
// #define QUERY_RACE_PROTECT
Modified: trunk/types.h
===================================================================
--- trunk/types.h 2011-07-20 17:46:47 UTC (rev 269)
+++ trunk/types.h 2011-07-20 18:56:46 UTC (rev 270)
@@ -37,12 +37,12 @@
#ifndef TM_TYPES_H
#define TM_TYPES_H
-#include <sys/types.h>
-#include <sys/time.h>
// Expose C99 functionality from inttypes.h, which would otherwise not be
// available in C++.
#define __STDC_FORMAT_MACROS
#include <inttypes.h>
+#include <sys/types.h>
+#include <sys/time.h>
#include "config.h"