I'd say the tooling is still Java-focused, but I found some decent CLI
tooling at https://github.com/apache/parquet-mr/tree/master/parquet-tools
Specifically, I used the convert command
to go from JSON -> Parquet. JSON.gz to Parquet (gzip compression code)
saved us about 35%.
When you say "log writer", do you mean custom Zeek writer
<https://docs.zeek.org/en/stable/frameworks/logging.html> that writes to
The major issue we're facing is that the schema for Zeek output can change
over time (more columns can be added). That's an issue for Parquet.
On Fri, Aug 30, 2019 at 2:21 PM Justin Azoff <justin(a)corelight.com> wrote:
On Fri, Aug 30, 2019 at 2:17 PM Karl Pietrzak
Good morning everyone.
I'm researching compression of Zeek data. I'm currently dumping Zeek
data into Parquet files
I don't have much feedback on the uid bits, but I'm very interested in
Parquet! I had looked into doing this a while back but the tooling around
parquet was very java/big data focussed and not very CLI friendly. Are you
using the new c++ implementation in a log writer or are you converting
json to parquet?