Hi,
Below you can find a script that does file extraction and renames files to include the MD5
hash of the file. I'm using the file_sniff event to extract files and at this point I
save them using the timestamp and the file ID. Extracted files are saved in a top level
directory.
Later on, in the file_state_remove event (at which point the file's MD5 should be
available) I rename the file using the MD5 hash, and retaining the file's extension.
I'm saying that in the file_state_remove event the file's MD5 should be
available, but it's not always the case. One possible situation in which the MD5 is
missing is when Zeek is missing some bytes. Renamed files are being moved in a
sub-directory using the date when the file was seen.
The script below allows you to customise the MIME types of the files that you want to
extract and to restrict it to files downloaded by one given IP address. Feel free to
customise it to fit your needs. The location where files are extracted can be customised
as well.
Cheers,
Liviu
# MIME-types to be extracted
const extracted_mime_types = set(
# Images:
"image/jpeg",
"image/png"
);
# Client for which to extract files
const target_client = 10.0.0.1 &redef;
redef FileExtract::prefix = "/data/zeek/extracted_files/";
export {
## Path where extracted files are saved
const file_extract_path: string = "/data/zeek/extracted_files/"
&redef;
}
# File extraction
event file_sniff(f: fa_file, meta: fa_metadata)
{
# Check the right mime-type to extract.
if ( ! meta?$mime_type || meta$mime_type !in extracted_mime_types )
return;
if ( target_client !in f$info$rx_hosts )
return;
for (i in meta$mime_types)
{
if(meta$mime_types[i]$mime in extracted_mime_types)
{
local fext = split_string(meta$mime_types[i]$mime, /\//)[1];
local ntime = fmt("%D", network_time());
local fname = fmt("%s_%s.%s", ntime, f$id, fext);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT,
[$extract_filename=fname]);
break;
}
}
}
event file_state_remove(f: fa_file)
{
if ( !f$info?$extracted || !f$info?$md5 || FileExtract::prefix == "" )
return;
local orig = f$info$extracted;
local split_orig = split_string(f$info$extracted, /\./);
local extension = split_orig[|split_orig|-1];
local ntime = fmt("%D", network_time());
local ndate = sub_bytes(ntime, 1, 10);
local dest_dir = fmt("%s%s", FileExtract::prefix, ndate);
mkdir(dest_dir);
local dest = fmt("%s/%s.%s", dest_dir, f$info$md5, extension);
local cmd = fmt("mv %s/%s %s", FileExtract::prefix, orig, dest);
when ( local result = Exec::run([$cmd=cmd]) )
{
}
f$info$extracted = dest;
}
On Sun, 2019-04-28 at 10:04 +0300, william de ping wrote:
Hi everyone,
I want to extract files and have their names include their md5 hash.
The problem is that the md5 hashing happens on file_hash event while file extraction
occurs on former events such as file_new or file_over_new_connection.
Any ideas on how to accomplish this?
Thanks
B
_______________________________________________
Zeek mailing list
<mailto:zeek@zeek.org>
zeek(a)zeek.org
<http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek>
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek