On 6/1/07, jmzhou.ml@gmail.com <jmzhou.ml@gmail.com> wrote:
Oooops, I think there is a potential flaw in my Analysis point 2. Imagine
that a parsing will just consume an incoming packet - nothing more and
nothing less. After the parsing, data_available probably will not be false
because of the way flow buffer is implemented - orig_data_begin_ will only
be increased to orig_data_end_ upon the next request! So, at the while loop,
data_available = true, have_pending_request = true (from last request), and
ready = true (from last request, too). So we end up with entering the loop
again! In fact, to check orig_data_begin < orig_data_end_ in the current
data_available is wrong from its intention.
 
I was well aware of that fact, but did not see a problem here, as in my SSL analyzer I manually assure that after parsing a data packet, the corresponding data is consumed. In that case, data_available() works just as intended.
 
I don't think, that it is a good idea to introduce side effects into the end() function just for a slight increase in performance. On one hand, the end() function is in the public interface of the buffer and you should never significantly change the behavior of a public method (some binpac users might rely on that behavior), except for very good reasons. On the other hand, in cases where there might be a serious performance problem with the current behavior, the programmer can manually avoid this by explicitly consuming the data, like I do in my analyzer (of course this should be documented ;-).
 
I think we need to do something with the flow buffer implementation. The
current method to increase orig_data_begin_ upon new request is tricky and
difficult to make things correct. My suggestion is to increase orig_data_begin_
in the end() method. The reason is: whenever you request data from the flow
buffer, you always call begin() and end() in sequence. These methods are the
only ways that you can access the data in the flow buffer. After the call
of end(), the decoder will not call begin() and end() again except in one
case which I will address shortly. Thus, it is safe to increase orig_data_begin_
pointer in the end() call. After this change, checking orig_data_begin_ <
orig_data_end_ in data_available() method is correct. BTW, it is not needed
to check buffer_n_ > 0 in data_available(). Because if something is buffered,
and we reach the entry point of the while loop, orig_data_begin_ must be
equal to orig_data_end_. The next parsing must use new data instead of the
data in flow buffer. In addition, in NewFrame and NewLine, there's no need
to increase orig_data_begin_ by frame_length_ after the change of end().
 
Again, data_available() is a public method that is not just intended for the single use we have discussed so far (I use it explicitly in my analyzer). You can always use it to check whether there is currently any data available, and of course this includes the buffer.

Tobias