responses on DSN reviews

advertisement
Dear Mr. JeeHyun Hwang:
I am sorry to inform you that the following submission was not selected by the program
committee to appear at DSN-DCCS 2009:
Fault Localization for Firewall Policies
The paper selection process was very competitive. Only 37 papers out of 177
submissions were selected for this year's program (21% acceptance rate).
However, I hope the review comments are constructive and helpful to your future work.
I also hope that you can attend DSN 2009 in Estoril, Portugal at the end of June this
year. Note that the many workshops at DSN could give you an opportunity to present
your work in this year's DSN (deadline is March 16), see www.dsn.org for details.
I have enclosed the reviewer comments for your perusal. Note that the reviewers may
have augmented their reviews and changed their scores after the author response
period based on on-line discussions before the PC meeting and face to face discussions
at the PC meeting.
Best regards and good luck with your future work!
Matti Hiltunen DSN-DCCS 2009 PC chair
====================================================================
========
DSN-DCCS
2009
Reviews
for
Submission
#185
====================================================================
========
Title: Fault Localization for Firewall Policies
Authors:
JeeHyun
Hwang,
Tao
Xie,
Fei
Chen
and
Alex
Liu
====================================================================
REVIEWER #1
--------------------------------------------------------------------
------- Reviewer's Scores
Presentation and English: 3
Relevance for DCCS: 4
Novelty: 3
Contribution:: 3
Technical correctness: 2
Confidence on the review: 2
Overall recommendation: 4
-------------------------------------------------------------------------- Comments
This paper describes a method for isolating flaws in firewall specifications. The idea is
that one uses test packets to identify a flaw, and then uses their reduction techniques to
determine automatically which rules produced the flaw. The test packets themselves
are generated automatically in order to guarantee coverage.
The authors also include
results of an experiment involving mutated firewall policies, and report good results.
This looks like interesting work that could be useful.
A few comments:
The description of the technique, which is the meat of this paper, begins with some
examples and then continues with a high-level overview.
a firm grasp on how their technique works.
This makes it difficult to get
If the authors included, say, a pseudo-code
description of their algorithm that would be helpful.
I would also recommend that they
put the high-level overview *before* the examples.
Throughout this paper I kept asking the question:
policy is correct?
how do you know whether a firewall
The paper doesn't attempt to answer this question.
It assumes that
there is correct, expected behavior and it is possible to tell whether acceptance or
rejection of a packet conforms to this expected behavior.
It is beyond the scope of this
paper to attempt to come up with a way of specifying correct behavior, but according to
the related work section of the paper others have addressed this problem.
It would be
helpful if the authors devoted more space to the discussion of this work and its
relevance to their own.
If the authors could show that their experimental approach
addresses the kinds of correctness criteria and the kinds of mistakes that arise in real
life, this would be very useful.
Even if it doesn't, a discussion of how their
experimental approach could be extended and/or modified to address this would be very
informative.
====================================================================
REVIEWER #2
Reviewer's Scores
Presentation and English: 3
Relevance for DCCS: 4
Novelty: 3
Contribution:: 2
Technical correctness: 1
Confidence on the review: 3
Overall recommendation: 2
-------------------------------------------------------------------Comments
The paper presents techniques for localizing faults that cause failed acceptance tests
during firewall testing. The localization can occur either as an identification of a single
faulty rule (for faults that have incorrect decisions), or as a ranked list of a reduced set
of rules indicating the order in which rules are likely to have the fault. The topic is an
interesting one, and although the paper would be strengthened if the authors had found
data to show the rate of prevalence of firewall problems and security incidents related
to incorrect rules in the field, I am willing to accept that this is a real problem.
Although I would like to encourage the authors to continue exploring ideas on this topic,
I feel that in the current state, this work is still too preliminary and lacking sufficient
depth for a full technical paper. While the proposed methods make some smart
observations, there are problems with them -- the method to identify rule decision
changes seems to be incomplete, and an important fault class is not considered at all
(see comments below).
None of the experimental results are overwhelming (e.g.,
based on a large study of real firewall faults). Finally, neither is the reduction in the
number of rules that have to be considered that large (about a 30% reduction) to merit
the inclusion of the paper on any of the above grounds alone. My comments below also
highlight areas in which the work could be improved to make a nice paper.
Detailed comments:
* The organization of the paper could be improved by simply merging
(since there isn't that much depth in Section 5),
section 4. In the current form,
covered in
Section 3 and 5
and putting the new section after
Section 5 doesn't have that much to add that isn't
Section 3.
* Section 2.4: test packets are generated based on solving constraints
each rule in isolation. Does this effect test coverage?
such as the default deny rule is hard
any insight from the
imposed by
e.g., a rule that is deep down
to reach because of previous rules. Do you have
policies you have studied how prevalent the problem is?
Would a
combined constraint system be a better solution for this?
* On the technique to detect RDC's in Section 3.1, 5.1. It seems you
case where all the failed tests belong to a single
successful tests. In that
an
rule r, but that r also has instances of
case, it is not possible to assume that the fault is necessarily
RDC.
* An important firewall fault class: "incorrect rule order" is
paper. This fault class can show symptoms
redundant rule is moved up the
those of an
completely omitted by the
that resemble both an RDC (e.g., if a
chain so that it no longer becomes redundant), or
RFC. Therefore, the techniques, as they currently stand, cannot
differentiate between the faults they were designed for, vs.
making the applicability of the
incorrect rule order faults,
proposed techniques to real problems suspect.
* The description in Section 5.2 and 5.3 is sloppy - the notation
to be defined. There are also several typos
incorrect (I assume FF(c)
also
have missed the
(e.g., rs, c, etc.) need
(in Sec 5.3 especially), making equation 3
means FF(r)?). The description in the text for the equation
does not match the equation itself.
* When computing the ranking-based reduction percentage, shouldn't you
instead of r? i.e., further reductions due to ranking should
reduced rule-set (and not the original).
use r'
be compared against the
* I liked the results section, and the authors have done a fairly good
lots of scenarios to test their
used?
job at generating
techniques. However, how realistic are the fault models
Couldn't you use the previous literature on "common firewall
[2, 17]) to produce more realistic firewall faults
for your experiments?
* I notice that none of the techniques proposed by the authors use
successful tests for further localization. It seems
useful information and scope for
used in a
mistakes" (e.g.,
information from
to me like there could be a lot of
better localization if both successes and failures are
smart manner. Perhaps, this is an opportunity for the authors to
strengthen
the work.
* The comparison to Marmorstein et. al. in Sec 7 didn't make sense to
sentence, you say that they propose a technique to
firewall policy. However, in the
methodology to
me. In one
identify 2 or 3 faulty rules in a
next sentence, you say that they do not provide a
identify faulty rules? Don't these sentences contradict each other?
====================================================================
REVIEWER #3
-------------------------------------------------------------------Reviewer's Scores
Presentation and English: 4
Relevance for DCCS: 4
Novelty: 3
Contribution:: 2
Technical correctness: 3
Confidence on the review: 3
Overall recommendation: 3
-------------------------------------------------------------------Comments
This paper describes an approach to find faulty firewall rules. Single faults are assumed.
Debugging firewall rules is currently a manual/tedious process and the contribution is
thus relevant. The proposed approach is based on the testing procedure published by
the same authors in SRDS 2008, reference [8] in the submission. In that paper the focus
is on determining the packet set for testing a firewall, while in this paper the focus is on
finding the faulty rules.
The proposed approach is based on the following. (1) If a faulty firewall rule exists,
than this rule is covered by a failed test; with this approach one can locate decision
faults, in which a rule incorrectly specifies "accept" for a packet to which it should have
specified "reject" (or vice-versa). (2) for interval faults, in which a firewall rule
specifies the incorrect interval for acceptance/rejection of a packet, an approach for
reducing the number of rules to be inspected is proposed. First of all the rules for
inspection are located above the first rule that results in a failed test. Then (3) the
faulty rule's decision must be different from the decision of rules covered by failed
tests. (4) Finally an approach is given to rank rules based on test coverage.
These strategies are presented in an ad hoc way, i.e. this reviewer would like to have
found a proof of correctness of each of the proposed procedures. Instead of presenting
a section with short examples (sec. 3) and then only specifying the strategy in section 5,
this reviewer suggests the authors to write one section describing (actually, specifying)
each strategy followed by examples. This will avoid the redundancy in explaining
concepts.
In order to evaluate the proposed approach the authors executed a large enough
number of experiments using real firewall policies and fault injection. The results show
that about a third of rules on average are reduced for inspection (lowest value is 7%
highest 51%) while 50% percent of the rules are reduced when after the ranking
approach is employed (lowest 33% highest 68%).
The paper is very well written and organized, a few suggestions for improvement
follow:
* [Sec. 1] "help reduce effort to locate" -> "help reduce THE effort to locate"
* [Sec. 1] "including a sing fault" -> "including a single fault"
* [Sec. 1] The Introduction becomes redundant when you list the "four main
contributions", they were made clear in earlier paragraphs.
* [Sec. 1] "Section 8 concludes." -> "Section 8 concludes the paper."
* [Sec. 2] Please clarify what you mean by "When firewall policies do not include many
conflicts, this technique can effectively generate packets..."
* [Sec. 3] This section might be called "Description with Examples" instead of just
"Example"; see comment above on structure.
* [Sec. 3] "This ranking technique.... and compute suspicions" -> "computeD
suspicions"
* [Sec. 4] why do you use the word "change" in "Therefore, a change (fault) in p can
introduce a faulty behavior"
* [Sec. 5] "In our approach, we first the first technique"
* [Sec. 5] "Using two preceding" -> "using THE two preceding
* [SubSec. 5.3] "a technique to rank based" -> "a technique to rank rules based"
* [SubSec. 5.3] "For each rule, Suspicions value" -> "For each rule, the Suspicions
index"
* [SubSec. 6.1] Please rewrite the expression "Our tool" instead of repeating it in each
sentence, for instance you might use "The tool then" to make an improved text.
* [Sec. 6] "We defin define the ranking"
* [Sec. 7] Please explain why "it is difficult to use their techniques to locate faults...", in
which sense is it difficult?
====================================================================
REVIEWER
#4
------------------------------------------------------------------
Reviewer's Scores
Presentation and English: 3
Relevance for DCCS: 4
Novelty: 3
Contribution:: 3
Technical correctness: 2
Confidence on the review: 2
Overall recommendation: 4
-------------------------------------------------------------------Comments
Firewall policy testing is used to detect conflicts or inconsistencies in a firewall policy
by emulating firewall filtering against test packets and observing their decision by
various rules in the policy. "When a conflict/inconsistency is detected during such
testing, how does one narrow down and locate the individual rules responsible for the
conflict/inconsistency?"
This
practically
important
question
(currently
answered
manually) is addressed by the techniques presented in the paper. The paper builds upon
the previous work of the authors for structural testing of firewall policies published in
SRDS 2008. The paper models two fault types, Rule Decision Change (RDC), and Rule
Field Interval Change (RFC), based on whether the fault is in the decision part of a
firewall rule or in the predicate part. Three techniques for fault localization (one for
RDC and two for RFC) are presented, and evaluated using instrumented versions of
real-life firewall policies.
The paper is a difficult read in certain parts, and the presentation could be improved: *
Given that the number of tests in Figure 2 and 3 are small (only 10 and 12 respectively),
the description can be made clearer by including the actual test packet-decision pairs
used in those examples. * What is the rationale behind the metric for ranking-based
rule reduction percentage presented in Section 6.2? * Run a spell and grammar checker.
* The suspicion metric for Rule R3 in Figure 3 is given as 0.44, but in the description it
is incorrectly given as 0.33. * Change PT(r) in Section 5.3 to FT(r) * Change "sing fault"
in Section 1 to "single fault". Run a spelling & grammar checker to correct such errors
found throughout the paper.
====================================================================
REVIEWER #5
-------------------------------------------------------------------------- Reviewer's Scores
Presentation and English: 4
Relevance for DCCS: 3
Novelty: 3
Contribution:: 2
Technical correctness: 3
Confidence on the review: 2
Overall recommendation: 3
-------------------------------------------------------------------Comments
This paper builds on a previous one (ref 6).
The previous paper describes the
production of reduced test sets to test firewall policies.
This new paper uses
exceptions to that testing to identify possibly erroneous firewall policies.
This is almost entirely a theoretical approach.
Two plausible error types are assumed,
created, and detected without any reference to actual real-world errors.
This
approach detects what it detects, and provides shortened lists of rules to check. This is
a worthwhile goal.
But I can't tell if it would actually be useful, or if the approach has much of a future.
Presumably, it didn't detect anything in the eleven available firewall rulesets.
The
lengths of these ruleset are typical for a small installation, and fairly easy to audit by
eyeball in most cases. Reducing this number is not a big deal.
I would be much more excited about this paper if the techniques were run on one of the
rulesets reported by Wool that had 5,000 entries.
Discovery of problems, and
excitement on the part of that firewall administrator, would make this a much stronger
paper.
(I realize the difficulty in obtaining access to such rulesets. But the promise of
a cleaner ruleset ought to be enough to find a collaborator.) The results in ref 13 are a
good example.
I like the idea of applying the lessons learned from software development and
dependability to firewall rulesets, but those fields wouldn't have much to offer if all the
programs were a couple dozen lines long.
Some parts of the explanation are murky.
suspect, that R4 is erroneous?
In Figure 2, why do we know, or even
It would be appropriate if we wanted to accept [2,2,5].
Your explanation of why a rule test failure implies a possible error could use some
clarification.
Page 2, para 4, s/sing/single
In section 3.2, para 4, sentence 3 is cast awkwardly, the series of commas of ri, r1, r2,
etc confuse the sentence.
Dear Mr. JeeHyun Hwang:
I am sorry to inform you that the following submission was not selected by the program
committee to appear at DSN-DCCS 2009:
--------author response by JeeHyun Hwang----------------------------We greatly appreciate the reviewers' detailed and constructive reviews.
To Reviewer 1
(1) The correctness of the RDC fault localization technique, the first rule
reduction technique, and the second rule reduction can be formally
proved; they will be included in the final version.
To Reviewer 2
(1) The rationale behind the metrics for the rule ranking technique is that
the number of clauses in a predicate that are specified wrong is typically
small. Therefore, we first examine the potential faulty rule that has the
smallest number of false clauses over all failed packets.
To Reviewer 3
(1) We are conducting our experiments over firewall policies with hundreds
of rules; in fact, the larger firewall policies that our approach is applied on,
the more benefits are achieved by our approach.
To Reviewer 4
(1) The two types of firewall faults RDC and RFC that we proposed are
realistic. The typical scenario for the RDC fault is that a firewall
administrator forgot to change the decision of a legacy rule that is not valid
anymore. The typical scenario for the RFC fault is that the firewall
administrator adds a new rule to a firewall policy, but the new rule
overwrites the rules below it and causes errors.
(2) In Figure 2, R4 is a faulty rule because we inject this fault by changing
its decision to be incorrect.
To Reviewer 5
(1) Firewall errors are mostly caused by incorrect rules. In paper "A
quantitative study of firewall configuration errors", by studying a large
number of real-life firewall, Wool found that more than 90% of them have
errors in their rules.
(2) On the choices of test generation based on constraint solving, our
previous SRDS'08 paper observed that tests by solving individual rule
constraint can achieve comparable coverage with that by solving combined
constraints, but requires much lower analysis cost.
(3) For the case mentioned by reviewer 5, the technique in Section 3.1 is
not applicable and then the technique in Section 3.2 is applied.
(4) We cannot use the anomaly models used in the papers [2,17] with two
reasons. First, there are often many anomalies in a firewall based on their
definition. Second, many such anomalies are not errors. For example, they
have defined a conflict between any two rules as an anomaly; however, it
is often not.
(5) Marmorstein's work does propose a method for identifying faulty rules;
however, their method is not systematic. Our proposed techniques have
three main advantages over Marmorstein's work. First, Marmorstein's work
misses all the potential faulty rules that we covered in the RFC fault
because they consider only the rules that match a failed packet whereas
we examine the rules above the rule that failed packets match. Second, we
reduced the number of possible faulty rules, while Marmorstein's work can
find out only all the potential faulty rules that cover the failed packets. Third,
we ranked the potential f
Download