Dataflow: update fieldFlowBranchLimit semantics #15599

aschackmull · 2024-02-13T13:12:39Z

This makes two changes to the fieldFlowBranchLimit interpretation:

The count is adjusted to properly count virtual dispatch instead of nodes. This will block less flow and hence result in more computation and more alerts - hopefully fixing some FNs.
The blocking condition on return edges is changed to only care about virtual dispatch count and not the number of call sites. This will block more flow and hopefully reduce FPs and performance problems based on uncertain dispatch. This has anecdotally been identified as the core issue in a couple of poorly performing cases.

For the return edge condition, special care is taken to still follow call edges that are determined by the call context.

All qltests are also updated to use the default fieldFlowBranchLimit instead of an inflated value to better reflect what's actually used in queries.

…eturn edge condition (block more)

These could be an empty type, but Unit was available and it probably doesn't matter.

MathiasVP · 2024-04-19T10:06:29Z

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll

+  result = n.asInstruction() or
+  result = n.asOperand().getUse() or
+  result = n.(SsaPhiNode).getPhiNode().getBasicBlock().getFirstInstruction() or
+  n.(IndirectInstruction).hasInstructionAndIndirectionIndex(result, _) or


n.hasInstructionAndIndirectionIndex(instr, index) may actually hold for multiple (instr, index) pairs, so this predicate really should be called getAnInstruction. Additionally, this seems to miss a case for IndirectOperands which I guess means that slightly fewer nodes than expected will have a second-level scope. However, since the shared library handles missing second-level scopes it's probably not a big deal.

The C/C++ team will probably fix this once this PR has been merged 👍 There are some more follow-ups that we want to do, and I'll write these up as an internal C/C++ issue.

hvitved

LGTM, one question.

hvitved · 2024-04-22T09:26:23Z

shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll

+    private int ctxDispatchFanoutOnReturn(NodeEx out, DataFlowCall ctx) {
+      exists(DataFlowCall call, DataFlowCallable c |
+        simpleDispatchFanoutOnReturn(call, out) > 1 and
+        not Stage1::revFlow(out, false) and


Why is this restriction needed?

hvitved · 2024-04-22T09:29:01Z

shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll

+        simpleDispatchFanoutOnReturn(call, out) > 1 and
+        not Stage1::revFlow(out, false) and
+        call.getEnclosingCallable() = c and
+        returnCallEdge1(c, _, ctx, _) and


So we are only considering contexts ctx that we also return to; why is that? (and I guess that answers my question above).

This ensures that a call-context is always associated with the dispatch fanout at call. Either this is nested flow-through, in which case the context will be active in the initial flow-in when entering call, or this is returning flow that didn't come from a parameter in which case the return-call-context will be the thing that reduces the eventual fanout. This does miss the case where we enter the surrounding scope with a call-context and flow-through at call without returning further, but I think that's ok - for the "MaD model does lambda callback" case this would only be problematic if there were two lambdas being passed in with flow through the first and then entering the second. If we used the dual constraint not fwdFlow(out, false) to ensure a call-context from a parameter instead, then we'd miss the important case where a source is inside the lambda callback. So to cover this case we'd need a separate range on contexts to count, and I don't think that's worth the effort.

…ldflowbranchlimit-v2" This reverts commit b2f0994, reversing changes made to 19974f0.

github-actions bot added DataFlow Library Java labels Feb 13, 2024

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from 0844619 to 912c8fa Compare February 21, 2024 11:07

github-actions bot added C# Swift C++ Python Go Ruby labels Feb 21, 2024

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from 15a07b2 to 26c3de5 Compare February 26, 2024 10:04

aschackmull mentioned this pull request Feb 26, 2024

Wip: test changes to fieldflowbranchlimit semantics #10025

Closed

aschackmull changed the title ~~Dataflow: wip test of fieldflowbranchlimit adjustment~~ Dataflow: update fieldFlowBranchLimit semantics Feb 26, 2024

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from 26c3de5 to a6419dc Compare February 26, 2024 13:37

github-actions bot removed Python Go Ruby labels Feb 26, 2024

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from bdfd86b to 3abae04 Compare March 5, 2024 13:10

github-actions bot added Python Go Ruby labels Mar 5, 2024

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch 2 times, most recently from db93e59 to 3a7d795 Compare March 7, 2024 11:02

MathiasVP mentioned this pull request Mar 11, 2024

C++: Clean up cpp/non-constant-format #15875

Merged

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from 0adaa99 to d6338b9 Compare April 11, 2024 09:01

aschackmull added 6 commits April 15, 2024 15:12

Dataflow: Adjust fieldFlowBranchLimit count (block less) and adjust r…

82afbbc

…eturn edge condition (block more)

Dataflow: Simplify branch and join.

f945687

Dataflow: Use default fieldFlowBranchLimit in qltests.

b87b832

C++: Update qltest.

9e39be5

C++: Count return dispatch based on 2nd level scopes.

db6d27b

Dataflow: Add dummy DataFlowSecondLevelScope implementations.

2f0987e

These could be an empty type, but Unit was available and it probably doesn't matter.

Java: Count second level scopes for fieldFlowBranchLimit.

3c69f8f

aschackmull force-pushed the dataflow/fieldflowbranchlimit-v2 branch from d6338b9 to 3c69f8f Compare April 15, 2024 13:18

Dataflow: Add change note.

5950149

github-actions bot added the documentation label Apr 19, 2024

aschackmull marked this pull request as ready for review April 19, 2024 06:46

aschackmull requested review from a team as code owners April 19, 2024 06:46

owen-mc approved these changes Apr 19, 2024

View reviewed changes

MathiasVP reviewed Apr 19, 2024

View reviewed changes

hvitved reviewed Apr 22, 2024

View reviewed changes

hvitved approved these changes Apr 23, 2024

View reviewed changes

aschackmull merged commit b2f0994 into github:main Apr 23, 2024
55 checks passed

aschackmull deleted the dataflow/fieldflowbranchlimit-v2 branch April 23, 2024 08:08

MathiasVP mentioned this pull request Apr 23, 2024

C++: fieldFlowBranchLimit follow-up (1) #16302

Merged

hvitved mentioned this pull request Apr 24, 2024

Data flow: Fix bad join #16313

Merged

MathiasVP added a commit to MathiasVP/ql that referenced this pull request May 1, 2024

Revert "Merge pull request github#15599 from aschackmull/dataflow/fie…

b3d83be

…ldflowbranchlimit-v2" This reverts commit b2f0994, reversing changes made to 19974f0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataflow: update fieldFlowBranchLimit semantics #15599

Dataflow: update fieldFlowBranchLimit semantics #15599

aschackmull commented Feb 13, 2024 •

edited

Loading

MathiasVP Apr 19, 2024

hvitved left a comment

hvitved Apr 22, 2024

hvitved Apr 22, 2024

aschackmull Apr 22, 2024

Dataflow: update fieldFlowBranchLimit semantics #15599

Dataflow: update fieldFlowBranchLimit semantics #15599

Conversation

aschackmull commented Feb 13, 2024 • edited Loading

MathiasVP Apr 19, 2024

Choose a reason for hiding this comment

hvitved left a comment

Choose a reason for hiding this comment

hvitved Apr 22, 2024

Choose a reason for hiding this comment

hvitved Apr 22, 2024

Choose a reason for hiding this comment

aschackmull Apr 22, 2024

Choose a reason for hiding this comment

aschackmull commented Feb 13, 2024 •

edited

Loading