Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Data flow improvements to unlock flow in sqlx test #18291

Merged
merged 9 commits into from
Dec 18, 2024
Merged
24 changes: 22 additions & 2 deletions rust/ql/lib/codeql/rust/dataflow/internal/DataFlowImpl.qll
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,11 @@ private class CapturedVariableContent extends Content, TCapturedVariableContent
override string toString() { result = "captured " + v }
}

/** A value refered to by a reference. */
Fixed Show fixed Hide fixed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referred

final class ReferenceContent extends Content, TReferenceContent {
override string toString() { result = "&ref" }
}

/**
* An element in an array.
*/
Expand Down Expand Up @@ -1040,6 +1045,13 @@ module RustDataFlow implements InputSig<Location> {
["crate::option::Option::Some", "crate::result::Result::Ok"]
)
or
exists(PrefixExprCfgNode deref |
c instanceof ReferenceContent and
deref.getOperatorName() = "*" and
node1.asExpr() = deref.getExpr() and
node2.asExpr() = deref
)
or
VariableCapture::readStep(node1, c, node2)
)
or
Expand Down Expand Up @@ -1123,6 +1135,12 @@ module RustDataFlow implements InputSig<Location> {
node2.(PostUpdateNode).getPreUpdateNode().asExpr() = index.getBase()
)
or
exists(RefExprCfgNode ref |
c instanceof ReferenceContent and
node1.asExpr() = ref.getExpr() and
node2.asExpr() = ref
)
or
VariableCapture::storeStep(node1, c, node2)
)
or
Expand Down Expand Up @@ -1382,7 +1400,8 @@ private module Cached {
e =
[
any(IndexExprCfgNode i).getBase(), any(FieldExprCfgNode access).getExpr(),
any(TryExprCfgNode try).getExpr()
any(TryExprCfgNode try).getExpr(),
any(PrefixExprCfgNode pe | pe.getOperatorName() = "*").getExpr()
]
} or
TSsaNode(SsaImpl::DataFlowIntegration::SsaNode node) or
Expand Down Expand Up @@ -1482,7 +1501,8 @@ private module Cached {
TStructFieldContent(StructCanonicalPath s, string field) {
field = s.getStruct().getFieldList().(RecordFieldList).getAField().getName().getText()
} or
TCapturedVariableContent(VariableCapture::CapturedVariable v)
TCapturedVariableContent(VariableCapture::CapturedVariable v) or
TReferenceContent()

cached
newtype TContentSet = TSingletonContentSet(Content c)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> {
RustDataFlow::readStep(pred, cs, succ) and
cs.getContent() instanceof ArrayElementContent
)
or
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my understanding: when you take a reference &foo you get data flow from f to the ReferenceContent of &f and you get taint flow from f to &f without content?

What sorts of cases do we need the contentless taint flow for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is right. I added the taint flow to support this line in the SQL injection test:

let unsafe_query_1 = String::from("SELECT * FROM people WHERE firstname='") + &remote_string + "'";

Here remote_string is tainted, and the extra taint step makes unsafe_query_1 tainted at well. One could argue that the reference itself isn't really tainted, but on the other hand the only thing it can be used for is access tainted data and it seemed like a simple way to unlock some additional flow. Alternatively, we could also extend the handling of + to read ReferenceContent as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that having + read the ReferenceContent is more accurate but ... I'm worried this will be a can of worms if we got this way. So I guess we should probably leave it the way it is.

@hvitved do you have an opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modelling store steps as also taint steps has proven bad in the past, so I think it would be better to provide a taint flow summary for + which pops ReferenceContent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the best way to do that for a built-in operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modelling store steps as also taint steps has proven bad in the past

Re. this, we also do that right now for arrays (which was inspired by Ruby). Do we want to remove that as well (later)?

Copy link
Contributor

@hvitved hvitved Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we only add taint steps for reads out of arrays, and not for stores into arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Got it, taint steps for read steps are fine, but taint steps for store steps are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should revert this.

Done 👍

)
or
FlowSummaryImpl::Private::Steps::summaryLocalStep(pred.(Node::FlowSummaryNode).getSummaryNode(),
Expand All @@ -59,7 +61,10 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> {
bindingset[node]
predicate defaultImplicitTaintRead(Node::Node node, ContentSet cs) {
exists(node) and
cs.(SingletonContentSet).getContent() instanceof ArrayElementContent
exists(Content c | c = cs.(SingletonContentSet).getContent() |
c instanceof ArrayElementContent or
c instanceof ReferenceContent
)
}

/**
Expand Down
1 change: 0 additions & 1 deletion rust/ql/lib/codeql/rust/elements/internal/VariableImpl.qll
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,6 @@ module Impl {
class VariableReadAccess extends VariableAccess {
VariableReadAccess() {
not this instanceof VariableWriteAccess and
not this = any(RefExpr re).getExpr() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be better to only consider these reads for the SSA library. Should be enough to change certain = false to certain = true here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I had to also handle RefExpr in variableReadActual.

not this = any(CompoundAssignmentExpr cae).getLhs()
}
}
Expand Down
6 changes: 6 additions & 0 deletions rust/ql/lib/codeql/rust/frameworks/reqwest.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extensions:
- addsTo:
pack: codeql/rust-all
extensible: summaryModel
data:
- ["repo:https://github.com/seanmonstar/reqwest:reqwest", "<crate::blocking::response::Response>::text", "Argument[self]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be ReturnValue.Variant[crate::result::Result::Ok(0)].

13 changes: 13 additions & 0 deletions rust/ql/lib/codeql/rust/frameworks/stdlib/lang-core.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,17 @@ extensions:
pack: codeql/rust-all
extensible: summaryModel
data:
# Option
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these taint models should not be needed after altering the summary above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved one of them. But some of our sources specify taint on the entire Result, so I think I'd be fine to keep the others until that is no longer the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we remove these lines, and not have flow for now, we should soon be able to have it once #18298 lands. Otherwise I fear we forget to remove these lines.

- ["lang:core", "<crate::option::Option>::unwrap_or", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::option::Option>::unwrap_or", "Argument[0]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::option::Option>::unwrap_or", "Argument[self]", "ReturnValue", "taint", "manual"]
# Result
- ["lang:core", "<crate::result::Result>::unwrap", "Argument[self].Variant[crate::result::Result::Ok(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::result::Result>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"]
- ["lang:core", "<crate::result::Result>::unwrap_or", "Argument[self].Variant[crate::result::Result::Ok(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::result::Result>::unwrap_or", "Argument[0]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::result::Result>::unwrap_or", "Argument[self]", "ReturnValue", "taint", "manual"]
# String
- ["lang:alloc", "<crate::string::String>::as_str", "Argument[self]", "ReturnValue", "taint", "manual"]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
identityLocalStep
| main.rs:404:7:404:18 | phi(default_name) | Node steps to itself |
| main.rs:394:7:394:18 | phi(default_name) | Node steps to itself |
Loading