Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case mismatches when looking up env vars in the Config snapshot #11824

Merged
merged 7 commits into from
Mar 17, 2023

Conversation

kylematsuda
Copy link
Contributor

What does this PR try to resolve?

Fixes #11814.

Windows environment variables are case-insensitive, which causes problems when looking them up in the Config env snapshot.

This PR adds another member (case_insensitive_env) in Config that maps upper-cased keys to their original values in the env (for example, "PATH" => "Path"). If lookup in self.env fails, this PR converts the key to upper case and looks it up in self.case_insensitive_env to obtain the correct key name if it exists (on Windows only).

How should we test and review this PR?

Please see the new tests in testsuite/config.rs and testsuite/cargo_command.rs.

Additional information

Currently, this uses str::to_uppercase to uppercase the keys. This requires key to be valid UTF-8, and may disagree with how the OS uppercases things (see the link in this comment for details).

@rustbot
Copy link
Collaborator

rustbot commented Mar 10, 2023

r? @weihanglo

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-configuration Area: cargo config files and env vars S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 10, 2023
@weihanglo weihanglo added the beta-nominated Nominated to backport to the beta branch. label Mar 10, 2023
@ehuss
Copy link
Contributor

ehuss commented Mar 10, 2023

r? @ehuss

@rustbot rustbot assigned ehuss and unassigned weihanglo Mar 10, 2023
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does get_env_str() and env_has_key() also need to be updated?

Comment on lines 278 to 280
// Only keep entries where both the key and value are valid UTF-8
.filter_map(|(k, v)| Some((k.to_str()?, v.to_str()?)))
.map(|(k, _)| (k.to_uppercase().replace("-", "_"), k.to_owned()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why these lines changed? I would not expect any changes to normalized_env.

Though if you want to clean this up, it seems like it would be nicer to use .keys() instead of .iter().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I meant to add a comment about this but forgot. I realized when I was looking back at #11727 that we might have accidentally subtly changed the behavior of normalized_env there.

Before #11727, the normalized_env (called upper_case_env there) was generated from self.env, which held pairs where both the key and value were required to be valid UTF-8. In #11727, upper_case_env started using all the keys that were valid UTF-8 (regardless of whether the value was also UTF-8), so in principle more keys might potentially be included.

Since I noticed it here, I put back the check that both the env key and env var are valid UTF-8 before using the key in normalized_env. I don't really have a sense of whether this small change is important. If you think that it's probably not, then I'd be glad to use .keys() instead, which I'd agree is more readable.

src/cargo/util/config/mod.rs Outdated Show resolved Hide resolved
@kylematsuda
Copy link
Contributor Author

Does get_env_str() and env_has_key() also need to be updated?

I think they don't need to be updated -- here's my reasoning:

Those two methods were added for convenience when self.env changed from HashMap<String, String> to HashMap<OsString, OsString>, and they replaced calls to self.env.get() and self.env.contains_key(). This means that these call sites were accessing data in self.env both before and after #11727. Therefore, if case mismatch is an issue with these, it would have already been an issue prior to #11727. From the perspective of these methods, the only difference after #11727 is that self.env might also contain a few items that aren't valid UTF-8.

On the other hand, adding config.get_env and config.get_env_os changed a bunch of calls to std::env::var(_os) to go through self.env instead, so it makes sense that those ones are now causing case mismatch issues.

Hopefully this makes sense, please let me know if anything is unclear or if I'm thinking about this wrong.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me!

Does it make sense that we make three env hashmaps (env, case_insensitive_env, and normalized_env) into a single struct, say Envs? The struct only exposes a limited API for Config, and all access to env must get through it.

That will be clearer what is accessible and what is not. We can also put that struct under its own module (config/mod.rs has grown too big 😆).

@kylematsuda
Copy link
Contributor Author

Does it make sense that we make three env hashmaps (env, case_insensitive_env, and normalized_env) into a single struct, say Envs? The struct only exposes a limited API for Config, and all access to env must get through it.

Sure! I agree that would be clearer, I'll give that a shot now.

@kylematsuda
Copy link
Contributor Author

Hi @weihanglo, I've implemented your suggestion in the most recent commit. Please take a look when you have a chance and let me know if you have any suggestions 😃

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thank you for extracting them as a struct! For other issues let's wait for Eric's feedback :)

@kylematsuda
Copy link
Contributor Author

Sounds great, thanks for taking a look!

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through this again. Seems good to merge.

Regarding UTF-8 validity of env value #11824 (comment), we just go back in the day before #11727 so should be no problem at all I guess. We can lift the restriction if needed after.

@ehuss, do you have any other thought on this?

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like wrapping things in the Env struct.

.collect();
let normalized_env = env
.iter()
// Only keep entries where both the key and value are valid UTF-8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this comment also mention why it is skipping over non-utf-8 values? Perhaps something like "Because the config env vars only support utf-8, this needs to be kept in sync with that, otherwise the normalized map warning could incorrectly warn about entries that can't be read by the config system." Or something shorter if you can word it better.

Comment on lines +126 to +127
/// This is intended for use in private methods of `Config`,
/// and does not check for env key case mismatch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this include a comment explaining why it is case-sensitive?

I was looking over the history, and it looks like this was an unintended regression in 1.28 via #5552. I think I somehow forgot about windows case sensitivity at that time, and have been confused on a few occasions (like #9169) due to the fact that Command used to force keys to be uppercase on Windows (fixed in 1.55 via rust-lang/rust#85270). When launching cargo from within cargo's testsuite, it used Command which ended up mucking with the environment keys.

I don't think this needs to be fixed here in this PR. I don't think anyone has complained about this since then, so maybe it's best just to leave it.

So, a comment could explain something like:

This is case-sensitive on Windows (even though Windows is usually case-insensitive) due to an unintended regression in 1.28 (via #5552). This should only affect keys used for cargo's config-system env variables (CARGO_ prefixed ones) which are currently all uppercase. We may want to consider rectifying it if users report issues. One thing that adds a wrinkle here is the unstable advanced-env option which requires case-sensitive keys.

Do not use this for any other purposes. Use [Env::get_env_os] or [Env::get_env] instead, which properly handle case insensitivity on Windows.

@kylematsuda kylematsuda force-pushed the windows-config-env-fix branch from ae52cb2 to 7af55d8 Compare March 16, 2023 21:15
@kylematsuda
Copy link
Contributor Author

Thanks for the helpful feedback and context! I added those comments you suggested in the latest commit.

I also added a bit more documentation on Env itself to try to provide more context. I think something that has been confusing me is that environment variables can be accessed through two different APIs on Config: Config::get_env[_os] or Config::get (for CARGO_ prefixed keys). The Env::case_insensitive_env is only used in the former; Env::normalized_env is only used in the latter. So I tried to point this out explicitly in the docstrings for Env and its fields. Please let me know if you think it's confusing/too much/etc., I'm happy to remove it if it's not helpful.

@weihanglo
Copy link
Member

Thank you for the write-up! We have really wanted to document every design rationale to help ourselves and contributors.

BTW, CI is currently blocked. We are waiting for the next nightly release.

@weihanglo
Copy link
Member

Talked to ehuss and we both agree it is pretty good to merge!

@bors r+

@bors
Copy link
Contributor

bors commented Mar 17, 2023

📌 Commit 7af55d8 has been approved by weihanglo

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 17, 2023
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Mar 17, 2023
@ehuss
Copy link
Contributor

ehuss commented Mar 17, 2023

Thanks! It is much appreciated.

@bors r+

@bors
Copy link
Contributor

bors commented Mar 17, 2023

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Mar 17, 2023

📌 Commit 7af55d8 has been approved by ehuss

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Mar 17, 2023

⌛ Testing commit 7af55d8 with merge 6270d15...

bors added a commit that referenced this pull request Mar 17, 2023
Handle case mismatches when looking up env vars in the Config snapshot

### What does this PR try to resolve?

Fixes #11814.

Windows environment variables are case-insensitive, which causes problems when looking them up in the `Config` env snapshot.

This PR adds another member (`case_insensitive_env`) in `Config` that maps upper-cased keys to their original values in the env (for example, `"PATH" => "Path"`). If lookup in `self.env` fails, this PR converts the key to upper case and looks it up in `self.case_insensitive_env` to obtain the correct key name if it exists (on Windows only).

### How should we test and review this PR?

Please see the new tests in `testsuite/config.rs` and `testsuite/cargo_command.rs`.

### Additional information

Currently, this uses `str::to_uppercase` to uppercase the keys. This requires key to be valid UTF-8, and may disagree with how the OS uppercases things (see the link in [this comment](#11814 (comment)) for details).
@bors
Copy link
Contributor

bors commented Mar 17, 2023

⌛ Testing commit 7af55d8 with merge c9faf70...

@bors
Copy link
Contributor

bors commented Mar 17, 2023

☀️ Test successful - checks-actions
Approved by: ehuss
Pushing c9faf70 to master...

@bors bors merged commit c9faf70 into rust-lang:master Mar 17, 2023
weihanglo pushed a commit to weihanglo/cargo that referenced this pull request Mar 17, 2023
…=ehuss

Handle case mismatches when looking up env vars in the Config snapshot

### What does this PR try to resolve?

Fixes rust-lang#11814.

Windows environment variables are case-insensitive, which causes problems when looking them up in the `Config` env snapshot.

This PR adds another member (`case_insensitive_env`) in `Config` that maps upper-cased keys to their original values in the env (for example, `"PATH" => "Path"`). If lookup in `self.env` fails, this PR converts the key to upper case and looks it up in `self.case_insensitive_env` to obtain the correct key name if it exists (on Windows only).

### How should we test and review this PR?

Please see the new tests in `testsuite/config.rs` and `testsuite/cargo_command.rs`.

### Additional information

Currently, this uses `str::to_uppercase` to uppercase the keys. This requires key to be valid UTF-8, and may disagree with how the OS uppercases things (see the link in [this comment](rust-lang#11814 (comment)) for details).
bors added a commit that referenced this pull request Mar 17, 2023
[beta-1.69] backport #11824

Beta backports:

- #11824

In order to make CI pass, the following PRs are also cherry-picked:

- —
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 19, 2023
…nglo

[beta-1.69] cargo beta backports

3 commits in 9880b408a3af50c08fab3dbf4aa2a972df71e951..7b18c85808a6b45ec8364bf730617b6f13e0f9f8
2023-02-28 19:39:39 +0000 to 2023-03-17 12:29:33 +0000
- [beta-1.69] backport rust-lang/cargo#11824 (rust-lang/cargo#11863)
- [beta-1.69] backport rust-lang/cargo#11820 (rust-lang/cargo#11823)
- chore: Backport rust-lang/cargo#11630 to `1.69.0` (rust-lang/cargo#11806)

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 22, 2023
Update cargo

11 commits in 4a3c588b1f0a8e2dc8dd8789dbf3b6a71b02ed49..15d090969743630bff549a1b068bcaa8174e5ee3
2023-03-14 14:05:36 +0000 to 2023-03-21 17:54:28 +0000
- docs(contrib): Move higher level resolver docs into doc comments (rust-lang/cargo#11870)
- docs(contrib): Pull impl info out of architecture (rust-lang/cargo#11869)
- Update curl-sys (rust-lang/cargo#11871)
- Poll loop fixes (rust-lang/cargo#11624)
- clippy: warn `disallowed_methods` for `std::env::var` and friends (rust-lang/cargo#11828)
- Add --ignore-rust-version flag to cargo install (rust-lang/cargo#11859)
- Handle case mismatches when looking up env vars in the Config snapshot (rust-lang/cargo#11824)
- align semantics of generated vcs ignore files (rust-lang/cargo#11855)
- Add more information to wait-for-publish (rust-lang/cargo#11713)
- docs: Address warnings (rust-lang/cargo#11856)
- docs(contrib): Create a file overview in the nightly docs (rust-lang/cargo#11850)
RalfJung pushed a commit to RalfJung/miri that referenced this pull request Mar 22, 2023
Update cargo

11 commits in 4a3c588b1f0a8e2dc8dd8789dbf3b6a71b02ed49..15d090969743630bff549a1b068bcaa8174e5ee3
2023-03-14 14:05:36 +0000 to 2023-03-21 17:54:28 +0000
- docs(contrib): Move higher level resolver docs into doc comments (rust-lang/cargo#11870)
- docs(contrib): Pull impl info out of architecture (rust-lang/cargo#11869)
- Update curl-sys (rust-lang/cargo#11871)
- Poll loop fixes (rust-lang/cargo#11624)
- clippy: warn `disallowed_methods` for `std::env::var` and friends (rust-lang/cargo#11828)
- Add --ignore-rust-version flag to cargo install (rust-lang/cargo#11859)
- Handle case mismatches when looking up env vars in the Config snapshot (rust-lang/cargo#11824)
- align semantics of generated vcs ignore files (rust-lang/cargo#11855)
- Add more information to wait-for-publish (rust-lang/cargo#11713)
- docs: Address warnings (rust-lang/cargo#11856)
- docs(contrib): Create a file overview in the nightly docs (rust-lang/cargo#11850)
@ehuss ehuss added this to the 1.70.0 milestone Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars beta-nominated Nominated to backport to the beta branch. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Config env snapshot doesn't handle case-insensitive env vars correctly on windows
5 participants