Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace uppercase chars to make behaviour consistent in RequestHandler - DefaultCharReplacements #12850

Merged

Conversation

bjarnef
Copy link
Contributor

@bjarnef bjarnef commented Aug 16, 2022

Prerequisites

  • I have added steps to test this contribution in the description below

If there's an existing issue for this PR then this fixes #12798

Description

Currently it seems there are some inconsistent behaviour in char replacements regarding URL's.
In older versions of Umbraco I recall æ, ø and å were replaced, but later I think it only happened to lowercase chars.

I think if a char is replaced by default, it should be consistent where char replacements are handled.
In a new project on v10 I noticed Æ and æ by default was replaced by ae, but not regarding Ø and Å.

Furthermore oe should be more SEO friendly URL instead of o, e.g. for a page "Søg" (Search in English).

This is probably a breaking change, so it would probably only be relavant to a minor og major version.

@github-actions
Copy link

github-actions bot commented Aug 16, 2022

Hi there @bjarnef, thank you for this contribution! 👍

While we wait for one of the Core Collaborators team to have a look at your work, we wanted to let you know about that we have a checklist for some of the things we will consider during review:

  • It's clear what problem this is solving, there's a connected issue or a description of what the changes do and how to test them
  • The automated tests all pass (see "Checks" tab on this PR)
  • The level of security for this contribution is the same or improved
  • The level of performance for this contribution is the same or improved
  • Avoids creating breaking changes; note that behavioral changes might also be perceived as breaking
  • If this is a new feature, Umbraco HQ provided guidance on the implementation beforehand
  • 💡 The contribution looks original and the contributor is presumably allowed to share it

Don't worry if you got something wrong. We like to think of a pull request as the start of a conversation, we're happy to provide guidance on improving your contribution.

If you realize that you might want to make some changes then you can do that by adding new commits to the branch you created for this work and pushing new commits. They should then automatically show up as updates to this pull request.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@mikecp
Copy link
Contributor

mikecp commented Aug 27, 2022

Thanks @bjarnef for this update, it makes the replacements more consistent indeed 👍

Cheers!

@mikecp mikecp merged commit 68cf801 into umbraco:v10/contrib Aug 27, 2022
@bjarnef bjarnef deleted the v10/feature/replace-uppercase-chars branch August 27, 2022 14:49
@nul800sebastiaan
Copy link
Member

nul800sebastiaan commented Sep 8, 2022

Unfortunately this causes a breaking change when URL tracking is disabled:

  • If you have disabled URL tracking and have no other URL tracking package / custom code
  • And you save a node with upper case Å, Æ, Ø, Ä, Ü or Ö in it
  • The existing URL will change, with no redirects to the new URL so people who have saved the old URL will get a 404

I've reverted this commit 5f42cf0 and cherry-picked it to v11 instead 68ff7b2.

@nul800sebastiaan
Copy link
Member

I also find it weird that Ë and Ï aren't covered by default by the way. This list is still pretty random. Also, why are most special chars blanked out but + turns to plus and * turns to star.. by that logic & should become ampersand.

Just thinking out loud here, we should probably have better defaults in general.

@nul800sebastiaan
Copy link
Member

Documented this change in our announcements umbraco/Announcements#4

Normally we wouldn't announce a behavioral change but since it can lead to quite the SEO headaches if you're not aware of this, we decided to document it in the Announcements repo this time.

@bjarnef
Copy link
Contributor Author

bjarnef commented Sep 8, 2022

@nul800sebastiaan not sure if + should turns to plus, * turns to star and & should become ampersand. In might makes sense in English, but probably not make other cultures.

Not sure what the best practice is to replace this chars, if they should be replaced by empty string, hyphen or something else.

@mikecp
Copy link
Contributor

mikecp commented Sep 8, 2022

Just FYI @nul800sebastiaan , this PR just introduced the replace of the capital letters, the rest of the replaces were already there. But indeed, it might be worth reconsidering the list in its whole, especially if introduced as a breaking change in V11 anyway.
Also, I did not dig further, but I noticed that some special characters in French were replaced by some other logic. E.g. "ç" becomes "c" but does not appear in the list used here...

@nul800sebastiaan
Copy link
Member

@bjarnef Yeah I know, it's all a bit arbitrary, I personally would probably just turn them into dashes, and then filter the whole URL one more time to remove duplicate dashes (so that we don't get about----us but about-us, for example).

Ah I didn't notice @mikecp, but I think that might be because of the other option ConvertUrlsToAscii that is set to try by default. I like that it's now at least consistent indeed, but it would be good to completely reconsider the defaults indeed.

@nul800sebastiaan nul800sebastiaan changed the title Replace uppercase chars to make behaviour consistent Replace uppercase chars to make behaviour consistent in RequestHandler - DefaultCharReplacements Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CharCollection doesn't replace Danish Å
3 participants