-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use POSIX collation in the Postgres registry #765
base: develop
Are you sure you want to change the base?
Conversation
As I would always have to reindex all indexes in a database with reindexdb, I don't see any advantage. |
That's what I was wondering about.
I believe you can still use UTF-8 bytes, it's just that things are indexed purely in byte order, not a character encoding order. I think that's right, but I'm not positive, admittedly. |
The more I think about it, the more convinced I am that this isn't the right thing to do. We shouldn't be making collation decisions for the people using the database. It might even be surprising that the registry tables use a different collation than the rest of the database. And as @sjstoelting points out, one ought to expect to reindex all indexes in a database where the collation has changed; and including Sqitch's indexes should be pretty obvious (and generally low-overhead, it's not usually much data). What do you think, @datafoo? |
Sqitch should use whatever collation is best for itself. As far as users are concerned, this is an implementation detail so, as I see it, Sqitch would not be "making decisions for the people using the database".
It might be obvious for you because you knows Sqitch inside out but I can tell you it is was not for me. All I had was "Rebuild all objects affected by this collation" when faced with the error:
I think we should do what's right. PostgreSQL indicates:
Is there any reason to use locales? |
The thing is, we don't. We just default to whatever the DBA has configured things for. It might be more surprising to them if it wasn't the same. |
Then I believe we should follow PostgreSQL recommendation.
Again, this is an implementation detail so it should not impact the rest of the user's database. Can you think of anything that might be problematic and/or more difficult for a DBA if Sqitch were to use POSIX collation? |
And again, there's a world outside the USA which do need and use different collations.
If you don't take the characters used in other countries seriously, I boycott your products wherever possible.
Leave it to the users and don't make decisions, that don't fit elsewhere.
And again, I even have email addresses with domains containing umlauts.
|
Yeah, that's the thing, I think we have no business choosing a collation. |
I believe there is a misunderstanding. In issue #763, I am suggesting for Sqitch to explicitly use a collation not a character set, not a character encoding. If you run initdb --auth=trust --pgdata="./mydata" --encoding=UTF-8 --no-locale --username="postgres" ... you end up with UTF8 encoding and C collation: postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+---------+-------+------------+-----------------+-----------------------
postgres | postgres | UTF8 | C | C | | libc |
template0 | postgres | UTF8 | C | C | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | C | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
(3 rows) UTF8 being an encoding for unicode, you should be able to use any character you like. |
What is the goal about having data stored as UTF-8 and sorted by something that doesn't contain most of the characters that are available?
You can change your own stuff to whatever you like, if you love POSIX, do it in your own stuff.
But dictating that to everyone is still something I believe to be a decision by it's users.
Am 15. August 2023 11:52:37 MESZ schrieb datafoo ***@***.***>:
…> And again, there's a world outside the USA which do need and use different collations. If you don't take the characters used in other countries seriously, I boycott your products wherever possible. Leave it to the users and don't make decisions, that don't fit elsewhere. And again, I even have email addresses with domains containing umlauts.
I believe there is a misunderstanding. In issue #763, I am suggesting for Sqitch to explicitly use a _collation_ not a _character set_, not a _character encoding_.
If you run
```shell
initdb --auth=trust --pgdata="./mydata" --encoding=UTF-8 --no-locale --username="postgres"
```
... you end up with UTF8 encoding and C collation:
```shell
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+---------+-------+------------+-----------------+-----------------------
postgres | postgres | UTF8 | C | C | | libc |
template0 | postgres | UTF8 | C | C | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | C | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
(3 rows)
```
UTF8 being an encoding for unicode, you should be able to use any character you like.
--
Reply to this email directly or view it on GitHub:
#765 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
As a user of Sqitch, I do not interact with Sqitch using SQL, I interact with sqitch using the
I am not dictating anything to anyone, I am interacting on this issue to make sqitch even better than it already is. If you share this goal, you could perhaps try to explain how this proposal is creating a problem. |
I'm beginning to come back around to @datafoo's point of view, because using C/POSIX collation has no effect on the data you can store: it's all UTF-8. What it affects is the sort order:
All the character encodings remain, although the sort ordering might not be expected for some locale or another: it's simply sorted in lexicographic order — that is, by the UTF-8 byte orders. |
23a6974
to
05968ce
Compare
05968ce
to
63ace35
Compare
Applies to Postgres 9.1 and higher and Yugabyte 2.9 and higher. Using the POSIX collation ensures that index ordering never changes when the database is upgraded, which is a particular problem with glibc collations, but since POSIX/C collation is strictly byte-ordered, it should be fine. Of course, any of use of `ORDER BY` on such columns will return unexpected results when users are used to other locales, but since Sqitch itself only ever orders by timestamp, it should not be an issue in its own use. Closes #763.
63ace35
to
dd6a9bc
Compare
Applies also to Yugabyte, and separately add for Cockroach. Using the POSIX collation ensures that index ordering never changes when the database is upgraded, which is a particular problem with glibc collations, but since POSIX/C collation is strictly byte-ordered, it should be fine. Of course, any of use of
ORDER BY
on such columns will return unexpected results when users are used to other locales, but since Sqitch itself only ever orders by timestamp, it should not be an issue in its own use.Closes #763.