Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Deterministic charset and collation #14

Merged
merged 6 commits into from
Jan 19, 2024

Conversation

bwaidelich
Copy link
Member

Resolves: #12

@bwaidelich bwaidelich marked this pull request as draft January 15, 2024 13:27
@bwaidelich bwaidelich marked this pull request as ready for review January 16, 2024 16:08
@bwaidelich
Copy link
Member Author

Added some more optimizations after talking to @kitsunet (Thanks again for rubber ducking!)

I intentionally left out the binary optimization for id columns as this adds quite a lot of special handling for different database platforms.
We can address this with a future release (see #16) or create dedicated adapters.

@@ -254,7 +284,7 @@ private function commitEvent(StreamName $streamName, Event $event, Version $vers
'recordedat' => $this->clock->now(),
],
[
'version' => Types::INTEGER,
'version' => Types::BIGINT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically wrong in sqlite I guess? But it shouldn't matter much, so no need to change IMHO given that it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eagle-eye Christian :)

The type comes in relevant in Connection:getBindingInfo()

For the value itself, it does not make a difference, because neither IntegerType nor BigIntType implement an overload for convertToDatabaseValue()

But I realized, that the returned $bindingType is different (INTEGER for IntegerType and STRING for BigIntType).

I'm not 100% sure whether it makes a difference in our case...

Copy link
Member

@kitsunet kitsunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, thank you!

@bwaidelich
Copy link
Member Author

I did some benchmarks to estimate the effect of this PR and unfortunately all numbers are the same or worse with the patch applied..

Performance is equally bad (there are some differences, but I didn't test in fully isolated conditions):

To write 1mio events

  • SQLite: ~420s
  • MariaDB: ~380s
  • PostgreSQL: ~380s

To read 1mio events from the $all stream

  • SQLite: ~4s (!)
  • MariaDB: ~30s
  • PostgreSQL: ~20s

The size for 1mio events (with empty payload) increases with the patch:

  • SQLite: 176MB (stays the same)
  • MariaDB: 211MB (changes to 227MB)
  • PostgreSQL: 248MB (changes to 293MB)

Here the detailed Benchmark: https://gist.github.com/bwaidelich/c7106d6fe39ff855ae94568e01c37fbb
And the diff (without => with patch): https://gist.github.com/bwaidelich/c7106d6fe39ff855ae94568e01c37fbb/revisions

@kitsunet
Copy link
Member

Where is the benchmark code again? Soemthing must be off, or we missed something, I don#t expect much speed gain here given that we do not do any actual queries that could benefit from anything we do, but the size should not be bigger.

@bwaidelich
Copy link
Member Author

@kitsunet Ah right, I forgot to add the benchmark script itself. Added it now to the gist

I don#t expect much speed gain here

Yes, I didn't either – but I wanted to verify that too

the size should not be bigger

Yes, that puzzled me as well.. But there is not so much that could go wrong really.
I drop the tables for each run (and delete the SQLite file) and then used the total_size determined by TablePlus.

Here's the created table structures without and with this change applied:

Without patch (branch main)

SQLite

CREATE TABLE benchmark_events (
  sequencenumber INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  stream VARCHAR(255) NOT NULL,
  version BIGINT UNSIGNED NOT NULL,
  type VARCHAR(255) NOT NULL,
  payload CLOB NOT NULL,
  metadata CLOB DEFAULT NULL,
  id VARCHAR(255) NOT NULL,
  correlationid VARCHAR(255) DEFAULT NULL,
  causationid VARCHAR(255) DEFAULT NULL,
  recordedat DATETIME NOT NULL --(DC2Type:datetime_immutable)
)

MariaDB

CREATE TABLE `benchmark_events` (
  `sequencenumber` int(11) NOT NULL AUTO_INCREMENT,
  `stream` varchar(255) NOT NULL,
  `version` bigint(20) unsigned NOT NULL,
  `type` varchar(255) NOT NULL,
  `payload` longtext NOT NULL,
  `metadata` longtext DEFAULT NULL,
  `id` varchar(255) NOT NULL,
  `correlationid` varchar(255) DEFAULT NULL,
  `causationid` varchar(255) DEFAULT NULL,
  `recordedat` datetime NOT NULL COMMENT '(DC2Type:datetime_immutable)',
  PRIMARY KEY (`sequencenumber`),
  UNIQUE KEY `UNIQ_38ABE26EBF396750` (`id`),
  UNIQUE KEY `UNIQ_38ABE26EF0E9BE1CBF1CD3C3` (`stream`,`version`),
  KEY `IDX_38ABE26ED575A589` (`correlationid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;

PostgreSQL

CREATE SEQUENCE IF NOT EXISTS benchmark_events_sequencenumber_seq;
CREATE TABLE "public"."benchmark_events" (
    "sequencenumber" int4 NOT NULL DEFAULT nextval('benchmark_events_sequencenumber_seq'::regclass),
    "stream" varchar NOT NULL,
    "version" int8 NOT NULL,
    "type" varchar NOT NULL,
    "payload" text NOT NULL,
    "metadata" text,
    "id" varchar NOT NULL,
    "correlationid" varchar DEFAULT NULL::character varying,
    "causationid" varchar DEFAULT NULL::character varying,
    "recordedat" timestamp NOT NULL,
    PRIMARY KEY ("sequencenumber")
);
COMMENT ON COLUMN "public"."benchmark_events"."recordedat" IS '(DC2Type:datetime_immutable)';

With patch (branch feature/12-deterministic-charset-and-collation)

SQLite

CREATE TABLE benchmark_events (
  sequencenumber INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  stream VARCHAR(100) NOT NULL,
  version INTEGER UNSIGNED NOT NULL,
  type VARCHAR(100) NOT NULL,
  payload CLOB NOT NULL,
  metadata CLOB DEFAULT NULL --(DC2Type:json)
  id CHAR(36) NOT NULL,
  causationid VARCHAR(40) DEFAULT NULL,
  correlationid VARCHAR(40) DEFAULT NULL,
  recordedat DATETIME NOT NULL --(DC2Type:datetime_immutable)
)

MariaDB

CREATE TABLE `benchmark_events` (
  `sequencenumber` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `stream` varchar(100) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `version` bigint(20) unsigned NOT NULL,
  `type` varchar(100) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `payload` longtext NOT NULL,
  `metadata` longtext DEFAULT NULL COMMENT '(DC2Type:json)',
  `id` char(36) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  `causationid` varchar(40) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL,
  `correlationid` varchar(40) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL,
  `recordedat` datetime NOT NULL COMMENT '(DC2Type:datetime_immutable)',
  PRIMARY KEY (`sequencenumber`),
  UNIQUE KEY `UNIQ_38ABE26EBF396750` (`id`),
  UNIQUE KEY `UNIQ_38ABE26EF0E9BE1CBF1CD3C3` (`stream`,`version`),
  KEY `IDX_38ABE26ED575A589` (`correlationid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;

PostgresSQL

CREATE SEQUENCE IF NOT EXISTS benchmark_events_sequencenumber_seq;
CREATE TABLE "public"."benchmark_events" (
    "sequencenumber" int8 NOT NULL DEFAULT nextval('benchmark_events_sequencenumber_seq'::regclass),
    "stream" varchar NOT NULL,
    "version" int8 NOT NULL,
    "type" varchar NOT NULL,
    "payload" text NOT NULL,
    "metadata" json,
    "id" bpchar NOT NULL,
    "causationid" varchar DEFAULT NULL::character varying,
    "correlationid" varchar DEFAULT NULL::character varying,
    "recordedat" timestamp NOT NULL,
    PRIMARY KEY ("sequencenumber")
);
COMMENT ON COLUMN "public"."benchmark_events"."recordedat" IS '(DC2Type:datetime_immutable)';

@bwaidelich
Copy link
Member Author

bwaidelich commented Jan 18, 2024

I'm going ahead and merge this anyways. The numbers are not better, but they are not really much worse either and the changes still make sense to me.
We can always fix/adjust things in future bugfix/feature releases
WDYT, @kitsunet ?

@kitsunet
Copy link
Member

Yep fine by me, I can test this myself over hte next days, I think the direction is definitely right.

@bwaidelich bwaidelich merged commit a65fdbc into main Jan 19, 2024
3 checks passed
@bwaidelich bwaidelich deleted the feature/12-deterministic-charset-and-collation branch January 19, 2024 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deterministic charset and collation
2 participants