Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buddy processes falling down on unserialization step #421

Open
5 tasks
djklim87 opened this issue Dec 13, 2024 · 10 comments
Open
5 tasks

Buddy processes falling down on unserialization step #421

djklim87 opened this issue Dec 13, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@djklim87
Copy link
Contributor

Bug Description:

Reproducible in master

mysql> CREATE SOURCE kafka (id bigint, term text, abbrev text, GlossDef json) type='kafka' broker_list='kafka:9092' topic_list='my-data' consumer_group='manticore' num_consumers='2' batch=50; CREATE TABLE destination_kafka (id bigint, name text, short_name text, received_at text, size multi);CREATE MATERIALIZED VIEW view_table TO destination_kafka AS SELECT id, renamed_term as name, abbrev as short_name, UTC_TIMESTAMP() as received_at, GlossDef.size as size FROM kafka;

It seems the file is being written twice with different serialized code instead of the expected single record per file

a:2:{i:0;s:9:"runWorker";i:1;a:1:{i:0;a:10:{s:2:"id";i:5479352753408966658;s:4:"type";s:5:"kafka";s:4:"name";s:5:"kafka";s:9:"full_name";s:7:"kafka_1";s:12:"buffer_table";s:15:"_buffer_kafka_1";s:14:"original_query";s:199:"CREATE SOURCE kafka (id bigint, renamed_term "$term" text, abbrev text, GlossDef json) type='kafka' broker_list='kafka:9092' topic_list='my-data' consumer_group='manticore' num_consumers='2' batch=50";s:5:"attrs";s:72:"{"broker":"kafka:9092","topic":"my-data","group":"manticore","batch":50}";s:14:"custom_mapping";s:24:"{"renamed_term":"$term"}";s:16:"destination_name";s:17:"destination_kafka";s:5:"query";s:129:"SELECT id, renamed_term AS name, abbrev AS short_name, UTC_TIMESTAMP() AS received_at, GlossDef.size AS size FROM _buffer_kafka_1";}}}a:2:{i:0;s:9:"runWorker";i:1;a:1:{i:0;a:10:{s:2:"id";i:5479352753408966657;s:4:"type";s:5:"kafka";s:4:"name";s:5:"kafka";s:9:"full_name";s:7:"kafka_0";s:12:"buffer_table";s:15:"_buffer_kafka_0";s:14:"original_query";s:199:"CREATE SOURCE kafka (id bigint, renamed_term "$term" text, abbrev text, GlossDef json) type='kafka' broker_list='kafka:9092' topic_list='my-data' consumer_group='manticore' num_consumers='2' batch=50";s:5:"attrs";s:72:"{"broker":"kafka:9092","topic":"my-data","group":"manticore","batch":50}";s:14:"custom_mapping";s:24:"{"renamed_term":"$term"}";s:16:"destination_name";s:17:"destination_kafka";s:5:"query";s:129:"SELECT id, renamed_term AS name, abbrev AS short_name, UTC_TIMESTAMP() AS received_at, GlossDef.size AS size FROM _buffer_kafka_0";}}}

Manticore Search Version:

Manticore 6.3.9 74f607887@24121214 dev (columnar 2.3.1 edadc69@24112219) (secondary 2.3.1 edadc69@24112219) (knn 2.3.1 edadc69@24112219)

Operating System Version:

Test kit

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@donhardman
Copy link
Contributor

donhardman commented Dec 16, 2024

There was a core logic issue in the way we read data in Process due to how Swoole handles large data execution. We should validate whether the fix resolves the issue or not.

@donhardman
Copy link
Contributor

@djklim87 I was unable to reproduce it locally because:

  • it requires setting up Kafka
  • it's not reproducible with simple data

We should consider adding tests to cover this case since it only occurs in specific scenarios and doesn't work like this in most cases.

@donhardman donhardman assigned djklim87 and unassigned donhardman Dec 16, 2024
@djklim87
Copy link
Contributor Author

No, it doesn’t require Kafka. The issue was related to worker creation. If it successfully creates workers, there’s no need to write additional tests, except for the tests addressing the fixes you mentioned in the previous comment

@djklim87 djklim87 assigned donhardman and unassigned djklim87 Dec 26, 2024
@donhardman
Copy link
Contributor

I have fixed the issue but need confirmation since I cannot reproduce it without access to your specific data. The following tests must be performed with your data set, as the issue is not reproducible with other data. The problem was specifically related to custom data and cannot be replicated using different data sets. I did not observe any issues with worker creation; the problem was only with data submitted within the channel, which has now been fixed and merged (awaiting confirmation and testing).

@donhardman donhardman assigned djklim87 and unassigned donhardman Dec 26, 2024
@donhardman
Copy link
Contributor

Can you reproduce it with latest master?

@djklim87
Copy link
Contributor Author

Again, you don’t need Kafka for this.

No datasets, no Kafka. Just use the command below:

CREATE SOURCE kafka (id bigint, term text, abbrev text, GlossDef json) type='kafka' broker_list='kafka:9092' topic_list='my-data' consumer_group='manticore' num_consumers='2' batch=50; 
CREATE TABLE destination_kafka (id bigint, name text, short_name text, received_at text, size multi);
CREATE MATERIALIZED VIEW view_table TO destination_kafka AS SELECT id, renamed_term as name, abbrev as short_name, UTC_TIMESTAMP() as received_at, GlossDef.size as size FROM kafka;

And yes, everything works fine now. However, you didn’t write tests for your fixes

If you add these tests, it will be enough to close this issue

@djklim87 djklim87 assigned donhardman and unassigned djklim87 Dec 26, 2024
@donhardman
Copy link
Contributor

donhardman commented Dec 29, 2024

@djklim87 please provide your data to create tests for @PavelShilin89

The issue occurs due to specific data transfers that you experienced, so it doesn't affect 100% of cases. To address this, we should write CLT tests.

@donhardman donhardman assigned djklim87 and unassigned donhardman Dec 29, 2024
@donhardman
Copy link
Contributor

donhardman commented Dec 29, 2024

No, it doesn’t require Kafka. The issue was related to worker creation. If it successfully creates workers, there’s no need to write additional tests, except for the tests addressing the fixes you mentioned in the previous comment

To clarify, the issue is not related to worker creation. The cause was data transfer between processes. In some cases, when the data is big enough, it gets chunked by Swoole. That's why we should write CLT tests with custom data that you used in your local environment.

@djklim87 djklim87 assigned donhardman and djklim87 and unassigned djklim87 and donhardman Jan 3, 2025
@djklim87
Copy link
Contributor Author

djklim87 commented Jan 6, 2025

Ok so @PavelShilin89 write tests for this issue:

Just run 5 times queries below (Without Kafka, just Buddy)

CREATE SOURCE kafka (id bigint, term text, abbrev text, GlossDef json) type='kafka' broker_list='kafka:9092' topic_list='my-data' consumer_group='manticore' num_consumers='2' batch=50; 
CREATE TABLE destination_kafka (id bigint, name text, short_name text, received_at text, size multi);
CREATE MATERIALIZED VIEW view_table TO destination_kafka AS SELECT id, renamed_term as name, abbrev as short_name, UTC_TIMESTAMP() as received_at, GlossDef.size as size FROM kafka;

And check that logs doesn't have errors with text Uncaught ErrorException: unserialize()

Full error text:


[BUDDY] Fatal error: Uncaught ErrorException: unserialize(): Extra data starting at offset 701 of 1402 bytes in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/Process/BaseProcessor.php:75
[BUDDY] Stack trace:
[BUDDY] #0 [internal function]: buddy_error_handler(2, 'unserialize(): ...', '/usr/share/mant...', 75)
[BUDDY] #1 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/Process/BaseProcessor.php(75): unserialize('a:2:{i:0;s:9:"r...')
[BUDDY] #2 /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/Process/Process.php(146): Manticoresearch\Buddy\Core\Process\BaseProcessor->parseMessage('a:2:{i:0;s:9:"r...')
[BUDDY] #3 [internal function]: Manticoresearch\Buddy\Core\Process\Process::Manticoresearch\Buddy\Core\Process\{closure}(Object(Swoole\Process))
[BUDDY] #4 /usr/share/manticore/modules/manticore-buddy/src/Network/Server.php(259): Swoole\Server->start()
[BUDDY] #5 /usr/share/manticore/modules/manticore-buddy/src/main.php(155): Manticoresearch\Buddy\Base\Network\Server->start()
[BUDDY] #6 {main}
[BUDDY]   thrown in /usr/share/manticore/modules/manticore-buddy/vendor/manticoresoftware/buddy-core/src/Process/BaseProcessor.php on line 75
[BUDDY] [2025-01-03 15:24:30 $136.0]	WARNING	Server::check_worker_exit_status(): worker(pid=143, id=2) abnormal exit, status=255, signal=0

@PavelShilin89
Copy link

@djklim87 testing done in PR manticoresoftware/manticoresearch#2905, please give feedback or approval to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants