Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--merge-every-nth counts total messages, not pulses? #244

Open
SimonHeybrock opened this issue Oct 7, 2024 · 9 comments
Open

--merge-every-nth counts total messages, not pulses? #244

SimonHeybrock opened this issue Oct 7, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@SimonHeybrock
Copy link
Member

Not sure this is what is going on, but I have a file with 3 NXevent_data groups. When I use --merge-every-nth=2 it will only pass the first two to the data reduction workflow. I would have expected it to load two pulse from each group?

This is for the Loki@Larmo file.

@SimonHeybrock SimonHeybrock added the bug Something isn't working label Oct 7, 2024
@jokasimr
Copy link
Contributor

jokasimr commented Oct 29, 2024

This is for the Loki@Larmo file

This file? ess.loki.data.loki_tutorial_sample_run_60339().

Yes the current version emits a new message every n-th message received.
I does seem more sensible to always emit messages containing a whole number of pulses.

But the details are a bit fuzzy, especially in the presence of several NXevent_data groups.
If merge-every-nth=2, do we emit a new message when we have received two pulses from each of the detectors?
Or do we emit a new message when we have received two pulses from any one of the detectors?

@jokasimr jokasimr self-assigned this Oct 30, 2024
@YooSunYoung
Copy link
Member

I'm not sure if we can automate them at all.
Because we never know how many detectors are flushing how many messages per pulse,
and the order of messages are also not guaranteed (most likely to be sorted since we only have one consumer).

We can set another parameter that tells the handler how many messages are expected per pulse,
but then it's easier to just multiply it by the number of pulses we want and then use it as merge-every-nth.

Or, we can maybe wait for all detector sources to receive message with a certain timeout...?

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Nov 1, 2024

I think we should probable look at the event_time_zero of the messages (or just the wallclock time in Beamlime?), and instead of merging every N frames, merge messages for X seconds?

@YooSunYoung
Copy link
Member

I think we should probable look at the event_time_zero of the messages (or just the wallclock time in Beamlime?), and instead of merging every N frames, merge messages for X seconds?

We already have an option max-seconds-between-messages.

@SimonHeybrock
Copy link
Member Author

What does that do? It is not clear from the name.

@YooSunYoung
Copy link
Member

What does that do? It is not clear from the name.

If given, the handler accumulate messages for max-seconds-between-messages seconds.

It's passed to this function:

def maxcount_or_maxtime(maxcount: Number, maxtime: Number):
if maxcount <= 0:
raise ValueError("maxcount must be positive")
if maxtime <= 0:
raise ValueError("maxtime must be positive")
count = 0
last = time.time()
def run():
nonlocal count, last
count += 1
if count >= maxcount or time.time() - last >= maxtime:
count = 0
last = time.time()
return True
return run

Actually, the helper message seems wrong...

what would you call it then...?

@jokasimr
Copy link
Contributor

jokasimr commented Nov 1, 2024

ev44 has a field that determines order, so in principle we can handle messages arriving out of order. But it might be a bit tricky.
Do we have more than one Kafka partition per topic? If we don't then messages are guaranteed to be ordered.

@YooSunYoung
Copy link
Member

YooSunYoung commented Nov 4, 2024

Do we have more than one Kafka partition per topic

Yes, I asked ECDC and all topics has(and will have) multiple partitions as recommended for performance. So they are not guaranteed to be ordered.
We cannot guarantee the order unless we only flush the data after a while, since we can't wait for a missing piece, cause we simply can't determine if anything's missing.

But in reality, most of event data will arrive pretty much in order.

@jokasimr jokasimr removed their assignment Dec 5, 2024
@jokasimr jokasimr moved this from In progress to Selected in Development Board Dec 5, 2024
@jokasimr
Copy link
Contributor

jokasimr commented Dec 5, 2024

Not sure how to do this right now, so putting it back into the selected pile.

@SimonHeybrock SimonHeybrock moved this from Selected to Triage in Development Board Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triage
Development

No branches or pull requests

3 participants