-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Performance and Blocking specification #130
Conversation
Performance and Blocking specification is specified in a separate document and is linked from Language Library Design principles document. Implements issue: open-telemetry#94
I signed it > CLA |
specification/performance.md
Outdated
- **Library should not block end-user application.** | ||
- **Library should not consume infinite memory resource.** | ||
|
||
Tracer should not degrade the end-user application. So that it should not block the end-user application nor consume too much memory resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion, during the application exit/shutdown, it might make sense to block for a user-configurable grace period. This gives the library enough time to persist or flush the telemetry data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for Flush
. If we will decide we need a Flush
method in SDK - it must be blocking for at least a configurable period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense completely. I added "Shutdown and explicit flushing could block" section to describe shutdown/flush behavior: 53b0519#diff-44dc82a7e6286380ed89736215beda74R41
specification/performance.md
Outdated
Here are the key principles: | ||
|
||
- **Library should not block end-user application.** | ||
- **Library should not consume infinite memory resource.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of infinite is vague, do we want to make this user configurable? (don't have to be rocket science here though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "unbounded" may be clearer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be configurable, but it sounds this has to more to do with the implementation than the actual API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced "unlimited" with "unbounded".
IMHO, it may not easy to configure memory resources itself (depends on implementation) but the resource usage should be indirectly controlled via the size of a queue or something else. So that I think @songy23 's suggestion ("unbounded") is a good word to describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've left two minor suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
specification/performance.md
Outdated
Here are the key principles: | ||
|
||
- **Library should not block end-user application.** | ||
- **Library should not consume infinite memory resource.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "unbounded" may be clearer here?
specification/performance.md
Outdated
If there is such trade-off in language library, it should provide the following options to end-user: | ||
|
||
- **Prevent blocking**: Dropping some information under overwhelming load and show warning log to inform when information loss starts and when recovered | ||
- Should be a default option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also apply to metrics recording (as an example see census-instrumentation/opencensus-python#684).
(However we may not want to make it the default option for logging APIs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the same principle for metrics.
For logging, applied "Prevent information loss" policy by default and added "End-user application should aware of the size of logs" section: 53b0519#diff-44dc82a7e6286380ed89736215beda74R33
How do you think?
specification/performance.md
Outdated
- **Library should not block end-user application.** | ||
- **Library should not consume infinite memory resource.** | ||
|
||
Tracer should not degrade the end-user application. So that it should not block the end-user application nor consume too much memory resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please expand it to all API, not simply Tracer
. Metrics and distributed context MUST behave the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded to all API, not Tracing only.
I think dropping Logs by default is not good behavior. So that I applied "Prevent information loss" (but may block) policy to Logging. Then I added "End-user application should aware of the size of logs" section: 53b0519#diff-44dc82a7e6286380ed89736215beda74R33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest ask is to make it clear in text that this doc is applicable to entire API surface, not only to tracers
specification/performance.md
Outdated
|
||
Here are the key principles: | ||
|
||
- **Library should not block end-user application.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do not allow sync upload of telemetry as an alternative implementation? Or you assume the default implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think async should always be the case (with the exception of something like a mock component). I haven't seen any Telemetry system doing sync upload of data - the only cases would be a flush (or a close) call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is sufficiently specific. What counts as blocking work here? Obviously we shouldn't block on network or disk IO, but what about doing expensive copies, computing aggregate metrics, etc.?
Disallowing sync upload will also make it difficult to use OT for auditing, i.e. when the user would rather have the call fail than have it succeed without emitting telemetry data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do not allow sync upload of telemetry as an alternative implementation? Or you assume the default implementation here?
Disallowing sync upload will also make it difficult to use OT for auditing, i.e. when the user would rather have the call fail than have it succeed without emitting telemetry data.
I added "by default" phrase in the policy:
Library should not block end-user application by default.
I think it is okay to implement synchronous (blocking) option. So I leave a room for synchronous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What counts as blocking work here? Obviously we shouldn't block on network or disk IO, but what about doing expensive copies, computing aggregate metrics, etc.?
Sharp question.
Actually, we cannot implement a completely zero overhead library. Any tracing/metric/logging library "blocks" application in some nanoseconds or microseconds or milliseconds. And the acceptable duration of the "blocking" depends on end-user needs, application domain, runtime environment.
I think we can agree with "we shouldn't block on network or disk IO". But it is difficult to define the general rule of how many CPU usages and computation overhead is allowed.
This is why I intentionally wrote about I/O but not wrote about a computation (CPU) overhead.
If there are some needs to explicitly define it, it can be done in separate PR/issue, I feel.
specification/performance.md
Outdated
- **Library should not block end-user application.** | ||
- **Library should not consume infinite memory resource.** | ||
|
||
Tracer should not degrade the end-user application. So that it should not block the end-user application nor consume too much memory resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for Flush
. If we will decide we need a Flush
method in SDK - it must be blocking for at least a configurable period.
specification/performance.md
Outdated
- Should provide option to change threshold of the dropping | ||
- Better to provide metric that represents effective sampling ratio | ||
- **Prevent information loss**: Preserve all information but possible to consume unlimited resource | ||
- Should be supported as an alternative option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Should/Might/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it feels this is more related to the actual implementation (although I agree we should recommend something like this, but more as a guideline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Replaced "should" with "might".
|
||
## Key principles | ||
|
||
Here are the key principles: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider putting "Instrumentation cannot be a failure modality".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I agree with the policy itself: "Instrumentation cannot be a failure modality". But this file/PR is focusing on performance / blocking matter.
Could you make it as separate GitHub issue or something?
The topic relates with about error handling, recovery, retry, logging, handling information loss etc. Seems not to be so simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some grammar nits, and I think it would be useful to do some work to write all these docs in the same style.
The explanation that libraries should either drop messages before using more than some configurable amount of memory is good, and the behavior that the doc describes looks great. It is light on specifics though.
specification/performance.md
Outdated
|
||
Here are the key principles: | ||
|
||
- **Library should not block end-user application.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is sufficiently specific. What counts as blocking work here? Obviously we shouldn't block on network or disk IO, but what about doing expensive copies, computing aggregate metrics, etc.?
Disallowing sync upload will also make it difficult to use OT for auditing, i.e. when the user would rather have the call fail than have it succeed without emitting telemetry data.
- Write about Metrics & Logging to cover entire API - Write about shut down / flush operations - Leave room for blocking implementation options (should not block "as default behavior") - Grammar & syntax fix
- Not limit for tracing, metrics.
Here are the key principles: | ||
|
||
- **Library should not block end-user application by default.** | ||
- **Library should not consume unbounded memory resource.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there libraries for which this is not true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For blocking, opencensus-java blocks end-user application when a queue gets full: census-instrumentation/opencensus-java#1837 (this is why I get concerned about those matters).
An easy solution to the blocking matter is to use an unbounded queue to avoid blocking. In compensation, it consumes memory. I don't know the monitoring library which uses an unbounded queue, but I think clarifying it is meaningful.
Also, unbounded memory usage matter is related to the log volume matter described in "End-user application should aware of the size of logs" section: https://github.com/open-telemetry/opentelemetry-specification/pull/130/files#diff-44dc82a7e6286380ed89736215beda74R33 .
specification/performance.md
Outdated
- **Library should not block end-user application by default.** | ||
- **Library should not consume unbounded memory resource.** | ||
|
||
API should not degrade the end-user application. So that it should not block the end-user application nor consume too much memory resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should not degrade" is not really a feasible requirement. It creates unrealistic expectations that collecting telemetry might be free. There's always perf overhead. What implementation should do is provide levers to control that overhead at the expense of collecting less data. Note that "less data" and "dropping data due to buffer overflow" are different remediation mechanisms, with different QOS guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should not degrade" is not really a feasible requirement. It creates unrealistic expectations that collecting telemetry might be free. There's always perf overhead.
It is true that there is no zero-overhead perf library. Very similar conversation: #130 (comment)
Considering this and the above feedback, I changed the pointed sentence as follows:
Although there is inevitable overhead to achieve monitoring, API should not degrade the end-user application as possible.
Does this make sense?
What implementation should do is provide levers to control that overhead at the expense of collecting less data.
Another important point is that library design itself. For example, tracing should not block end-user applications even if the network I/O takes so much time. Such operations should be done in an asynchronous manner. Levers cannot help such design failure. This is why I wrote these principles.
Talking about levers, I believe each API provides levers (e.g. Tracing can be tuned by Sampling ). Thus I did not mention about levers in this document. Should it be mentioned in this document also?
specification/performance.md
Outdated
|
||
### Shutdown and explicit flushing could block | ||
|
||
The language library could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still should not block indefinitely. If we want to codify blocking flushing, we should require supporting user-supplied timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I added sentence The language library should support user-configurable timeout if it blocks on shut down.
Here are the key principles: | ||
|
||
- **Library should not block end-user application by default.** | ||
- **Library should not consume unbounded memory resource.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we single out memory only? CPU and latency impact are often more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that computation overhead (CPU usage, latency) is also a possible cause of unwelcome blocking as discussed in #130 (comment) .
Could you read the above comment?
If we need to deep dive into CPU and latency impact matters, it is better to create separate PR/issue, I feel.
- Mentioned about inevitable overhead - Shutdown may block, but it should support configurable timeout also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the scopes of the two core parts of this (blocking and memory usage) are, while definitely linked, separate issues.
Blocking is pretty easily addressed - i.e., with the suggestion (from the PR) that libraries should use something like asnyc I/O or background threads.
Memory usage, on the other hand, is a lot more complicated: there are many solutions, each with different trade-offs in terms of usability, ease of implementation, etc. Many of the previously used solutions - e.g., the limits in OpenCensus - are mitigations but not outright solutions. For instance, individual logs may be large enough to cause problems, even if there aren't many logs per span.
Similarly, there are a lot of big questions that are touched on here, but that we haven't answered yet. How do we want to convey dropped data? How do we ensure that disk I/O doesn't become the problem - especially if, as I think is suggested in the change here (sorry if I'm misreading!!), the idea is to log data that would otherwise be lost?
There are also some promising mitigations that haven't been explored as much yet, e.g.:
- Streaming requests
- Providing verbosity levels for logs and/or spans or metrics themselves
- More fine-tuning of request frequency
Another big piece that hasn't really been addressed yet in OpenTracing, OpenCensus, or OpenTelemetry is the testing side of things - i.e., testing that a) instrumentation doesn't add too much overhead, and that b) given certain failure modes or degraded circumstances, the instrumented application is still able to function as expected.
(I believe that CPU and latency are in a similar camp. I do agree with @yurishkuro that they should be part of the discussion, but I think the discussion of them shouldn't be "concluded" for the same reason as the memory discussion.)
(cc @jmacd - probably of interest to you!) |
Hi @SergeyKanzhelev, as discussed in the thread of #130 (comment), I expanded this document into not only Tracer but to all API. Is there any blocker still? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with some comments that this document may need to be expanded for more use cases and aspects of performance in future. It is a good starting point though
Dear Reviewers, are there any blockers to merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to merge this PR. Remaining comments are a bit beyond the original scope and can be addressed in follow-up PRs.
@songy23 @SergeyKanzhelev @reyang Thank you for approval! Does whom can merge this PR? Or can I do something to help merge this branch? |
@open-telemetry/specs-approvers any objections to merge this doc? Looks like a good starting point to discuss perf and highlights basic principles to think about when implementing SDK |
@open-telemetry/spec-approvers (just a reminder reminder) Are we ready to merge this? I think merging this helps anyone to make PR to to propose more deep guideline. |
specification/performance.md
Outdated
- Better to provide metric that represents effective sampling ratio | ||
- Language library might provide this option for Logging | ||
- **Prevent information loss**: Preserve all information but possible to consume many resources | ||
- Should be a default option for Logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why logging is different? Should be an option for every signal and probably the default for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for approval!
Why logging is different?
About metrics/tracing, most existing tracing libraries support sampling & asynchronous processing. They drop some data in a fixed (or dynamic) rate and also drops data sometimes due to an internal buffer overflow. Such instrumentations are generally expected to sample information. But not expected to degrade end user's application to capture complete data.
About logging, as far as I know, most of the existing logging libraries do not drop logs but may block application. I feel dropping logs by default is a surprising behavior for most users.
Related materials: Logging v. instrumentation, Question: Can zipkin be used to replace logging?
Related conversation in this PR: #130 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was mostly about why not make everything blocking as default. We should support non blocking for sure but I feel that blocking can be default for all the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saiya just a data point: https://github.com/uber-go/zap by default samples (throttles) logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support non blocking for sure but I feel that blocking can be default for all the implementations.
In my opinion, the blocking strategy is a little bit surprising for users who use instrumentation (tracing, metrics). But the non-blocking strategy can be an opt-in option as you mentioned. Both are fine for me.
Should we apply the blocking strategy for all components (tracing, metrics, and logging)?
If so, I will update this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key distinction to make here is dropping (sampling) vs throttling when it comes to logs:
- Dropping logs arbitrarily will definitely come as a surprise for most users and may lose valuable information.
- Throttling however can be really useful. If an application starts flooding the log for whatever reason, intelligent throttling along the lines of "log message "...." appeared X times in the last minute" does not lose information and can prevent severe performance degradations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping logs arbitrarily will definitely come as a surprise for most users and may lose valuable information.
Yes. This is what I meant and why I distinguished logs from other instruments (tracing, metrics).
Throttling however can be really useful. If an application starts flooding the log for whatever reason, intelligent throttling along the lines of "log message "...." appeared X times in the last minute" does not lose information and can prevent severe performance degradations.
Such intelligent seems to be awesome for me.
In my understanding, OpenTelemetly's specification of logging is not yet defined. Thus such technique should be discussed in logging API definition, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised why do you care that much about the default value and why do you treat metrics different than logs. In big companies (at least some that I know) metrics are more important than logs - for example monitoring/alerting is done using metrics not logs. Also we do use blocking for all the signals and we don't run into issues.
So I would suggest to treat all the signals the same. So probably go with a default blocking or not-blocking but for all the signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood prescribing a default strategy (blocking or information loss) is a difficult topic. At the beginning, I assumed most users prefer non-blocking (information loss) for metrics/tracing. But now I understood there are various needs.
So that I deleted sentences such as Should be a default option other than Logging
and Should be a default option for Logging
in 34380b3.
In this PR, I want to clarify principle (avoid degrading user application as possible) and describe desired options to deal with a tradeoff. If we need to prescribe default, it can be done in separate PR, I feel.
specification/performance.md
Outdated
|
||
Although there are inevitable overhead to achieve monitoring, API should not degrade the end-user application as possible. So that it should not block the end-user application nor consume too much memory resource. | ||
|
||
Especially, most telemetry exporters need to call API of servers to export telemetry data. API call operations should be performed in asynchronous I/O or background thread to prevent blocking end-user applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not important here, also covered by #186. So I think it is fine to just say that we should support not blocking which is the previous sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I read #186 and deleted this sentence as you suggested.
specification/performance.md
Outdated
- Better to provide metric that represents effective sampling ratio | ||
- Language library might provide this option for Logging | ||
- **Prevent information loss**: Preserve all information but possible to consume many resources | ||
- Should be a default option for Logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised why do you care that much about the default value and why do you treat metrics different than logs. In big companies (at least some that I know) metrics are more important than logs - for example monitoring/alerting is done using metrics not logs. Also we do use blocking for all the signals and we don't run into issues.
So I would suggest to treat all the signals the same. So probably go with a default blocking or not-blocking but for all the signals.
specification/performance.md
Outdated
|
||
The language library could block the end-user application when it shut down. On shutdown, it has to flush data to prevent information loss. The language library should support user-configurable timeout if it blocks on shut down. | ||
|
||
If the language library supports an explicit flush operation, it could block also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you want to also support a timeout for this operation and a negative or zero timeout means blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove duplication with open-telemetry#186 - Mention about configurable timeout of flush operation
- Not specify default strategy (blocking or information loss)
…ecification into performance
As discussed, the default option (prefer blocking or information loss) is not obvious, there are various needs. Thus I deleted sentences such as I think issues are cleared now. Are there any blockers to merge this still? |
@open-telemetry/spec-approvers (just a reminder) Are we ready to merge this? As written in previous one comment, all moot point seems to be solved. |
👍 Thank you for taking care of this PR, everyone! |
* Add Performance and Blocking specification Performance and Blocking specification is specified in a separate document and is linked from Language Library Design principles document. Implements issue: open-telemetry#94 * PR fix (open-telemetry#94). - Write about Metrics & Logging to cover entire API - Write about shut down / flush operations - Leave room for blocking implementation options (should not block "as default behavior") - Grammar & syntax fix * PR fix (open-telemetry#94). - Not limit for tracing, metrics. * PR fix (open-telemetry#94). - Mentioned about inevitable overhead - Shutdown may block, but it should support configurable timeout also * PR fix (open-telemetry#94) - s/traces/telemetry data/ - Syntax fix Co-Authored-By: Yang Song <[email protected]> * PR fix (open-telemetry#130) - Remove duplication with open-telemetry#186 - Mention about configurable timeout of flush operation * PR fix (open-telemetry#130) - Not specify default strategy (blocking or information loss)
…rics 1.0 GA) (open-telemetry#130) * Define what's in scope for OpenTelemetry Logs 1.0 GA Clearly defined scope is important to align all logs contributors and to make sure we know what target we are working towards. Note that some of the listed items are already fully or partially implemented but are still listed for completeness. * Address PR comments Co-authored-by: Bogdan Drutu <[email protected]>
…rics 1.0 GA) (open-telemetry#130) * Define what's in scope for OpenTelemetry Logs 1.0 GA Clearly defined scope is important to align all logs contributors and to make sure we know what target we are working towards. Note that some of the listed items are already fully or partially implemented but are still listed for completeness. * Address PR comments Co-authored-by: Bogdan Drutu <[email protected]>
* Add Performance and Blocking specification Performance and Blocking specification is specified in a separate document and is linked from Language Library Design principles document. Implements issue: open-telemetry#94 * PR fix (open-telemetry#94). - Write about Metrics & Logging to cover entire API - Write about shut down / flush operations - Leave room for blocking implementation options (should not block "as default behavior") - Grammar & syntax fix * PR fix (open-telemetry#94). - Not limit for tracing, metrics. * PR fix (open-telemetry#94). - Mentioned about inevitable overhead - Shutdown may block, but it should support configurable timeout also * PR fix (open-telemetry#94) - s/traces/telemetry data/ - Syntax fix Co-Authored-By: Yang Song <[email protected]> * PR fix (open-telemetry#130) - Remove duplication with open-telemetry#186 - Mention about configurable timeout of flush operation * PR fix (open-telemetry#130) - Not specify default strategy (blocking or information loss)
Performance and Blocking specification is specified in a separate document and
is linked from Language Library Design principles document.
I wrote this to document minimal principles as a starting point. If we need a detailed guidelines, it can be added in separate PR, I think. But also welcome for any recommendations :-)
Implements issue: #94
Related issues in OpenCensus: census-instrumentation/opencensus-java#1837, census-instrumentation/opencensus-specs#262