-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC: Distributed error reporting #12489
base: develop
Are you sure you want to change the base?
Conversation
Distributed error reporting: Setting up the database that stores all the errors
…ask2 Distributed error reporting: Create model to store captured Errors
…ask3 Distributed error reporting: Middleware to catch runtime exception in backend
…ask4 Distributed error reporting: Endpoint /api/errorreports/report to store frontend error
…d, remove mark_as_reported
…ask7 Distributed error reporting: Capture more information about the error and its environment
reruns migrations
remove installation_type and release_version\n move request_time_to_error to context\n remove sensitive info from the requests info\n only use traceback and error_msg to fingerprint an error_report
DER: Move request_time_to_error to context and remove sensitive info from the requests info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main blockers: we should add more logic to remove parameters like passwords from request data, and we should have request timeouts configured on the report requests
}, | ||
}; | ||
|
||
Resource.client = jest.fn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, whenever you mock something in Python or JS, you want to ensure that the original implementation can be restored after the test is completed. That approach can keep tests from interfering with other tests, because of their use of mocks.
Since this is a direct replace of Resource.client
, there isn't a way for it to be restored. So it would be better to use mock.spyOn
or mock.replaceProperty
here, and do so in the beforeEach
. Then instead of clearAllMocks
in afterEach
(which only clears the mock state), I would suggest using restoreAllMocks
as that would ensure any mocks are restored to what they should be (assuming the appropriate approach was used to create the mock in the first place).
height: window.screen.height, | ||
available_width: window.screen.availWidth, | ||
available_height: window.screen.availHeight, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was discussion about using the screen size breakpoints instead of the actual width and height. Is that the case, because it doesn't look like it? The reason is that it protects privacy. Specific sizes can be used to identify users, which reduces the anonymity of the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should definitely make this update. Although I am also noticing that this file shouldn't exist, because it has been moved into the plugin to make this behaviour pluggable.
request_headers.pop("Cookie", None) | ||
|
||
request_get = dict(request.GET) | ||
request_get.pop("token", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, probably for POST, we should ensure passwords are not sent?
kolibri/core/errorreports/models.py
Outdated
error_report.context = context | ||
|
||
error_report.save() | ||
logger.error("ErrorReports: Database updated.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels more like info-type logging?
ping_once(started, server=server) | ||
pingback_id = ping_once(started, server=server) | ||
if pingback_id: | ||
ping_error_reports.enqueue(args=(server, pingback_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is creating two different pathways that hinges on the pingback_id
. In utils.py
, there already exists logic dependent on if "id" in data:
, which is the same condition here. It seems like this fits alongside the existing logic there.
Vue.config.errorHandler = function (err, vm) { | ||
logging.error(`Unexpected Error: ${err}`); | ||
const error = new VueErrorReport(err, vm); | ||
ErrorReportResource.report(error); | ||
}; | ||
|
||
window.addEventListener('error', e => { | ||
logging.error(`Unexpected Error: ${e.error}`); | ||
const error = new JavascriptErrorReport(e); | ||
ErrorReportResource.report(error); | ||
}); | ||
|
||
window.addEventListener('unhandledrejection', event => { | ||
event.preventDefault(); | ||
logging.error(`Unhandled Rejection: ${event.reason}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the unhandledrejection
listener will prevent default logging of the error, so in regards to that and the other logging statements, I'm concerned whether these are suppressing necessary log information, i.e. a stack trace, that developers would need? If logging.error
outputs a stack trace, that may not be the same trace as the error itself.
kolibri/core/errorreports/tasks.py
Outdated
join_url(server, "/api/v1/errors/report/"), | ||
data=errors_json, | ||
headers={"Content-Type": "application/json"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is using raw python requests, lets ensure this has explicit timeouts configured, and ideally separate timeouts for connection vs request.
Make error reports into a plugin
Summary
This pr implements the distributed error reporting feature as part of the Google Summer of Code(GSoC) 2024 program. See the epic for details.
References
Closes #12214
Reviewer guidance
All tests associated with the implementation must run successfully
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)