Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix offline condition when one call fails #198

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sammiq
Copy link

@sammiq sammiq commented Dec 31, 2024

Proposed change

This change redoes the API request logic to not get knocked offline for a bad response code, changing it to only go offline if an API causes an exception (like timeout or a request error, etc). Resolves #191.

Type of change

  • Bugfix
  • New feature
  • Code quality improvements to existing code or addition of tests
  • Documentation

Additional information

I have been having the same issue as #191 so decided to rework the logic slightly, also fixed the lock handling to be a little cleaner.

Checklist

  • The code change is tested and works locally.
  • The code has been formatted using Black.
  • Tests have been added to verify that the new code works.
  • Documentation added/updated if required.

@tomaae
Copy link
Owner

tomaae commented Dec 31, 2024

Just had a quick look, but this is breaking some things.
You have removed no_response error completely, that will obviously mess up setups when there is no access to said host.
Also error code 500 bug workaround has been removed. This is not yet fixed by ix systems.

But I dont see what this should fix, its just rewritten in bit different way. Offline condition when call fails is intended of course, that is how integration is supposed to act.

@sammiq
Copy link
Author

sammiq commented Dec 31, 2024

I'm not sure of how this is true, I've made it so that any exception will set self._connected = False which will take things offline, which occurs when you don't get any response from the host or some other unexpected behavior occurs. The error 500 will be handled the same way as any non-200 response will be handled, returning None and leaving things online, this mimics exactly your explicit handling of 500 errors, and just expands it to other errors as well, which stops the entire integration from going offline when you get a 404 or other error code, and will still set the error externally to 500 in those cases allowing the other code to handle those situations as they already do.

To be clear, the existing code will take everything not a 200 or 500 offline (exception or not), whereas this code will only take the integration offline when an exception is thrown, everything else just returns None and sets the error accordingly. This stops the integration from not working when the charts/release data is not available, but will still set everything offline if the truenas server is offline.

EDIT: I see that you are mapping the errors to translations so I may have broken that by using the exceptions directly, but I'm not sure that this needs translating TBH as the exceptions give you more information as to exactly what went wrong, but i'll push an update that mimics that behavior.

@tomaae
Copy link
Owner

tomaae commented Jan 1, 2025

You are setting no_response every time there on exception without trying to get status code. I had it written in such way because exception there arises for other reasons then no response error.
Also, written this way, error handling from codes has been removed, thats why it does not go offline on "charts/release" failure. So that means that when TrueNAS API does down fully/partially or just goes crazy, possibly even authentication error, it will still act as if everything was ok. It is necessery to handle both no_response and error code type errors.

@sammiq
Copy link
Author

sammiq commented Jan 2, 2025

You are setting no_response every time there on exception without trying to get status code. I had it written in such way because exception there arises for other reasons then no response error.

The way i read your code, the only exceptions thrown in that block would be from the request (in which case there would no response) or from the json parsing (which would but the status code would be 200). The only case I can see here that this code now differs is if there is a parse exception on the json.

written this way, error handling from codes has been removed

That's not quite accurate, the modified code still returns None if there are status codes other than 200 when there is no exception, and acts exactly like it did before for your handled cases other than not going offline on purpose. The point of the PR is to stop it from going offline when another status code is returned, so that one failed API call does not mean that everything is failed. I don't see this as acting as everything ok, just that the server is up but not returning data, which is what this method still returns and the outer dict is cleared, which should lead to no data available which seems correct.

If there is some other opaque factor external then I'm not sure I understand it, so feel free to reject this PR as in that case, I feel that in that case the needed changes to handle partial API outages in a way to address your concerns are larger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] TrueNAS disconnected due to single field not returned
2 participants