Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling in network functions #157

Open
4 tasks done
TomasTurina opened this issue Sep 13, 2024 · 1 comment · Fixed by #472 · May be fixed by #474
Open
4 tasks done

Improve error handling in network functions #157

TomasTurina opened this issue Sep 13, 2024 · 1 comment · Fixed by #472 · May be fixed by #474
Assignees
Labels

Comments

@TomasTurina
Copy link
Member

TomasTurina commented Sep 13, 2024

Description

As part of the development of the communicator component of the new agent, we chose to use boost asio/beast as our networking library, which also works with C++20 coroutines.

The first implementation is missing to check for many network errors that can occur during communication. Currently, any failure is not reported correctly and no recovery attempt is made, so we need to improve error handling.

Tasks to be done:

  • Ensure that all functions that use the Boost networking library are within the HTTPClient class.
  • Identify functions within the HTTPClient class that do not have error handling.
  • Apply changes to improve error handling.
  • Ensure that components that use the HTTPClient class (i.e. Communicator) are properly handling network errors/exceptions.

Errors reported during testing

When performing manual tests for other developments we have already encountered some cases that we have to solve:

Handle jwt exceptions

In the following line

if (const auto decoded = jwt::decode<jwt::traits::nlohmann_json>(*m_token); decoded.has_payload_claim("exp"))

a case occurred where a token was returned with an invalid jwt format and the co-routine that handled this task crashed, causing no further attempt to obtain the token and no errors were logged.

Chrash when getting reference from null object

In the following line

return nlohmann::json::parse(boost::beast::buffers_to_string(res.body().data())).at("token").get_ref<const std::string&>();

if the json object has a different format than expected causing “token” to not be where you are looking for it, it causes the crash when trying to do the get_ref() since the at() above would fail.

Example of response that produces the crash:

{
"no-token": "value"
}

End of co-routines in case of write/read failure

In Co_PerformHttpRequest when a failure occurs during AsyncWrite/AsyncRead the co_routine is being terminated completely, the correct operation of this should be:

  • close the socket
  • do a wait
  • try again later

but do not stop the co-routine.

Timeouts in operations with sockets

Timeouts must be added to all socket operations that can block the process/co_routine indefinitely. For example Connect, Write and Read.

@aritosteles
Copy link
Contributor

aritosteles commented Nov 29, 2024

Update

  • 2024/11/28:
    Reviewing the code. Looks like some of the errors are already fixed. Investigating how to deal with boost::asio timeouts.
  • 2024/12/05:
    Working on adding timeout to both async_connect and connect calls.
  • 2024/12/06:
    Implemented timeout on synchoronous connect call. Looking to modify server mock to force timeouts and run tests.
  • 2024/12/09:
    Further investigation and tests over http synchronous connect call. No satisfactory results yielded, will focus on asynchronous connect moving forward.
  • 2024/12/10:
    Trying different approaches to setting a timeout. So far I've failed to devise a way to reliably test my prototype.
  • 2024/12/11:
    Successfully implemented and tested a timeout for HttpSocket::Connect() function. Moving on to implement the same in https, read and write.
  • 2024/12/13:
    Working on AsyncConnect() implementation.
  • 2024/12/16:
    Completed AsyncConnect(), Write() and AsyncWrite() implementation and testing for both Http and Https Socket classes. Working on Read() and AsyncRead().
  • 2024/12/17:
    Finished implementation and fixed the existing tests for HttpClient that were broken. Working on tests specific to HttpSocket and HttpsSocket timeout feature.
  • 2024/12/17:
    Continued working on timeout tests.
  • 2024/12/18:
    It was decided to get rid of the timeouts in synchronous operations.
  • 2024/12/20:
    Found and fixed warning in gcc 13.x
    Made Communicator::GetGroupConfigurationFromManager() into a coroutine
    Removed unused functions PerformHttpRequestDownload() and ReadToFile()
  • 2024/12/23:
    Fixed failing tests. Ready for review.
  • 2024/12/24:
    Addressed some comments in the review. Found some errors in agent, we are trying to determine if they are related to this PR.
  • 2024/12/26:
    Tracked down the source a "Operation cancelled" warning log that looked suspicious. It was caused by the server taking too long to communicate with the indexer and the agent's timeout setting to be too short. Extended the agent's timeout to 60 seconds and made the log to be of 'debug' type. PR is ready for review.
    -2024/12/31:
    Found asio::beast::tcp_stream::expires_after() which should provide the functionality we are looking for. Closed current PR and will start a new one. Merged a few changes that will be useful but will re-do the main functionality.
    -2025/01/02:
    Started working on new implementation using asio::beast::tcp_stream::expires_after(). Synchronous operations of HttpSocket seem to be working properly with a timeout.

@aritosteles aritosteles linked a pull request Dec 12, 2024 that will close this issue
1 task
@wazuhci wazuhci moved this from In progress to Pending review in Release 5.0.0 Dec 23, 2024
@wazuhci wazuhci moved this from Pending review to In progress in Release 5.0.0 Dec 31, 2024
@TomasTurina TomasTurina linked a pull request Dec 31, 2024 that will close this issue
3 tasks
@TomasTurina TomasTurina reopened this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment