Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Empty and Invalid Data in Inventory #411

Open
Tracked by #241
cborla opened this issue Dec 10, 2024 · 10 comments · May be fixed by #446
Open
Tracked by #241

Normalize Empty and Invalid Data in Inventory #411

cborla opened this issue Dec 10, 2024 · 10 comments · May be fixed by #446
Assignees
Labels
level/task Task issue module/agent module/inventory Inventory module mvp Minimum Viable Product refinement type/enhancement Enhancement issue

Comments

@cborla
Copy link
Member

cborla commented Dec 10, 2024

Description

The current implementation of the inventory system has inconsistencies in how empty data and invalid types are handled. This issue proposes improvements to standardize data representation and validation, ensuring cleaner and more reliable outputs. The system should report null for fields that it could not collect, rather than inventing values or using ambiguous placeholders.

Tasks

  1. Report Empty Data as Null:
    • Fields that the system could not collect should be represented as null in the output JSON.
    • Avoid interpreting a space (" ") or any other placeholder as empty data. The DataProvider must not fabricate values and should return null for missing fields.
  2. Validate Data Types:
    • Ensure that all data provided by DataProvider strictly adheres to the defined types (e.g., array, date, number, string, etc.).
    • Develop unit tests to verify that each field is either null or conforms to its expected format.
  3. Define Inventory Specification:
    • Create a detailed specification for all fields related to Inventory that the agent can report, covering both stateful and stateless messages.
    • Include the data type of each field and indicate whether it is mandatory.
    • The mandatory status of a field should be based on whether the agent would encounter an error if it fails to populate the field.

Acceptance Criteria

  • Fields that the system could not collect are represented as null.
  • The DataProvider does not generate placeholder values for missing data.
  • All fields strictly adhere to their defined data types.
  • Comprehensive unit tests ensure proper data validation.
  • A clear and complete inventory specification is available, detailing field types and mandatory statuses.

References

This is a continuation of issue #397.

@cborla cborla added level/task Task issue type/bug Bug issue module/agent module/inventory Inventory module labels Dec 10, 2024
@wazuhci wazuhci moved this to Backlog in Release 5.0.0 Dec 10, 2024
@cborla cborla mentioned this issue Dec 10, 2024
53 tasks
@vikman90 vikman90 added the mvp Minimum Viable Product refinement label Dec 11, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 5.0.0 Dec 16, 2024
@cborla
Copy link
Member Author

cborla commented Dec 16, 2024

@nbertoldo
Copy link
Member

Update 2024/12/17

  • UNKNOWN_VALUE of the data provider was replaced from " " to nullptr and several functions and UTs had to be refactored.
  • By making this change many variables of type std::string had to be replaced by std::optional<std::string> so that in case a value could not be obtained, the variable would take the value std::nullopt, since it cannot be assigned a std::nullptr_t.

@vikman90 vikman90 added type/enhancement Enhancement issue and removed type/bug Bug issue labels Dec 18, 2024
@nbertoldo nbertoldo linked a pull request Dec 19, 2024 that will close this issue
3 tasks
@nbertoldo nbertoldo linked a pull request Dec 19, 2024 that will close this issue
3 tasks
@nbertoldo
Copy link
Member

nbertoldo commented Dec 19, 2024

Update 2024/12/18

For the case of functions that return integer values, for example, int getCpuCores() if replaced by std::optional<int> getCpuCores() an error is obtained when assigning it to the JSON object:

hardware[“cpu_cores”] = getCpuCores().value_or(UNKNOWN_VALUE);

Error:

/home/vagrant/wazuh-agent/src/common/data_provider/src/sysInfoLinux.cpp:276:51:   required from here
/usr/include/c++/13/optional:1042:25: error: static assertion failed
 1042 |           static_assert(is_convertible_v<_Up&&, _Tp>);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/13/optional:1042:25: note: 'std::is_convertible_v<std::nullptr_t&, int>' evaluates to false
/usr/include/c++/13/optional:1047:20: error: invalid 'static_cast' from type 'std::nullptr_t' to type 'int'
 1047 |             return static_cast<_Tp>(std::forward<_Up>(__u));

This is because UNKNOWN_VALUE is not compatible with the int type contained in std::optional. This generates a type error because std::optional::value_or requires the provided value to be convertible to the type contained in std::optional.

Possible solutions:

  1. Handle value manually:
if (auto cpuCores = getCpuCores())
{
    hardware["cpu_cores"] = *cpuCores;
}
else
{
    hardware["cpu_cores"] = UNKNOWN_VALUE;
}
  1. Use another UNKNOWN_VALUE value convertible to int as 0 or -1.
  2. The data provider function directly adds the appropriate value to the JSON object.

@nbertoldo
Copy link
Member

nbertoldo commented Dec 20, 2024

Update 2024/12/19

  • Data provider functions that obtain information from Linux systems are normalized.

@nbertoldo
Copy link
Member

Update 2024/12/20

  • Fixed exception raised at hash id calculation:
[2024-12-20 19:00:39.926] [wazuh-agent] [error] [ERROR] [inventory.cpp:136] [LogErrorInventory] {"data":[{"adapter":"","address":"172.18.0.1","broadcast":"172.18.255.255","dhcp":"unknown","gateway":" ","iface":"br-6e3f5b470628","iface_type":"ethernet","mac":"02:42:d6:31:50:c1","metric":"0","mtu":1500,"netmask":"255.255.0.0","network_item_id":"8d00c355ec9b2857892546bfa4bbfafa1779e4ce","proto_type":"ipv4","rx_bytes":0,"rx_dropped":0,"rx_errors":0,"rx_packets":0,"state":"down","tx_bytes":0,"tx_dropped":0,"tx_errors":0,"tx_packets":0},{"adapter":"","address":"172.17.0.1","broadcast":"172.17.255.255","dhcp":"unknown","gateway":" ","iface":"docker0","iface_type":"ethernet","mac":"02:42:ff:dd:16:46","metric":"0","mtu":1500,"netmask":"255.255.0.0","network_item_id":"fae3b9f3354ce8946db29fbf6c1811172ec77ea9","proto_type":"ipv4","rx_bytes":0,"rx_dropped":0,"rx_errors":0,"rx_packets":0,"state":"down","tx_bytes":0,"tx_dropped":0,"tx_errors":0,"tx_packets":0},{"adapter":"","address":"10.0.2.15","broadcast":"10.0.2.255","dhcp":"unknown","gateway":"10.0.2.2","iface":"eth0","iface_type":"ethernet","mac":"08:00:27:64:e1:ff","metric":"100","mtu":1500,"netmask":"255.255.255.0","network_item_id":"147678404f9105937ae41106c00c244f7a12a24b","proto_type":"ipv4","rx_bytes":104876780,"rx_dropped":0,"rx_errors":0,"rx_packets":102958,"state":"up","tx_bytes":15505902,"tx_dropped":0,"tx_errors":0,"tx_packets":36647},{"adapter":"","address":"fe80::a00:27ff:fe64:e1ff","broadcast":"","dhcp":"unknown","gateway":"10.0.2.2","iface":"eth0","iface_type":"ethernet","mac":"08:00:27:64:e1:ff","metric":"","mtu":1500,"netmask":"ffff:ffff:ffff:ffff::","network_item_id":"3522ff7dca7dafb2911a7d9036ce4f5169827d63","proto_type":"ipv6","rx_bytes":104876780,"rx_dropped":0,"rx_errors":0,"rx_packets":102958,"state":"up","tx_bytes":15505902,"tx_dropped":0,"tx_errors":0,"tx_packets":36647},{"adapter":"","address":"192.168.56.132","broadcast":"192.168.56.255","dhcp":"unknown","gateway":" ","iface":"eth1","iface_type":"ethernet","mac":"08:00:27:cb:72:00","metric":"0","mtu":1500,"netmask":"255.255.255.0","network_item_id":"2773cc1128fc9f3528bffa310529c68119fb5b45","proto_type":"ipv4","rx_bytes":19589372,"rx_dropped":0,"rx_errors":0,"rx_packets":108523,"state":"up","tx_bytes":50445560,"tx_dropped":0,"tx_errors":0,"tx_packets":94644},{"adapter":"","address":"fe80::a00:27ff:fecb:7200","broadcast":"","dhcp":"unknown","gateway":" ","iface":"eth1","iface_type":"ethernet","mac":"08:00:27:cb:72:00","metric":"","mtu":1500,"netmask":"ffff:ffff:ffff:ffff::","network_item_id":"b4a2ceac790ada8223e380cef68173b0c685acf9","proto_type":"ipv6","rx_bytes":19589372,"rx_dropped":0,"rx_errors":0,"rx_packets":108523,"state":"up","tx_bytes":50445560,"tx_dropped":0,"tx_errors":0,"tx_packets":94644}],"exception":"[json.exception.type_error.302] type must be string, but is null","table":"networks"}
  • Updated unit tests
  • Review of the inventory tables and the related fields in the ECS: Inventory tables

@nbertoldo
Copy link
Member

Update 2024/12/27

  • Refactor data provider Linux functions making it easier to complete null fields.

@nbertoldo
Copy link
Member

Update 2024/12/30

  • Refactor data provider Windows functions.
  • Testing event generation in Windows agent.

@nbertoldo
Copy link
Member

Update 2024/12/31

  • Fix Windows packages and ports scan.
  • Testing event generation in Windows agent.

@nbertoldo
Copy link
Member

Update 2025/01/02

  • Refactor data provider macOS functions.

@nbertoldo
Copy link
Member

Update 2025/01/03

  • Refactor data provider macOS functions.
  • Update unit tests.
  • Testing event generation in macOS agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue module/agent module/inventory Inventory module mvp Minimum Viable Product refinement type/enhancement Enhancement issue
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants