Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove defusedxml in favor of lxml #9840

Merged
merged 2 commits into from
Jul 29, 2024

Conversation

manuel-sommer
Copy link
Contributor

I would recommend to either go with lxml or defusedxml, but don't use a mixture of both.
https://discuss.python.org/t/status-of-defusedxml-and-recommendation-in-docs/34762/19

I chose lxml as defusedxml got the last update in March 2021

@manuel-sommer
Copy link
Contributor Author

@Maffooch this is a followup of #9812

Copy link

dryrunsecurity bot commented Mar 28, 2024

Hi there 👋, @DryRunSecurity here, below is a summary of our analysis and findings.

DryRun Security Status Findings
Server-Side Request Forgery Analyzer 0 findings
Configured Codepaths Analyzer 0 findings
IDOR Analyzer 0 findings
Sensitive Files Analyzer 0 findings
SQL Injection Analyzer 0 findings
Authn/Authz Analyzer 0 findings
Secrets Analyzer 0 findings

Note

🟢 Risk threshold not exceeded.

Change Summary (click to expand)

The following is a summary of changes in this pull request made by me, your security buddy 🤖. Note that this summary is auto-generated and not meant to be a definitive list of security issues but rather a helpful summary from a security perspective.

Summary:

The code changes in this pull request focus on improving the security and robustness of the various XML parsers used in the DefectDojo application security management tool. The key changes include:

  1. Replacing the defusedxml library with the lxml library for XML parsing, which provides better security and performance.
  2. Configuring the XML parsers to disable the resolution of external XML entities, effectively mitigating the risk of XML External Entity (XXE) vulnerabilities.
  3. Enhancing the parsing of vulnerability details, such as mitigation information, severity levels, and deduplication of findings.
  4. Improving the handling of various report formats, including Nessus, Veracode, Nikto, and others, to ensure accurate and comprehensive security data is imported into the DefectDojo application.

These changes demonstrate a security-focused approach to the application's XML parsing functionality, addressing potential vulnerabilities and improving the overall quality and reliability of the security data processed by the tool.

Files Changed:

  • dojo/tools/appspider/parser.py: The changes replace the defusedxml library with lxml and disable XML entity resolution to mitigate XXE vulnerabilities.
  • dojo/tools/burp_dastardly/parser.py: Similar changes to the Burp Dastardly parser, addressing potential XML-related vulnerabilities.
  • dojo/tools/burp/parser.py: The Burp Suite XML parser is updated with the same security-focused changes.
  • dojo/tools/acunetix/parse_acunetix_xml.py: The Acunetix XML parser is updated to use lxml and disable entity resolution.
  • dojo/tools/dependency_check/parser.py: The Dependency Check parser is improved, including better handling of identifiers and suppressed vulnerabilities.
  • dojo/tools/cyclonedx/xml_parser.py: The CycloneDX XML parser is updated with the lxml library and entity resolution disabled.
  • dojo/tools/crashtest_security/parser.py: The Crashtest Security parser is updated with the lxml library and entity resolution disabled.
  • And similar changes are made to the parsers for Fortify, Checkmarx, Nikto, Qualys, Nexpose, OpenSCAP, Nmap, OpenVAS, Outpost24, Qualys, SpotBugs, SSLyze, sslscan, Tenable, VCG, Wapiti, Xanitizer, and Veracode.

Powered by DryRun Security

@manuel-sommer
Copy link
Contributor Author

FYI: I can do a followup PR to fix the added TRY200 regarding ruff linter, but I did prioritize this PR on migrating to lxml, but not on fixing all linter failures of previous implemented parsers.

@cneill
Copy link
Contributor

cneill commented Mar 28, 2024

I believe we originally started using defusedxml to avoid potential security issues parsing untrusted XML with lxml. bandit now reports many of these calls as vulnerable:

...
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/zap/parser.py:28:15
27	    def get_findings(self, file, test):
28	        tree = etree.parse(file)
29	        items = list()

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./unittests/tools/test_vcg_parser.py:48:18
47	        single_finding = open("unittests/scans/vcg/one_finding.xml")
48	        vcgscan = etree.parse(single_finding)
49	        finding = self.parser.parse_issue(vcgscan.findall("CodeIssue")[0], Test())
...

The defusedxml README describes these various vulnerabilities in more detail and provides some suggestions for safely using lxml. Additionally, lxml has some suggestions on their site.

tl;dr: I think we should either do the legwork to make lxml safe, or convert the few cases where lxml is already used to defusedxml calls:

Test results:
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/api_sonarqube/importer.py:385:18
384	        parser = etree.HTMLParser()
385	        details = etree.fromstring(vuln_details, parser)
386	

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/burp_enterprise/parser.py:23:15
22	        parser = etree.HTMLParser()
23	        tree = etree.parse(filename, parser)
24	        if tree:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:37:19
36	            parser = etree.HTMLParser()
37	            tree = etree.parse(filename, parser)
38	            if self.mode not in [None, "detailed"]:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:69:38
68	                parser = etree.HTMLParser()
69	                html_desc_as_e_tree = etree.fromstring(issue_detail["htmlDesc"], parser)
70	                issue_description = self.get_description(html_desc_as_e_tree)

--------------------------------------------------

@manuel-sommer manuel-sommer marked this pull request as draft March 28, 2024 17:14
Copy link
Contributor

github-actions bot commented Apr 6, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@manuel-sommer manuel-sommer marked this pull request as ready for review April 20, 2024 19:20
@manuel-sommer
Copy link
Contributor Author

@cneill, I updated how lxml is used (with resolve_entities=False)

@manuel-sommer
Copy link
Contributor Author

Could you also take a look @Maffooch if lxml is fine to be used like this?

@manuel-sommer
Copy link
Contributor Author

reopening to retrigger failed tests.

Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@manuel-sommer
Copy link
Contributor Author

Friendly reminder @Maffooch

Copy link
Contributor

github-actions bot commented Jul 3, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Maffooch
Copy link
Contributor

@manuel-sommer I apologize for the delay on this one, and I thank you very much for your patience.

After much deliberation, we believe it would be the best decision to remain with defusedXML. These are the high level reasons that contributed to that decision:

  • This library only warps the stdlib and does not have any dependencies. There really is not a good reason to rev versions unless it is absolutely needed
    • This is largely in response to the initial purpose of moving away from defusedXML
  • Python docs suggest using defusedXML in place of the stdlib
  • A few security tools will raise issues when using the stdlib, such as bandit
    • This may restrict some very compliancy strict organizations from using DefectDojo, and therefore slow adoption
  • DefectDojo is viewed as a pillar in security, and by deviating away from the recommended security best practices, it could raise some eyebrows for a few folks

I agree with your intention of consolidating to a single XML parsing library, and we believe that defusedXML would be in the best interest of the project

@mtesauro
Copy link
Contributor

@manuel-sommer Thanks for all your hard work on this - once we did the research, it proved that defusedXML was the best path forward. We're going to update the "Writing a parser" to include instructions to use this library going forward. We'd probably not have done the research if you'd not done this PR so even though it didn't get merged, it DID help. 👍

@manuel-sommer manuel-sommer reopened this Jul 29, 2024
@manuel-sommer
Copy link
Contributor Author

@mtesauro and @Maffooch I added the docs here.

Copy link

dryrunsecurity bot commented Jul 29, 2024

DryRun Security Summary

The pull request focuses on improving the security and robustness of the parser implementation in the DefectDojo project by emphasizing the use of secure libraries, recommending best practices for handling data, and emphasizing the importance of comprehensive unit testing.

Expand for full summary

Summary:

The code changes in this pull request are focused on improving the security and robustness of the parser implementation in the DefectDojo project. The key changes include:

  1. Emphasizing the use of secure and well-maintained libraries for parsing various file formats, such as using defusedXML instead of lxml for parsing XML data.
  2. Recommending that parsers should not set attributes if the data is not available, rather than filling them with placeholder values, to maintain the integrity of the processed data.
  3. Suggesting the addition of checks to avoid potential KeyError exceptions when accessing fields that may not always be present in the input data, to ensure that the parser can gracefully handle edge cases and unexpected data.
  4. Recommending the use of pre-defined deduplication algorithms, such as the "legacy" algorithm, instead of implementing custom deduplication logic, to maintain consistency and reliability across the project.
  5. Emphasizing the importance of having comprehensive unit tests for each parser, covering common cases as well as checking the attributes of the findings, to ensure the robustness and correctness of the parser implementation.

From an application security perspective, these changes are positive as they help to improve the overall security and reliability of the DefectDojo project by reducing the risk of vulnerabilities and improving the overall quality of the parser implementation.

Files Changed:

  • docs/content/en/contributing/how-to-write-a-parser.md: This file contains the documentation for writing parsers in the DefectDojo project. The changes focus on providing guidance and recommendations for improving the security and robustness of the parser implementation, as outlined in the summary.

Code Analysis

We ran 9 analyzers against 1 file and 0 analyzers had findings. 9 analyzers had no findings.

Riskiness

🟢 Risk threshold not exceeded.

View PR in the DryRun Dashboard.

Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

Copy link

@Maffooch Maffooch merged commit 19bab59 into DefectDojo:dev Jul 29, 2024
126 checks passed
@manuel-sommer manuel-sommer deleted the rm_defusedxml branch July 29, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants