remove defusedxml in favor of lxml #9840

manuel-sommer · 2024-03-28T08:17:39Z

I would recommend to either go with lxml or defusedxml, but don't use a mixture of both.
https://discuss.python.org/t/status-of-defusedxml-and-recommendation-in-docs/34762/19

I chose lxml as defusedxml got the last update in March 2021

manuel-sommer · 2024-03-28T08:17:56Z

@Maffooch this is a followup of #9812

dryrunsecurity · 2024-03-28T08:19:35Z

Hi there 👋, @DryRunSecurity here, below is a summary of our analysis and findings.

DryRun Security	Status	Findings
Server-Side Request Forgery Analyzer	✅	0 findings
Configured Codepaths Analyzer	✅	0 findings
IDOR Analyzer	✅	0 findings
Sensitive Files Analyzer	✅	0 findings
SQL Injection Analyzer	✅	0 findings
Authn/Authz Analyzer	✅	0 findings
Secrets Analyzer	✅	0 findings

Note

🟢 Risk threshold not exceeded.

Change Summary (click to expand)

The following is a summary of changes in this pull request made by me, your security buddy 🤖. Note that this summary is auto-generated and not meant to be a definitive list of security issues but rather a helpful summary from a security perspective.

Summary:

The code changes in this pull request focus on improving the security and robustness of the various XML parsers used in the DefectDojo application security management tool. The key changes include:

Replacing the defusedxml library with the lxml library for XML parsing, which provides better security and performance.
Configuring the XML parsers to disable the resolution of external XML entities, effectively mitigating the risk of XML External Entity (XXE) vulnerabilities.
Enhancing the parsing of vulnerability details, such as mitigation information, severity levels, and deduplication of findings.
Improving the handling of various report formats, including Nessus, Veracode, Nikto, and others, to ensure accurate and comprehensive security data is imported into the DefectDojo application.

These changes demonstrate a security-focused approach to the application's XML parsing functionality, addressing potential vulnerabilities and improving the overall quality and reliability of the security data processed by the tool.

Files Changed:

dojo/tools/appspider/parser.py: The changes replace the defusedxml library with lxml and disable XML entity resolution to mitigate XXE vulnerabilities.
dojo/tools/burp_dastardly/parser.py: Similar changes to the Burp Dastardly parser, addressing potential XML-related vulnerabilities.
dojo/tools/burp/parser.py: The Burp Suite XML parser is updated with the same security-focused changes.
dojo/tools/acunetix/parse_acunetix_xml.py: The Acunetix XML parser is updated to use lxml and disable entity resolution.
dojo/tools/dependency_check/parser.py: The Dependency Check parser is improved, including better handling of identifiers and suppressed vulnerabilities.
dojo/tools/cyclonedx/xml_parser.py: The CycloneDX XML parser is updated with the lxml library and entity resolution disabled.
dojo/tools/crashtest_security/parser.py: The Crashtest Security parser is updated with the lxml library and entity resolution disabled.
And similar changes are made to the parsers for Fortify, Checkmarx, Nikto, Qualys, Nexpose, OpenSCAP, Nmap, OpenVAS, Outpost24, Qualys, SpotBugs, SSLyze, sslscan, Tenable, VCG, Wapiti, Xanitizer, and Veracode.

Powered by DryRun Security

manuel-sommer · 2024-03-28T08:21:23Z

FYI: I can do a followup PR to fix the added TRY200 regarding ruff linter, but I did prioritize this PR on migrating to lxml, but not on fixing all linter failures of previous implemented parsers.

cneill · 2024-03-28T17:03:18Z

I believe we originally started using defusedxml to avoid potential security issues parsing untrusted XML with lxml. bandit now reports many of these calls as vulnerable:

...
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/zap/parser.py:28:15
27	    def get_findings(self, file, test):
28	        tree = etree.parse(file)
29	        items = list()

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./unittests/tools/test_vcg_parser.py:48:18
47	        single_finding = open("unittests/scans/vcg/one_finding.xml")
48	        vcgscan = etree.parse(single_finding)
49	        finding = self.parser.parse_issue(vcgscan.findall("CodeIssue")[0], Test())
...

The defusedxml README describes these various vulnerabilities in more detail and provides some suggestions for safely using lxml. Additionally, lxml has some suggestions on their site.

tl;dr: I think we should either do the legwork to make lxml safe, or convert the few cases where lxml is already used to defusedxml calls:

Test results:
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/api_sonarqube/importer.py:385:18
384	        parser = etree.HTMLParser()
385	        details = etree.fromstring(vuln_details, parser)
386	

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/burp_enterprise/parser.py:23:15
22	        parser = etree.HTMLParser()
23	        tree = etree.parse(filename, parser)
24	        if tree:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:37:19
36	            parser = etree.HTMLParser()
37	            tree = etree.parse(filename, parser)
38	            if self.mode not in [None, "detailed"]:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:69:38
68	                parser = etree.HTMLParser()
69	                html_desc_as_e_tree = etree.fromstring(issue_detail["htmlDesc"], parser)
70	                issue_description = self.get_description(html_desc_as_e_tree)

--------------------------------------------------

github-actions · 2024-04-06T08:31:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2024-04-20T17:37:46Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

manuel-sommer · 2024-04-20T19:22:07Z

@cneill, I updated how lxml is used (with resolve_entities=False)

manuel-sommer · 2024-04-20T19:34:09Z

Could you also take a look @Maffooch if lxml is fine to be used like this?

manuel-sommer · 2024-04-20T20:58:44Z

reopening to retrigger failed tests.

mtesauro

Approved

github-actions · 2024-04-24T17:02:49Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2024-04-24T17:54:25Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions · 2024-04-30T14:46:29Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2024-05-30T18:10:35Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions · 2024-06-25T17:57:29Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2024-06-25T20:35:03Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

manuel-sommer · 2024-07-02T09:02:02Z

Friendly reminder @Maffooch

github-actions · 2024-07-03T20:19:25Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Maffooch · 2024-07-29T17:46:37Z

@manuel-sommer I apologize for the delay on this one, and I thank you very much for your patience.

After much deliberation, we believe it would be the best decision to remain with defusedXML. These are the high level reasons that contributed to that decision:

This library only warps the stdlib and does not have any dependencies. There really is not a good reason to rev versions unless it is absolutely needed
- This is largely in response to the initial purpose of moving away from defusedXML
Python docs suggest using defusedXML in place of the stdlib
A few security tools will raise issues when using the stdlib, such as bandit
- This may restrict some very compliancy strict organizations from using DefectDojo, and therefore slow adoption
DefectDojo is viewed as a pillar in security, and by deviating away from the recommended security best practices, it could raise some eyebrows for a few folks

I agree with your intention of consolidating to a single XML parsing library, and we believe that defusedXML would be in the best interest of the project

mtesauro · 2024-07-29T19:32:59Z

@manuel-sommer Thanks for all your hard work on this - once we did the research, it proved that defusedXML was the best path forward. We're going to update the "Writing a parser" to include instructions to use this library going forward. We'd probably not have done the research if you'd not done this PR so even though it didn't get merged, it DID help. 👍

manuel-sommer · 2024-07-29T20:33:12Z

@mtesauro and @Maffooch I added the docs here.

dryrunsecurity · 2024-07-29T20:33:14Z

DryRun Security Summary

The pull request focuses on improving the security and robustness of the parser implementation in the DefectDojo project by emphasizing the use of secure libraries, recommending best practices for handling data, and emphasizing the importance of comprehensive unit testing.

Expand for full summary

Summary:

The code changes in this pull request are focused on improving the security and robustness of the parser implementation in the DefectDojo project. The key changes include:

Emphasizing the use of secure and well-maintained libraries for parsing various file formats, such as using defusedXML instead of lxml for parsing XML data.
Recommending that parsers should not set attributes if the data is not available, rather than filling them with placeholder values, to maintain the integrity of the processed data.
Suggesting the addition of checks to avoid potential KeyError exceptions when accessing fields that may not always be present in the input data, to ensure that the parser can gracefully handle edge cases and unexpected data.
Recommending the use of pre-defined deduplication algorithms, such as the "legacy" algorithm, instead of implementing custom deduplication logic, to maintain consistency and reliability across the project.
Emphasizing the importance of having comprehensive unit tests for each parser, covering common cases as well as checking the attributes of the findings, to ensure the robustness and correctness of the parser implementation.

From an application security perspective, these changes are positive as they help to improve the overall security and reliability of the DefectDojo project by reducing the risk of vulnerabilities and improving the overall quality of the parser implementation.

Files Changed:

docs/content/en/contributing/how-to-write-a-parser.md: This file contains the documentation for writing parsers in the DefectDojo project. The changes focus on providing guidance and recommendations for improving the security and robustness of the parser implementation, as outlined in the summary.

Code Analysis

We ran 9 analyzers against 1 file and 0 analyzers had findings. 9 analyzers had no findings.

Riskiness

🟢 Risk threshold not exceeded.

View PR in the DryRun Dashboard.

github-actions · 2024-07-29T20:35:30Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

sonarqubecloud · 2024-07-29T20:37:35Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions bot added unittests parser labels Mar 28, 2024

manuel-sommer force-pushed the rm_defusedxml branch from 0d05ed6 to 0ba0ec8 Compare March 28, 2024 08:36

manuel-sommer marked this pull request as draft March 28, 2024 17:14

github-actions bot added the conflicts-detected label Apr 6, 2024

manuel-sommer force-pushed the rm_defusedxml branch from 0ba0ec8 to 4b9b5c6 Compare April 20, 2024 17:37

github-actions bot removed the conflicts-detected label Apr 20, 2024

manuel-sommer marked this pull request as ready for review April 20, 2024 19:20

manuel-sommer closed this Apr 20, 2024

manuel-sommer reopened this Apr 20, 2024

mtesauro approved these changes Apr 21, 2024

View reviewed changes

github-actions bot added conflicts-detected and removed conflicts-detected labels Apr 24, 2024

manuel-sommer closed this Apr 24, 2024

manuel-sommer reopened this Apr 24, 2024

manuel-sommer closed this Apr 24, 2024

manuel-sommer reopened this Apr 24, 2024

github-actions bot added the conflicts-detected label Apr 30, 2024

github-actions bot removed the conflicts-detected label May 30, 2024

manuel-sommer closed this May 30, 2024

manuel-sommer reopened this May 30, 2024

github-actions bot added the conflicts-detected label Jun 25, 2024

github-actions bot removed the conflicts-detected label Jun 25, 2024

github-actions bot added the conflicts-detected label Jul 3, 2024

manuel-sommer mentioned this pull request Jul 27, 2024

Update Qualys WebApp parser to use DefusedXML #10637

Merged

manuel-sommer closed this Jul 29, 2024

manuel-sommer force-pushed the rm_defusedxml branch from 21f46f7 to 4b60cef Compare July 29, 2024 19:18

update to docs

eacc0c7

manuel-sommer reopened this Jul 29, 2024

fix

ab69475

github-actions bot added docs and removed unittests parser conflicts-detected labels Jul 29, 2024

Maffooch approved these changes Jul 29, 2024

View reviewed changes

Maffooch merged commit 19bab59 into DefectDojo:dev Jul 29, 2024
126 checks passed

manuel-sommer deleted the rm_defusedxml branch July 29, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove defusedxml in favor of lxml #9840

remove defusedxml in favor of lxml #9840

manuel-sommer commented Mar 28, 2024

manuel-sommer commented Mar 28, 2024

dryrunsecurity bot commented Mar 28, 2024 •

edited

Loading

manuel-sommer commented Mar 28, 2024

cneill commented Mar 28, 2024

github-actions bot commented Apr 6, 2024

github-actions bot commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

mtesauro left a comment

github-actions bot commented Apr 24, 2024

github-actions bot commented Apr 24, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented May 30, 2024

github-actions bot commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

manuel-sommer commented Jul 2, 2024

github-actions bot commented Jul 3, 2024

Maffooch commented Jul 29, 2024

mtesauro commented Jul 29, 2024

manuel-sommer commented Jul 29, 2024

dryrunsecurity bot commented Jul 29, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024

sonarqubecloud bot commented Jul 29, 2024

remove defusedxml in favor of lxml #9840

remove defusedxml in favor of lxml #9840

Conversation

manuel-sommer commented Mar 28, 2024

manuel-sommer commented Mar 28, 2024

dryrunsecurity bot commented Mar 28, 2024 • edited Loading

manuel-sommer commented Mar 28, 2024

cneill commented Mar 28, 2024

github-actions bot commented Apr 6, 2024

github-actions bot commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

manuel-sommer commented Apr 20, 2024

mtesauro left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 24, 2024

github-actions bot commented Apr 24, 2024

github-actions bot commented Apr 30, 2024

github-actions bot commented May 30, 2024

github-actions bot commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

manuel-sommer commented Jul 2, 2024

github-actions bot commented Jul 3, 2024

Maffooch commented Jul 29, 2024

mtesauro commented Jul 29, 2024

manuel-sommer commented Jul 29, 2024

dryrunsecurity bot commented Jul 29, 2024 • edited Loading

DryRun Security Summary

Code Analysis

Riskiness

github-actions bot commented Jul 29, 2024

sonarqubecloud bot commented Jul 29, 2024

Quality Gate passed

dryrunsecurity bot commented Mar 28, 2024 •

edited

Loading

dryrunsecurity bot commented Jul 29, 2024 •

edited

Loading