Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace REXML with Nokogiri for consistency #110

Merged
merged 7 commits into from
Jul 31, 2024

Conversation

Sgtpluck
Copy link
Member

@Sgtpluck Sgtpluck commented Jul 12, 2024

As I've been trying to clean up parts of this code, I noticed a bunch of repetition around XML processing because the XMLSecurity class was using both REXML and Nokogiri for processing. I decided to take a stab at choosing one to make it clearer and more consistent, so that we can focus on understanding the SAML processing rather than how the XML parser is working.

The tests are all passing, which is a great sign, but I'd love some feedback about whether this seems like a positive change, or if there are more tests I can/should write to create more confidence in the change.

def digest_method_algorithm(ref, hash, digest_method_fix_enabled)
if digest_method_fix_enabled
algorithm(REXML::XPath.first(ref, '//ds:DigestMethod', hash))
def digest_method_algorithm(ref, digest_method_fix_enabled)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not long for this world, we are attempting to reach out to the one partner who will be affected by fixing this bug totally

Copy link
Member

@vrajmohan vrajmohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem to all make sense. I have a couple of minor comments that don't have to be addressed. I don't have any experience with REXML or Nokogiri.

lib/saml_idp/xml_security.rb Outdated Show resolved Hide resolved
lib/saml_idp/xml_security.rb Outdated Show resolved Hide resolved
@Sgtpluck
Copy link
Member Author

just waiting on a review from @lmgeorge, as she has XML expertise!

@@ -97,10 +95,10 @@ def signature_algorithm(options)
if options[:get_params] && options[:get_params][:SigAlg]
algorithm(options[:get_params][:SigAlg])
else
ref_elem = REXML::XPath.first(self, '//ds:Reference', { 'ds' => DSIG })
ref_elem = document.at_xpath('//ds:Reference | //Reference', DS_NS)
Copy link

@lmgeorge lmgeorge Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend this XPath query as it is less brittle:

ref_elem = document.at_xpath(
    "//*[namespace-uri()='#{DSIG}'][local-name()='Reference'] | //Reference"
)

This will always find the right node for the following cases:

  1. When the namespace is defined with an unknown namespace prefix (e.g., mydsig instead of ds)
  2. When the namespace is declared as the default namespace on the node (i.e., <Reference xmlns="http://www.w3.org/2000/09/xmldsig#">)
  3. When the namespace is declared as the default namespace on an ancestor node
  4. When no namespace, default or otherwise, is defined for the node or any of its ancestors (it lives in the document's global namespace).
    • Note: When searching from the document root, it would be safer to use the full node path (e.g., //Signature/SignedInfo/Reference) since some of the node names are fairly generic.

Copy link

@lmgeorge lmgeorge Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probable bug: the OR clause | //Reference only finds nodes without any namespace declaration, but won't find nodes where the namespace is declared as the default namespace (see: spec/support/responses/no_signature_ns.xml), which seems to be an expected behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, i had initially used wildcards (:*//Reference) rather than the OR clause, but i decided since the original code ONLY found either no namespace or ds, i didn't want to change it (and that felt way too permissible), since i didn't know what impact that would have to existing integrations. so '//ds:Reference | //Reference' finds either the ds namespace if it exists, or no namespace, but that's it. i think we should definitely think about adding the ability to have different namespaces, but that would be a feature, not a refactor.

with that extra context, does it seem like this will work as expected? thanks for helping me think through this!

Copy link

@lmgeorge lmgeorge Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just retested this. I think the issue is that when I first looked at it, I was testing validations with REXML which is stricter than Nokogiri by default. All of the use cases I mentioned should work with '//ds:Reference | //Reference', even if the namespace prefix isn't ds. However, use case 3 and 4 may only be working because no namespace validation is occuring.

To ensure your namespace is actually what it should be for those cases, you would have to use the XPath query I provided.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to address this in a different PR as it is working, but may be more permissive that we'd like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmgeorge i've been trying to figure out how to generate signed saml assertions that would allow me to test things like an unknown namespace, but i haven't been able to figure out how. do you have any tools you use for this kind of thing? a lot of the automated testing is a little hand-wavey since tampering with it will lead to signature validation errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make a ticket to define how we want namespace handling to work (ie, do we want to allow namespaces beyond ds) and refine from there!

lib/saml_idp/xml_security.rb Show resolved Hide resolved
return nil unless ref_elem

algorithm(REXML::XPath.first(ref_elem, '//ds:DigestMethod', { 'ds' => DSIG }))
algorithm(ref_elem.at_xpath('//ds:DigestMethod | //DigestMethod', DS_NS))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same probable bug as mentioned earlier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll only mark this one as this pattern is seen through out the code)

lib/saml_idp/xml_security.rb Outdated Show resolved Hide resolved
spec/lib/saml_idp/logout_response_builder_spec.rb Outdated Show resolved Hide resolved
@@ -97,10 +95,10 @@ def signature_algorithm(options)
if options[:get_params] && options[:get_params][:SigAlg]
algorithm(options[:get_params][:SigAlg])
else
ref_elem = REXML::XPath.first(self, '//ds:Reference', { 'ds' => DSIG })
ref_elem = document.at_xpath('//ds:Reference | //Reference', DS_NS)
Copy link

@lmgeorge lmgeorge Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just retested this. I think the issue is that when I first looked at it, I was testing validations with REXML which is stricter than Nokogiri by default. All of the use cases I mentioned should work with '//ds:Reference | //Reference', even if the namespace prefix isn't ds. However, use case 3 and 4 may only be working because no namespace validation is occuring.

To ensure your namespace is actually what it should be for those cases, you would have to use the XPath query I provided.

lib/saml_idp/xml_security.rb Show resolved Hide resolved
@Sgtpluck Sgtpluck merged commit b94ee37 into main Jul 31, 2024
1 check passed
@Sgtpluck Sgtpluck deleted the dmm/remove-rexml-from-xml-security branch July 31, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants