-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
App Version
v3.25.14
API Provider
Anthropic
Model Used
Claude 4 Sonnet
Roo Code Task Links (Optional)
Description
In our project, the apply_diff tool, which is orchestrated by [src/core/tools/multiApplyDiffTool.ts], relies on fast-xml-parser for processing instructions. The utility function responsible for this is located in [src/utils/xml.ts]
We've identified a critical issue where the tool fails when processing XML content that contains text with special characters like &. The root cause is that fast-xml-parser's default configuration decodes HTML entities (e.g., converting & to & internally during parsing). This causes a mismatch when the tool's diffing strategy later compares the parsed content against the original file content, leading to a No sufficiently similar match found error.
This behavior is particularly problematic and non-obvious for tools that require byte-for-byte or character-for-character accuracy between the original source and the parsed representation.
🔁 Steps to Reproduce
Steps to Reproduce
Use a tool that wraps file content in an XML structure for processing, like our apply_diff tool.
Attempt to modify a file where the search block contains a special character that gets converted to an HTML entity (e.g., & becomes &).
The apply_diff tool calls parseXml (from src/utils/xml.ts) which uses fast-xml-parser with its default settings.
The parser returns an object where the string now contains & instead of &.
The diffing engine compares "Team Identity & Project Positioning" (from the parser) with "Team Identity & Project Positioning" (from the file) and fails to find a match.
Minimal Reproducible Example
import { XMLParser } from "fast-xml-parser";
// Simulate the XML content passed to the tool
const xmlInput = <args> <file> <path>./doc.md</path> <diff> <content> Team Identity & Project Positioning </content> </diff> </file> </args>;
// Default parser configuration (problematic)
const defaultParser = new XMLParser();
const parsedWithDefaults = defaultParser.parse(xmlInput);
console.log('Parsed with defaults:', parsedWithDefaults.args.file.diff.content);
// Expected output: "Team Identity & Project Positioning"
// Actual output: "Team Identity & Project Positioning"
// Correct parser configuration (works as expected)
const correctParser = new XMLParser({
processEntities: false, // The key fix
});
const parsedCorrectly = correctParser.parse(xmlInput);
console.log('Parsed correctly:', parsedCorrectly.args.file.diff.content);
// Expected output: "Team Identity & Project Positioning"
// Actual output: "Team Identity & Project Positioning"
typescript
Expected Behavior
The XML parser should provide a configuration option to disable HTML entity processing and, for a tool designed for precise text manipulation, this option should arguably be disabled by default or at least clearly documented as a potential "gotcha". The parsed content should exactly match the original string content from the file.
Actual Behavior
fast-xml-parser decodes entities by default, causing silent data corruption for use cases that depend on literal string matching.
💥 Outcome Summary
Suggested Solution
The fix is to explicitly set processEntities: false in the fast-xml-parser options within our src/utils/xml.ts file. This prevents the library from decoding entities and preserves the original string.
We recommend either:
Changing the default behavior of fast-xml-parser to not process entities unless explicitly enabled.
Or, more prominently documenting this default behavior in the README as a critical consideration for any application using the library for text-based diffing or validation.
Environment:
Library: fast-xml-parser
Version: 5.x.x
Context: Node.js-based developer tool, specifically affecting file operations in [src/core/tools/multiApplyDiffTool.ts]
\src\utils\xml.ts:28-56
/**
* Parses an XML string for diffing purposes, ensuring no HTML entities are decoded.
* This is a specialized version of parseXml to be used exclusively by diffing tools
* to prevent mismatches caused by entity processing.
* @param xmlString The XML string to parse
* @returns Parsed JavaScript object representation of the XML
* @throws Error if the XML is invalid or parsing fails
*/
export function parseXmlForDiff(xmlString: string, stopNodes?: string[]): unknown {
const _stopNodes = stopNodes ?? []
try {
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@_",
parseAttributeValue: false,
parseTagValue: false,
trimValues: true,
processEntities: false, // Do not process HTML entities, keep them as is
stopNodes: _stopNodes,
})
return parser.parse(xmlString)
} catch (error) {
// Enhance error message for better debugging
const errorMessage = error instanceof Error ? error.message : "Unknown error"
throw new Error(`Failed to parse XML: ${errorMessage}`)
}
}
\src\core\tools\multiApplyDiffTool.ts:15-15
import { parseXmlForDiff } from "../../utils/xml"
\src\core\tools\multiApplyDiffTool.ts:111-111
const parsed = parseXmlForDiff(argsXmlTag, ["file.diff.content"]) as ParsedXmlResult
📄 Relevant Logs or Errors (Optional)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status