Skip to content

Conversation

@nanorepublica
Copy link
Owner

No description provided.

Root cause identified: When using nested include() with mixed path() and
re_path(), regex anchors (^ and $) end up embedded in the middle of
accumulated URL patterns. normalize_path() only strips anchors from
start/end, so patterns like 'nutritionist/^client/mfp/$' don't match
href '/nutritionist/client/mfp/'.

Fix: Change normalize_path() to use str.replace() instead of checking
startswith()/endswith().

Also adds specs for:
- Static JavaScript file scanning (--scan-static flag)
- Detection aggressiveness levels (--url-detection flag)
When using nested include() with mixed path() and re_path(), regex
anchors (^ and $) end up embedded in the middle of accumulated URL
patterns. For example:

  path('nutritionist/', include(...)) + re_path(r'^client/mfp/$', ...)

produces the pattern: 'nutritionist/^client/mfp/$'

The previous normalize_path() only stripped ^ from start and $ from end,
so patterns like this would never match their corresponding hrefs.

Fix: Change normalize_path() to use str.replace() to strip ALL
occurrences of ^ and $ from anywhere in the pattern.

This fixes false positives where URLs used in inline JavaScript were
incorrectly reported as unreferenced because the pattern matching failed.

Tests added:
- test_normalize_embedded_caret_anchor
- test_normalize_embedded_dollar_anchor
- test_normalize_multiple_embedded_anchors
- test_normalize_real_world_nested_include_pattern
- TestEmbeddedAnchorMatching class with integration tests
Adds support for scanning JavaScript files in static directories for
URL references. This helps detect URLs used in external .js files that
would otherwise be reported as unreferenced.

New CLI option:
  --scan-static    Scan JavaScript files in static directories

Features:
- Discovers static directories from STATICFILES_DIRS and app static folders
- Scans .js and .mjs files for URL references
- Skips minified files (.min.js) and vendor directories
- Filters by BASE_DIR to exclude third-party code
- URLs found in static files are included in href matching

New TemplateAnalyzer methods:
- find_all_static_files(): Discover and analyze JS files
- analyze_static_file(): Analyze a single static file
- normalize_static_path(): Get relative path for static files

Tests added for:
- URL extraction from JavaScript content
- jQuery AJAX patterns (the original use case)
- Fetch API calls
- Comment exclusion in JS files
- Static path normalization
Adds support for detecting URLs in JavaScript template literals when
using the 'extended' detection mode.

New CLI option:
  --url-detection [basic|extended]
    - basic - default - Only detect static string URLs in quotes
    - extended - Also detect URLs in template literals

Features:
- Extracts static URL prefixes from template literals before interpolation
- Works in both template files and static JavaScript files
- Combines with existing basic detection

Tests added for template literal detection in various scenarios.
- Fix trailing whitespace in CHANGELOG.md
- Use union syntax in isinstance call (UP038)
- Fix line length issues in test docstrings
- Apply ruff formatting
@nanorepublica nanorepublica merged commit 8d20104 into main Dec 4, 2025
19 checks passed
@nanorepublica nanorepublica deleted the claude/fix-unreferenced-url-012EZbq8XwJPj8Ao4wWW6VAf branch December 4, 2025 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants