fix(v3): reject U+0085 (NEXT LINE) in URL validation#5704
Conversation
ValidateAndSanitizeURL lists every other non-ASCII White_Space character in its dangerous-unicode class but omits U+0085 (NEXT LINE). Because U+0085 is a C1 control it is also missed by the r < 32 loop, so a URL containing it passes the validator. Add it to the class and a test case.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
Walkthrough
URL Validator: U+0085 fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain modules listed in go.work or their selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Description
ValidateAndSanitizeURL(v3/pkg/application/urlvalidator.go) rejects control characters and a set of dangerous/invisible Unicode characters in URLs. TheunicodeDangerousclass lists every non-ASCII whitespace character (U+00A0, U+1680, U+2000-U+200A, U+2028, U+2029, U+202F, U+205F, U+3000) except U+0085 (NEXT LINE).U+0085 has
White_Space=Yesin the Unicode character database (PropList.txt), the same property as the other spaces already in the class. It is also a C1 control character, so it is not caught by ther < 32loop earlier in the function (that loop only covers C0 controls). The result is that a URL containing NEL passes the validator untouched, even though it is the same kind of invisible whitespace as the characters the class already rejects.This adds U+0085 to the class so the whitespace coverage is complete, and adds a test case alongside the existing zero-width / RTL-override cases.
Changes
v3/pkg/application/urlvalidator.go: addU+0085tounicodeDangerous, with a short comment noting it is a C1 control that ther < 32loop does not reach.v3/pkg/application/urlvalidator_test.go: add anext line (NEL)case that expects thedangerous unicodeerror.v3/UNRELEASED_CHANGELOG.md: add a Fixed entry.Testing
go test ./pkg/application/ -run TestValidateURLpasses. Reverting only the regex change (keeping the new test) makes thenext line (NEL)case fail withexpected error containing 'dangerous unicode', got nil, which confirms the test exercises this change and that nothing else is affected.Summary by CodeRabbit