booru-viewer/booru_viewer
pax 5a511338c8 security: fix #14 — cap category_fetcher HTML body before regex walk
CategoryFetcher.fetch_post pulls a post-view HTML page and runs
_TAG_ELEMENT_RE.finditer over the full body. The regex itself is
linear (no catastrophic backtracking shape), but a hostile server
returning hundreds of MB of HTML still pegs CPU walking the buffer.
Caps the body the regex sees at 2MB — well above any legit
Gelbooru/Moebooru post page (~30-150KB).

Truncation rather than streaming because httpx already buffers the
body before _request returns; the cost we're cutting is the regex
walk, not the memory hit. A full streaming refactor of fetch_post
is a follow-up that the audit explicitly flagged as out of scope
("not catastrophic — defense in depth").

Audit-Ref: SECURITY_AUDIT.md finding #14
Severity: Informational
2026-04-11 16:26:00 -05:00
..