category_fetcher: reject XML responses with DOCTYPE/ENTITY declarations

User-configurable sites could send XXE or billion-laughs payloads
via tag category API responses. Reject any XML body containing
<!DOCTYPE or <!ENTITY before passing to ET.fromstring.
This commit is contained in:
pax 2026-04-12 14:55:30 -05:00
parent 56c5eac870
commit ad6f876f40

View File

@ -593,6 +593,9 @@ def _parse_tag_response(resp) -> list[tuple[str, int]]:
return [] return []
out: list[tuple[str, int]] = [] out: list[tuple[str, int]] = []
if body.startswith("<"): if body.startswith("<"):
if "<!DOCTYPE" in body or "<!ENTITY" in body:
log.warning("XML response contains DOCTYPE/ENTITY, skipping")
return []
try: try:
root = ET.fromstring(body) root = ET.fromstring(body)
except ET.ParseError as e: except ET.ParseError as e: