pax f0fe52c886 fix: HTML parser two-pass rewrite + fire-and-forget prefetch
Three fixes:

1. HTML parser completely rewritten with two-pass approach:
   - Pass 1: regex finds each tag-type element and its full inner
     content (up to closing </li|span|td|div>)
   - Pass 2: within the content, extracts the tag name from the
     tags=NAME URL parameter in the search link
   The old single-pass regex captured the ? wiki-link (first <a>)
   instead of the tag name (second <a>). The URL-param extraction
   works on Rule34 (40 tags), Safebooru.org (47 tags), and
   yande.re (3 tags). Gelbooru proper returns 0 (post page only
   has ? links with no tags= param) which is correct — Gelbooru
   uses the batch tag API instead.

2. prefetch_batch is now truly fire-and-forget:
   gelbooru.py and moebooru.py use asyncio.create_task instead of
   await for prefetch_batch. search() returns immediately. The
   probe + batch/HTML fetch runs in the background. Previously
   search() blocked on the probe, which made Rule34 searches take
   5+ seconds (slow/broken Rule34 API response time).

3. Partial cache compose already fixed in the previous commit
   complements this: posts with 49/50 cached tags now show all
   available categories instead of nothing.
2026-04-09 19:31:43 -05:00
..