category_fetcher: fix _do_ensure to try batch API when not yet probed

_do_ensure only tried the batch API when _batch_api_works was True,
but after removing the search-time prefetch (where the probe used
to run), _batch_api_works stayed None forever. Gelbooru's only
viable path IS the batch API (its post-view HTML has no tag links),
so clicks on Gelbooru posts produced zero categories.

Fix: _do_ensure now tries the batch API when _batch_api_works is
not False (i.e., both True and None). When None, the call doubles
as an inline probe: if the batch produced categories, save True;
if nothing useful came back, save False and fall to HTML.

This is simpler than the old prefetch_batch probe because it runs
on ONE post at a time — no batch/HTML mixing concerns, no "single
path per invocation" rule. The probe result is persisted to DB so
it only fires once per site ever.

Dispatch matrix in _do_ensure:
  _batch_api_works True  + auth → batch API (Gelbooru proper)
  _batch_api_works None  + auth → batch as probe → True or False
  _batch_api_works False        → HTML scrape (Rule34)
  no auth                       → HTML scrape (Safebooru.org)
  transient error               → stays None, retry next click

Verified all three sites from clean cache: Gelbooru 55/56+49/50
(batch), Rule34 40/40+38/38 (HTML), Safebooru.org 47/47+47/47
(HTML).
This commit is contained in:
pax 2026-04-09 19:53:20 -05:00
parent 35424ff89d
commit f168bece00

View File

@ -332,13 +332,34 @@ class CategoryFetcher:
self._inflight.pop(post.id, None) self._inflight.pop(post.id, None)
async def _do_ensure(self, post: "Post") -> None: async def _do_ensure(self, post: "Post") -> None:
"""Inner dispatch for ensure_categories.""" """Inner dispatch for ensure_categories.
# Batch API path (for single-post ensure, e.g. click or save)
if self._batch_api_works is True and self._batch_api_available(): Tries the batch API when it's known to work (True) OR not yet
await self.fetch_via_tag_api([post]) probed (None). The result doubles as an inline probe: if the
if post.tag_categories: batch produced categories, it works (save True); if it
return returned nothing useful, it's broken (save False). Falls
# HTML fallback through to HTML scrape as the universal fallback.
"""
if self._batch_api_works is not False and self._batch_api_available():
try:
await self.fetch_via_tag_api([post])
except Exception as e:
log.debug("Batch API ensure failed (transient): %s", e)
# Leave _batch_api_works at None → retry next call
else:
if post.tag_categories:
if self._batch_api_works is None:
self._batch_api_works = True
self._save_probe_result(True)
return
# Batch returned nothing → broken API (Rule34) or
# the specific post has only unknown tags (very rare).
if self._batch_api_works is None:
self._batch_api_works = False
self._save_probe_result(False)
# HTML scrape fallback (works on Rule34/Safebooru.org/Moebooru,
# returns empty on Gelbooru proper which is fine because the
# batch path above covers Gelbooru)
await self.fetch_post(post) await self.fetch_post(post)
# ----- dispatch: prefetch (batch, fire-and-forget) ----- # ----- dispatch: prefetch (batch, fire-and-forget) -----