booru-viewer

Author	SHA1	Message	Date
pax	a6a73fed61	security: fix #4 — chmod SQLite DB + WAL/SHM sidecars to 0o600 The sites table stores api_key + api_user in plaintext. Previous behavior left the DB file at the inherited umask (0o644 on most Linux systems) so any other local user could sqlite3 it open and exfiltrate every booru API key. Adds Database._restrict_perms(), called from the lazy conn init right after _migrate(). Tightens the main file plus the -wal and -shm sidecars to 0o600. The sidecars only exist after the first write, so the FileNotFoundError path is expected and silenced. Filesystem chmod failures are also swallowed for FUSE-mount compatibility. behavior change from v0.2.5: ~/.local/share/booru-viewer/booru.db is now 0o600 even if a previous version created it 0o644. Audit-Ref: SECURITY_AUDIT.md finding #4 Severity: Medium	2026-04-11 16:15:41 -05:00
pax	6801a0b45e	security: fix #4 — chmod data_dir to 0o700 on POSIX The data directory holds the SQLite database whose `sites` table stores api_key and api_user in plaintext. Previous behavior used the inherited umask (typically 0o755), which leaves the dir world-traversable on shared workstations and on networked home dirs whose home is 0o755. Tighten to 0o700 unconditionally on every data_dir() call so the fix is applied even when an older version (or external tooling) left the directory loose. Failures from filesystems that don't support chmod (some FUSE mounts) are swallowed — better to keep working than refuse to start. Windows: no-op, NTFS ACLs handle this separately. behavior change from v0.2.5: ~/.local/share/booru-viewer is now 0o700 even if it was previously 0o755. Audit-Ref: SECURITY_AUDIT.md finding #4 Severity: Medium	2026-04-11 16:14:30 -05:00
pax	19a22be59c	security: fix #3 — redact params in GelbooruClient debug log Same fix as danbooru.py and e621.py — Gelbooru's params dict carries api_key + user_id when configured. Route through redact_params() before the debug log emits them. Audit-Ref: SECURITY_AUDIT.md finding #3 Severity: Medium	2026-04-11 16:13:25 -05:00
pax	49fa2c5b7a	security: fix #3 — redact params in E621Client debug log Same fix as danbooru.py — the search() log.debug params line previously emitted login + api_key. Route through redact_params(). Audit-Ref: SECURITY_AUDIT.md finding #3 Severity: Medium	2026-04-11 16:13:06 -05:00
pax	9a3bb697ec	security: fix #3 — redact params in DanbooruClient debug log The log.debug(f" params: {params}") line in search() previously dumped login + api_key to the booru logger at DEBUG level. Route the params dict through redact_params() so the keys are replaced with *** before formatting. Audit-Ref: SECURITY_AUDIT.md finding #3 Severity: Medium	2026-04-11 16:12:47 -05:00
pax	d6909bf4d7	security: fix #3 — redact URL in BooruClient._log_request The httpx request event hook converts request.url to a str so log_connection can parse it — at that point the credential query params (login, api_key, etc.) are in scope and could be captured by any traceback, debug hook, or monitoring agent observing the hook call. Pipe through redact_url() first so the rendered string never carries the secrets, even transiently. Audit-Ref: SECURITY_AUDIT.md finding #3 Severity: Medium	2026-04-11 16:12:28 -05:00
pax	c735db0c68	security: fix #1 — wire SSRF hook into detect_site_type client detect_site_type constructs a fresh BooruClient._shared_client directly (bypassing the BooruClient.client property) for the /posts.json, /index.php, and /post.json probes. The hooks set here are the ones installed on that initial construction — if detection runs before any BooruClient instance's .client is accessed, the shared singleton must still have SSRF validation and connection logging. This additionally closes finding #16 for the detect client — site detection requests now appear in the connection log instead of being invisible. behavior change from v0.2.5: Test Connection from the site dialog now rejects private-IP targets. Adding a local/RFC1918 booru via the "auto-detect type" dialog will fail with "blocked request target ..." instead of probing it. Explicit api_type selection still goes through the BooruClient.client path, which is also now protected. Audit-Ref: SECURITY_AUDIT.md finding #1 Also-Closes: SECURITY_AUDIT.md finding #16 (detect half) Severity: High	2026-04-11 16:11:37 -05:00
pax	ef95509551	security: fix #1 — wire SSRF hook into E621Client custom client E621 maintains its own httpx.AsyncClient because their TOS requires a per-user User-Agent string that BooruClient's shared client can't carry. The client is rebuilt on User-Agent change, so the hook must be installed in the same construction path. Also installs BooruClient._log_request as a second hook (this additionally closes finding #16 for the e621 client — e621 requests previously bypassed the connection log entirely, and this wires them in consistently with the base client). Audit-Ref: SECURITY_AUDIT.md finding #1 Also-Closes: SECURITY_AUDIT.md finding #16 (e621 half) Severity: High	2026-04-11 16:11:12 -05:00
pax	ec79be9c83	security: fix #1 — wire SSRF hook into cache download client Adds validate_public_request to the cache module's shared httpx client event_hooks. Covers image/video/thumbnail downloads, which are the most likely exfil path — file_url comes straight from the booru JSON response and previously followed any 3xx that landed, so a hostile booru could point downloads at a private IP. Every redirect hop is now rejected if the target is non-public. The import is lazy inside _get_shared_client because core.api.base imports log_connection from this module; a top-level `from .api._safety import ...` would circular-import through api/__init__.py during cache.py load. By the time _get_shared_client is called the api package is fully loaded. Audit-Ref: SECURITY_AUDIT.md finding #1 Severity: High	2026-04-11 16:10:50 -05:00
pax	6eebb77ae5	security: fix #1 — wire SSRF hook into BooruClient shared client Adds validate_public_request to the BooruClient event_hooks list so every request (and every redirect hop) is checked against the block list from _safety.py. Danbooru, Gelbooru, and Moebooru subclasses all go through BooruClient.client and inherit the protection. Preserves the existing _log_request hook by listing both hooks in order: validate first (so blocked hops never reach the log), then log. Audit-Ref: SECURITY_AUDIT.md finding #1 Severity: High	2026-04-11 16:10:12 -05:00
pax	013fe43f95	security: fix #1 — add public-host validator helper Introduces core/api/_safety.py containing check_public_host and the validate_public_request async request-hook. The hook rejects any URL whose host is (or resolves to) loopback, RFC1918, link-local (including 169.254.169.254 cloud metadata), CGNAT, unique-local v6, or multicast. Called on every request hop so it covers both the initial URL and every redirect target that httpx would otherwise follow blindly. Also exports redact_url / redact_params for finding #3 — the secret-key set lives in the same module since both #1 and #3 work is wired through httpx client event_hooks. Helper is stdlib-only (ipaddress, socket, urllib.parse) plus httpx; no new deps. Not yet wired into any httpx client; per-file wiring commits follow. Audit-Ref: SECURITY_AUDIT.md finding #1 Severity: High	2026-04-11 16:09:53 -05:00
pax	5261fa176d	add search history setting New setting "Record recent searches" (on by default). When disabled, searches are not recorded and the Recent section is hidden from the history dropdown. Saved searches are unaffected. behavior change: opt-in setting, on by default (preserves existing behavior)	2026-04-10 16:28:43 -05:00
pax	94588e324c	add unbookmark-on-save setting New setting "Remove bookmark when saved to library" (off by default). When enabled, _maybe_unbookmark runs directly in each save callback after save_post_file succeeds -- handles DB removal, grid dot, preview state, popout sync, and bookmarks tab refresh. Wired into all 4 save paths: save_to_library, bulk_save, save_as, batch_download_to. behavior change: opt-in setting, off by default	2026-04-10 16:23:54 -05:00
pax	9cc294a16a	Revert "add unbookmark-on-save setting" This reverts commit 08f99a61011532202b22d05750416aa1e754f9c9.	2026-04-10 16:20:26 -05:00
pax	08f99a6101	add unbookmark-on-save setting New setting "Remove bookmark when saved to library" (off by default). When enabled, saving a post to the library automatically removes its bookmark. Handles both single saves (on_bookmark_done) and bulk saves (on_batch_done). UI toggle in Settings > General. behavior change: opt-in setting, off by default	2026-04-10 16:19:00 -05:00
pax	d66dc14454	db: fix orphan rows — cascade delete_site, wire up reconcile on startup delete_site() leaked rows in tag_types, search_history, and saved_searches; reconcile_library_meta() was implemented but never called. Add tests for both fixes plus tag cache pruning.	2026-04-10 14:10:57 -05:00
pax	264c421dff	cache: skip .part files in evict_oldest Prevents cache eviction from deleting a .part temp file that mpv's stream-record is actively writing to. Prerequisite for the stream-record plumbing in video_player.py.	2026-04-09 20:52:36 -05:00
pax	57a19f87ba	gelbooru: re-add background prefetch for batch API fast path only When _batch_api_works is True (Gelbooru proper with auth, persisted from a prior session's probe), search() fires prefetch_batch in the background. The batch tag API covers the entire page's tags in 1-2 requests during the time between grid render and user click — the cache is warm before the info panel opens, so categories appear instantly with no flash of flat tags. Gated on _batch_api_works is True (not None, not False): - Gelbooru proper: prefetches (batch API known good) - Rule34: skips (batch_api_works = False, persisted) - Safebooru.org: skips (no auth → fetcher skips batch capability) Rule34 / Safebooru.org / Moebooru stay on-demand: the ~200ms per-click HTML scrape is unavoidable for those sites because their only path is per-post page fetching, which can't be batched.	2026-04-09 20:01:34 -05:00
pax	f168bece00	category_fetcher: fix _do_ensure to try batch API when not yet probed _do_ensure only tried the batch API when _batch_api_works was True, but after removing the search-time prefetch (where the probe used to run), _batch_api_works stayed None forever. Gelbooru's only viable path IS the batch API (its post-view HTML has no tag links), so clicks on Gelbooru posts produced zero categories. Fix: _do_ensure now tries the batch API when _batch_api_works is not False (i.e., both True and None). When None, the call doubles as an inline probe: if the batch produced categories, save True; if nothing useful came back, save False and fall to HTML. This is simpler than the old prefetch_batch probe because it runs on ONE post at a time — no batch/HTML mixing concerns, no "single path per invocation" rule. The probe result is persisted to DB so it only fires once per site ever. Dispatch matrix in _do_ensure: _batch_api_works True + auth → batch API (Gelbooru proper) _batch_api_works None + auth → batch as probe → True or False _batch_api_works False → HTML scrape (Rule34) no auth → HTML scrape (Safebooru.org) transient error → stays None, retry next click Verified all three sites from clean cache: Gelbooru 55/56+49/50 (batch), Rule34 40/40+38/38 (HTML), Safebooru.org 47/47+47/47 (HTML).	2026-04-09 19:53:20 -05:00
pax	35424ff89d	gelbooru+moebooru: drop background prefetch from search, fetch on demand Removes the asyncio.create_task(prefetch_batch) calls from search() and get_post() in both clients. Tags are now fetched ONLY when the user actually clicks a post (via ensure_categories in the info panel path) or saves with a category-token template. The background prefetch was the source of most of the complexity: probe timing, early-exit bugs from partial composes racing with on-click ensures, Rule34's slow probe blocking the prefetch window. All gone. New flow: search() → fast, returns posts with flat tags only click → ensure_categories fires, ~200ms HTML scrape or batch API, categories arrive, signal re-renders re-click → instant (cache compose, no HTTP) save → ensure in save_post_file, same path The ~200ms per first-click is invisible during the image load. The cache compounds across posts and sessions. The prefetch_batch method stays in CategoryFetcher for potential future use but nothing calls it from the hot path anymore.	2026-04-09 19:48:04 -05:00
pax	7d11aeab06	category_fetcher: persist batch API probe result across sessions The probe that detects whether a site's batch tag API works (Gelbooru proper: yes, Rule34: no) now persists its result in the tag_types table using a sentinel key (__batch_api_probe__). On subsequent app launches, the fetcher reads the saved result at construction time and skips the probe entirely. Before: every session with Rule34 wasted ~0.6s on a probe request that always fails (Rule34 returns garbage for names=). During that time the background prefetch couldn't start HTML scraping, so the first few post clicks paid ~0.3s each. After: first ever session probes Rule34 once, stores False. Every subsequent session reads False from DB, skips the probe, and the background prefetch immediately starts HTML scraping. By the time the user clicks any post, the scrape is usually done. Gelbooru proper: probe succeeds on first session, stores True. Future sessions use the batch API without probing. No change in speed (already fast), just saves the probe roundtrip. Persisted per site_id so different Gelbooru-shaped sites get their own probe result. The clear_tag_cache method wipes probe results along with tag data (the sentinel key lives in the same table).	2026-04-09 19:46:20 -05:00
pax	1547cbe55a	fix: remove early-exit on non-empty tag_categories in ensure path Two places checked `if post.tag_categories: return` before doing a full cache-coverage check, causing posts with partial cache composes (e.g. 5/40 tags from the background prefetch) to get stuck at low coverage forever: ensure_categories: removed the post.tag_categories early exit. Now ALWAYS runs try_compose_from_cache first. Only the 100% coverage return (True) is trusted as "done." Partial composes return False and fall through to the fetch path. _ensure_post_categories_async: removed the post.tag_categories guard. Danbooru/e621 are filtered by the client.category_fetcher is None check instead (they categorize inline, no fetcher). For Gelbooru-style sites, always schedules ensure_categories regardless of current post state. Root cause: the partial-compose fix (try_compose_from_cache populates tag_categories even when cache coverage is <100%) conflicted with the early-exit guards that assumed non-empty tag_categories = fully categorized. Now the only "fully done" signal is try_compose_from_cache returning True (100% coverage).	2026-04-09 19:40:09 -05:00
pax	762d73dc4f	category_fetcher: fix partial-compose vs ensure_categories interaction try_compose_from_cache was returning True on ANY partial cache hit (even 1/38 tags). ensure_categories then saw non-empty tag_categories and returned immediately, leaving the post stuck at 1/38 coverage. The bug showed on Rule34: post 1 got fully scraped (40/40), its tags got cached, then post 2's compose found one matching tag and declared victory. Fix: try_compose_from_cache now returns True ONLY when 100% of unique tags have cached labels (no fetch needed). It STILL populates post.tag_categories with whatever IS cached (for immediate partial display), but returning False signals ensure_categories to continue to the fetch path. This is the correct semantic split: - populate → always (for display) - return True → only when complete (for dispatch) Verified: Rule34: 40/40 + 38/38 (was 40/40 + 1/38) Gelbooru: 55/56 + 49/50 (batch API, one rare tag) Safebooru.org: 47/47 + 47/47 (HTML scrape, full)	2026-04-09 19:36:58 -05:00
pax	f0fe52c886	fix: HTML parser two-pass rewrite + fire-and-forget prefetch Three fixes: 1. HTML parser completely rewritten with two-pass approach: - Pass 1: regex finds each tag-type element and its full inner content (up to closing </li\|span\|td\|div>) - Pass 2: within the content, extracts the tag name from the tags=NAME URL parameter in the search link The old single-pass regex captured the ? wiki-link (first <a>) instead of the tag name (second <a>). The URL-param extraction works on Rule34 (40 tags), Safebooru.org (47 tags), and yande.re (3 tags). Gelbooru proper returns 0 (post page only has ? links with no tags= param) which is correct — Gelbooru uses the batch tag API instead. 2. prefetch_batch is now truly fire-and-forget: gelbooru.py and moebooru.py use asyncio.create_task instead of await for prefetch_batch. search() returns immediately. The probe + batch/HTML fetch runs in the background. Previously search() blocked on the probe, which made Rule34 searches take 5+ seconds (slow/broken Rule34 API response time). 3. Partial cache compose already fixed in the previous commit complements this: posts with 49/50 cached tags now show all available categories instead of nothing.	2026-04-09 19:31:43 -05:00
pax	165733c6e0	category_fetcher: compose from partial cache coverage try_compose_from_cache previously required 100% cache coverage — every tag in the post had to have a cached label or it returned False and populated nothing. One rare uncached tag out of 50 blocked the entire composition, leaving the post with zero categories even though 49/50 labels were available. Fix: compose whatever IS cached, return True when at least one tag got categorized. Tags not in the cache are simply absent from the categories dict (they stay in the flat tags string). The return value now means "the post has usable categories" rather than "the post has complete categories." This distinction matters because the dispatch logic uses the return value to decide whether to skip the fetch path — partial coverage is better than no coverage, and the missing tags get cached eventually when other posts that contain them get fetched. Verified against Gelbooru: post with 50 tags where 49 were cached now gets 49/50 categorized (Artist, Character, Copyright, General, Meta) instead of 0/50.	2026-04-09 19:23:57 -05:00
pax	8f8db62a5a	library_save: ensure categories before template render save_post_file is now async and gains an optional category_fetcher parameter. When the template uses any category token (%artist%, %character%, %copyright%, %general%, %meta%, %species%) AND the post's tag_categories is empty AND a fetcher is available, it awaits ensure_categories(post) before calling render_filename_template. This guarantees the filename is correct even when saving a post the user hasn't clicked (bypassing the info panel's on-display trigger). When the template uses only non-category tokens (%id%, %md5%, %score%, %rating%, %ext%) or is empty, the ensure check is skipped entirely — no HTTP overhead for the common case. Every existing caller already runs from _run_async closures, so the sync→async signature change is mechanical. The callers are updated in the next two commits to pass category_fetcher.	2026-04-09 19:18:13 -05:00
pax	f5954d1387	api: factory constructs CategoryFetcher for Gelbooru + Moebooru sites client_for_type gains optional db + site_id kwargs. When both are passed and api_type is gelbooru or moebooru, a CategoryFetcher is constructed and assigned to client.category_fetcher. The fetcher owns the per-tag cache, the batch tag API fast path, and the per-post HTML scrape fallback. Danbooru and e621 never get a fetcher — their inline JSON categorization is already optimal. Test Connection dialog and scripts don't pass db/site_id, so they get fetcher-less clients with the existing search behavior.	2026-04-09 19:15:57 -05:00
pax	834deecf57	moebooru: implement _post_view_url + prefetch wiring Override _post_view_url to return /post/show/{id} for the per-post HTML scrape path. No _tag_api_url override — Moebooru has no batch tag DAPI; the CategoryFetcher dispatch goes straight to per-post HTML for these sites. search() and get_post() now call prefetch_batch when a fetcher is attached, same fire-and-forget pattern as gelbooru.py.	2026-04-09 19:15:34 -05:00
pax	7f897df4b2	gelbooru: implement _post_view_url + _tag_api_url + prefetch wiring Overrides both URL methods from the base class: _post_view_url(post) -> /index.php?page=post&s=view&id={id} Universal HTML scrape path — works on Gelbooru proper, Rule34, Safebooru.org without auth. _tag_api_url() -> {base_url}/index.php Batch tag DAPI fast path. The CategoryFetcher's probe-and-cache determines at runtime whether the endpoint actually honors names=. Gelbooru proper: probe succeeds. Rule34: probe fails (garbage response), falls back to HTML. Safebooru.org: no auth, dispatch skips batch entirely. search() and get_post() now call await self.category_fetcher.prefetch_batch(posts) after building the post list, when a fetcher is attached. The prefetch is fire-and-forget — search returns immediately and the background tasks fill categories as the user reads. When no fetcher is attached (Test Connection dialog, scripts), this is a no-op and behavior is unchanged.	2026-04-09 19:15:02 -05:00
pax	5ba0441be7	e621: populate categories in get_post (latent bug fix)	2026-04-09 19:14:19 -05:00
pax	9001808951	danbooru: populate categories in get_post (latent bug fix)	2026-04-09 19:13:52 -05:00
pax	8f298e51fc	api: BooruClient virtual _post_view_url + _tag_api_url + category_fetcher attr Three additions to the base class, all default-inactive: _post_view_url(post) -> str \| None Override to provide the post-view HTML URL for the per-post category scrape path. Default None (Danbooru/e621 skip it). _tag_api_url() -> str \| None Override to provide the batch tag DAPI base URL for the fast path in CategoryFetcher. Default None. Only Gelbooru proper benefits — the fetcher's probe-and-cache determines at runtime whether the endpoint actually honors the names= parameter. self.category_fetcher = None Set externally by the factory (client_for_type) when db and site_id are available. Gelbooru-shape and Moebooru clients use it; Danbooru/e621 leave it None. No behavior change at this commit. Existing clients inherit the defaults and continue working identically.	2026-04-09 19:13:21 -05:00
pax	e00d88e1ec	api: CategoryFetcher module with HTML scrape + batch tag API + cache New module core/api/category_fetcher.py — the unified tag-category fetcher for boorus that don't return categories inline. Public surface: try_compose_from_cache(post) — instant, no HTTP. Builds post.tag_categories from cached (site_id, name) -> label entries. Returns True if every tag in the post is cached. fetch_via_tag_api(posts) — batch fast path. Collects uncached tags across posts, chunks into 500-name batches, GETs the tag DAPI. Only available when the client declares _tag_api_url AND has credentials (Gelbooru proper). Includes JSON/XML sniffing parser ported from the reverted code. fetch_post(post) — universal fallback. HTTP GETs the post-view HTML page, regex-extracts class="tag-type-X">name</a> markup. Works on every Gelbooru fork and every Moebooru deployment. Does NOT require auth. ensure_categories(post) — idempotent dispatch: cache compose -> batch API (if available) -> HTML scrape. Coalesces concurrent calls for the same post.id via an in-flight task dict. prefetch_batch(posts) — fire-and-forget background prefetch. ONE fetch path per invocation (no mixing batch + HTML). Probe-and-cache for the batch tag API: _batch_api_works = None -> not yet probed OR transient error (retry next call) _batch_api_works = True -> batch works (Gelbooru proper) _batch_api_works = False -> clean 200 + zero matching names (Rule34's broken names= filter) Transition to True/False is permanent per instance. Transient errors (HTTP error, timeout, parse exception) leave None so the next search retries the probe. HTML regex handles both standard tag-type-artist and combined- class forms like tag-link tag-type-artist (Konachan). Tag names normalized to underscore-separated lowercase. Canonical category order: Artist > Character > Copyright > Species > General > Meta > Lore (matches danbooru/e621 inline). Dead code at this commit — no integration yet.	2026-04-09 19:12:43 -05:00
pax	5395569213	db: re-add tag_types cache table with string labels + auto-prune Per-site tag-type cache for boorus that don't return categories inline. Uses string labels ("Artist", "Character", "Copyright", "General", "Meta") instead of the integer codes the reverted version used — the labels come directly from HTML class names, no mapping step needed. Schema: tag_types(site_id, name, label TEXT, fetched_at) PRIMARY KEY (site_id, name) Methods: get_tag_labels(site_id, names) — chunked 500-name SELECT set_tag_labels(site_id, mapping) — bulk INSERT OR REPLACE, auto-prunes oldest entries when the table exceeds 50k rows clear_tag_cache(site_id=None) — manual wipe, for future Settings UI "Clear tag cache" button The 50k row cap prevents unbounded growth over months of browsing multiple boorus. Normal usage (a few thousand unique tags per site) never reaches it. When exceeded, the oldest entries by fetched_at are pruned first — these are the tags the user hasn't encountered recently and would be re-fetched cheaply if needed. Migration: CREATE TABLE IF NOT EXISTS in _migrate(), non-breaking for existing databases.	2026-04-09 19:10:37 -05:00
pax	150970b56f	cache: delete_from_library cleans up library_meta + matches templated names Two related fixes that the old delete flow was missing: 1. delete_from_library now accepts an optional `db` parameter which it forwards to find_library_files. Without `db`, only digit-stem files match (the old behavior — preserved as a fallback). With `db`, templated filenames stored in library_meta also match, so post-refactor saves like 12345_hatsune_miku.jpg get unlinked too. Without this fix, "Unsave from Library" on a templated save was a silent no-op. 2. Always cleans up the library_meta row when called with `db`, not just when files were unlinked. Two cases this matters for: a. Files were on disk and unlinked → meta is now stale. b. Files were already gone but the meta lingered (orphan from a previous broken delete) → user asked to "unsave," meta should reflect that. This is the missing half of the cleanup that left some libraries with pathologically more meta rows than actual files.	2026-04-09 18:25:21 -05:00
pax	5976a81bb6	db: add reconcile_library_meta to clean up orphan meta rows The old delete_from_library deleted files from disk but never cleaned up the matching library_meta row. Result: pathologically the meta table can have many more rows than there are files on disk. This was harmless when the only consumer was tag-search (the meta would just match nothing useful), but it becomes a real problem the moment is_post_in_library / get_saved_post_ids start driving UI state — the saved-dot indicator would light up for posts whose files have been gone for ages. reconcile_library_meta() walks saved_dir() shallowly (root + one level of subdirs), collects every present post_id (digit-stem files plus templated filenames looked up via library_meta.filename), and DELETEs every meta row whose post_id isn't in that set. Returns the count of removed rows. Defensive: if saved_dir() exists but has zero files (e.g. removable drive temporarily unmounted), the method refuses to reconcile and returns 0. The cost of a false positive — wiping every meta row for a perfectly intact library — is higher than the cost of leaving stale rows around for one more session. The cache.py fix in the next commit makes future delete_from_library calls clean up after themselves. This method is the one-time catch-up for libraries that were already polluted before that fix.	2026-04-09 18:25:21 -05:00
pax	6f59de0c64	config: find_library_files now matches templated filenames When given an optional db handle, find_library_files queries library_meta for templated filenames belonging to the post and matches them alongside the legacy digit-stem stem == str(post_id) heuristic. Without db it degrades to the legacy-only behavior, so existing callers don't break — but every caller in the gui layer has a Database instance and will be updated to pass it. This is the foundation for the bookmark/browse saved-dot indicator fix and the delete_from_library fix in the next three commits.	2026-04-09 18:25:21 -05:00
pax	28348fa9ab	db: add is_post_in_library / get_saved_post_ids helpers The pre-template world used find_library_files(post_id) — a filesystem walk matching files whose stem equals str(post_id) — for "is this post saved?" checks across the bookmark dot indicator, browse dot indicator, Unsave menu visibility, etc. With templated filenames (e.g. 12345_hatsune_miku.jpg) the stem no longer equals the post id and the dots silently stop lighting up. Two new helpers, both indexed: - is_post_in_library(post_id) -> bool single check, SELECT 1 - get_saved_post_ids() -> set[int] batch fetch for grid scans Both go through library_meta which is keyed by post_id, so they're format-agnostic — they don't care whether the on-disk filename is 12345.jpg, mon3tr_(arknights).jpg, or anything else, as long as the save flow wrote a meta row. Every save site does this since the unified save_post_file refactor landed.	2026-04-09 18:25:21 -05:00
pax	f0b1fc9052	config: render_filename_template now matches the API client key casing The danbooru and e621 API clients store tag_categories with Capitalized keys ("Artist", "Character", "Copyright", "General", "Meta", "Species") — that's the convention info_panel and preview_pane already iterate against. render_filename_template was looking up lowercase keys, so every category token rendered empty even on Danbooru posts where the data was right there. Templates like "%id%_%character%" silently collapsed back to "{id}.{ext}". Fix: look up the Capitalized form, with a fallback chain (exact -> .lower() -> .capitalize()) so future drift between API clients in either direction won't silently break templates again. Verified against a real Danbooru save in the user's library: post 11122211 with tag_categories containing Artist=["yun_ze"], Character=["mon3tr_(arknights)"], etc. now renders "%id%_%character%" -> "11122211_mon3tr_(arknights).jpg" instead of "11122211.jpg".	2026-04-09 18:25:21 -05:00
pax	9248dd77aa	library: add unified save_post_file for the upcoming refactor New module core/library_save.py with one public function and two private helpers. Dead code at this commit — Phase 2 commits route the eight save sites through it one at a time. save_post_file(src, post, dest_dir, db, in_flight=None, explicit_name=None) - Renders the basename from library_filename_template, or uses explicit_name when set (Save As path). - Resolves collisions: same-post-on-disk hits return the basename unchanged so re-saves are idempotent; different-post collisions get sequential _1, _2, _3 suffixes. in_flight is consulted alongside on-disk state for batch members claimed earlier in the same call. - Conditionally writes library_meta when the resolved destination is inside saved_dir(), regardless of which save path called us. - Returns the resolved Path so callers can build status messages. _same_post_on_disk uses get_library_post_id_by_filename, falling back to the legacy v0.2.3 digit-stem heuristic for rows whose filename column is empty. Mirrors the digit-stem checks already in gui/library.py. Boundary rule: imports core.cache, core.config, core.db only. No gui/ imports — that's how main_window.py and bookmarks.py will both call in without circular imports.	2026-04-09 18:25:21 -05:00
pax	6075f31917	library: scaffold filename templates + DB column Adds the foundation that the unified save flow refactor builds on. No behavior change at this commit — empty default template means every save site still produces {id}{ext} like v0.2.3. - core/db.py: library_meta.filename column with non-breaking migration for legacy databases. Index on filename. New get_library_post_id_by_filename() lookup. filename kwarg on save_library_meta (defaults to "" for legacy callers). library_filename_template added to _DEFAULTS. - core/config.py: render_filename_template() with %id% %md5% %ext% %rating% %score% %artist% %character% %copyright% %general% %meta% %species% tokens. Sanitizes filesystem-reserved chars, collapses whitespace, strips leading dots/.., caps the rendered stem at 200 characters, falls back to post id when sanitization yields empty. - gui/settings.py: Library filename template input field next to the Library directory row, with a help label listing tokens and noting that Gelbooru/Moebooru can only resolve the basic ones.	2026-04-09 18:25:21 -05:00
pax	250b144806	Decouple bookmark folders from library folders, add move-aware save + submenu pickers everywhere Bookmark folders and library folders used to share identity through _db.get_folders() — the same string was both a row in favorite_folders and a directory under saved_dir. They look like one concept but they're two stores, and the cross-bleed produced a duplicate-on-move bug and made "Save to Library" silently re-file the bookmark too. Now they're independent name spaces: - library_folders() in core.config reads filesystem subdirs of saved_dir; the source of truth for every Save-to-Library menu - find_library_files(post_id) walks the library shallowly and is the new "is this saved?" / delete primitive - bookmark folders stay DB-backed and are only used for bookmark organization (filter combo, Move to Folder) - delete_from_library no longer takes a folder hint — walks every library folder by post id and deletes every match (also cleans up duplicates left by the old save-to-folder copy bug) - _save_to_library is move-aware: if the post is already in another library folder, atomic Path.rename() into the destination instead of re-copying from cache (the duplicate bug fix) - bookmark "Move to Folder" no longer also calls _copy_to_library; Save to Library no longer also calls move_bookmark_to_folder - settings export/import unchanged; favorite_folders table preserved so no migration UI additions: - Library tab right-click: Move to Folder submenu (single + multi), uses Path.rename for atomic moves - Bookmarks tab: − Folder button next to + Folder for deleting the selected bookmark folder (DB-only, library filesystem untouched) - Browse tab right-click: "Bookmark" replaced with "Bookmark as" submenu when not yet bookmarked (Unfiled / folders / + New); flat "Remove Bookmark" when already bookmarked - Embedded preview Bookmark button: same submenu shape via new bookmark_to_folder signal + set_bookmark_folders_callback - Popout Bookmark button: same shape — works in both browse and bookmarks tab modes - Popout Save button: Save-to-Library submenu via new save_to_folder + unsave_requested signals (drops save_toggle_requested + the _save_toggle_from_popout indirection) - Popout in library mode: Save button stays visible as Unsave; the rest of the toolbar (Bookmark / BL Tag / BL Post) is hidden State plumbing: - _update_fullscreen_state mirrors the embedded preview's _is_bookmarked / _is_saved instead of re-querying DB+filesystem, eliminating the popout state drift during async bookmark adds - Library tab Save button reads "Unsave" the entire time; Save button width bumped 60→75 so the label doesn't clip on tight themes - Embedded preview tracks _is_bookmarked alongside _is_saved so the new Bookmark-as submenu can flip to a flat unbookmark when active Naming: - "Unsorted" renamed to "Unfiled" everywhere user-facing — library Unfiled and bookmarks Unfiled now share one label. Internal comparison in library.py:_scan_files updated to match the combo.	2026-04-07 19:50:39 -05:00
pax	eb58d76bc0	Route async work through one persistent loop, lock shared httpx + DB writes Mixing `threading.Thread + asyncio.run` workers with the long-lived asyncio loop in gui/app.py is a real loop-affinity bug: the first worker thread to call `asyncio.run` constructs a throwaway loop, which the shared httpx clients then attach to, and the next call from the persistent loop fails with "Event loop is closed" / "attached to a different loop". This commit eliminates the pattern across the GUI and adds the locking + cleanup that should have been there from the start. Persistent loop accessor (core/concurrency.py — new) - set_app_loop / get_app_loop / run_on_app_loop. BooruApp registers the one persistent loop at startup; everything that wants to schedule async work calls run_on_app_loop instead of spawning a thread that builds its own loop. Three functions, ~30 lines, single source of truth for "the loop". Lazy-init lock + cleanup on shared httpx clients (core/api/base.py, core/api/e621.py, core/cache.py) - Each shared singleton (BooruClient._shared_client, E621Client._e621_client, cache._shared_client) now uses fast-path / locked-slow-path lazy init. Concurrent first-callers from the same loop can no longer both build a client and leak one (verified: 10 racing callers => 1 httpx instance). - Each module exposes an aclose helper that BooruApp.closeEvent runs via run_coroutine_threadsafe(...).result(timeout=5) BEFORE stopping the loop. The connection pool, keepalive sockets, and TLS state finally release cleanly instead of being abandoned at process exit. - E621Client tracks UA-change leftovers in _e621_to_close so the old client doesn't leak when api_user changes — drained in aclose_shared. GUI workers routed through the persistent loop (gui/sites.py, gui/bookmarks.py) - SiteDialog._on_detect / _on_test: replaced `threading.Thread(target=lambda: asyncio.run(...))` with run_on_app_loop. Results marshaled back through Qt Signals connected with QueuedConnection. Added _closed flag + _inflight futures list: closeEvent cancels pending coroutines and shorts out the result emit if the user closes the dialog mid-detect (no use-after-free on destroyed QObject). - BookmarksView._load_thumb_async: same swap. The existing thumb_ready signal already used QueuedConnection so the marshaling side was already correct. DB write serialization (core/db.py) - Database._write_lock = threading.RLock() — RLock not Lock so a writing method can call another writing method on the same thread without self-deadlocking. - New _write() context manager composes the lock + sqlite3's connection context manager (the latter handles BEGIN / COMMIT / ROLLBACK atomically). Every write method converted: add_site, update_site, delete_site, add_bookmark, add_bookmarks_batch, remove_bookmark, update_bookmark_cache_path, add_folder, remove_folder, rename_folder, move_bookmark_to_folder, add/remove_blacklisted_tag, add/remove_blacklisted_post, save_library_meta, remove_library_meta, set_setting, add_search_history, clear_search_history, remove_search_history, add_saved_search, remove_saved_search. - _migrate keeps using the lock + raw _conn context manager because it runs from inside the conn property's lazy init (where _write() would re-enter conn). - Reads stay lock-free and rely on WAL for reader concurrency. Verified under contention: 5 threads × 50 add_bookmark calls => 250 rows, zero corruption, zero "database is locked" errors. Smoke-tested with seven scenarios: get_app_loop raises before set, run_on_app_loop round-trips, lazy init creates exactly one client, 10 concurrent first-callers => 1 httpx, aclose_shared cleans up, RLock allows nested re-acquire, multi-threaded write contention.	2026-04-07 17:24:23 -05:00
pax	54ccc40477	Defensive hardening across core/* and popout overlay fix Sweep of defensive hardening across the core layers plus a related popout overlay regression that surfaced during verification. Database integrity (core/db.py) - Wrap delete_site, add_search_history, remove_folder, rename_folder, and _migrate in `with self.conn:` so partial commits can't leave orphan rows on a crash mid-method. - add_bookmark re-SELECTs the existing id when INSERT OR IGNORE collides on (site_id, post_id). Was returning Bookmark(id=0) silently, which then no-op'd update_bookmark_cache_path the next time the post was bookmarked. - get_bookmarks LIKE clauses now ESCAPE '%', '_', '\\' so user search literals stop acting as SQL wildcards (cat_ear no longer matches catear). Path traversal (core/db.py + core/config.py) - Validate folder names at write time via _validate_folder_name — rejects '..', os.sep, leading '.' / '~'. Permits Unicode/spaces/ parens so existing folders keep working. - saved_folder_dir() resolves the candidate path and refuses anything that doesn't relative_to the saved-images base. Defense in depth against folder strings that bypass the write-time validator. - gui/bookmarks.py and gui/app.py wrap add_folder calls in try/except ValueError and surface a QMessageBox.warning instead of crashing. Download safety (core/cache.py) - New _do_download(): payloads >=50MB stream to a tempfile in the destination dir and atomically os.replace into place; smaller payloads keep the existing buffer-then-write fast path. Both enforce a 500MB hard cap against the advertised Content-Length AND the running total inside the chunk loop (servers can lie). - Per-URL asyncio.Lock coalesces concurrent downloads of the same URL so two callers don't race write_bytes on the same path. - Image.MAX_IMAGE_PIXELS = 256M with DecompressionBombError handling in both converters. - _convert_ugoira_to_gif checks frame count + cumulative uncompressed size against UGOIRA_MAX_FRAMES / UGOIRA_MAX_UNCOMPRESSED_BYTES from ZipInfo headers BEFORE decompressing — defends against zip bombs. - _convert_animated_to_gif writes a .convfailed sentinel sibling on failure to break the re-decode-on-every-paint loop for malformed animated PNGs/WebPs. - _is_valid_media returns True (don't delete) on OSError so a transient EBUSY/permissions hiccup no longer triggers a delete + re-download loop on every access. - _referer_for() uses proper hostname suffix matching, not substring `in` (imgblahgelbooru.attacker.com no longer maps to gelbooru.com). - PIL handles wrapped in `with` blocks for deterministic cleanup. API client retry + visibility (core/api/*) - base.py: _request retries on httpx.NetworkError + ConnectError in addition to TimeoutException. test_connection no longer echoes the HTTP response body in the error string (it was an SSRF body-leak gadget when used via detect_site_type's redirect-following client). - detect.py + danbooru.py + e621.py + gelbooru.py + moebooru.py: every previously-swallowed exception in search/autocomplete/probe paths now logs at WARNING with type, message, and (where relevant) the response body prefix. Debugging "the site isn't working" used to be a total blackout. main_gui.py - file_dialog_platform DB probe failure prints to stderr instead of vanishing. Popout overlay (gui/preview.py + gui/app.py) - preview.py:79,141 — setAttribute(WA_StyledBackground, True) on _slideshow_toolbar and _slideshow_controls. Plain QWidget parents silently ignore QSS `background:` declarations without this attribute, which is why the popout overlay strip was rendering fully transparent (buttons styled, bar behind them showing the letterbox color). - app.py: bake _BASE_POPOUT_OVERLAY_QSS as a fallback prepended before the user's custom.qss in the loader. Custom themes that don't define overlay rules now still get a translucent black bar with white text + hairline borders. Bundled themes win on tie because their identical-specificity rules come last in the prepended string.	2026-04-07 17:24:19 -05:00
pax	463f77d8bb	Make info panel tag colors QSS-targetable, delete dead theme.py + green palette constants	2026-04-07 13:15:31 -05:00
pax	72150fc98b	Add BOORU_VIEWER_NO_HYPR_RULES + BOORU_VIEWER_NO_POPOUT_ASPECT_LOCK env vars for ricers with their own windowrules	2026-04-07 12:27:22 -05:00
pax	74f948a3e8	Speed up page loads — pre-fetch bookmarks/cache as sets, off-load PIL conversion to a worker	2026-04-07 11:36:23 -05:00
pax	2fbf2f6472	0.2.0: mpv backend, popout viewer, preview toolbar, API retry, SearchState refactor Video: - Replace Qt Multimedia with mpv via python-mpv + OpenGL render API - Hardware-accelerated decoding, frame-accurate seeking, proper EOF detection - Translucent overlay controls in both preview and popout - LC_NUMERIC=C for mpv locale compatibility Popout viewer (renamed from slideshow): - Floating toolbar + controls overlay with auto-hide (2s) - Window auto-resizes to content aspect ratio on navigation - Hyprland: hyprctl resizewindowpixel + keep_aspect_ratio prop - Window geometry persisted to DB across sessions - Smart F11 exit sizing (60% monitor, centered) Preview toolbar: - Bookmark, Save, BL Tag, BL Post, Popout buttons above preview - Save opens folder picker menu, shows Save/Unsave state - Blacklist actions have confirmation dialogs - Per-tab button visibility (Library: Save + Popout only) - Cross-tab state management with grid selection clearing Search & pagination: - SearchState dataclass replaces 8 scattered attrs + defensive getattr - Media type filter dropdown (All/Animated/Video/GIF/Audio) - API retry with backoff on 429/503/timeout - Infinite scroll dedup fix (local seen set per backfill round) - Prev/Next buttons hide at boundaries, "(end)" status indicator Grid: - Rubber band drag selection - Saved/bookmarked dots update instantly across all tabs - Library/bookmarks emit signals on file deletion for cross-tab sync Settings & misc: - Default site option - Max thumbnail cache setting (500MB default) - Source URLs clickable in info panel - Long URLs truncated to prevent splitter blowout - Bulk save no longer auto-bookmarks	2026-04-06 13:43:46 -05:00
pax	1a5dbff1bb	Clean up dead code and unused imports	2026-04-05 21:30:47 -05:00
pax	8467c0696b	Add post date to info line	2026-04-05 21:15:22 -05:00

1 2

77 Commits