20 Commits

Author SHA1 Message Date
pax
b65f8da837 security: fix #10 — validate media magic in first 16 bytes of stream
The previous flow streamed the full body to disk and called
_is_valid_media after completion. A hostile server that omits
Content-Type (so the early text/html guard doesn't fire) could
burn up to MAX_DOWNLOAD_BYTES (500MB) of bandwidth and cache-dir
write/delete churn before the post-download check rejected.

Refactors _do_download to accumulate chunks into a small header
buffer until at least 16 bytes have arrived, then runs
_looks_like_media against the buffer before committing to writing
the full payload. The 16-byte minimum handles servers that send
tiny chunks (chunked encoding with 1-byte chunks, slow trickle,
TCP MSS fragmentation) without false-failing on the first chunk.

Extracts _looks_like_media(bytes) as a sibling to _is_valid_media
(path) sharing the same magic-byte recognition. _looks_like_media
fails closed on empty input — when called from the streaming
validator, an empty header means the server returned nothing
useful. _is_valid_media keeps its OSError-fallback open behavior
for the on-disk path so transient EBUSY doesn't trigger a delete
+ re-download loop.

Audit-Ref: SECURITY_AUDIT.md finding #10
Severity: Low
2026-04-11 16:24:59 -05:00
pax
8f9e4f7e65 security: fix #8 — drop duplicate MAX_IMAGE_PIXELS set from cache.py
The cap is now installed by core/__init__.py (previous commit), so
the line in cache.py is redundant. Removing it leaves a single
authoritative location for the security-critical PIL setting.

Audit-Ref: SECURITY_AUDIT.md finding #8
Severity: Low
2026-04-11 16:21:37 -05:00
pax
5d348fa8be security: fix #5 — LRU cap on _url_locks to prevent memory leak
Replaces the unbounded defaultdict(asyncio.Lock) with an OrderedDict
guarded by _get_url_lock() and _evict_url_locks(). The cap is 4096
entries; LRU semantics keep the hot working set alive and oldest-
unlocked-first eviction trims back toward the cap on each new
insertion.

Eviction skips locks that are currently held — popping a lock that
a coroutine is mid-`async with` on would break its __aexit__. The
inner loop's evicted-flag handles the edge case where every
remaining entry is either the freshly inserted hash or held; in
that state the cap is briefly exceeded and the next insertion
retries, instead of looping forever.

Audit-Ref: SECURITY_AUDIT.md finding #5
Severity: Medium
2026-04-11 16:16:52 -05:00
pax
ec79be9c83 security: fix #1 — wire SSRF hook into cache download client
Adds validate_public_request to the cache module's shared httpx
client event_hooks. Covers image/video/thumbnail downloads, which
are the most likely exfil path — file_url comes straight from the
booru JSON response and previously followed any 3xx that landed,
so a hostile booru could point downloads at a private IP. Every
redirect hop is now rejected if the target is non-public.

The import is lazy inside _get_shared_client because
core.api.base imports log_connection from this module; a top-level
`from .api._safety import ...` would circular-import through
api/__init__.py during cache.py load. By the time
_get_shared_client is called the api package is fully loaded.

Audit-Ref: SECURITY_AUDIT.md finding #1
Severity: High
2026-04-11 16:10:50 -05:00
pax
264c421dff cache: skip .part files in evict_oldest
Prevents cache eviction from deleting a .part temp file that mpv's
stream-record is actively writing to. Prerequisite for the stream-record
plumbing in video_player.py.
2026-04-09 20:52:36 -05:00
pax
150970b56f cache: delete_from_library cleans up library_meta + matches templated names
Two related fixes that the old delete flow was missing:

1. delete_from_library now accepts an optional `db` parameter which
   it forwards to find_library_files. Without `db`, only digit-stem
   files match (the old behavior — preserved as a fallback). With
   `db`, templated filenames stored in library_meta also match,
   so post-refactor saves like 12345_hatsune_miku.jpg get unlinked
   too. Without this fix, "Unsave from Library" on a templated
   save was a silent no-op.

2. Always cleans up the library_meta row when called with `db`, not
   just when files were unlinked. Two cases this matters for:
     a. Files were on disk and unlinked → meta is now stale.
     b. Files were already gone but the meta lingered (orphan from
        a previous broken delete) → user asked to "unsave," meta
        should reflect that.
   This is the missing half of the cleanup that left some libraries
   with pathologically more meta rows than actual files.
2026-04-09 18:25:21 -05:00
pax
250b144806 Decouple bookmark folders from library folders, add move-aware save + submenu pickers everywhere
Bookmark folders and library folders used to share identity through
_db.get_folders() — the same string was both a row in favorite_folders
and a directory under saved_dir. They look like one concept but they're
two stores, and the cross-bleed produced a duplicate-on-move bug and
made "Save to Library" silently re-file the bookmark too.

Now they're independent name spaces:
  - library_folders() in core.config reads filesystem subdirs of
    saved_dir; the source of truth for every Save-to-Library menu
  - find_library_files(post_id) walks the library shallowly and is the
    new "is this saved?" / delete primitive
  - bookmark folders stay DB-backed and are only used for bookmark
    organization (filter combo, Move to Folder)
  - delete_from_library no longer takes a folder hint — walks every
    library folder by post id and deletes every match (also cleans up
    duplicates left by the old save-to-folder copy bug)
  - _save_to_library is move-aware: if the post is already in another
    library folder, atomic Path.rename() into the destination instead
    of re-copying from cache (the duplicate bug fix)
  - bookmark "Move to Folder" no longer also calls _copy_to_library;
    Save to Library no longer also calls move_bookmark_to_folder
  - settings export/import unchanged; favorite_folders table preserved
    so no migration

UI additions:
  - Library tab right-click: Move to Folder submenu (single + multi),
    uses Path.rename for atomic moves
  - Bookmarks tab: − Folder button next to + Folder for deleting the
    selected bookmark folder (DB-only, library filesystem untouched)
  - Browse tab right-click: "Bookmark" replaced with "Bookmark as"
    submenu when not yet bookmarked (Unfiled / folders / + New); flat
    "Remove Bookmark" when already bookmarked
  - Embedded preview Bookmark button: same submenu shape via new
    bookmark_to_folder signal + set_bookmark_folders_callback
  - Popout Bookmark button: same shape — works in both browse and
    bookmarks tab modes
  - Popout Save button: Save-to-Library submenu via new save_to_folder
    + unsave_requested signals (drops save_toggle_requested + the
    _save_toggle_from_popout indirection)
  - Popout in library mode: Save button stays visible as Unsave; the
    rest of the toolbar (Bookmark / BL Tag / BL Post) is hidden

State plumbing:
  - _update_fullscreen_state mirrors the embedded preview's
    _is_bookmarked / _is_saved instead of re-querying DB+filesystem,
    eliminating the popout state drift during async bookmark adds
  - Library tab Save button reads "Unsave" the entire time; Save
    button width bumped 60→75 so the label doesn't clip on tight themes
  - Embedded preview tracks _is_bookmarked alongside _is_saved so the
    new Bookmark-as submenu can flip to a flat unbookmark when active

Naming:
  - "Unsorted" renamed to "Unfiled" everywhere user-facing — library
    Unfiled and bookmarks Unfiled now share one label. Internal
    comparison in library.py:_scan_files updated to match the combo.
2026-04-07 19:50:39 -05:00
pax
eb58d76bc0 Route async work through one persistent loop, lock shared httpx + DB writes
Mixing `threading.Thread + asyncio.run` workers with the long-lived
asyncio loop in gui/app.py is a real loop-affinity bug: the first worker
thread to call `asyncio.run` constructs a throwaway loop, which the
shared httpx clients then attach to, and the next call from the
persistent loop fails with "Event loop is closed" / "attached to a
different loop". This commit eliminates the pattern across the GUI and
adds the locking + cleanup that should have been there from the start.

Persistent loop accessor (core/concurrency.py — new)
- set_app_loop / get_app_loop / run_on_app_loop. BooruApp registers the
  one persistent loop at startup; everything that wants to schedule
  async work calls run_on_app_loop instead of spawning a thread that
  builds its own loop. Three functions, ~30 lines, single source of
  truth for "the loop".

Lazy-init lock + cleanup on shared httpx clients (core/api/base.py,
core/api/e621.py, core/cache.py)
- Each shared singleton (BooruClient._shared_client, E621Client._e621_client,
  cache._shared_client) now uses fast-path / locked-slow-path lazy init.
  Concurrent first-callers from the same loop can no longer both build
  a client and leak one (verified: 10 racing callers => 1 httpx instance).
- Each module exposes an aclose helper that BooruApp.closeEvent runs via
  run_coroutine_threadsafe(...).result(timeout=5) BEFORE stopping the
  loop. The connection pool, keepalive sockets, and TLS state finally
  release cleanly instead of being abandoned at process exit.
- E621Client tracks UA-change leftovers in _e621_to_close so the old
  client doesn't leak when api_user changes — drained in aclose_shared.

GUI workers routed through the persistent loop (gui/sites.py,
gui/bookmarks.py)
- SiteDialog._on_detect / _on_test: replaced
  `threading.Thread(target=lambda: asyncio.run(...))` with
  run_on_app_loop. Results marshaled back through Qt Signals connected
  with QueuedConnection. Added _closed flag + _inflight futures list:
  closeEvent cancels pending coroutines and shorts out the result emit
  if the user closes the dialog mid-detect (no use-after-free on
  destroyed QObject).
- BookmarksView._load_thumb_async: same swap. The existing thumb_ready
  signal already used QueuedConnection so the marshaling side was
  already correct.

DB write serialization (core/db.py)
- Database._write_lock = threading.RLock() — RLock not Lock so a
  writing method can call another writing method on the same thread
  without self-deadlocking.
- New _write() context manager composes the lock + sqlite3's connection
  context manager (the latter handles BEGIN / COMMIT / ROLLBACK
  atomically). Every write method converted: add_site, update_site,
  delete_site, add_bookmark, add_bookmarks_batch, remove_bookmark,
  update_bookmark_cache_path, add_folder, remove_folder, rename_folder,
  move_bookmark_to_folder, add/remove_blacklisted_tag,
  add/remove_blacklisted_post, save_library_meta, remove_library_meta,
  set_setting, add_search_history, clear_search_history,
  remove_search_history, add_saved_search, remove_saved_search.
- _migrate keeps using the lock + raw _conn context manager because
  it runs from inside the conn property's lazy init (where _write()
  would re-enter conn).
- Reads stay lock-free and rely on WAL for reader concurrency. Verified
  under contention: 5 threads × 50 add_bookmark calls => 250 rows,
  zero corruption, zero "database is locked" errors.

Smoke-tested with seven scenarios: get_app_loop raises before set,
run_on_app_loop round-trips, lazy init creates exactly one client,
10 concurrent first-callers => 1 httpx, aclose_shared cleans up,
RLock allows nested re-acquire, multi-threaded write contention.
2026-04-07 17:24:23 -05:00
pax
54ccc40477 Defensive hardening across core/* and popout overlay fix
Sweep of defensive hardening across the core layers plus a related popout
overlay regression that surfaced during verification.

Database integrity (core/db.py)
- Wrap delete_site, add_search_history, remove_folder, rename_folder,
  and _migrate in `with self.conn:` so partial commits can't leave
  orphan rows on a crash mid-method.
- add_bookmark re-SELECTs the existing id when INSERT OR IGNORE
  collides on (site_id, post_id). Was returning Bookmark(id=0)
  silently, which then no-op'd update_bookmark_cache_path the next
  time the post was bookmarked.
- get_bookmarks LIKE clauses now ESCAPE '%', '_', '\\' so user search
  literals stop acting as SQL wildcards (cat_ear no longer matches
  catear).

Path traversal (core/db.py + core/config.py)
- Validate folder names at write time via _validate_folder_name —
  rejects '..', os.sep, leading '.' / '~'. Permits Unicode/spaces/
  parens so existing folders keep working.
- saved_folder_dir() resolves the candidate path and refuses anything
  that doesn't relative_to the saved-images base. Defense in depth
  against folder strings that bypass the write-time validator.
- gui/bookmarks.py and gui/app.py wrap add_folder calls in try/except
  ValueError and surface a QMessageBox.warning instead of crashing.

Download safety (core/cache.py)
- New _do_download(): payloads >=50MB stream to a tempfile in the
  destination dir and atomically os.replace into place; smaller
  payloads keep the existing buffer-then-write fast path. Both
  enforce a 500MB hard cap against the advertised Content-Length AND
  the running total inside the chunk loop (servers can lie).
- Per-URL asyncio.Lock coalesces concurrent downloads of the same
  URL so two callers don't race write_bytes on the same path.
- Image.MAX_IMAGE_PIXELS = 256M with DecompressionBombError handling
  in both converters.
- _convert_ugoira_to_gif checks frame count + cumulative uncompressed
  size against UGOIRA_MAX_FRAMES / UGOIRA_MAX_UNCOMPRESSED_BYTES from
  ZipInfo headers BEFORE decompressing — defends against zip bombs.
- _convert_animated_to_gif writes a .convfailed sentinel sibling on
  failure to break the re-decode-on-every-paint loop for malformed
  animated PNGs/WebPs.
- _is_valid_media returns True (don't delete) on OSError so a
  transient EBUSY/permissions hiccup no longer triggers a delete +
  re-download loop on every access.
- _referer_for() uses proper hostname suffix matching, not substring
  `in` (imgblahgelbooru.attacker.com no longer maps to gelbooru.com).
- PIL handles wrapped in `with` blocks for deterministic cleanup.

API client retry + visibility (core/api/*)
- base.py: _request retries on httpx.NetworkError + ConnectError in
  addition to TimeoutException. test_connection no longer echoes the
  HTTP response body in the error string (it was an SSRF body-leak
  gadget when used via detect_site_type's redirect-following client).
- detect.py + danbooru.py + e621.py + gelbooru.py + moebooru.py:
  every previously-swallowed exception in search/autocomplete/probe
  paths now logs at WARNING with type, message, and (where relevant)
  the response body prefix. Debugging "the site isn't working" used
  to be a total blackout.

main_gui.py
- file_dialog_platform DB probe failure prints to stderr instead of
  vanishing.

Popout overlay (gui/preview.py + gui/app.py)
- preview.py:79,141 — setAttribute(WA_StyledBackground, True) on
  _slideshow_toolbar and _slideshow_controls. Plain QWidget parents
  silently ignore QSS `background:` declarations without this
  attribute, which is why the popout overlay strip was rendering
  fully transparent (buttons styled, bar behind them showing the
  letterbox color).
- app.py: bake _BASE_POPOUT_OVERLAY_QSS as a fallback prepended
  before the user's custom.qss in the loader. Custom themes that
  don't define overlay rules now still get a translucent black
  bar with white text + hairline borders. Bundled themes win on
  tie because their identical-specificity rules come last in the
  prepended string.
2026-04-07 17:24:19 -05:00
pax
74f948a3e8 Speed up page loads — pre-fetch bookmarks/cache as sets, off-load PIL conversion to a worker 2026-04-07 11:36:23 -05:00
pax
2fbf2f6472 0.2.0: mpv backend, popout viewer, preview toolbar, API retry, SearchState refactor
Video:
- Replace Qt Multimedia with mpv via python-mpv + OpenGL render API
- Hardware-accelerated decoding, frame-accurate seeking, proper EOF detection
- Translucent overlay controls in both preview and popout
- LC_NUMERIC=C for mpv locale compatibility

Popout viewer (renamed from slideshow):
- Floating toolbar + controls overlay with auto-hide (2s)
- Window auto-resizes to content aspect ratio on navigation
- Hyprland: hyprctl resizewindowpixel + keep_aspect_ratio prop
- Window geometry persisted to DB across sessions
- Smart F11 exit sizing (60% monitor, centered)

Preview toolbar:
- Bookmark, Save, BL Tag, BL Post, Popout buttons above preview
- Save opens folder picker menu, shows Save/Unsave state
- Blacklist actions have confirmation dialogs
- Per-tab button visibility (Library: Save + Popout only)
- Cross-tab state management with grid selection clearing

Search & pagination:
- SearchState dataclass replaces 8 scattered attrs + defensive getattr
- Media type filter dropdown (All/Animated/Video/GIF/Audio)
- API retry with backoff on 429/503/timeout
- Infinite scroll dedup fix (local seen set per backfill round)
- Prev/Next buttons hide at boundaries, "(end)" status indicator

Grid:
- Rubber band drag selection
- Saved/bookmarked dots update instantly across all tabs
- Library/bookmarks emit signals on file deletion for cross-tab sync

Settings & misc:
- Default site option
- Max thumbnail cache setting (500MB default)
- Source URLs clickable in info panel
- Long URLs truncated to prevent splitter blowout
- Bulk save no longer auto-bookmarks
2026-04-06 13:43:46 -05:00
pax
39733e4865 Convert animated PNG and WebP to GIF for Qt playback
PIL extracts frames with durations, saves as animated GIF.
Non-animated PNG/WebP kept as-is. Converted on download and
on cache access (for already-cached files). Same pattern as
ugoira zip conversion.
2026-04-05 19:41:34 -05:00
pax
1807f77dd4 Fix Gelbooru CDN — pass Referer header per-request
Shared client doesn't set Referer globally since it varies per
domain. Now passed as per-request header so Gelbooru CDN doesn't
return HTML captcha pages.
2026-04-05 17:45:57 -05:00
pax
ee9d06755f Connection pooling for thumbnails, wider score spinbox
Shared httpx.AsyncClient reuses TLS connections across thumbnail
downloads — avoids 40 separate TLS handshakes per page on Windows.
Score spinbox widened to 90px with explicit arrow buttons.
2026-04-05 16:19:10 -05:00
pax
e8f72c6fe6 Add Network tab to settings — shows all connected hosts
Logs every outgoing connection (API requests and image downloads)
with timestamps. Network tab in Settings shows all hosts contacted
this session with request counts. No telemetry, just transparency.
2026-04-04 21:11:01 -05:00
pax
148c1c3a26 Session cache option, zip conversion fix, page boundary nav
- Add "Clear cache on exit" checkbox in Settings > Cache
- Fix ugoira zip conversion: don't crash if frames fail, verify gif
  exists before deleting zip
- Arrow key past last/first post loads next/prev page
2026-04-04 19:49:09 -05:00
pax
ce51bfd98d Fix ugoira conversion — filter non-image files, skip bad frames 2026-04-04 19:43:47 -05:00
pax
2f029da916 Fix ugoira zip conversion — convert cached zips before returning
Previously cached .zip files passed the valid media check and were
returned without conversion. Now zips are detected and converted
to GIF before the general cache check.
2026-04-04 19:42:36 -05:00
pax
526606c7c5 Convert Pixiv ugoira zips to animated GIFs, add filetype to info panel
- Detect .zip files (Pixiv ugoira) and convert frames to animated GIF
- Cache the converted GIF so subsequent loads are instant
- Add filetype field to the info panel
- Add ZIP to valid media magic bytes
2026-04-04 19:40:49 -05:00
pax
b10c00d6bf Initial release — booru image viewer with Qt6 GUI and Textual TUI
Supports Danbooru, Gelbooru, Moebooru, and e621. Features include tag search
with autocomplete, favorites with folders, save-to-library, video playback,
drag-and-drop, multi-select, custom CSS theming, and cross-platform support.
2026-04-04 06:00:50 -05:00