booru-viewer/tests/core/test_db.py
pax bf14466382 Add Phase A test suite for core/ primitives
First regression-test layer for booru-viewer. Pure Python — no Qt, no
mpv, no network, no real filesystem outside tmp_path. Locks in the
security and concurrency invariants from the 54ccc40 + eb58d76 hardening
commits ahead of the upcoming popout state machine refactor (Prompt 3),
which needs a stable baseline to refactor against.

16 tests across five files mirroring the source layout under
booru_viewer/core/:

- tests/core/test_db.py (4):
  - _validate_folder_name rejection rules (.., /foo, \\foo, .hidden,
    ~user, empty) and acceptance categories (unicode, spaces, parens)
  - add_bookmark INSERT OR IGNORE collision returns the existing row
    id (locks in the lastrowid=0 fix)
  - get_bookmarks LIKE escaping (literal cat_ear does not match catear)

- tests/core/test_cache.py (7):
  - _referer_for hostname suffix matching (gelbooru.com / donmai.us
    apex rewrite, both exact-match and subdomain)
  - _referer_for rejects substring attackers
    (imgblahgelbooru.attacker.com does NOT pick up the booru referer)
  - ugoira frame-count and uncompressed-size caps refuse zip bombs
    before any decompression
  - _do_download MAX_DOWNLOAD_BYTES enforced both at the
    Content-Length pre-check AND in the chunk-loop running total
  - _is_valid_media returns True on OSError (no delete + redownload
    loop on transient EBUSY)

- tests/core/test_config.py (2):
  - saved_folder_dir rejects literal .. and ../escape
  - find_library_files walks root + 1 level, filters by
    MEDIA_EXTENSIONS, exact post-id stem match

- tests/core/test_concurrency.py (2):
  - get_app_loop raises RuntimeError before set_app_loop is called
  - run_on_app_loop round-trips a coroutine result from a worker
    thread loop back to the test thread

- tests/core/api/test_base.py (1):
  - BooruClient._shared_client lazy singleton constructor-once under
    10-thread first-call race

Plus tests/conftest.py with fixtures: tmp_db, tmp_library,
reset_app_loop, reset_shared_clients. All fixtures use tmp_path or
reset module-level globals around the test so the suite is parallel-
safe.

pyproject.toml:
- New [project.optional-dependencies] test extra: pytest>=8.0,
  pytest-asyncio>=0.23
- New [tool.pytest.ini_options]: asyncio_mode = "auto",
  testpaths = ["tests"]

README.md:
- Linux install section gains "Run tests" with the
  pip install -e ".[test]" + pytest tests/ invocation

Phase B (post-sweep VideoPlayer regression tests for the seek slider
pin, _pending_mute lazy replay, and volume replay) is deferred to
Prompt 3's state machine work — VideoPlayer cannot be instantiated
without QApplication and a real mpv, which is out of scope for a
unit test suite. Once the state machine carves the pure-Python state
out of VideoPlayer, those tests become trivial against the helper
module.

Suite runs in 0.07s (16 tests). Independent of Qt/mpv/network/ffmpeg.

Test cases for Prompt 3:
- (already covered) — this IS the test suite Prompt 3 builds on top of
2026-04-08 18:50:00 -05:00

99 lines
4.0 KiB
Python

"""Tests for `booru_viewer.core.db` — folder name validation, INSERT OR
IGNORE collision handling, and LIKE escaping.
These tests lock in the `54ccc40` security/correctness fixes:
- `_validate_folder_name` rejects path-traversal shapes before they hit the
filesystem in `saved_folder_dir`
- `add_bookmark` re-SELECTs the actual row id after an INSERT OR IGNORE
collision so the returned `Bookmark.id` is never the bogus 0 that broke
`update_bookmark_cache_path`
- `get_bookmarks` escapes the SQL LIKE wildcards `_` and `%` so a search for
`cat_ear` doesn't bleed into `catear` / `catXear`
"""
from __future__ import annotations
import pytest
from booru_viewer.core.db import _validate_folder_name
# -- _validate_folder_name --
def test_validate_folder_name_rejects_traversal():
"""Every shape that could escape the saved-images dir or hit a hidden
file must raise ValueError. One assertion per rejection rule so a
failure points at the exact case."""
with pytest.raises(ValueError):
_validate_folder_name("") # empty
with pytest.raises(ValueError):
_validate_folder_name("..") # dotdot literal
with pytest.raises(ValueError):
_validate_folder_name(".") # dot literal
with pytest.raises(ValueError):
_validate_folder_name("/foo") # forward slash
with pytest.raises(ValueError):
_validate_folder_name("foo/bar") # embedded forward slash
with pytest.raises(ValueError):
_validate_folder_name("\\foo") # backslash
with pytest.raises(ValueError):
_validate_folder_name(".hidden") # leading dot
with pytest.raises(ValueError):
_validate_folder_name("~user") # leading tilde
def test_validate_folder_name_accepts_unicode_and_punctuation():
"""Common real-world folder names must pass through unchanged. The
guard is meant to block escape shapes, not normal naming."""
assert _validate_folder_name("miku(lewd)") == "miku(lewd)"
assert _validate_folder_name("cat ear") == "cat ear"
assert _validate_folder_name("日本語") == "日本語"
assert _validate_folder_name("foo-bar") == "foo-bar"
assert _validate_folder_name("foo.bar") == "foo.bar" # dot OK if not leading
# -- add_bookmark INSERT OR IGNORE collision --
def test_add_bookmark_collision_returns_existing_id(tmp_db):
"""Calling `add_bookmark` twice with the same (site_id, post_id) must
return the same row id on the second call, not the stale `lastrowid`
of 0 that INSERT OR IGNORE leaves behind. Without the re-SELECT fix,
any downstream `update_bookmark_cache_path(id=0, ...)` silently
no-ops, breaking the cache-path linkage."""
site = tmp_db.add_site("test", "http://example.test", "danbooru")
bm1 = tmp_db.add_bookmark(
site_id=site.id, post_id=42, file_url="http://example.test/42.jpg",
preview_url=None, tags="cat",
)
bm2 = tmp_db.add_bookmark(
site_id=site.id, post_id=42, file_url="http://example.test/42.jpg",
preview_url=None, tags="cat",
)
assert bm1.id != 0
assert bm2.id == bm1.id
# -- get_bookmarks LIKE escaping --
def test_get_bookmarks_like_escaping(tmp_db):
"""A search for the literal tag `cat_ear` must NOT match `catear` or
`catXear`. SQLite's LIKE treats `_` as a single-char wildcard unless
explicitly escaped — without `ESCAPE '\\\\'` the search would return
all three rows."""
site = tmp_db.add_site("test", "http://example.test", "danbooru")
tmp_db.add_bookmark(
site_id=site.id, post_id=1, file_url="http://example.test/1.jpg",
preview_url=None, tags="cat_ear",
)
tmp_db.add_bookmark(
site_id=site.id, post_id=2, file_url="http://example.test/2.jpg",
preview_url=None, tags="catear",
)
tmp_db.add_bookmark(
site_id=site.id, post_id=3, file_url="http://example.test/3.jpg",
preview_url=None, tags="catXear",
)
results = tmp_db.get_bookmarks(search="cat_ear")
tags_returned = {b.tags for b in results}
assert tags_returned == {"cat_ear"}