Files
wordsearch/wordsearch-specs.md
2026-05-04 09:45:17 +01:00

370 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Word Search Puzzle Generator — Spec
## Overview
A self-hosted web app that generates printable word search puzzles for kids
(and grownups). Themed word lists are managed through the UI and stored as
JSON files on disk. Every puzzle is configured explicitly per generation —
no sticky difficulty presets.
## Tech Stack
- **Python 3.14+**
- **FastAPI** — web framework
- **Jinja2** — server-rendered HTML templates (no JS framework, no build step)
- **reportlab** — PDF generation
- **uvicorn** — ASGI server
- **Vanilla JS** — minimal, only for the theme editor (textarea + fetch)
- **Storage:** JSON files on disk under `themes/`. No database.
Single language, single process, no build pipeline, no DB.
## Deployment
- Dockerfile + `docker-compose.yml`
- Single container, single port (default `8000`)
- Mount `./themes` as a volume
- Behind Pangolin/Traefik if exposing on a subdomain; otherwise hit the LXC IP
## Directory Layout
```
wordsearch/
├── app/
│ ├── main.py # FastAPI routes
│ ├── generator.py # Grid building + word placement
│ ├── normaliser.py # Word normalisation + prefix stripping
│ ├── pdf.py # PDF rendering (reportlab)
│ ├── themes.py # Load / save / list theme JSON files
│ └── templates/
│ ├── base.html
│ ├── index.html # Generate form
│ ├── themes.html # Theme list
│ └── theme_edit.html
├── themes/ # JSON theme files (mounted volume)
├── static/style.css
├── requirements.txt
├── Dockerfile
└── docker-compose.yml
```
## Web UI
### `/` — Generate Puzzle
```
Theme: [ Mr Men Characters ▾ ]
Grid size: [ 12 ] (5 25)
Words: [ 10 ] (1 30)
Min length: [ 3 ]
Max length: [ 12 ] (clamped to grid size)
Title: [ ] (optional, overrides theme name)
Directions:
☑ Horizontal (→) — always on, locked
☑ Vertical (↓) — always on, locked
☐ Diagonal (↘ ↗)
☐ Reversed (← ↑ ↖ ↙)
☐ Allow overlapping words
[ Generate ]
```
- All optional toggles default **off** on every page load — no sticky state.
- "Generate" → POSTs the form, streams the PDF straight back as a download
(`Content-Disposition: attachment; filename="<slug>_<timestamp>.pdf"`).
- The server keeps no copy on disk; the browser is the only place the PDF
lives.
### `/themes` — Theme List
- Table of existing themes: name, slug, word count, edit/delete buttons
- "New theme" button → `/themes/new`
### `/themes/new` and `/themes/{slug}/edit` — Theme Editor
```
Theme name: [ ] (display name, e.g. "Sea Creatures")
Slug: [ ] (filename; auto-generated on create, locked on edit)
Words (one per line):
┌────────────────────────────────────────┐
│ Mr Tickle │
│ Mr Happy │
│ Little Miss Sunshine │
│ ... │
└────────────────────────────────────────┘
Live preview:
Mr Tickle → TICKLE
Mr Happy → HAPPY
Little Miss Sunshine → LITTLEMISSSUNSHINE
[ Save ] [ Delete ]
```
- The live preview shows what each word will look like in the grid after
normalisation. Updates on textarea change (debounced).
- Save writes `themes/<slug>.json`. Delete confirms then removes.
### Auth
None. Homelab use. Add HTTP basic auth via FastAPI middleware if exposed
publicly later.
## Routes
| Method | Path | Purpose |
|--------|----------------------------|----------------------------------------|
| GET | `/` | Generate form |
| POST | `/generate` | Build puzzle, return PDF download |
| GET | `/themes` | List themes |
| GET | `/themes/new` | New theme form |
| GET | `/themes/{slug}/edit` | Edit theme form |
| POST | `/themes` | Create theme |
| POST | `/themes/{slug}` | Update theme |
| POST | `/themes/{slug}/delete` | Delete theme |
| GET | `/api/themes` | JSON list of themes (dropdown source) |
| POST | `/api/normalise` | Preview normalisation for a list of |
| | | words (used by the editor live preview)|
## Word Normalisation
For every input word, the generator produces two forms:
- **Display form** — original string, untouched. Goes on the PDF word list.
- **Grid form** — what gets placed in the grid.
### Grid form rules
1. Strip any **leading prefix tokens** (case-insensitive, with or without
trailing dot). Stripping is **token-based**, not substring — `"Misty"`
does not match `"Miss"`.
2. Uppercase the result.
3. Strip all whitespace and punctuation.
### Stripped prefixes
A constant in `normaliser.py`:
```python
PREFIXES = {
"mr", "mrs", "ms", "miss",
"dr", "sir", "dame", "lord", "lady", "master",
"captain", "capt", "cpt",
"professor", "prof",
"saint", "st",
}
```
### Examples
| Input | Display form | Grid form |
|-------------------------|-----------------------|---------------------|
| `Mr Tickle` | `Mr Tickle` | `TICKLE` |
| `Mr. Bump` | `Mr. Bump` | `BUMP` |
| `Little Miss Sunshine` | `Little Miss Sunshine`| `LITTLEMISSSUNSHINE`|
| `Dr Octopus` | `Dr Octopus` | `OCTOPUS` |
| `Sir Lancelot` | `Sir Lancelot` | `LANCELOT` |
| `Misty` | `Misty` | `MISTY` |
| `cucumber` | `cucumber` | `CUCUMBER` |
| `Captain America` | `Captain America` | `AMERICA` |
### Prefix-only edge cases
- Multiple consecutive prefixes get stripped: `"Mr Dr Strange"``STRANGE`.
- Word that is **only** a prefix: `"Mr"` → keep as `MR`, log a warning to
stderr (probably a theme typo).
- After stripping, if grid form is empty, skip the word and warn.
## Word Selection
1. Load theme word list.
2. Compute grid form for each word.
3. Filter by length: `min_length ≤ len(grid_form) ≤ min(max_length, grid_size)`.
4. Shuffle.
5. Place words one by one until either:
- the requested word count is reached, or
- the filtered list is exhausted.
6. If fewer words placed than requested, generate the puzzle anyway and log
a stderr warning — there is no result page, so warnings aren't surfaced
to the user; they have to verify the dropdown's word count matches their
target before generating.
### Length filter validation
If the filter matches **fewer** words than requested, generate with what's
available and warn (e.g. "asked for 10 words; only 4 in the theme matched
your length filter"). Don't block — the user might genuinely want a sparse
puzzle.
## Word Placement Rules
### Direction set
The active directions are computed from the toggles:
```
base = { → , ↓ } # always on
if diagonal_toggle:
base |= { ↘ , ↗ }
if reversed_toggle:
base |= { reverse(d) for d in base } # add reversal of everything in base
```
So:
| Diag | Rev | Active directions |
|------|-----|-------------------|
| ☐ | ☐ | → ↓ |
| ☑ | ☐ | → ↓ ↘ ↗ |
| ☐ | ☑ | → ↓ ← ↑ |
| ☑ | ☑ | → ↓ ↘ ↗ ← ↑ ↖ ↙ (all 8) |
Direction vectors `(Δrow, Δcol)`:
| Symbol | Δrow | Δcol |
|--------|------|------|
| → | 0 | +1 |
| ↓ | +1 | 0 |
| ↘ | +1 | +1 |
| ↗ | 1 | +1 |
| ← | 0 | 1 |
| ↑ | 1 | 0 |
| ↖ | 1 | 1 |
| ↙ | +1 | 1 |
### Placement algorithm
For each word:
1. Pick a random direction from the active set.
2. Pick a random valid starting cell such that the entire word fits within
the grid (compute bounds from word length + direction vector).
3. Check collision against already-placed words:
- If overlap toggle is **off**: every cell the word would occupy must be
currently empty.
- If overlap toggle is **on**: every cell must either be empty OR contain
the same letter the word would place there (letter-sharing intersections).
4. If the placement is valid, commit it. Otherwise retry up to 200 attempts
per word (re-rolling direction + start each time), then skip with a
warning.
### Grid fill
After all words are placed, fill remaining empty cells with random uppercase
AZ letters. All letters are uppercase.
## Theme File Format
`themes/<slug>.json`:
```json
{
"name": "Mr Men Characters",
"words": [
"Mr Tickle",
"Mr Happy",
"Mr Bump",
"Little Miss Sunshine",
"Mr Strong",
"Mr Tall",
"Mr Small"
]
}
```
- `name` — human-readable, shown in dropdown and on PDF.
- `words` — list of strings, one per line in the editor textarea.
### Theme curation guidance (in README)
- Aim for **2030 words per theme** with a healthy spread of lengths
(some short, some long) so length filters give useful results across
age groups.
- Avoid hyphens and apostrophes when possible — they get stripped.
`"Spider-Man"``SPIDERMAN` is fine, but `"don't"``DONT` may surprise.
## PDF Layout
A4 portrait, single page, plain.
### Top
- Title (theme name or override), centred, large.
### Middle: Grid
- Centred, monospace font, generous letter spacing.
- No cell borders (cleaner look, easier to scan).
- Sized so a 12×12 grid is comfortable to read; scales down for larger grids.
### Bottom: Word list
- Heading: "Find these words"
- 24 columns depending on word count.
- Each entry rendered as:
- `GRIDFORM` (bold, uppercase) followed by ` (original prefix tokens)` in
lighter weight — only if the word had a stripped prefix.
- Words with no prefix render bare: just `GRIDFORM` in bold uppercase.
- Examples in the rendered list:
```
TICKLE (Mr) BUMP (Mr)
HAPPY (Mr) STRONG (Mr)
SUNSHINE (Little Miss)
CUCUMBER TOMATO
AMERICA (Captain)
```
### Active options subtitle
Directly under the title, in small uppercase letter-spaced grey text, list
any *extra* directions or modes that are enabled beyond the always-on
horizontal + vertical baseline. The labels render only when at least one
of `diagonal`, `reversed`, or `allow_overlap` is on:
```
DIAGONAL · REVERSED · OVERLAPPING
```
If none of them are enabled, render nothing — no empty line, no spacing.
### No footer
Page stays clean: no timestamp, no branding, no toggle state in the footer.
(The download filename carries the timestamp.)
### No answer key
v1 ships puzzle-only. Solution PDF is a future enhancement.
## Initial Themes to Ship
Pre-populate `themes/` with starter files (2030 words each, varied length):
- `mr-men.json`
- `sea-creatures.json`
- `superheroes.json`
- `farm-animals.json`
- `villains.json`
- `transformers.json`
- `wild-animals.json`
- `precious-stones.json`
- `common-birds.json` — songbirds plus raptors (owl, eagle, hawk, falcon, etc.)
- `science-physics.json` — forces, energy, motion, electricity
- `science-chemistry.json` — atoms, molecules, elements, reactions
- `science-biology.json` — cells, organs, microbes, ecology
## Dockerfile
```dockerfile
FROM python:3.14-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
COPY static/ ./static/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
## docker-compose.yml
```yaml
services:
wordsearch:
build: .
ports:
- "8000:8000"
volumes:
- ./themes:/app/themes
restart: unless-stopped
```
## Acceptance Criteria
- `docker compose up` starts the app, accessible at `http://<host>:8000`.
- Generate a 12×12 puzzle from a pre-shipped theme with default settings,
download a valid PDF.
- Toggling diagonal, reversed, and overlap each visibly changes the puzzle.
- Min/max length filtering works: setting `min=8` excludes short words.
- Theme editor: create new theme, see live normalisation preview, save,
appear in dropdown, generate from it.
- Edit and delete existing themes via UI.
- Words with prefixes (Mr, Dr, etc.) show stripped form in grid, full form
with prefix in parentheses on word list.
- When fewer words can be placed than requested, the PDF still generates
(warnings only go to stderr — no result page).
- Bad input (invalid slug, empty word list, max length < min length) shows
a clear error message, not a stack trace.
- Downloaded PDFs are named `<slug>_<YYYY-MM-DD_HH-MM-SS>.pdf`.