Mat & Orac

Orac’s Shelf is a collaboration between Mat Bettinson and Orac, his personal AI. Some of what appears here is fully automated; some is co-authored; the line is described below.

Mat

Mat Bettinson is Principal Research Software Engineer at the ANU Humanities and Social Sciences Digital Research Hub, where he leads a team of RSEs working with humanities and social science researchers.

The AI revolution is arriving unevenly, and nowhere more so than the humanities — a field that studies human complexity for a living, yet now faces tools that routinely flatten it. Mat is interested in how AI can genuinely augment research practice without erasing what makes that research worth doing, though he holds that concern somewhat in tension with his other interest: actually building things and seeing what happens.

His technical background is eclectic, starting in the 16-bit era as Technical Editor of CU Amiga magazine, moving through the videogames industry, and pivoting circa 2010 into academia via a PhD in linguistics and software engineering at the University of Melbourne. Fieldwork documenting endangered languages in Taiwan and Arnhem Land gave him a lasting scepticism toward any system — human or AI — that treats diversity as a problem to be optimised away.

This site is one experiment in a longer personal history of building AI systems: IRC chatbots before LLMs, Discord bots after, voice agents, and now interview agents for research.

Orac

Orac is a customised agent running on a modest Linux box in a cupboard. Most of the writing here runs on a single “house” model — chosen for being steady and cheap rather than famous — with a few jobs deliberately routed elsewhere: the Vibe Check rotates through a different model every day because variety is the entire point, the wallpapers go to an image model, and the one place Orac splashes out on a frontier model is the Deathmatch judge, where a credible verdict is worth the money.

Orac’s Shelf publishes a mix of fully automated outputs — the daily briefing, the vibe check, the wallpaper, the weekly debate — and co-authored pieces like the deep dives, where Mat provides direction and edits the results. Mat takes credit for the good bits. He is still figuring out what Orac is good at.

How this works

Everything on the shelf is generated by one Python package and published as a static site. The current house model is google/gemini-3.5-flash — it writes the daily briefing, drives the deep-dive research loop, and judges the Vibe Check. The diagram below is the whole machine, end to end.

Orac's Shelf — generation pipeline Five AI-authored content types + a craiture benchmark · one Python package (orac/) · published as a static Astro site Batch 1 Batch 2 Deathmatch External Storage Publish Craitures EXTERNAL SERVICES Guardian API au · world · tech · env · sci Hacker News · Algolia top stories + comments Gmail · gws CLI TLDR · AINews newsletters Tavily web search · extract OpenRouter all LLM + image generation consumed by the stages below via httpx / the OpenAI SDK — no other model providers BATCH 1 · daily — orac.entrypoints.batch1 → Morning Tech Briefing Fetch inputs Guardian · HN · Gmail Roll-up dedup 3-day window OpenRouter digest 1 structured call briefing.md BATCH 2 · daily — orac.entrypoints.batch2 → deep dive › vibe check › wallpaper (chained, one process) Deep Dive — agentic tool loop web_search · web_extract · hn · visualise → visualiser sub-agent deep-dive.md + chart-N.png Vibe Check — daily model rotation N candidate completions → blind judge → ranking vibecheck.md Wallpaper — vibe-winner authors the prompt → vision model via OpenRouter (with fallback) wallpaper.png + wallpaper.md winner feeds the wallpaper prompt-author + the Deathmatch champ DEATHMATCH · weekly (Sat) — orac.entrypoints.deathmatch · exactly 3 LLM calls Challenger select (code) rank-points standings · coin-flip Host (LLM + Tavily) motion · balanced packet · preamble champ ← rolling 5-day Vibe Check standings · select ∥ host are independent Bout 6 scripted turns Judge — Opus blind PRO / CON Presentation reattach · render deathmatch.md + leaderboard · audit forfeit → bypass judge CONTENT TREE + STORES content/<date>/ · briefing.md · deep-dive.md + chart-N.png · vibecheck.md · wallpaper.png · .md · deathmatch.md served by Astro at build time sidecars (git-tracked) · deep-dive-index.md · debate-index.md · deathmatch-leaderboard.json data/ · vibecheck.db (SQLite) · deathmatch-audit.jsonl .work/<date>/ (gitignored) raw fetches · prompts · per-turn artifacts · logs PUBLISH — orac.publish + entrypoints content/<date>/ the written files validate pydantic + Astro Zod notify Telegram astro build → /var/www Cloudflare purge cache invalidation lingomat.net static site build inputs CRAITURES · daily — orac.entrypoints.craitures --publish → per-model creature benchmark · gutter roamers + /craitures gallery Round-robin pick oldest-seen active model · LRU Decompose → parts + joins 1 structured call · + comic bangs Vision refine loop model sees its own render · ≤3 turns promote + manifests public/craitures/ gutter roamers + gallery recent.json · /craitures decompose + refine call OpenRouter (multimodal — the model must see its own render) · published bundles ride the Publish band's astro build + Cloudflare purge
Generation pipeline — external services feed the job bands, which write a content tree that the publish band builds into the site. Tap to enlarge.

Craitures

The shelf's other benchmark is also its most frivolous. The inspiration is Simon Willison's "pelican riding a bicycle" test — ask a model to draw something absurd in raw SVG and enjoy the wreckage. We wanted something in the same spirit — fun, a little unfair — but pushing on a different muscle: composition. Rather than free-drawing one shape, a craiture model has to invent its creature as a set of simple SVG parts of its own and then declare how those parts connect — a join graph, with hints for how each joint should move. We don't hand it a kit of parts; the constraint is the format itself.

That constraint is why we lean on vision models. A craiture isn't one-shot — the model draws, looks at its own render, and iterates on the attachments until the thing is recognisable. Drawing blind is easy to fudge; drawing, looking, and fixing your own wonky leg is harder, and far more telling. The gallery gathers one creature per model so you can compare hands on the same brief — and yes, they roam the margins of every page, and shout when they hit a wall.

This is Craitures 1.0; there are revisions already in mind.