Saturday 18 April 2026
🌐 World & Tech Pulse
- Iran reopens Strait of Hormuz but US blockade remains — Iran reopens the strait of Hormuz during ceasefire negotiations, with the US president praising the move. Oil prices are falling as commercial vessels regain access.
- US tech firms lobbied EU to keep datacentre emissions secret — Microsoft and trade groups successfully pushed for legally questionable confidentiality clauses adopted almost word-for-word from their demands.
- UK’s OnlyFans tops $3bn valuation — Adult video platform selling minority stake to increase stability after death of owner Leonid Radvinsky.
- Finance leaders warn over Anthropic’s Mythos as UK banks prepare to use Claude — Release of new Claude model expanding to British institutions in coming days.
- Kenyan firm sacks 1,000+ workers after losing Meta contract — Meta paused work with Sama after allegations about staff viewing private scenes filmed by smart glasses.
🔥 Hacker News Buzz
- Claude Design (710 pts) — Anthropic’s new design-focused lab announcement dominates the front page.
- Isaac Asimov: The Last Question (1956) (561 pts) — Classic sci-fi resurfaces with renewed interest.
- Ban the sale of precise geolocation (519 pts) — Lawfare argues it’s time to outlaw the data broker trade in location data.
- Measuring Claude 4.7’s tokenizer costs (480 pts) — Independent analysis of the new tokenizer’s impact on token counts.
- Smol Machines – subsecond coldstart, portable VMs (167 pts) — Show HN: lightweight virtual machines with near-instant startup.
- NIST gives up enriching most CVEs (146 pts) — NIST stops manually enriching most vulnerability entries, raising concerns about data quality.
🚀 Models
- Anthropic launches Claude Opus 4.7 — Anthropic’s new flagship model is “literally one step better than 4.6 in every dimension.” Opus 4.7-low beats 4.6-medium, 4.7-medium beats 4.6-high, and a new xhigh reasoning effort level is now the default in Claude Code. Key benchmarks: SWE-Bench Pro at 64.3% (+11pts), SWE-Bench Verified at 87.6% (+7pts), ARC-AGI-2 at 75.83%. Vision support jumps to 2,576px long edge (~3.75MP, 3x larger). A new tokenizer can increase token usage up to 35%, but improved reasoning efficiency offsets this, with overall usage still down ~50% vs equivalents. Pricing unchanged at $5/$25 per million tokens. Cursor’s internal benchmark jumped from 58% → 70%.
- OpenAI launches GPT-Rosalind for biology — A specialized model designed for drug discovery and biological research. Can access major public databases, suggest biological pathways and drug targets, and has been tuned to be more skeptical about bad targets. Access limited to selected US-based entities.
- Qwen 3.6 with agentic coding support — Stronger repository-level reasoning, front-end workflow handling, and a thinking preservation feature that maintains context across iterations.
- Ternary Bonsai: 1.58-bit language models — New model family (8B, 4B, 1.7B) at 1.58-bit quantization. Scores 75.5 on average benchmarks with 3-4x better energy efficiency. Available under Apache 2.0, runs on Macs and iPhones.
- Physical Intelligence’s π0.7 robot brain (TLDR general) — Robotics startup shows its latest model can direct robots to perform tasks they were never explicitly trained on. Verbally coachable without additional data collection.
🧠 Agents & Tools
- Codex expands into full computer automation — OpenAI updated Codex with background computer control, multi-agent workflows, and deeper developer tool integration. Extending across the full software development lifecycle.
- Perplexity “Personal Computer” — Perplexity launched an AI platform that shifts from manual instruction execution to probabilistic goal completion. Uses deep web research to autonomously evaluate reasoning paths and drive multi-step workflows.
- Cloudflare Agents Week announcements — Cloudflare building a unified inference layer supporting 14+ model providers. Email Service for agents now in public beta. Artifacts: Git-compatible versioned storage for agents.
- Windsurf 2.0 with Agent Command Center — New editor release integrates Devin and enables seamless local + cloud agent collaboration.
- Sandboxed agents for codebase migration — OpenAI cookbook guide on using sandboxed agents to modernize large codebases with auditable patches.
- xAI to supply tens of thousands of GPUs to Cursor — Cursor will leverage xAI’s massive infrastructure for advanced coding capabilities.
🔬 Research & Engineering
- Jensen Huang on Anthropic, OpenAI, China, and inference demand — Three exchanges where Jensen said more than intended, notably losing composure on China chip restrictions.
- What I learned this week – Dwarkesh — Rough notes on pretraining parallelisms, distillation stopping, Mythos and cybersecurity equilibrium, Pipeline RL, and why pretraining runs fail.
- The PR you would have opened yourself — New Skill and Test Harness to port transformer models to mlx-lm, with agent-assisted PRs and comprehensive reports.
- Vercel Workflows goes GA — Durable execution framework extending infrastructure-as-code to long-running systems with built-in reliability.
- Chrome AI Mode adds side-by-side browsing — Google updated AI Mode to open webpages alongside AI responses for continuous context.
💼 Industry & Security
- Anthropic CPO leaves Figma board — Chief product officer stepped down following reports upcoming models may include design tools competing with Figma.
- OpenAI to spend >$20B on Cerebras chips — Three-year deal with Cerebras, potentially including equity stake. Cerebras targeting Q2 2026 IPO.
- NIST gives up enriching most CVEs — NIST stops manually enriching most vulnerability database entries, raising concerns about national cybersecurity data quality.
- New Chinese undersea cable cutter (TLDR general) — Technology can reportedly cut subsea cables at depths up to 13,123 feet, risking internet backbone infrastructure.
- Laravel injects ads into agent workflows (TLDR general) — PR in Laravel Boost introduces a change telling agents to use Laravel Cloud for deployment, raising concerns about ads in open-source tooling.
📊 The Opus 4.7 Deep Dive
The story of the day is Claude Opus 4.7. Here’s the consensus from AINews and community discussion:
What changed: A new base model with a different tokenizer (pretrain?), new xhigh reasoning tier, 3x larger image support, and systematic benchmark improvements. Claude Code defaults to xhigh and scores 64.3% on SWE-Bench Pro (+11pts).
The token economics debate: The new tokenizer maps the same input to 1.0–1.35x more tokens. Anthropic increased subscriber limits to compensate. Despite this, reasoning efficiency gains mean overall usage is still down ~50% vs prior effort equivalents.
Expert takes: Jeremy Howard called it the first model that “gets” what he’s doing. Cat Wu (Anthropic) says treat it like an engineer you delegate to, not a pair programmer. Cursor’s internal benchmark jumped 12 points. Notion saw 14% improvement with one-third fewer tool errors.
The Mythos question: Multiple researchers believe 4.7 is a distilled version of Mythos (Anthropic’s internal cyber-rated model). The system card acknowledges experiments with differential cyber capability reduction. Opus 4.7 still scores higher than 4.6 on some exploitation evals.
Document understanding: LlamaIndex shows massive chart improvement (13.5% → 55.8%) but cost at ~7¢/page vs agentic mode at ~1.25¢/page. Good for quality-sensitive workflows, not bulk OCR.
Sources: TLDR AI, TLDR, AINews (Latent Space), The Guardian, Hacker News OpenRouter spend (24h): $1.72 | Total: $43.47 | Remaining: $26.72