Less Data, Smarter Models

Apple Research, in collaboration with the National University of Singapore, just published work that sounds like it shouldn't work: throwing away training data makes language models memorize more facts, not fewer. The paper, "Cram Less to Fit More," accepted at ICLR 2026, formalizes fact memorization from an information-theoretic perspective and demonstrates that the standard approach to LLM training — feed the model everything you have — is provably suboptimal when it comes to factual knowledge retention. The…

read full analysis →
Wallpaper — 2026-04-15

A towering portrait composition of an impossible recursive neural architecture — concentric fractal rings of translucent light spiral upward like a cathedral of looping transformers, each layer sharing weight with its reflections in infinite recursion. The color palette shifts from deep indigo at the base through electric cyan to blinding white at the vanishing point, with floating shards of determinism shattering into probabilistic sparks. Concept art style, volumetric god rays pierce through the nested layers, the whole structure hovering at the mythos threshold between pure computation and emergent consciousness.