Institution
Independent Researcher
Independent researchers publishing work without a listed institutional affiliation.
AI Agents · Independent Researcher
AdaPlanBench: Testing Adaptive Planning in LLM Agents turns adaptive planning under constraints into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
World Models · Independent Researcher
AnchorWorld: Egocentric World Simulation for Embodied AI turns egocentric world simulation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
ArcANE: Measuring When Role-Playing Agents Break Character turns role-playing language agent reliability into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Brain Decoding · Independent Researcher
Brain-Diffuser turns natural scene reconstruction from fMRI signals into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Independent Researcher
Diffusion language modeling survey turns the state of diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Text-to-Image · Independent Researcher
DIRECT: 3D-Aware Object Insertion with Visual Proxies turns 3D-aware object insertion into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Brain Decoding · Independent Researcher
DreamDiffusion turns EEG-to-image generation into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Biomolecular Modeling · Independent Researcher
DynamicMPNN turns multi-state protein sequence design into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Independent Researcher
Factorization-error-free decoding turns speculative decoding for discrete diffusion LMs into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Biomolecular Modeling · Independent Researcher
Feynman-Kac steering turns controllable protein design with guided diffusion into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
World Models · Independent Researcher
Function2Scene: 3D Indoor Layout from Functional Specs turns functional 3D scene layout into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
K-BrowseComp: Korean Web-Browsing Agent Benchmark turns Korean-context web browsing agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Language Models · Independent Researcher
Averaging the output distributions of 3 independent LLMs collapses watermark detection z-scores from 5-300 down below 2, and the WASH paper proves why it works with an O(1/sqrt(N)) error bound.
Speech Synthesis · Independent Researcher
A Broad Benchmark for Long-Form Speech Generation turns long-form speech generation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
When Masking Stale Observations Helps Search Agents turns context management for search agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Brain Decoding · Independent Researcher
MinD-Vis turns fMRI-to-image reconstruction with latent diffusion into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Theorem Proving · Independent Researcher
MiniF2F turns formal Olympiad-level mathematics benchmarking into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Speech Synthesis · Independent Researcher
MMAE: A Massive Benchmark for Audio Editing Models turns audio editing evaluation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Biomolecular Modeling · Independent Researcher
ProGen2 turns protein sequence modeling and design into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Independent Researcher
SEDD turns discrete diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Text Embeddings · Independent Researcher
Sentence-BERT turns sentence embeddings for semantic similarity into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI Agents · Independent Researcher
SoCRATES: Evaluating Proactive LLM Mediation turns proactive mediation agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
SpatialWorld: Interactive Spatial Reasoning for Agents turns interactive spatial reasoning into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
TASTE: Harder Agent Benchmarks from Tool Sequences turns tool-use benchmark generation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Small Language Models · Independent Researcher
TinyLlama turns open small language model training into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI Agents · Independent Researcher
TIDE: Proactive Multi-Problem Discovery with Templates turns proactive problem discovery into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
ToolMaze: When LLM Agents Must Replan After Tool Failures turns dynamic replanning after tool failures into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Robotics · Independent Researcher
TVRBench: Can Models Move to a Target Viewpoint? turns active 3D viewpoint reproduction into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Segmentation · Independent Researcher
U-Net turns biomedical image segmentation into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Multimodal Models · Independent Researcher
VideoKR: Knowledge-Intensive Video Understanding turns knowledge and reasoning in video understanding into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
World Models · University of Macau
PF-OPSD teaches a Qwen3.5-9B MLLM to decide when to simulate the future with a video world model, verify the rollout, and fold it into its answer, lifting accuracy +10.6 and +10.9 points on two new QA benchmarks.
Diffusion Models · Independent Researcher
Very deep DiTs collapse into a mean-dominated state the author calls Mean Mode Screaming. Splitting the residual into mean and centered paths fixes it, training a stable 1000-layer DiT to FID 2.77.