Graph Generator | AppPages | Russian fonts demo
Resources | Less Wrong | Action Log
AntiPaSTO: Self-Supervised Value Steering for Debugging Alignment
Tue, 13 Jan 2026 12:55:19 GMT Contra Dance as a Model For Post-AI Culture
Tue, 13 Jan 2026 06:50:18 GMT Making LLM Graders Consistent
Tue, 13 Jan 2026 03:32:09 GMT Attempting to influence transformer representations via initialization
Tue, 13 Jan 2026 00:49:47 GMT When does competition lead to recognisable values?
Mon, 12 Jan 2026 23:13:18 GMT Lies, Damned Lies, and Proofs: Formal Methods are not Slopless
Mon, 12 Jan 2026 22:32:18 GMT Pro or Average Joe? Do models infer our technical ability and can we control this judgement?
Tue, 13 Jan 2026 05:49:44 GMT Dating Roundup #10: Gendered Expectations
Mon, 12 Jan 2026 20:30:20 GMT Automated Interpretability-Driven Model Auditing and Control: A Research Agenda
Mon, 12 Jan 2026 19:55:21 GMT Tensor-Transformer Variants are Surprisingly Performant
Mon, 12 Jan 2026 19:43:15 GMT