Search Results

News

lesswrong. com
lesswrong. com > posts > k Cwt H8 Nm Qd Eq Cb Y8 W > exploring-capability-gated-out-of-context-reasoning

Exploring capability gated out-of-context reasoning " Less Wrong

1+ hour, 39+ min ago (1653+ words) This work was done as an independent --- out of interest in stenographic reasoning. A version with interactive charts and diagrams is available here. AI helped write code and suggest edits for this work. Classical monitoring assumes shared input means shared…...

lesswrong. com
lesswrong. com > posts > tj ZWbmu CE9tj YQsrf > have-we-already-lost-part-1-the-plan-in-2024

Have we already lost? Part 1: The Plan in 2024 " Less Wrong

7+ hour, 1+ min ago (176+ words) Written very quickly for the Inkhaven Residency. As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidable: have we, as the AI Safety community, already lost? That is, have we passed…...

lesswrong. com
lesswrong. com > posts > ETx Kte Tat4 Kf REe Bu > stockfish-is-not-a-chess-superintelligence-and-it-doesn-t

Stockfish is not a chess superintelligence (and it doesn't need to be) " Less Wrong

9+ hour, 20+ min ago (427+ words) TLDR: I demonstrate Stockfish is not a chess superintelligence in the sense of understanding the game better than all humans in all situations. It still kicks our ass. In the same way, AI may end up not dominating us in…...

lesswrong. com
lesswrong. com > posts > Tt DGDFLxiy Tbi PLb5 > inkhaven-a-menu

Inkhaven: a menu " Less Wrong

8+ hour, 29+ min ago (976+ words) What is happening right now? What is everyone doing and why? A candid account of the situation from my perspective. I think the situation on the ground is an urgent crisis. Other people's actions don't seem to match that reality....

lesswrong. com
lesswrong. com > posts > ETx Kte Tat4 Kf REe Bu > generalisation-isn-t-actually-that-important

Generalisation isn't actually (that) important " Less Wrong

9+ hour, 20+ min ago (374+ words) TLDR: I demonstrate Stockfish is not a chess superintelligence in the sense of understanding the game better than all humans in all situations. It still kicks our ass. In the same way, AI may end up not dominating us in…...

lesswrong. com
lesswrong. com > posts > 2wi5m CLSk Zo2ky32p > do-not-be-surprised-if-lesswrong-gets-hacked

Do not be surprised if Less Wrong gets hacked " Less Wrong

10+ hour, 7+ min ago (670+ words) Or, for that matter, anything else. This post is meant to be two things: Claude Mythos was announced yesterday. That announcement came with a blog post from Anthropic's Frontier Red Team, detailing the large number of zero-days (and other security…...

lesswrong. com
lesswrong. com > posts > 9x ADZLT4 Bg Xd TL8 PX > why-alignment-risk-might-peak-before-asi-a-substrate

Why Alignment Risk Might Peak Before ASI - a Substrate Controller Framework " Less Wrong

12+ hour, 12+ min ago (930+ words) In this post I develop the argument that alignment risk arises as product of prediction variance reduction in the substrate controller of the agent. I develop this through a mechanistic framework that explains instrumental convergence in ways that I don't…...

lesswrong. com
lesswrong. com > posts > JAR8b9 G4 Np NM6se Tn > one-week-in-the-rat-farm

One Week in the Rat Farm " Less Wrong

13+ hour, 27+ min ago (1861+ words) Hello, Less Wrong. This is a personal introduction diary-ish post and it does not have a thesis. I apologise if this isn't a good fit for the website; I just needed to unload my brain somewhere and this seemed like…...

lesswrong. com
lesswrong. com > posts > TWk REJG4 Av Moe6tcq > zero-shot-alignment-harm-detection-via-incongruent-attention

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms " Less Wrong

15+ hour, 32+ min ago (394+ words) First, a Technical Summary Technical Summary Adapter Architecture and Mechanisms The adapter is a compact module with roughly 4. 7 million parameters placed on top of a frozen Phi-2 base model. It never modifies the base weights. Instead, it intercepts the final…...

lesswrong. com
lesswrong. com > posts > dowcjcy MJ3c Wj Te P2 > how-i-use-claude-as-a-personal-coach

How I use Claude as a personal coach " Less Wrong

16+ hour, 49+ min ago (629+ words) Last week I wrote about my reflections on using Claude as a personal coach. Today, when I couldn't figure out what to write, I noticed a comment from Viliam: I would appreciate a more detailed explanation of how specifically you…...