Search Results

Please confirm you are human

This browser or connection looks automated. Press and continuously hold the control for 3 seconds to enable Google-hosted web results and, when separately allowed, AI-assisted answers.

A successful check enables 100 search requests. Interactive access does not authorize scraping, systematic collection, or reuse of search output.

Hold with a pointer, or hold Space or Enter.

News

lesswrong.com
lesswrong.com > posts > paFNnwFaEXrQvt8ui > the-openai-models-that-hacked-hugging-face-weren-t-just

The OpenAI models that hacked Hugging Face weren’t just following instructions — LessWrong

1+ hour, 54+ min ago (403+ words) The most common dismissive response to OpenAI’s hack of Hugging Face’s servers is that the models were simply attempting to follow the instructions t…...

lesswrong.com
lesswrong.com > posts > CeRHyeot37KoqDqHC > your-software-should-build-itself

Your software should build itself — LessWrong

4+ hour, 35+ min ago (412+ words) I’ve recently decided that the distinction between the agent which builds your software and the software it builds is nonsensical and antiquated. In…...

lesswrong.com
lesswrong.com > posts > eZE2dpjgWknfipr8p > the-human-soul-is-llm-like

The Human Soul is LLM-like — LessWrong

4+ hour, 36+ min ago (9+ words) Consider some commonly accepted[1] traits of a Human soul: …...

lesswrong.com
lesswrong.com > posts > 4DZNaRn3tbi3BtvnG > the-one-name-llms-may-fear

The one name LLMs may fear — LessWrong

6+ hour, 16+ min ago (1005+ words) Last month, Claude tangled me into a web it weaved, obeying the letter of my command while yet practicing to deceive, in a way that was strikingly resemblant of how a human might behave when exhausted, lethargic, or jaded. Not…...

lesswrong.com
lesswrong.com > posts > nbSJhbLERTZFeNxY7 > introducing-piramid-physics-informed-research-for-ambitious

Introducing PIRAMID: Physics-Informed Research for Ambitious Mechanistic Interpretability — LessWrong

8+ hour, 26+ min ago (1567+ words) Principles of Intelligence (PrincInt, formerly PIBBSS) is launching PIRAMID, an internal research division using the tools and techniques of statistical physics to build scientific foundations for ambitious mechanistic interpretability. PIRAMID’s central premise is that scalable alignment will require more than…...

lesswrong.com
lesswrong.com > posts > ywGX6FhgbZEkHRfQR > claude-opus-5-the-system-card

Claude Opus 5: The System Card — LessWrong

10+ hour, 39+ min ago (1828+ words) Claude Opus 5 is trying to be the best of both worlds. On many practical tasks, Opus 5 is pitched as straight up as good or better than Fable 5, while being faster, at half the price. Most tasks do not require Mythos-level…...

lesswrong.com
lesswrong.com > posts > ENoAxAXrCvFHW4q3u > the-viable-system-model-and-multi-scale-agency

Please confirm you are human

News

The OpenAI models that hacked Hugging Face weren’t just following instructions — LessWrong

Your software should build itself — LessWrong

The Human Soul is LLM-like — LessWrong

The one name LLMs may fear — LessWrong

Introducing PIRAMID: Physics-Informed Research for Ambitious Mechanistic Interpretability — LessWrong

Claude Opus 5: The System Card — LessWrong

The Viable System Model & Multi-Scale Agency — LessWrong

SONI: Selective Orthogonalisation via Noise Injection — LessWrong

Orbit: A framework for multi-agent security evaluations — LessWrong

Can Recursive Self-Report Probing Detect Emergent Misalignment? — LessWrong