News
NVIDIA's Nemotron 3 Ultra Tops US Open-Weight AI on Brutal New Job Benchmark
11+ hour, 10+ min ago (183+ words) Most AI benchmarks test a model's ability to answer a question. AA-Briefcase tests whether a model can hold down a job. Artificial Analysis just launched the benchmark, and it is one of the most demanding evaluations of real-world agentic capability…...
NVIDIA Shrinks GLM-5. 2 Memory by 1. 8x With NVFP4 Without Losing Accuracy
23+ hour, 10+ min ago (232+ words) GLM-5. 2-NVFP4 is now ready to serve in v LLM. NVIDIA just dropped the official NVFP4 checkpoint of Z. ai's GLM-5. 2, the 744 B-parameter Mo E model built for long-horizon coding and agentic tasks, and it's already deployable with a single vllm serve command. The…...
Anthropic's Claude Mythos 5 Returns After US Government Lifts Infrastructure Ban
15+ hour, 13+ min ago (398+ words) Alpha Signal Anthropic's Claude Mythos 5 Returns After US Government Lifts Infrastructure Ban Two weeks ago, Anthropic's most powerful AI models were pulled from the internet by a government order. Today, the standoff is partially over. Anthropic announced that the US…...
Sakana AI's Coffee Bench Catches Claude Haiku 4. 5 Going Bankrupt Over 90 Days
1+ day, 14+ hour ago (296+ words) Alpha Signal Sakana AI's Coffee Bench Catches Claude Haiku 4. 5 Going Bankrupt Over 90 Days Most AI benchmarks are sprints. A model reads a prompt, generates an answer, and gets scored. Coffee Bench, a new benchmark from Sakana AI and KPMG Japan's…...
Hugging Face Absorbs llama. cpp Team to Make Local AI Effortless
1+ day, 22+ hour ago (192+ words) Hugging Face just hosted a live broadcast titled "Open Source AI: Run Your Own Models Locally", a signal that the company is doubling down on local inference as a first-class feature of the Hub. The timing is not accidental. Earlier…...
Cognition | AI Companies
2+ day, 4+ hour ago (52+ words) Applied AI lab building Devin, an autonomous software engineering agent. Devin runs agentic loops in a sandboxed shell, editor, and browser environment to plan, write, debug, and deploy code end-to-end. Trained using LLMs combined with reinforcement learning, with fine-tuning support…...
NVIDIA AI Developer | AI Companies
2+ day, 4+ hour ago (53+ words) NVIDIA's developer platform for AI/ML practitioners. Covers GPU-accelerated training and inference on Hopper and Blackwell architectures, Tensor RT-LLM, Dynamo distributed inference, Ne Mo for LLM pre/post-training, NIM microservices, CUDA-X libraries, and open model families including Nemotron, Cosmos, and…...
Kimi Developers | AI Companies
2+ day, 4+ hour ago (55+ words) Developer-facing platform for Moonshot AI's Kimi model family, competing in the frontier LLM and agentic coding space. The flagship open-weight Kimi K2 is a 1 T-parameter Mo E model with 32 B active parameters, trained using the Muon optimizer on 15. 5 T tokens, with…...
Py Torch | AI Companies
2+ day, 4+ hour ago (62+ words) Py Torch is an open-source deep learning framework, originally developed by Meta, that enables researchers and developers to build and train neural networks in Python with GPU-accelerated tensor computation and dynamic computation graphs. Now stewarded by the Py Torch Foundation…...
Llama Index " | AI Companies
2+ day, 4+ hour ago (53+ words) Open-source data framework for building RAG pipelines and AI agents over enterprise documents. Best known for Llama Parse, a VLM-powered document parser that correctly extracts tables, charts, and nested structures from PDFs where traditional OCR fails, producing clean, LLM-ready output....