Results for: "Inference"
Keyword Search 9 results
Taalas ASIC Chip: Llama 3.1 Inference at 17,000 Tokens/Second
THE GIST: Taalas' ASIC chip runs Llama 3.1 at 17,000 tokens/second, claiming 10x cost and energy efficiency over GPUs by hardwiring model weights.
InferShield: Open-Source Security Proxy for LLM Inference
THE GIST: InferShield is an open-source security proxy for LLM inference, providing real-time threat detection, policy enforcement, and audit trails without code changes.
Taalas Encodes AI Models onto Transistors for Inference Boost
THE GIST: Startup Taalas encodes AI inference weights directly into transistors, eliminating software overhead and boosting performance.
Reface and Prisma Founders Develop On-Device AI Inference with Mirai
THE GIST: Mirai, founded by Reface and Prisma co-founders, aims to improve on-device AI model inference.
OpenAI Partners with Tata for 100MW AI Data Center in India
THE GIST: OpenAI partners with Tata Group to establish a 100MW AI-ready data center in India, with plans to scale to 1GW.
PicoLM: Run a 1B Parameter LLM on a $10 Board
THE GIST: PicoLM enables running a 1-billion parameter LLM on a $10 board with minimal resources and no internet.
AgenticMemory: A Binary Graph Format for AI Agent Memory
THE GIST: AgenticMemory is a binary graph format enabling AI agents to store and retrieve cognitive events with sub-millisecond query speeds.
Understanding LLM Serving: Prefill, Decode, and Goodput
THE GIST: DistServe optimizes LLM serving by maximizing 'goodput'—the request rate that meets latency SLOs—considering prefill and decode phases.
NVIDIA Run:ai Enables Massive Token Throughput via GPU Fractioning
THE GIST: NVIDIA Run:ai, with Nebius AI Cloud, dramatically increases LLM inference capacity through dynamic GPU fractioning, achieving near-linear throughput scaling and improved resource utilization.