Blogs

InfiniBand vs RoCE for AI Inference Workloads: Performance, Cost, and Scalability

InfiniBand vs RoCE for AI Inference Workloads: Performance, Cost, and Scalability

Compare InfiniBand and RoCE v2 for AI inference workloads. Learn how each network option affects latency, cost, and scalability so you can choose the right infrastructure for your cluster.

DAC vs AOC vs Optical Transceiver: Which Interconnect Should You Choose For AI Inference Workloads

DAC vs AOC vs Optical Transceiver: Which Interconnect Should You Choose For AI Inference Workloads

Explore the differences between DAC, AOC, and optical transceivers in AI inference workloads. Learn how to choose the right interconnect solution based on cost, latency, scalability, and deployment scenarios.

What are AI Inference Workloads? Why AI Inference Workloads Are Growing Rapidly

What are AI Inference Workloads? Why AI Inference Workloads Are Growing Rapidly

This article systematically analyzes the definition, types, and working principles of AI inference workloads, outlines the driving factors behind their rapid growth, compares the differences between training and inference, and further explains their key role in performance, cost, and scalability, as well as the corresponding hardware foundation.

Optimizing AI Inference Workloads: Reducing Latency, Boosting Throughput, and Cutting Costs

Optimizing AI Inference Workloads: Reducing Latency, Boosting Throughput, and Cutting Costs

Optimizing AI inference workloads requires more than compute power. Learn how to reduce latency, boost throughput, and control costs through high-performance networking, InfiniBand vs. RoCE selection, and efficient interconnect design.

Top5 Challenges in Large-Scale AI Inference Workloads

Top5 Challenges in Large-Scale AI Inference Workloads

The large-scale deployment of AI inference workloads faces key challenges such as memory bottlenecks, latency constraints, and high-concurrency communication. This paper focuses on the critical bottlenecks in inference systems and provides an in-depth analysis of key technical directions, including memory optimization, low-latency interconnects, and novel hardware architectures.

Analysis of Prefix Caching in Large Language Model Inference

Analysis of Prefix Caching in Large Language Model Inference

Learn how prefix caching optimizes LLM inference by reusing KV cache states across requests. Explore its working principles, key differences from standard KV caching, and real-world applications including multi-turn chat, RAG, and few-shot learning.

Training vs Inference: Why Your AI Network Architecture Needs to Be Different

Training vs Inference: Why Your AI Network Architecture Needs to Be Different

AI training and inference have fundamentally different network requirements. Learn how the shift from training to inference workloads is driving the rise of RoCE—and how NADDOD's RoCEv2 solutions deliver the performance, cost efficiency, and scalability your AI infrastructure needs.

NVIDIA DGX Rubin NVL8 Technical Analysis: AI Training and Inference Accelerator

NVIDIA DGX Rubin NVL8 Technical Analysis: AI Training and Inference Accelerator

Learn how NVIDIA DGX Rubin NVL8 enables scalable AI training and inference with Rubin GPUs, NVLink 6.0, high-bandwidth memory, and optimized system architecture.

In-Depth Analysis of OCS: Optical-Layer Direct-Connect Switching Technology

In-Depth Analysis of OCS: Optical-Layer Direct-Connect Switching Technology

In-depth analysis of OCS (Optical Circuit Switching) in AI training and high-performance computing (HPC) data centers, exploring its optical-layer direct-connect architecture, low-latency and high-bandwidth advantages, as well as its potential and limitations in complementing traditional electrical switching networks and optimizing large-scale collective communications.