Terminology

The following terminology is often used when working with AI Hypercomputer.

Block
A collection of sub-blocks that are interconnected with non-blocking fabric, which provides high-bandwidth connectivity between all hosts.

Cluster
A collection of blocks interconnected by a high-speed network fabric. Each cluster is globally unique. For A4X, A4, and A3 Ultra machines, a cluster provides a common, non-blocking network fabric for your blocks of accelerator capacity. Within a cluster, the east to west networking is non-blocking for the entire collection of blocks.

Dense deployment
A resource request that allocates your accelerator resources physically close to each other to minimize network hops and optimize for the lowest latency.

Network fabric
A network fabric provides high-bandwidth, low-latency connectivity across all blocks and Google Cloud services in a cluster. Jupiter is Google's data center network architecture that uses software-defined networking and optical circuit switches to evolve the network and optimize its performance.

Node or host
A single physical server machine in the data center. Each host has associated compute resources, such as accelerators. The number and configuration of these compute resources depend on the machine family. Compute Engine instances are provisioned on top of a physical host.

An NVLink domain, also referred to as a sub-block, is the core unit of capacity for A4X Max and A4X machines. An NVLink domain consists of 18 A4X Max or A4X instances (72 GPUs) that are connected by a multi-node NVLink system.

Sub-block
A group of hosts and associated connectivity hardware that are on a single physical rack. In the context of A4X Max and A4X machines, a sub-block is also referred to as an NVLink domain.

More information

The following documents provide further explanations of the terminologies that are relevant to the corresponding topics: