Skip to main content
Skip to main content

Collectives and Synchronization

Collectives are often the throughput bottleneck at scale. Document both the algorithm and the topology it targets.

Common patterns

  • Tree-based broadcast
  • Ring-based allreduce
  • Dissemination barriers

Scaling questions

  • How does performance change with placement across racks?
  • Where does congestion or oversubscription appear?
  • Which topology assumptions are baked into the algorithm?
Loading comments...