Collectives and Synchronization
Collectives are often the throughput bottleneck at scale. Document both the algorithm and the topology it targets.
Common patterns
- Tree-based broadcast
- Ring-based allreduce
- Dissemination barriers
Broadcast examples
Broadcast across bindings
- C
- Python
- Rust
gasnet_coll_broadcast(team, dest, root, src, nbytes, flags);
gasnet.collectives.broadcast(team, dest, root, src, nbytes)
gasnet::collectives::broadcast(team, dest, root, src, nbytes);
Scaling questions
- How does performance change with placement across racks?
- Where does congestion or oversubscription appear?
- Which topology assumptions are baked into the algorithm?
Loading comments...