Topic: performance
Distribution can hurt: network b/w and latency bottlenecks
Lots of tricks, e.g. caching, concurrency, pre-fetch
Distribution can help: parallelism, pick server near client
Idea: scalable design
Nx servers -> Nx total performance
Need a way to divide the load by N
Divide data over many servers ("sharding" or "partitioning")
By hash of file name?
By user?
Move files around dynamically to even out load?
"Stripe" each file's blocks over the servers?
Performance scaling is rarely perfect
Some operations are global and hit all servers (e.g. search)
Load imbalance
Everyone wants to get at a single popular file
-> one server 100%, added servers mostly idle
问题:为什么 search 功能, N 台机器反而性能没有提升?