Fululu

Since 2024-09-02, this blog has been stopped to be updated. For newer contents, please go to fuguirong.de.

BD9 Performance at large scale

System Big Data

Learning Notes

Publish Date: 2019-01-25

Measurements

Prefixes:

scale out
scale up: memory, disk, CPU, network … an easy but last resort
code:
- look for large loops
- avoid exception catching
- avoid polymorphism
- avoid virtual function
- go low level if needed
size of chunks: make the size smaller for liquidity, but not too much for latency issues.
storage format: syntax VS binary format
network usage:
- keep shuffling to a minimum
- push down projection and selection as close as possible to the source.
architecture

sharing resources, queues, garbage collection, energy management…

task duplicates, first done wins.

when 95% percentile reached, launch duplicate tasks.

basic idea cancel request. A task is duplicated and waiting in several queues. Cancel other requests when one of them starts.