You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original issue 223 created by alghak on 2014-01-14T10:00:19.000Z:
The allowed_seeks is asigned to file_size /16KB, which is only fine to some specific circumstance.
Here is the assumptions in code:
// We arrange to automatically compact this file after// a certain number of seeks. Let's assume:// (1) One seek costs 10ms// (2) Writing or reading 1MB costs 10ms (100MB/s)// (3) A compaction of 1MB does 25MB of IO:
About assumption (1) :
A get operation which seeks several files does not means the seek on first file actually seek disk. It's very likely that bloom filter told us the data we want is not in that file, and filter data itself is likely in ram while table cache is big enough. On the other hand, even if the result of bloom filter is false positive, the file data is likely in leveldb block cache or system page cache. So a seek does not necessarily cost 10ms.
In some read-heavy workload, some people simply disable compaction triggered by read, while this may degrade read performance.
My suggestion is how about a tunable allow_seeks value that can be set according to specific circumstance based on measurement?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Original issue 223 created by alghak on 2014-01-14T10:00:19.000Z:
The
allowed_seeks
is asigned to file_size /16KB, which is only fine to some specific circumstance.Here is the assumptions in code:
About assumption (1) :
A get operation which seeks several files does not means the seek on first file actually seek disk. It's very likely that bloom filter told us the data we want is not in that file, and filter data itself is likely in ram while table cache is big enough. On the other hand, even if the result of bloom filter is false positive, the file data is likely in leveldb block cache or system page cache. So a seek does not necessarily cost 10ms.
In some read-heavy workload, some people simply disable compaction triggered by read, while this may degrade read performance.
My suggestion is how about a tunable allow_seeks value that can be set according to specific circumstance based on measurement?
Thanks in advance.
The text was updated successfully, but these errors were encountered: