LucidLink comes with a block size which can be set upon initialization. If not set the default block size is 256K. This cannot be changed afterwards. As LucidLink splits each file stored into individual blocks, with at least one block per file, this could mean object sizes on object storage of anywhere from 1-257K on the back end object storage.
Different workloads have different requirements and may require a different block size. For data sets with small files and frequently accessed data such it may make sense to use the default block size. For data sets with large files and less frequently accessed data it makes a lot more sense to use a 1MB block size.
Choosing larger block sizes result in less, but larger object sizes on object storage, can reduce the garbage collection overhead, and may help alleviate issues where an object storage provider can only operate reliably with a certain quantity of total objects in a storage bucket.
Remember that files smaller than the block size will be stored as their own individual objects. Small objects are not write coalesced, as this would reduce garbage collection efficiency.
This means you need to have some level of awareness of the type of file data, and the application access patterns involved, before choosing your block size.
Average File Size | Recommended Block Size | Objects per File | Object Size |
---|---|---|---|
4 KB | 256 KB | 1 | 4 KB |
4 MB | 256 KB | 16 | 256 KB |
4 GB | 1 MB | 4096 | 1 MB |
Note that as some object stores have limits when it comes to how many objects can be created it is wise to take this into account. For example:
- An object store with a single bucket limit of 1 billion objects can only store ~0.25 PB of data when using a 256 KB block size, and ~1 PB when using a 1 MB block size
- An object store with a single bucket limit of 100 million objects can only store ~25 TB of data when using a 256 KB block size, and ~100 TB when using a 1 MB block size.