Enumerations | |
enum | MDBX_option_t { MDBX_opt_max_db , MDBX_opt_max_readers , MDBX_opt_sync_bytes , MDBX_opt_sync_period , MDBX_opt_rp_augment_limit , MDBX_opt_loose_limit , MDBX_opt_dp_reserve_limit , MDBX_opt_txn_dp_limit , MDBX_opt_txn_dp_initial , MDBX_opt_spill_max_denominator , MDBX_opt_spill_min_denominator , MDBX_opt_spill_parent4child_denominator , MDBX_opt_merge_threshold_16dot16_percent , MDBX_opt_writethrough_threshold , MDBX_opt_prefault_write_enable , MDBX_opt_gc_time_limit , MDBX_opt_prefer_waf_insteadof_balance , MDBX_opt_subpage_limit , MDBX_opt_subpage_room_threshold , MDBX_opt_subpage_reserve_prereq , MDBX_opt_subpage_reserve_limit } |
MDBX environment extra runtime options. More... | |
enum | MDBX_warmup_flags_t { MDBX_warmup_default = 0 , MDBX_warmup_force = 1 , MDBX_warmup_oomsafe = 2 , MDBX_warmup_lock = 4 , MDBX_warmup_touchlimit = 8 , MDBX_warmup_release = 16 } |
Warming up options. More... | |
Functions | |
LIBMDBX_API int | mdbx_env_set_option (MDBX_env *env, const MDBX_option_t option, uint64_t value) |
Sets the value of a extra runtime options for an environment. | |
LIBMDBX_API int | mdbx_env_get_option (const MDBX_env *env, const MDBX_option_t option, uint64_t *pvalue) |
Gets the value of extra runtime options from an environment. | |
int | mdbx_env_set_syncbytes (MDBX_env *env, size_t threshold) |
Sets threshold to force flush the data buffers to disk, even any of MDBX_SAFE_NOSYNC flag in the environment. | |
int | mdbx_env_set_syncperiod (MDBX_env *env, unsigned seconds_16dot16) |
Sets relative period since the last unsteady commit to force flush the data buffers to disk, even of MDBX_SAFE_NOSYNC flag in the environment. | |
LIBMDBX_API int | mdbx_env_warmup (const MDBX_env *env, const MDBX_txn *txn, MDBX_warmup_flags_t flags, unsigned timeout_seconds_16dot16) |
Warms up the database by loading pages into memory, optionally lock ones. | |
LIBMDBX_API int | mdbx_env_set_flags (MDBX_env *env, MDBX_env_flags_t flags, bool onoff) |
Set environment flags. | |
LIBMDBX_API int | mdbx_env_set_geometry (MDBX_env *env, intptr_t size_lower, intptr_t size_now, intptr_t size_upper, intptr_t growth_step, intptr_t shrink_threshold, intptr_t pagesize) |
Set all size-related parameters of environment, including page size and the min/max size of the memory map. | |
int | mdbx_env_set_mapsize (MDBX_env *env, size_t size) |
int | mdbx_env_set_maxreaders (MDBX_env *env, unsigned readers) |
Set the maximum number of threads/reader slots for for all processes interacts with the database. | |
int | mdbx_env_set_maxdbs (MDBX_env *env, MDBX_dbi dbs) |
Set the maximum number of named tables for the environment. | |
LIBMDBX_API int | mdbx_env_set_userctx (MDBX_env *env, void *ctx) |
Sets application information (a context pointer) associated with the environment. | |
enum MDBX_option_t |
MDBX environment extra runtime options.
Enumerator | |
---|---|
MDBX_opt_max_db | Controls the maximum number of named tables for the environment. By default only unnamed key-value table could used and appropriate value should set by |
MDBX_opt_max_readers | Defines the maximum number of threads/reader slots for all processes interacting with the database. This defines the number of slots in the lock table that is used to track readers in the environment. The default is about 100 for 4K system page size. Starting a read-only transaction normally ties a lock table slot to the current thread until the environment closes or the thread exits. If MDBX_NOSTICKYTHREADS is in use, mdbx_txn_begin() instead ties the slot to the MDBX_txn object until it or the MDBX_env object is destroyed. This option may only set after mdbx_env_create() and before mdbx_env_open(), and has an effect only when the database is opened by the first process interacts with the database. |
MDBX_opt_sync_bytes | Controls interprocess/shared threshold to force flush the data buffers to disk, if MDBX_SAFE_NOSYNC is used. |
MDBX_opt_sync_period | Controls interprocess/shared relative period since the last unsteady commit to force flush the data buffers to disk, if MDBX_SAFE_NOSYNC is used. |
MDBX_opt_rp_augment_limit | Controls the in-process limit to grow a list of reclaimed/recycled page's numbers for finding a sequence of contiguous pages for large data items.
A long values requires allocation of contiguous database pages. To find such sequences, it may be necessary to accumulate very large lists, especially when placing very long values (more than a megabyte) in a large databases (several tens of gigabytes), which is much expensive in extreme cases. This threshold allows you to avoid such costs by allocating new pages at the end of the database (with its possible growth on disk), instead of further accumulating/reclaiming Garbage Collection records. On the other hand, too small threshold will lead to unreasonable database growth, or/and to the inability of put long values. The |
MDBX_opt_loose_limit | Controls the in-process limit to grow a cache of dirty pages for reuse in the current transaction. A 'dirty page' refers to a page that has been updated in memory only, the changes to a dirty page are not yet stored on disk. To reduce overhead, it is reasonable to release not all such pages immediately, but to leave some ones in cache for reuse in the current transaction. The |
MDBX_opt_dp_reserve_limit | Controls the in-process limit of a pre-allocated memory items for dirty pages. A 'dirty page' refers to a page that has been updated in memory only, the changes to a dirty page are not yet stored on disk. Without MDBX_WRITEMAP dirty pages are allocated from memory and released when a transaction is committed. To reduce overhead, it is reasonable to release not all ones, but to leave some allocations in reserve for reuse in the next transaction(s). The |
MDBX_opt_txn_dp_limit | Controls the in-process limit of dirty pages for a write transaction. A 'dirty page' refers to a page that has been updated in memory only, the changes to a dirty page are not yet stored on disk. Without MDBX_WRITEMAP dirty pages are allocated from memory and will be busy until are written to disk. Therefore for a large transactions is reasonable to limit dirty pages collecting above an some threshold but spill to disk instead. The |
MDBX_opt_txn_dp_initial | Controls the in-process initial allocation size for dirty pages list of a write transaction. Default is 1024. |
MDBX_opt_spill_max_denominator | Controls the in-process how maximal part of the dirty pages may be spilled when necessary. The Should be in the range 0..255, where zero means no limit, i.e. all dirty pages could be spilled. Default is 8, i.e. no more than 7/8 of the current dirty pages may be spilled when reached the condition described above. |
MDBX_opt_spill_min_denominator | Controls the in-process how minimal part of the dirty pages should be spilled when necessary. The Should be in the range 0..255, where zero means no restriction at the bottom. Default is 8, i.e. at least the 1/8 of the current dirty pages should be spilled when reached the condition described above. |
MDBX_opt_spill_parent4child_denominator | Controls the in-process how much of the parent transaction dirty pages will be spilled while start each child transaction. The For a stack of nested transactions each dirty page could be spilled only once, and parent's dirty pages couldn't be spilled while child transaction(s) are running. Therefore a child transaction could reach MDBX_TXN_FULL when parent(s) transaction has spilled too less (and child reach the limit of dirty pages), either when parent(s) has spilled too more (since child can't spill already spilled pages). So there is no universal golden ratio. Should be in the range 0..255, where zero means no explicit spilling will be performed during starting nested transactions. Default is 0, i.e. by default no spilling performed during starting nested transactions, that correspond historically behaviour. |
MDBX_opt_merge_threshold_16dot16_percent | Controls the in-process threshold of semi-empty pages merge. This option controls the in-process threshold of minimum page fill, as used space of percentage of a page. Neighbour pages emptier than this value are candidates for merging. The threshold value is specified in 1/65536 points of a whole page, which is equivalent to the 16-dot-16 fixed point format. The specified value must be in the range from 12.5% (almost empty page) to 50% (half empty page) which corresponds to the range from 8192 and to 32768 in units respectively. |
MDBX_opt_writethrough_threshold | Controls the choosing between use write-through disk writes and usual ones with followed flush by the Depending on the operating system, storage subsystem characteristics and the use case, higher performance can be achieved by either using write-through or a serie of usual/lazy writes followed by the flush-to-disk. Basically for N chunks the latency/cost of write-through is: latency = N * (emit + round-trip-to-storage + storage-execution); And for serie of lazy writes with flush is: latency = N * (emit + storage-execution) + flush + round-trip-to-storage. So, for large N and/or noteable round-trip-to-storage the write+flush approach is win. But for small N and/or near-zero NVMe-like latency the write-through is better. To solve this issue libmdbx provide
|
MDBX_opt_prefault_write_enable | Controls prevention of page-faults of reclaimed and allocated pages in the MDBX_WRITEMAP mode by clearing ones through file handle before touching. |
MDBX_opt_gc_time_limit | Controls the in-process spending time limit of searching consecutive pages inside GC.
Задаёт ограничение времени в 1/65536 долях секунды, которое может быть потрачено в ходе пишущей транзакции на поиск последовательностей страниц внутри GC/freelist после достижения ограничения задаваемого опцией MDBX_opt_rp_augment_limit. Контроль по времени не выполняется при поиске/выделении одиночных страниц и выделении страниц под нужды GC (при обновлении GC в ходе фиксации транзакции). Задаваемый лимит времени исчисляется по "настенным часам" и контролируется в рамках транзакции, наследуется для вложенных транзакций и с аккумулированием в родительской при их фиксации. Контроль по времени производится только при достижении ограничения задаваемого опцией MDBX_opt_rp_augment_limit. Это позволяет гибко управлять поведением используя обе опции. По умолчанию ограничение устанавливается в 0, что приводит к незамедлительной остановке поиска в GC при достижении MDBX_opt_rp_augment_limit во внутреннем состоянии транзакции и соответствует поведению до появления опции |
MDBX_opt_prefer_waf_insteadof_balance | Управляет выбором между стремлением к равномерности наполнения страниц, либо уменьшением количества измененных и записанных страниц. После операций удаления страницы содержащие меньше минимума ключей, либо опустошенные до MDBX_opt_merge_threshold_16dot16_percent подлежат слиянию с одной из соседних. Если страницы справа и слева от текущей обе «грязные» (были изменены в ходе транзакции и должны быть записаны на диск), либо обе «чисты» (не изменялись в текущей транзакции), то целью для слияния всегда выбирается менее заполненная страница. Когда же только одна из соседствующих является «грязной», а другая «чистой», то возможны две тактики выбора цели для слияния:
|
MDBX_opt_subpage_limit | Задаёт в % максимальный размер вложенных страниц, используемых для размещения небольшого количества мульти-значений связанных с одном ключем. Использование вложенных страниц, вместо выноса значений на отдельные страницы вложенного дерева, позволяет уменьшить объем неиспользуемого места и этим увеличить плотность размещения данных. Но с увеличением размера вложенных страниц требуется больше листовых страниц основного дерева, что также увеличивает высоту основного дерева. Кроме этого, изменение данных на вложенных страницах требует дополнительных копирований, поэтому стоимость может быть больше во многих сценариях. min 12.5% (8192), max 100% (65535), default = 100% |
MDBX_opt_subpage_room_threshold | Задаёт в % минимальный объём свободного места на основной странице, при отсутствии которого вложенные страницы выносятся в отдельное дерево. min 0, max 100% (65535), default = 0 |
MDBX_opt_subpage_reserve_prereq | Задаёт в % минимальный объём свободного места на основной странице, при наличии которого, производится резервирование места во вложенной. Если на основной странице свободного места недостаточно, то вложенная страница будет минимального размера. В свою очередь, при отсутствии резерва во вложенной странице, каждое добавлении в неё элементов будет требовать переформирования основной страниц с переносом всех узлов данных. Поэтому резервирование места, как правило, выгодно в сценариях с интенсивным добавлением коротких мульти-значений, например при индексировании. Но уменьшает плотность размещения данных, соответственно увеличивает объем БД и операций ввода-вывода. min 0, max 100% (65535), default = 42% (27525) |
MDBX_opt_subpage_reserve_limit | Задаёт в % ограничение резервирования места на вложенных страницах. min 0, max 100% (65535), default = 4.2% (2753) |
enum MDBX_warmup_flags_t |
Enumerator | |
---|---|
MDBX_warmup_default | By default mdbx_env_warmup() just ask OS kernel to asynchronously prefetch database pages. |
MDBX_warmup_force | Peeking all pages of allocated portion of the database to force ones to be loaded into memory. However, the pages are just peeks sequentially, so unused pages that are in GC will be loaded in the same way as those that contain payload. |
MDBX_warmup_oomsafe | Using system calls to peeks pages instead of directly accessing ones, which at the cost of additional overhead avoids killing the current process by OOM-killer in a lack of memory condition.
|
MDBX_warmup_lock | Try to lock database pages in memory by Such locking in memory requires that the corresponding resource limits (e.g. On successful, all currently allocated pages, both unused in GC and containing payload, will be locked in memory until the environment closes, or explicitly unblocked by using MDBX_warmup_release, or the database geometry will changed, including its auto-shrinking. |
MDBX_warmup_touchlimit | Alters corresponding current resource limits to be enough for lock pages by MDBX_warmup_lock. However, this option should be used in simpler applications since takes into account only current size of this environment disregarding all other factors. For real-world database application you will need full-fledged management of resources and their limits with respective engineering. |
MDBX_warmup_release | Release the lock that was performed before by MDBX_warmup_lock. |
LIBMDBX_API int mdbx_env_get_option | ( | const MDBX_env * | env, |
const MDBX_option_t | option, | ||
uint64_t * | pvalue | ||
) |
Gets the value of extra runtime options from an environment.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | option | The option from MDBX_option_t to get value of it. |
[out] | pvalue | The address where the option's value will be stored. |
LIBMDBX_API int mdbx_env_set_flags | ( | MDBX_env * | env, |
MDBX_env_flags_t | flags, | ||
bool | onoff | ||
) |
Set environment flags.
This may be used to set some flags in addition to those from mdbx_env_open(), or to unset these flags.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | flags | The env_flags to change, bitwise OR'ed together. |
[in] | onoff | A non-zero value sets the flags, zero clears them. |
MDBX_EINVAL | An invalid parameter was specified. |
LIBMDBX_API int mdbx_env_set_geometry | ( | MDBX_env * | env, |
intptr_t | size_lower, | ||
intptr_t | size_now, | ||
intptr_t | size_upper, | ||
intptr_t | growth_step, | ||
intptr_t | shrink_threshold, | ||
intptr_t | pagesize | ||
) |
Set all size-related parameters of environment, including page size and the min/max size of the memory map.
In contrast to LMDB, the MDBX provide automatic size management of an database according the given parameters, including shrinking and resizing on the fly. From user point of view all of these just working. Nevertheless, it is reasonable to know some details in order to make optimal decisions when choosing parameters.
Both mdbx_env_set_geometry() and legacy mdbx_env_set_mapsize() are inapplicable to read-only opened environment.
Both mdbx_env_set_geometry() and legacy mdbx_env_set_mapsize() could be called either before or after mdbx_env_open(), either within the write transaction running by current thread or not:
In case mdbx_env_set_geometry() or legacy mdbx_env_set_mapsize() was called BEFORE mdbx_env_open(), i.e. for closed environment, then the specified parameters will be used for new database creation, or will be applied during opening if database exists and no other process using it.
If the database is already exist, opened with MDBX_EXCLUSIVE or not used by any other process, and parameters specified by mdbx_env_set_geometry() are incompatible (i.e. for instance, different page size) then mdbx_env_open() will return MDBX_INCOMPATIBLE error.
In another way, if database will opened read-only or will used by other process during calling mdbx_env_open() that specified parameters will silently discarded (open the database with MDBX_EXCLUSIVE flag to avoid this).
Essentially a concept of "automatic size management" is simple and useful:
So, there some considerations about choosing these parameters:
size_now
argument) is an auxiliary parameter for simulation legacy mdbx_env_set_mapsize() and as workaround Windows issues (see below).Unfortunately, Windows has is a several issue with resizing of memory-mapped file:
MDBX bypasses all Windows issues, but at a cost:
SlimReadWriteLock
during each read-only transaction.For create a new database with particular parameters, including the page size, mdbx_env_set_geometry() should be called after mdbx_env_create() and before mdbx_env_open(). Once the database is created, the page size cannot be changed. If you do not specify all or some of the parameters, the corresponding default values will be used. For instance, the default for database size is 10485760 bytes.
If the mapsize is increased by another process, MDBX silently and transparently adopt these changes at next transaction start. However, mdbx_txn_begin() will return MDBX_UNABLE_EXTEND_MAPSIZE if new mapping size could not be applied for current process (for instance if address space is busy). Therefore, in the case of MDBX_UNABLE_EXTEND_MAPSIZE error you need close and reopen the environment to resolve error.
mdbx_chk
with the -v
option.Legacy mdbx_env_set_mapsize() correspond to calling mdbx_env_set_geometry() with the arguments size_lower
, size_now
, size_upper
equal to the size
and -1
(i.e. default) for all other parameters.
[in] | env | An environment handle returned by mdbx_env_create() |
[in] | size_lower | The lower bound of database size in bytes. Zero value means "minimal acceptable", and negative means "keep current or use default". |
[in] | size_now | The size in bytes to setup the database size for now. Zero value means "minimal acceptable", and negative means "keep current or use default". So, it is recommended always pass -1 in this argument except some special cases. |
[in] | size_upper | The upper bound of database size in bytes. Zero value means "minimal acceptable", and negative means "keep current or use default". It is recommended to avoid change upper bound while database is used by other processes or threaded (i.e. just pass -1 in this argument except absolutely necessary). Otherwise you must be ready for MDBX_UNABLE_EXTEND_MAPSIZE error(s), unexpected pauses during remapping and/or system errors like "address busy", and so on. In other words, there is no way to handle a growth of the upper bound robustly because there may be a lack of appropriate system resources (which are extremely volatile in a multi-process multi-threaded environment). |
[in] | growth_step | The growth step in bytes, must be greater than zero to allow the database to grow. Negative value means "keep current or use default". |
[in] | shrink_threshold | The shrink threshold in bytes, must be greater than zero to allow the database to shrink and greater than growth_step to avoid shrinking right after grow. Negative value means "keep current or use default". Default is 2*growth_step. |
[in] | pagesize | The database page size for new database creation or -1 otherwise. Once the database is created, the page size cannot be changed. Must be power of 2 in the range between MDBX_MIN_PAGESIZE and MDBX_MAX_PAGESIZE. Zero value means "minimal acceptable", and negative means "keep current or use default". |
MDBX_EINVAL | An invalid parameter was specified, or the environment has an active write transaction. |
MDBX_EPERM | Two specific cases for Windows: 1) Shrinking was disabled before via geometry settings and now it enabled, but there are reading threads that don't use the additional SRWL (which is required to avoid Windows issues). 2) Temporary close memory mapped is required to change geometry, but there read transaction(s) is running and no corresponding thread(s) could be suspended since the MDBX_NOSTICKYTHREADS mode is used. |
MDBX_EACCESS | The environment opened in read-only. |
MDBX_MAP_FULL | Specified size smaller than the space already consumed by the environment. |
MDBX_TOO_LARGE | Specified size is too large, i.e. too many pages for given size, or a 32-bit process requests too much bytes for the 32-bit address space. |
|
inline |
Set the maximum number of named tables for the environment.
This function is only needed if multiple tables will be used in the environment. Simpler applications that use the environment as a single unnamed table can ignore this option. This function may only be called after mdbx_env_create() and before mdbx_env_open().
Currently a moderate number of slots are cheap but a huge number gets expensive: 7-120 words per transaction, and every mdbx_dbi_open() does a linear search of the opened slots.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | dbs | The maximum number of tables. |
MDBX_EINVAL | An invalid parameter was specified. |
MDBX_EPERM | The environment is already open. |
|
inline |
Set the maximum number of threads/reader slots for for all processes interacts with the database.
This defines the number of slots in the lock table that is used to track readers in the environment. The default is about 100 for 4K system page size. Starting a read-only transaction normally ties a lock table slot to the current thread until the environment closes or the thread exits. If MDBX_NOSTICKYTHREADS is in use, mdbx_txn_begin() instead ties the slot to the MDBX_txn object until it or the MDBX_env object is destroyed. This function may only be called after mdbx_env_create() and before mdbx_env_open(), and has an effect only when the database is opened by the first process interacts with the database.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | readers | The maximum number of reader lock table slots. |
MDBX_EINVAL | An invalid parameter was specified. |
MDBX_EPERM | The environment is already open. |
LIBMDBX_API int mdbx_env_set_option | ( | MDBX_env * | env, |
const MDBX_option_t | option, | ||
uint64_t | value | ||
) |
Sets the value of a extra runtime options for an environment.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | option | The option from MDBX_option_t to set value of it. |
[in] | value | The value of option to be set. |
|
inline |
Sets threshold to force flush the data buffers to disk, even any of MDBX_SAFE_NOSYNC flag in the environment.
The threshold value affects all processes which operates with given environment until the last process close environment or a new value will be settled.
Data is always written to disk when mdbx_txn_commit() is called, but the operating system may keep it buffered. MDBX always flushes the OS buffers upon commit as well, unless the environment was opened with MDBX_SAFE_NOSYNC, MDBX_UTTERLY_NOSYNC or in part MDBX_NOMETASYNC.
The default is 0, than mean no any threshold checked, and no additional flush will be made.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | threshold | The size in bytes of summary changes when a synchronous flush would be made. |
|
inline |
Sets relative period since the last unsteady commit to force flush the data buffers to disk, even of MDBX_SAFE_NOSYNC flag in the environment.
The relative period value affects all processes which operates with given environment until the last process close environment or a new value will be settled.
Data is always written to disk when mdbx_txn_commit() is called, but the operating system may keep it buffered. MDBX always flushes the OS buffers upon commit as well, unless the environment was opened with MDBX_SAFE_NOSYNC or in part MDBX_NOMETASYNC.
Settled period don't checked asynchronously, but only by the mdbx_txn_commit() and mdbx_env_sync() functions. Therefore, in cases where transactions are committed infrequently and/or irregularly, polling by mdbx_env_sync() may be a reasonable solution to timeout enforcement.
The default is 0, than mean no any timeout checked, and no additional flush will be made.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | seconds_16dot16 | The period in 1/65536 of second when a synchronous flush would be made since the last unsteady commit. |
LIBMDBX_API int mdbx_env_set_userctx | ( | MDBX_env * | env, |
void * | ctx | ||
) |
Sets application information (a context pointer) associated with the environment.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | ctx | An arbitrary pointer for whatever the application needs. |
LIBMDBX_API int mdbx_env_warmup | ( | const MDBX_env * | env, |
const MDBX_txn * | txn, | ||
MDBX_warmup_flags_t | flags, | ||
unsigned | timeout_seconds_16dot16 | ||
) |
Warms up the database by loading pages into memory, optionally lock ones.
Depending on the specified flags, notifies OS kernel about following access, force loads the database pages, including locks ones in memory or releases such a lock. However, the function does not analyze the b-tree nor the GC. Therefore an unused pages that are in GC handled (i.e. will be loaded) in the same way as those that contain payload.
At least one of env
or txn
argument must be non-null.
[in] | env | An environment handle returned by mdbx_env_create(). |
[in] | txn | A transaction handle returned by mdbx_txn_begin(). |
[in] | flags | The warmup_flags, bitwise OR'ed together. |
[in] | timeout_seconds_16dot16 | Optional timeout which checking only during explicitly peeking database pages for loading ones if the MDBX_warmup_force option was specified. |
MDBX_ENOSYS | The system does not support requested operation(s). |
MDBX_RESULT_TRUE | The specified timeout is reached during load data into memory. |