Using LMDB from Common Lisp

LMDB is a fast key-value store. This talk is useful for those who want to learn more. The design of LMDB means many of the things that are standard in other databases – write-ahead logs, and all the filesystem housekeeping necessary to implement concurrent transactions – is unnecessasry in LMDB.

The bindings are implemented as two separate libraries: liblmdb, a low-level, autogenerated CFFI binding; and lmdb, the high-level CLOS binding.

Usage

Using LMDB requires some work. A simple query requires setting up (and tearing down) a whole stack of objects, namely:

  1. An environment, which is, essentially, a collection of databases. You create the environment object by passing a directory where LMDB will store its data.
  2. A transaction within that environment. All queries have to take place inside a transaction.
  3. A database access object, which is created from a transaction, after the transaction’s been started. The database object keeps the name of the database we’re accessing within the environment.

Or,

LMDB lifecycle diagram

When you have the database object, you can set, retrieve and delete key-value pairs. For more complex operations, you have to use cursors, which add another level of lifecycle management within databases.

Examples

Starting by loading LMDB and Alexandria,

CL-USER> (ql:quickload '(:lmdb :alexandria))
To load "lmdb":
  Load 1 ASDF system:
    lmdb
; Loading "lmdb"
.............
To load "alexandria":
  Load 1 ASDF system:
    alexandria
; Loading "alexandria"

(:LMDB :ALEXANDRIA)

We’ll store the database in your home directory under lmdb-test/, and use a hardcoded named LMDB database:

CL-USER> (defparameter +directory+
           (merge-pathnames #p"lmdb-test/" (user-homedir-pathname)))
+DIRECTORY+

CL-USER> (defparameter +db-name+ "mydb")
+DB-NAME+

First, let’s abstract away all of the housekeeping:

CL-USER> (defmacro with-db ((db) &body body)
           (alexandria:with-gensyms (env txn)
             `(let ((,env (lmdb:make-environment +directory+)))
                (lmdb:with-environment (,env)
                  (let ((,txn (lmdb:make-transaction ,env)))
                    (lmdb:begin-transaction ,txn)
                    (let ((db (lmdb:make-database ,txn +db-name+)))
                      (lmdb:with-database (,db)
                        (prog1
                          (progn
                            ,@body)
                          (lmdb:commit-transaction ,txn)))))))))
WITH-DB

We can retrieve keys using the get function:

CL-USER> (with-db (db)
           (lmdb:get db #(1)))
NIL

Obviously this returns NIL, since we haven’t actually set anything. To add or overwrite a key value pair, you use put:

CL-USER> (with-db (db)
           (lmdb:put db #(1) #(1 2 3)))
#(1 2 3)

CL-USER> (with-db (db)
           (lmdb:get db #(1)))
#(1 2 3)

That’s better. But raw byte vectors are unwieldy: how can we store actual data?

First, let’s get rid of this key/value pair so we can get back to a blank slate. We use the del function for that:

CL-USER> (with-db (db)
(lmdb:del db #(1))) T

CL-USER> (with-db (db)
           (lmdb:get db #(1)))
NIL

Alright, so, real data. These bindings only handle byte vectors: fancier datatypes are explicitly anti-features. Serialization of more complex data structures to byte vectors should be done by a higher-level library – maybe I’ll write a Moneta clone for Common Lisp.

Storing strings is pretty simple, all you need is the trivial-utf-8 library:

CL-USER> (ql:quickload :trivial-utf-8)
To load "trivial-utf-8":
  Load 1 ASDF system:
    trivial-utf-8
; Loading "trivial-utf-8"

(:TRIVIAL-UTF-8)

CL-USER> (defun str->vec (str)
           (trivial-utf-8:string-to-utf-8-bytes str))
STR->VEC

CL-USER> (defun vec->str (vec)
           (trivial-utf-8:utf-8-bytes-to-string vec))
VEC->STR

Now we can use this like this:

CL-USER> (with-db (db)
           (lmdb:put db (str->vec "Common Lisp")
                        (str->vec "An ANSI-standarized Lisp dialect")))
#(65 110 32 65 78 83 73 45 115 116 97 110 100 97 114 105 122 101 100 32 76 105
  115 112 32 100 105 97 108 101 99 116)

CL-USER> (with-db (db)
           (vec->str (lmdb:get db (str->vec "Common Lisp"))))
"An ANSI-standarized Lisp dialect"

How about integers? We use bit-smasher for that:

CL-USER> (ql:quickload :bit-smasher)
To load "bit-smasher":
  Load 1 ASDF system:
    bit-smasher
; Loading "bit-smasher"
...
(:BIT-SMASHER)

CL-USER> (defun int->vec (int)
           (bit-smasher:int->octets int))
INT->VEC

CL-USER> (defun vec->int (vec)
           (bit-smasher:octets->int vec))
VEC->INT

And usage:

CL-USER> (with-db (db)
           (lmdb:put db (str->vec "Common Lisp/age")
                        (int->vec 21)))
#(21)

CL-USER> (with-db (db)
           (vec->int (lmdb:get db (str->vec "Common Lisp/age"))))
21

This works with Common Lisp’s arbitrary-precision integers, as well. Let’s try ten to the three hundredth power1:

CL-USER> (expt 10 300)
1e300

CL-USER> (integer-length *)
997

Nine hundred and ninety seven bits is larger than the average machine word, and will be until we start dismanting planets into computers2. Let’s see how it works:

CL-USER> (with-db (db)
           (lmdb:put db (str->vec "big integer")
                        (int->vec (expt 10 300))))
#(23 228 60 136 0 117 155 165 156 8 225 76 124 215 170 216 106 74 69 129 9 249
  28 33 197 113 219 232 77 82 217 54 244 74 190 138 61 91 72 193 0 149 157 157
  11 108 200 86 179 173 201 59 103 174 168 248 224 103 210 200 208 75 193 119
  247 180 40 122 110 63 205 163 111 163 179 52 46 174 180 66 225 93 69 9 82 244
  221 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0)

CL-USER> (with-db (db)
           (vec->int (lmdb:get db (str->vec "big integer"))))
1e300

So all of this is fine, but what if we don’t know the contents of the database? That’s what cursors are for, but we don’t need to deal with them directly because this wrapper abstracts them:

CL-USER> (with-db (db)
           (lmdb:do-pairs (db key value)
             (format t "~A: ~A~%~%" key value)))
#(67 111 109 109 111 110 32 76 105 115 112): #(65 110 32 65 78 83 73 45 115 116
                                               97 110 100 97 114 105 122 101
                                               100 32 76 105 115 112 32 100 105
                                               97 108 101 99 116)

#(67 111 109 109 111 110 32 76 105 115 112 47 97 103 101): #(21)

#(98 105 103 32 105 110 116 101 103 101 114): #(23 228 60 136 0 117 155 165 156
                                                8 225 76 124 215 170 216 106 74
                                                69 129 9 249 28 33 197 113 219
                                                232 77 82 217 54 244 74 190 138
                                                61 91 72 193 0 149 157 157 11
                                                108 200 86 179 173 201 59 103
                                                174 168 248 224 103 210 200 208
                                                75 193 119 247 180 40 122 110
                                                63 205 163 111 163 179 52 46
                                                174 180 66 225 93 69 9 82 244
                                                221 16 0 0 0 0 0 0 0 0 0 0 0 0
                                                0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                                                0 0 0 0 0 0 0 0 0)

NIL

Which is not very informative, since these are just byte vectors.

Finally, you don’t want to keep the database directory:

CL-USER> (uiop:delete-directory-tree +directory+ :validate t)
#P"/home/eudoxia/lmdb-test/"

Footnotes

  1. I replaced the actual integer representation from the REPL with 1e300 for brevity. 

  2. A 1024-bit wide machine word is probably overkill even then.