How FKV Came To Be

New Data Logging Methods From a Hobby Project



~Written by Doug Currie, SLI Fellow

Recently we were challenged with a logging requirement for a small wearable device. The RAM available on the processor was miniscule (16kB), and the log requirements were huge (4GB). We could wedge in a FAT file system, but it would use most of the RAM, and have all of the FAT disadvantages including fragility, and performance degradation over time. FAT wasn’t designed for flash memories with their huge erase blocks, and wear leveling needs.

To address this challenge Sunrise developed a piece of technology called SLI-FKV (for Sunrise Labs, Inc. Flash Key-Value store) that’s been deployed now in several devices. It’s a very successful software component for highly reliable systems with cost and space constraints. SLI-FKV is used in Class III life sustaining medical devices, tiny wearable health monitoring devices, and in-home radon monitors. It scales from storage of a few megabytes to hundreds of gigabytes. How did it come to be?

I have a fascination with “small programming languages,” and one of my hobbies is to implement, port, improve, and test open source languages -- dozens of them over the years.[1] My experience as a compiler implementer in the 1980s is still useful to these projects! These language implementations often use novel algorithms and data structures, so I always learn new things from them. One that caught my eye several years ago was persistent search trees; these are useful in compilers because a tree structure can be shared among many “snapshots” of the search tree associated with different naming/binding scopes in the code. I coded and open-sourced a couple implementations about ten years ago based on left-leaning red-black trees, and another on weight-balanced trees.

When several years later we were faced with that logging challenge, I remembered having read about log-structured databases (another hobby) with Multi-Version Concurrency Control, MVCC, and wondered if we could apply the persistent search trees technology to flash storage. A web search led quickly to the conclusion that this approach was being researched by several people. None of those approaches were ideal for our application, many designed for PCs rather than embedded microcontrollers, so we took them as “proof of concept” and dove into our own implementation.

I developed the first version on my Mac using a mmap region of memory as a virtual flash; this proved out the algorithms, and was a fast platform for testing. Working under a medical software process has instilled in me an appreciation for unit testing and exhaustive code coverage. Testing on a fast platform let me run the equivalent of months of operation on a microcontroller in a couple days. Heuristic guided random test generation was used to simulate a wide variety of workloads, and gave me confidence in the algorithms and implementation.

At that point I turned the code over to the embedded software team to integrate with their scheduler and hardware. We tweaked some configuration parameters to get the best performance from their SD Card… every flash family/size seems to have slightly different erase block and write block sizes. The team got things up and running in a few days, and deployed within a few weeks.

Over the years since, we’ve added some features to FKV, ported it to several flash configurations, SPI, SD Card, NAND, etc., and several flash sizes; moved the code from Subversion to a Git bitbucket repository; added support for multiple flash partitions on a flash device; added partition copy functions for log extraction from non-removable media; created a PC-based program to convert an FKV partition to SQLite for analysis and queries. But the core algorithms and code has stood the test of time.

What contributed to the success of SLI-FKV? It wouldn’t have happened without that initial curiosity about small languages, the experience to implement these ideas, the creativity to see how those persistent search trees could be applied to a real world challenge brought by our client, the process to validate and share the code at Sunrise, and the skill and teamwork to deploy and extend the core technology.

Several Sunrise clients have benefited from their use of SLI-FKV in their products. Perhaps more will, too. Undoubtedly, though, many will benefit from the:

  • Experience

  • Curiosity

  • Challenge

  • Creativity

  • Skill

  • Process

  • Teamwork

that were essential in making it come to be.

[1] Some of these include: Gambit Scheme, Standard ML of NJ, SICStus Prolog, Moscow ML, Paracell, micropython, Owl-lisp, Picrin, Wren, femtolisp

Appendix - References

[1] Demetrios Zeinalipour-Yazti, Song Lin, Vana Kalogeraki, Dimitrios Gunopulos, and Walid A. Najjar. 2005. Microhash: an efficient index structure for fash-based sensor devices. In Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4 (FAST'05), Vol. 4. USENIX Association, Berkeley, CA, USA, 3-3.

[2] Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 25-36. DOI=http://dx.doi.org/10.1145/1989323.1989327

[3] Shaoyi Yin, Philippe Pucheral, and Xiaofeng Meng. 2009. A sequential indexing scheme for flash-based embedded systems. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT '09), Martin Kersten, Boris Novikov, Jens Teubner, Vladimir Polutin, and Stefan Manegold (Eds.). ACM, New York, NY, USA, 588-599. DOI=http://dx.doi.org/10.1145/1516360.1516429