Storing Link Keys in Flash Memory

For two Bluetooth devices to communicate securely, pairing is required to authenticate and encrypt the communication between them by entering a PIN or comparing a number shown on both devices. If the created secret data is stored, the process is called bonding.

For Bluetooth Classic, a shared link key is created. For LE, a long term key (similar to link key) is created, but also additional device information is exchanged like Identify Resolving Key (IRK) and the public BD_ADDR. This information allows to recognize and reconnect to previously bonded device, even if it is using a resolvable private private Bluetooth address.

Bluetooth Controllers have the ability to store a few Classic Link Keys, but the storage is limited and might not be sufficient for a particular project. LE bonding data cannot be stored in the Controller at all. Because of this, we chose to only provide two APIs (btstack_link_key_db.h and le_device_db.h) and let them to be implemented by the firmware engineers.

Over time, we also did a few implementations of these APIs for customers and found it quite tedious: copying & pasting parts of the key management code and replacing bits of code that access the Flash memory itself. As this wasn’t fun, we decided to re-think the whole problem and came up with a multi-layer architecture that includes a portable Flash abstraction and a Tag-Length-Value (TLV) storage engine. The TLV can also be used by any BTstack applications to store persistent data.

Flash Memory in Microcontrollers

On most microcontrollers, Flash cannot be read/written on an individual byte level like you’d store data in RAM (after all, RAM stands for Random Access Memory). Instead, it must be erased in blocks or sectors before it can be written to. On the STM32F407 used on the F4 Discovery board (see previous block post), e.g., erase is only possible sector-wise. The size of one sector varies from 16 kB (sector 0-3), over 64 kB (sector 4), to 128 kB (sectors 5-11), see table.

STM32F407 Flash Sector Size

If we would treat Flash like RAM, updating a single byte would become quite complex: erase another sector, copy all data from the current sector to the new sector while updating the modified byte to the new value.

Clearly, erasing 16 kB (or more), just to update a single link key is rather inefficient. Instead, it makes more sense to use the Flash mainly as append-only storage, where it’s easy to append new data at the end.

The general strategy is to always append new entries within a sector, mark existing entries as invalid, and migrate all valid entries from one sector to another one when sector becomes full to defragment the sector and gain free space again.

A fair metaphor for this could be a sheet of paper: you can add new entries at the end and strike out existing ones. You cannot update an existing entry though.

In the following, we refer to one or more of such Flash sectors that are used together (e.g., erased at the same time) as a bank in reference to the 80’s area banks-switched memory. We will use two banks: one is the ‘current’ bank where all data is kept and where new entries are appended, and a ‘free’ bank, that is used to defragment the current bank once it gets full.

This high-level view glosses over a few details: – how to migrate one bank to another? – how to detect the current bank on boot? – how to deal with a reset during Flash writes? We’ll get back to these in a bit.

Tag-Length-Value (TLV) Storage

A common way to store a list of entries in memory or disk is the TLV format.

---------------------------
| Tag 1 | Len 1 | Value 1 |
---------------------------
| Tag 2 | Len 2 | Value 2 |
---------------------------
| ...   | ...   | ...     |
---------------------------
| Tag n | Len n | Value n |
---------------------------

First, an identifier (tag) is stored, then the length of the value and then the value itself. Each value can have a different length.

This provides a basic key-value store. To find an entry, it’s necessary to iterate over all entries using the length field to find the next entry, but only the tag needs to be read.

With such a TLV storage engine, we implemented btstack_link_key_db.h and le_device_db.h on top of the TLV storage. We use the tag format ‘BTLx to store up to 256 link keys and the tag format ‘BTMx‘ to store up to 256 LE Device DB entries.

Architecture Diagram

Overall, we end up with this architecture:

 ---------------------------------------------------------------------------------
|             Link Key DB TLV             |             LE Device DB              |
|    (implements btstack_link_key_db.h)   |      (implements le_device_db.h)      |
----------------------------------------------------------------------------------
|                          TLV DB (implements btstack_tlv.h)                      |
----------------------------------------------------------------------------------
|                      HAL Flash Bank (implements hal_flash_bank.h)               |   
 ---------------------------------------------------------------------------------

Let’s start with the interfaces for the higher layer btstack_tlv.h and then lower layer hal_flash_bank.h.

btstack_tlv.h: Interface for Storing/Retrieving TLV Entries

As erased Flash has all bits set, we define 0xffffffff as empty tag. Similarly, every byte in Flash memory can be overwritten by 0x00, so we define 0x0000000 as invalid tag.

The btstack_tlv_t API in btstack_tlv.h is straight forward as it only provides operations to get/put/delete an entry by its tag.

/**
 * Get Value for Tag
 * @param context
 * @param tag
 * @param buffer
 * @param buffer_size
 * @returns size of value
 */
int (*get_tag)(void * context, uint32_t tag, uint8_t * buffer, uint32_t buffer_size);

/**
 * Store Tag
 * @param context
 * @param tag
 * @param data
 * @param data_size
 * @returns 0 on success
 */
int (*store_tag)(void * context, uint32_t tag, const uint8_t * data, uint32_t data_size);

/**
 * Delete Tag
 * @param context
 * @param tag
 */
void (*delete_tag)(void * context, uint32_t tag);

hal_flash_bank.h: Interface for Erasing Banks, Read/Write Data

In general Flash memory in common microcontrollers supports the following three operations: – erase larger block of Flash: sets all bits to ‘1’ = all bytes to 0xff – write one or more bytes once after erase – overwrite one or more byte with ‘0’ = all bytes to 0xff

On the Maxim 326330 (ARM Cortex M4F) writes can only performed on 32-bit boundaries and only in multiples of 32-bit words. We added a get_alignment() function to check if alignment requirements are met.

To be able to defragment the data in one bank, we need to have an additional empty bank to copy the valid data to. So, the API needs to support multiple banks.

We came up with the hal_flash_bank_t interface in hal_flash_bank.h.

/**
 * Get size of flash banks
 */
uint32_t (*get_size)(void * context);

/**
 * Get flash read/write alignment requirements
 */
uint32_t (*get_alignment)(void * context);

/**
 * Erase a bank
 * @param context
 * @param bank
 */
void (*erase)(void * context, int bank);

/**
 * Read from flash into provided buffer
 * @param context
 * @param bank
 * @param offset into flash bank
 * @param buffer to read data
 * @param size of data to read
 */
void (*read)(void * context, int bank, uint32_t offset, uint8_t * buffer, uint32_t size);

/**
 * Write data into flash. Each offset can only be written once after bank was erased
 * @param context
 * @param bank
 * @param offset into flash bank
  to read data
 * @param size of data to store
 */
void (*write)(void * context, int bank, uint32_t offset, const uint8_t * data, uint32_t size);

The implementation of interface on top of the vendor-provided peripheral libraries is simple in most cases. Here are two examples:

Implementation of btstack_tlv.h based on HAL Flash Bank

Finally, let’s have a look into the implementation of btstack_tlv_t based on the hal_flash_bank.h interface – and all the details skipped over before.

Our TLV implementation uses two banks: one is the ‘current’ bank where all data is kept and where new entries are appended, and a ‘free’ bank, that is used to defragment the current bank once it gets full.

We start by figuring out which bank is currently used to read and append new entries.

Detecting the current bank

After bootup, we first need to check which banks are valid, and if more than one are valid, which one contains the newest data.

To identify a valid bank, we use the first 8 bytes for a bank header and store the string ‘BTstack’ (in lack of a better idea) in it. Checking which bank is valid is reduced to read the bank header and check for the expected string.

In order to find out which one has the newest data, a simple counter can be used and also stored in the header. If the current bank gets full, we copy all data over to the free bank and increase the counter by one. Comparing the counters will tell us which one is newer.

There’s a minor caveat: every counter potentially overflows (although the MCU probably will die before that happens, if we would use a 32-bit counter). Instead, we wonder what’s the minimum amount of bits to detect correctly which is the newer bank if we assume that the counter is incremented by one by each migration.

With a bit of trial and error and scrap paper, we figured out that a 2-bit counter is sufficient for this. Let’s see how this works out:

Step Bank A Counter Bank B Counter Current Bank
0 invalid invalid none
1 0 invalid A
2 0 1 B
3 2 1 A
4 2 3 B
5 0 3 A
6 0 1 B
.. .. .. ..

If one of the banks is valid and the other not, the valid bank is newer. If the counter(bank A) == counter(bank B) + 1 modulo 4, then Bank A is newer, otherwise Bank B is newer.

Here’s the complete code. We’ve called the counters ‘epoch’.

// check both banks for headers and pick the one with the higher epoch % 4
// @returns bank or -1 if something is invalid
static int btstack_tlv_flash_bank_get_latest_bank(btstack_tlv_flash_bank * self){
    uint8_t header0[BTSTACK_TLV_HEADER_LEN];
    uint8_t header1[BTSTACK_TLV_HEADER_LEN];
    self->hal_flash_bank_impl->read(self->hal_flash_bank_context, 0, 0, &header0[0], BTSTACK_TLV_HEADER_LEN);
    self->hal_flash_bank_impl->read(self->hal_flash_bank_context, 1, 0, &header1[0], BTSTACK_TLV_HEADER_LEN);
    int valid0 = memcmp(header0, btstack_tlv_header_magic, BTSTACK_TLV_HEADER_LEN-1) == 0;
    int valid1 = memcmp(header1, btstack_tlv_header_magic, BTSTACK_TLV_HEADER_LEN-1) == 0;
    if (!valid0 && !valid1) return -1;
    if ( valid0 && !valid1) return 0;
    if (!valid0 &&  valid1) return 1;
    int epoch0 = header0[BTSTACK_TLV_HEADER_LEN-1] & 0x03;
    int epoch1 = header1[BTSTACK_TLV_HEADER_LEN-1] & 0x03;
    if (epoch0 == ((epoch1 + 1) & 0x03)) return 0;
    if (epoch1 == ((epoch0 + 1) & 0x03)) return 1;
    return -1;  // invalid, must not happen
}

Get Tag

Getting a tag is straight forward as all tags are kept unique: we iterate over the Flash memory and return when we find the tag that we’re looking for.

static int btstack_tlv_flash_bank_get_tag(void * context, uint32_t tag, uint8_t * buffer, uint32_t buffer_size){

    btstack_tlv_flash_bank_t * self = (btstack_tlv_flash_bank_t *) context;

    // abort if data size not aligned with flash requirements
    if (!btstack_tlv_flash_bank_verify_alignment(self, buffer_size)) return 0;

    uint32_t tag_index = 0;
    uint32_t tag_len   = 0;
    tlv_iterator_t it;
    btstack_tlv_flash_bank_iterator_init(self, &it, self->current_bank);
    while (btstack_tlv_flash_bank_iterator_has_next(self, &it)){
        if (it.tag == tag){
            log_info("Found tag '%x' at position %u", tag, it.offset);
            tag_index = it.offset;
            tag_len   = it.len;
            break;
        }
        tlv_iterator_fetch_next(self, &it);
    }
    if (tag_index == 0) return 0;
    if (!buffer) return tag_len;
    int copy_size = btstack_min(buffer_size, tag_len);
    self->hal_flash_bank_impl->read(self->hal_flash_bank_context, self->current_bank, tag_index + 8, buffer, copy_size);
    return copy_size;
}

Deleting Tag

To delete all entries with a given tag, we iterate over all entries and overwrite the tag field with the invalid tag (0x00000000) – which is always possible with Flash memory.

Store/Update Tag

To store a new tag, we first need to check if we have enough space left. If not, we trigger a migration. If that doesn’t help, we return an error.

Then, to avoid incomplete data in the Flash, we first write the value and only then write the tag and the length.

Finally, we delete all entries before the new one.

static int btstack_tlv_flash_bank_store_tag(void * context, uint32_t tag, const uint8_t * data, uint32_t data_size){

    btstack_tlv_flash_bank_t * self = (btstack_tlv_flash_bank_t *) context;

    // abort if data size not aligned with flash requirements
    if (!btstack_tlv_flash_bank_verify_alignment(self, data_size)) return 0;

    if (self->write_offset + 8 + data_size > self->hal_flash_bank_impl->get_size(self->hal_flash_bank_context)){
        btstack_tlv_flash_bank_migrate(self);
    }

    // check again
    if (self->write_offset + 8 + data_size > self->hal_flash_bank_impl->get_size(self->hal_flash_bank_context)) return 0;

    // prepare entry
    uint8_t entry[8];
    big_endian_store_32(entry, 0, tag);
    big_endian_store_32(entry, 4, data_size);

    log_info("write '%x', len %u at %u", tag, data_size, self->write_offset);

    // write value first
    self->hal_flash_bank_impl->write(self->hal_flash_bank_context, self->current_bank, self->write_offset + 8, data, data_size);

    // then entry
    self->hal_flash_bank_impl->write(self->hal_flash_bank_context, self->current_bank, self->write_offset, entry, sizeof(entry));

    // overwrite old entries (if exists)
    btstack_tlv_flash_bank_delete_tag_until_offset(self, tag, self->write_offset);

    // done
    self->write_offset += sizeof(entry) + data_size;

    return 1;
}

Migration

Now, if tags are updated over and over, or if the bank is rather small, a migration will become necessary. As all entries are unique, the migration simple iterates over the stored entries and copies all non-deleted entries into the other bank. After that, it increments the bank epoch and writes the bank header with the current epoch.

static void btstack_tlv_flash_bank_migrate(btstack_tlv_flash_bank_t * self){

    int next_bank = 1 - self->current_bank;
    log_info("migrate bank %u -> bank %u", self->current_bank, next_bank);
    // erase bank (if needed)
    btstack_tlv_flash_bank_erase_bank(self, next_bank);
    int next_write_pos = 8;

    tlv_iterator_t it;
    btstack_tlv_flash_bank_iterator_init(self, &it, self->current_bank);
    while (btstack_tlv_flash_bank_iterator_has_next(self, &it)){
        // skip deleted entries
        if (it.tag) {
            uint32_t tag_len = it.len;
            uint32_t tag_index = it.offset;

            // copy
            int bytes_to_copy = 8 + tag_len;
            log_info("migrate pos %u, tag '%x' len %u -> new pos %u", tag_index, it.tag, bytes_to_copy, next_write_pos);
            uint8_t copy_buffer[32];
            while (bytes_to_copy){
                int bytes_this_iteration = btstack_min(bytes_to_copy, sizeof(copy_buffer));
                self->hal_flash_bank_impl->read(self->hal_flash_bank_context, self->current_bank, tag_index, copy_buffer, bytes_this_iteration);
                self->hal_flash_bank_impl->write(self->hal_flash_bank_context, next_bank, next_write_pos, copy_buffer, bytes_this_iteration);
                tag_index      += bytes_this_iteration;
                next_write_pos += bytes_this_iteration;
                bytes_to_copy  -= bytes_this_iteration;
            }
        }
        tlv_iterator_fetch_next(self, &it);
    }

    // prepare new one
    uint8_t epoch_buffer;
    self->hal_flash_bank_impl->read(self->hal_flash_bank_context, self->current_bank, BTSTACK_TLV_HEADER_LEN-1, &epoch_buffer, 1);
    btstack_tlv_flash_bank_write_header(self, next_bank, (epoch_buffer + 1) & 3);
    self->current_bank = next_bank;
    self->write_offset = next_write_pos;
}

Handling Crashes during Write Operation

Now, as we have all functionality, let’s look at potential error cases.

Each update first writes the value and then the tag-length header. A crash before the tag-length header was written would not invalidate the previous value, but it would write some bytes that cannot be (over)written correctly later.

Similar, if the there’s a crash after the write is complete but before the value was deleted, there would be two entries for the same value in the Flash storage. That’s not a problem by itself, but it’s not elegant.

To fix the latter, we decided to delete all entries for the last written tag on init. This requires only a single iteration over the Flash storage. Then, to address the former, we check that all Flash from the write offset until the end is erased. If not, a migration is be triggered.

btstack_tlv.h in ESP32 Port

A quick look into the esp-idf documentations reveals an API for non-volatile flash. It is very similar to our btstack_tlv.h interface and allows read/write/delete of key-value pairs. As the keys in the nvs-flash API cannot contain zero bytes, we use the hex representation of our tags: e.g. ‘BTL0’ becomes “4253343430”. We’ve implemented the btstack_tlv.h directly as btstack_tlv_flash_bank_esp32.c.

A2DP Sink and Source on STM32 F4 Discovery Board
USB Protocol Analyzer for Bluetooth Communication Logging