BTstack Port for STM32 USB

BTstack STM32 USB Port

In this blog post, we provide an overview on how to use Bluetooth Controllers via USB as well report on our journey of porting BTstack to the STM32 F4 Discovery board which has an USB On-The-Go (OTG) interface.

Using a USB Bluetooth Controller has a higher complexity than one connected via the UART, mainly due to the need for an USB Host Stack. However, such controller may become quite handy when we need to a) add Bluetooth functionality to existing devices with an USB port, and/or b) avoid radio emission testing for low volume products as the device is shipped without RF functionality.

As there were request in the past for a BTstack USB port for MCUs and there have been a few implementations created by the BTstack community, we’ve decided to create such a port from scratch to learn about the challenges. For this port, we’ve selected the STM32 F4 Discovery board that we’ve used before. Starting with an existing port has the benefit that all but the Bluetooth transport has already been taken care of. To keep the flexibility, we avoided using an RTOS – an RTOS could still be added if needed later.

While the STM32 F4 has an USB peripheral built-in, to actually use USB, an USB Host Stack is required. Such stack has to handle at least the device management: detect new device, enumerate, query device descriptors, etc… ST provides an USB Host stack as part of their STM32 Cube environment. When looking for alternatives, we’ve found the TinyUSB project, which looks quite promising if you want to implement a USB Device, unfortunately it seems that currently it supports USB Host only with a few NXP LPC MCU devices.

Scenario

In nutshell, we have an USB Bluetooth Dongle in hand, and need to understand how this piece of hardware works. We also need an USB Host Stack on the microcontroller with an USB port. After we choose the development tools, we are ready to draft a rough plan for our porting excursion.

What is a USB Bluetooth Dongle

As basis for this port, we’ve relied on the excellent USB in a Nutshell guide and the Bluetooth Core Specification. From the documentation, we learned that an USB Bluetooth dongle is:

  • an USB FullSpeed (FS) device,
  • it can be recognized by its USB Device class of 0xE0 and
  • it uses four Endpoints.

There is one USB endpoint for each Bluetooth HCI packet type:

  • HCI Commands are send via the USB Control endpoint,
  • HCI Events are received via an USB Interrupt Endpoint and
  • HCI ACL packets are exchanged over USB Bulk Endpoints, one for each direction.

USB Host Stack

While the STM32 F4 has an USB peripheral built-in, to actually use USB, an USB Host Stack is required. Such stack has to handle at least the device management: detect new device, enumerate, query device descriptors, etc… ST provides an USB Host stack as part of their STM32 Cube environment and we’ve decided to use it for this port. When looking for alternatives, we’ve found the TinyUSB project, which looks quite promising if you want to implement a USB Device, unfortunately it seems that currently it supports USB Host only with a few NXP LPC MCU devices and the new Raspberry Pi Pico. TinyUSB even provides basic Hub (one level) support, which isn’t the case with the ST stack.

Tool Chest

Now, how do we get started? Well, let’s start by collecting the development tools:

We have at hand a TotalPhase Beagle USB 12 Protocol analyzer. We didn’t try, but the Salea Logic analyzer seems to have support for USB decoding as well, so that would be better than trying to use USB blindly, i.e. only with the debug information on the MCU itself. Given the issues we’ve seen, we would not suggest to try to implement a new USB device class without an USB Protocol analyzer.

Plan

With the tools in place, the rough plan is:

  • Step 1: Try HID Host example from STM32 Cube MX.
  • Step 2: Create Bluetooth class driver and dump USB configuration.
  • Step 3: Test sending HCI Reset and receive HCI Command Complete.
  • Step 4: Integrate in BTstack run loop and implement hci_transport.h interface for STM32 Cube USB Host.

Step 1: HID Host example from STM32 Cube MX

Let’s start with a fresh project for the F4 Discovery board (STM32F407G-DISC1) using STM32 Cube MX and configure the board for our use. Before we let Cube MX to generate code, we need to perform basic configuration of our project. As the last step, we insure that we have debug messages in place.

STM32 Cube MX: Pinout & Configuration Tab

In Pinout & Configuration tab, we first go to ‘System Core->RCC’ and set the High Speed Clock (HSE) to ‘Crystal/Ceramic Resonator’ as USB requires an 48 MHz clock.

Then, in Connectivity->USB_OTG_FS, we select Host_Only mode and enable Activate_VBUS. While there, we also enable the ‘USB On-the-Go FS global interrupt’ in the NVIC Interrupt Table.

Finally for this tab, we click on Middleware->USB_HOST, and for now select “Human Interface Device” in the upper Mode frame. In the lower Configuration frame, we click on Platform Settings tab, there’s a warning as DRIVE_VBUS_FS is undefined. To fix this we select GPIO:Output via PC0: [OTG_FS_PowerSwitchOn] for Drive_VBUS_VS row.

STM32 Cube MX: Clock Configuration Tab

Next, we go to the Clock Configuration tab. There’s a warning about Clock issues and we let the tool do its work. It sets the SYSCLK to 72 MHz and correctly configures the 48 MHz clock.

Test Run and USB Debug Messages

With this basic configuration done, we press on “Generate Code” and are curious how this worked out.

After running make, we start Ozone, selecting STM32F407VG as MCU and the J-Link OB as debug interface, pick the compiled .elf. file, and realize that we didn’t configure any output, so we browse around and find some (disabled) debug messages (USBH_UsrLog) in the usbh_core.c file. Putting some breakpoints near this debug messages allowed us to verify that the code is reached after plugging in a USB Mouse.

To get debug messages, as first step we add support for SEGGER RTT by adding the files SEGGER_RTT.c and SEGGER_RTT_Syscalls_GCC.c from the J-Link support package to the test project. After that, printf(‘Hello RTT!\n’) already works in the Ozone RTT terminal.

To actually enable these USB debug messages, we need to set USBH_DEBUG_LEVEL to 3 in usbh_conf.h file. Another way to do this would be to update USB_HOST parameter settings of Cube MX – next time, we’ll update it there. Looking at other settings in usbh_conf.h, we also see that USBH_MAX_NUM_ENDPOINTS (Maximum number of supported endpoints) is only set to 2. We quickly update this to 4 for Bluetooth.

We get the following in Ozone’s Terminal:

Hello RTT
USB Device Connected
USB Device Reset Completed
PID: 608dh
VID: 17efh
Address (#1) assigned.
Manufacturer : PixArt
Product : Lenovo USB Optical Mouse
Serial Number : N/A
Enumeration done.
This device has only 1 configuration.
Default configuration set.
Device remote wakeup enabled
Switching to Interface (#0)
Class    : 3h
SubClass : 1h
Protocol : 2h
Mouse device found!
HID class started.

That’s rather promising. Without having written any code or looking at the HID Class documentation, it’s no surprise that nothing happens when the mouse is moved or buttons are clicked.

We consider this first test of the vendor-provided USB Host stack and HID Class driver to be successful and move on to implementing a Bluetooth Class driver.

Step 2: Bluetooth Class Driver

Reading through the USB Host library application note, we learn that we need to implement an USBH_ClassTypeDef, which contains a name, the USB Class Code, and several functions that are called from the USB Host driver. Asides from Init and DeInit, we’ve got Requests – that is called during class initialization – and BgndProcess which is polled from the core state machine. That’s where the work will be done. SOFProcess called during the Start-Of-Frame Interrupt handler, but can be left empty.

/* USB Host Class structure */
typedef struct {
    const char          *Name;
    uint8_t              ClassCode;
    USBH_StatusTypeDef(*Init)(struct _USBH_HandleTypeDef *phost);
    USBH_StatusTypeDef(*DeInit)(struct _USBH_HandleTypeDef *phost);
    USBH_StatusTypeDef(*Requests)(struct _USBH_HandleTypeDef *phost);
    USBH_StatusTypeDef(*BgndProcess)(struct _USBH_HandleTypeDef *phost);
    USBH_StatusTypeDef(*SOFProcess)(struct _USBH_HandleTypeDef *phost);
    void                *pData;
} USBH_ClassTypeDef;

So, we’ve implemented this:

#define USB_BLUETOOTH_CLASS 0xE0U
USBH_ClassTypeDef Bluetooth_Class = {
    "Bluetooth",
    USB_BLUETOOTH_CLASS,
    USBH_Bluetooth_InterfaceInit,
    USBH_Bluetooth_InterfaceDeInit,
    USBH_Bluetooth_ClassRequest,
    USBH_Bluetooth_Process,
    NULL,
    NULL,
};

With that in place, we can add our new usbh_bluetooth.c implementation to the project and update the MX_USB_HOST_Init function in usb_host.c to register our Bluetooth_Class implementation instead of the USBH_HID_CLASS.

Inspecting USB Interfaces and Endpoints

The next goal is to get it to compile and dump the list of available interfaces and endpoints. For this, we’ll start with USBH_Bluetooth_InterfaceInit and provide dummy functions that print a log message with their name and return USBH_OK.

After replacing the USB Mouse with the Bluetooth dongle, we try this code:

USBH_StatusTypeDef USBH_Bluetooth_InterfaceInit(USBH_HandleTypeDef *phost){
    log_info("USBH_Bluetooth_InterfaceInit");
    // dump everything
    uint8_t num_interfaces = phost->device.CfgDesc.bNumInterfaces;
    uint8_t interface_index;
    for (interface_index=0;interface_index<num_interfaces;interface_index++){
        USBH_InterfaceDescTypeDef * interface = &phost->device.CfgDesc.Itf_Desc[interface_index];
        uint8_t num_endpoints = interface->bNumEndpoints;
        uint8_t ep_index;
        for (ep_index=0;ep_index<num_endpoints;ep_index++){
            USBH_EpDescTypeDef * ep_desc = &interface->Ep_Desc[ep_index];
            printf("Interface %u, endpoint #%u: address 0x%02x, attributes 0x%02x\n", 
            interface_index, ep_index, ep_desc->bEndpointAddress, ep_desc->bmAttributes);
        }
    }
    return USBH_OK;
}

and get this output:

USBH_Bluetooth_InterfaceInit
Interface 0, endpoint #0: address 0x81, attributes 0x03
Interface 0, endpoint #1: address 0x02, attributes 0x02
Interface 0, endpoint #2: address 0x82, attributes 0x02
Interface 1, endpoint #0: address 0x03, attributes 0x01
Interface 1, endpoint #1: address 0x83, attributes 0x01
Bluetooth class started.
USBH_Bluetooth_ClassRequest

Our USB Class was loaded and, as expected, it lists an interrupt enpoint (0x81) for HCI Events and the two bulk endpoints 0x02 and 0x82 for ACL Out and In respectively on Interface #0. Interface #1 shows the two isochronous endpoints for SCO In and Out. We’ll ignore the endpoints for SCO for the rest of this blog post.

HCI Commands are sent over the USB Control Endpoint with address 0x00 which is mandatory (it is not listed here, as it is already used by the stack to get the list of endpoints).

With this knowledge, we can add code to identify these.

// type interrupt, direction incoming
if  (((ep_desc->bEndpointAddress & USB_EP_DIR_MSK) == USB_EP_DIR_MSK) && (ep_desc->bmAttributes == USB_EP_TYPE_INTR)){
    puts("-> HCI Event");
}
// type bulk, direction incoming
if  (((ep_desc->bEndpointAddress & USB_EP_DIR_MSK) == USB_EP_DIR_MSK) && (ep_desc->bmAttributes == USB_EP_TYPE_BULK)){
    puts("-> HCI ACL IN");
}
// type bulk, direction outgoing
if  (((ep_desc->bEndpointAddress & USB_EP_DIR_MSK) == 0) && (ep_desc->bmAttributes == USB_EP_TYPE_BULK)){
    puts("-> HCI ACL OUT");
}

We then get:

USBH_Bluetooth_InterfaceInit
Interface 0, endpoint #0: address 0x81, attributes 0x03
-> HCI Event
Interface 0, endpoint #1: address 0x02, attributes 0x02
-> HCI ACL OUT
Interface 0, endpoint #2: address 0x82, attributes 0x02
-> HCI ACL IN
Interface 1, endpoint #0: address 0x03, attributes 0x01
Interface 1, endpoint #1: address 0x83, attributes 0x01
Bluetooth class started.
USBH_Bluetooth_ClassRequest

After querying the USB descriptor to identify all endpoints for Bluetooth, we can move on and try to actually send an HCI Command.

Step 3: Send HCI Reset and Receive HCI Command Complete

In all Bluetooth ports, sending HCI Reset and receiving the corresponding event is usally the first mile stone: up to this, it was merely setup (toolchain, platform, USB/UART driver…), but now we’re getting into the Bluetooth realm.

In the previous section, we’ve identified the necessary endpoints. In order to use them later, we follow the vendor examples and store all info about them in a single struct and store a pointer to it in our BluetoothClass instance.

typedef struct {
    uint8_t acl_in_ep;
    uint8_t acl_in_pipe;
    uint16_t acl_in_len;
    uint8_t acl_out_ep;
    uint8_t acl_out_pipe;
    uint16_t acl_out_len;
    uint8_t event_in_ep;
    uint8_t event_in_pipe;
    uint16_t event_in_len;
} USB_Bluetooth_t;
...
USBH_StatusTypeDef USBH_Bluetooth_InterfaceInit(USBH_HandleTypeDef *phost){
    ...
    // setup
    memset(&usb_bluetooth, 0, sizeof(USB_Bluetooth_t));
    phost->pActiveClass->pData = (void*) &usb_bluetooth;
    ..
}

To send an HCI Command, we need to send a request to the Control endpoint. The Bluetooth specification provides the information how to set bmRequestType, bRequest, wValue, wLength. We then send the request via the USBH_CtlReq function and provide a pointer to our HCI Reset Command.

static const uint8_t hci_reset[] = { 0x03, 0x0c, 0x00};
...
USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    //  log_info("USBH_Bluetooth_ClassRequest");
    static int state = 0;
    USBH_StatusTypeDef status;
    switch (state){
        case 0:
            // just send HCI Reset naively
            phost->Control.setup.b.bmRequestType = USB_H2D | USB_REQ_RECIPIENT_INTERFACE | USB_REQ_TYPE_CLASS;
            phost->Control.setup.b.bRequest = 0;
            phost->Control.setup.b.wValue.w = 0;
            phost->Control.setup.b.wIndex.w = 0U;
            phost->Control.setup.b.wLength.w = sizeof(hci_reset);
            status = USBH_CtlReq(phost, (uint8_t *)  hci_reset, sizeof(hci_reset));
            if (status == USBH_OK) {
                puts("HCI Reset Sent");
                state++;
            }
            break;
        default:
            break;
    }
    return USBH_OK;
}

The USBH_CtlReq function is a bit peculiar as it actually implements a state machine and it’s necessary to call it repeatedly until USBH_OK is returned. It feels weird to setup the request and pass in data on every call – if the polling is needed, we would have expected one call to setup the request and another call to poll/check if it has been completed. So, we add a variable to track the state and verify that the HCI Reset is sent. Indeed, after calling the USBH_CtlReq repeatedly, it’s sent via USB.

USB HCI Reset

Next, it would be great to also receive the corresponding HCI Command Complete event. For this, we need to store the information about the Interrupt endpoint in our Bluetooth_Class instance, and create a pipe.

USBH_StatusTypeDef USBH_Bluetooth_InterfaceInit(USBH_HandleTypeDef *phost){
...
        // type interrupt, direction incoming
        if  (((ep_desc->bEndpointAddress & USB_EP_DIR_MSK) == USB_EP_DIR_MSK) && (ep_desc->bmAttributes == USB_EP_TYPE_INTR)){
            event_in = ep_index;
        }
...
    // all found
    if ((acl_in < 0) && (acl_out < 0) && (event_in < 0)) {
        log_info("Could not find all endpoints");
        return USBH_FAIL;
    }

    // Event In
    USB_Bluetooth_t * usb = &usb_bluetooth;
    usb->event_in_ep =   interface->Ep_Desc[event_in].bEndpointAddress;
    usb->event_in_pipe = USBH_AllocPipe(phost, usb->event_in_ep);

    /* Open pipe for IN endpoint */
    USBH_OpenPipe(phost, usb->event_in_pipe, usb->event_in_ep, phost->device.address,
    phost->device.speed, USB_EP_TYPE_INTR, interface->Ep_Desc[event_in].wMaxPacketSize);

    USBH_LL_SetToggle(phost, usb->event_in_ep, 0U);
}

The semantic for receiving packets from an Interrupt endpoint seems straight forward: call USBH_InterruptReceiveData and then check the URB status until we either get a USBH_URB_NAK or an USBH_URB_DONE. Well, almost. Let’s see what happens.

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
...
    switch (state){
    ...
        case 1:
            // schedule interrupt transfer
            USBH_InterruptReceiveData(phost, hci_event, (uint8_t) sizeof(hci_event), usb->event_in_pipe);
            state++;
            return USBH_BUSY;
        case 2:
            // poll URB state
            urb_state = USBH_LL_GetURBState(phost, usb->event_in_pipe);
            switch (urb_state){
                case USBH_URB_IDLE:
                    break;
                case USBH_URB_DONE:
                    state++;
                    puts("Data received");
                    break;
                case USB_URB_NAK:
                    puts("Restart transfer");
                    state = 1;
                    break;
                default:
                    puts("other");
                    break;
            }
            break;
        ...
    }
...
}

When running this, neither “Data received” nor “Restart transfer” shows up in the console. Switching over to the USB Analyzer, we find a single Interrupt request, which is answered with a NAK from the Bluetooth Controller.

Browsing through the other USB class drivers provided by STM, we realize that they don’t check for USBH_URB_NAK anywhere. Instead, they store the frame number and restart the transfer two frames later.

Let’s try this.

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
...
    switch (state){
    ...
       case 1:
            // schedule interrupt transfer
            USBH_InterruptReceiveData(phost, hci_event, (uint8_t) sizeof(hci_event), usb->event_in_pipe);
            usb->event_in_frame = phost->Timer;
            state++;
            break;
        case 2:
            // poll URB state
            urb_state = USBH_LL_GetURBState(phost, usb->event_in_pipe);
            switch (urb_state){
                case USBH_URB_DONE:
                    state++;
                    puts("Data received");
                    break;
                default:
                    break;
            }
            if ((phost->Timer - usb->event_in_frame) > 2){
                // restart request
                state = 1;
            }
            break;
        ...
    }
...
}

With this change, we see the Data received message and the USB analyzer shows that we received the event correctly.

USB HCI Command Complete Event

Because using the frame number seems a bit strange – after all, the USB analyzer shows that the Bluetooth Controller did respond with a NAK – we’ve also tried to modify the USB Host stack to update the URBState to USBH_URB_NAK to allow sending the transfer again. However, we couldn’t get it to work, so we accepted this idiosyncrasy of the STM32 USB Host stack as it only introduces a minor delay of max 1-2 milliseconds when receiving HCI Events.

Step 4: Implement hci_transport.h Interface

Add BTstack

After successfully sending the HCI Reset Command, we can now drag the rest of BTstack into the project. For this, we can re-use much of the existing F4 Discovery port. In more detail: – we copy the existing port/main.c file, but delete the code that was responsible for communication with the Bluetooth Controller via UART – we reuse the hal_flash_bank_stm32.c implementation – as we need an hci_tranport_t implementation, we create hci_transport_h2_stm32.h and hci_transport_h2_stm32.c but only provide stub implementations for now. It is then used in the call to hci_init(..) – merge the Makefiles to build everything

Run Loop Integration

To use the new hci_transport_h2_stm32.c with BTstack’s Run Loop, we setup a polling data source in it’s open function.

// data source for integration with BTstack Runloop
static btstack_data_source_t transport_data_source;
...
static int hci_transport_h2_stm32_open(void){
    // set up polling data_source
    btstack_run_loop_set_data_source_handler(&transport_data_source, &btstack_uart_embedded_process);
    btstack_run_loop_enable_data_source_callbacks(&transport_data_source, DATA_SOURCE_CALLBACK_POLL);
    btstack_run_loop_add_data_source(&transport_data_source);
    return 0;
}

The data source handler then points to hci_transport_h2_stm32_process function, which polls the USH Host function.

static void hci_transport_h2_stm32_process(btstack_data_source_t *ds, btstack_data_source_callback_type_t callback_type) {
    switch (callback_type){
        case DATA_SOURCE_CALLBACK_POLL:
            MX_USB_HOST_Process();
            break;
        default:
            break;
    }
}

HCI Commands

With the USB Host state machine polled from BTstack’s run loop, we start to implement send/receive for the supported HCI packet types. Before we get to HCI Commands we have one more thing to setup, the can send now function that is used by HCI to see if it can send the next packet of specific type.

For this, we decide to support only a single outgoing packet at a time. This allows us to use a state variable for outgoing packets like this:

static enum {
    USBH_OUT_OFF,
    USBH_OUT_IDLE,
    USBH_OUT_CMD,
    USBH_OUT_ACL
} usbh_out_state;

It also simplifies the check if we can send a packet:

bool usbh_bluetooth_can_send_now(void){
    return usbh_out_state == USBH_OUT_IDLE;;
}

To send, we just store the request:

void usbh_bluetooth_send_cmd(const uint8_t * packet, uint16_t len){
    btstack_assert(usbh_out_state == USBH_OUT_IDLE);
    cmd_packet = packet;
    cmd_len    = len;
    usbh_out_state = USBH_OUT_CMD;
}

In the existing USBH_Bluetooth_Process, we replace the placeholder state variable with the new usbh_out_state and use the stored pointer to the HCI Command. Finally, when the HCI Command was sent, we need to notify the higher layer.

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    switch (usbh_out_state){
        case USBH_OUT_CMD:
            phost->Control.setup.b.bmRequestType = USB_H2D | USB_REQ_RECIPIENT_INTERFACE | USB_REQ_TYPE_CLASS;
            phost->Control.setup.b.bRequest = 0;
            phost->Control.setup.b.wValue.w = 0;
            phost->Control.setup.b.wIndex.w = 0U;
            phost->Control.setup.b.wLength.w = cmd_len;
            status = USBH_CtlReq(phost, (uint8_t *) cmd_packet, cmd_len);
            if (status == USBH_OK) {
                usbh_out_state = USBH_OUT_IDLE;
                // notify host stack
                (*usbh_packet_sent)();
            }
            break;
        default:
            break;
    }
}

This already lets any example start up BTstack and send the HCI Reset Command.

Bluetooth class started.
[00:00:00.936] EVT <= 6E 00 
[00:00:00.936] CMD => 03 0C 00 

Next, we’d like to receive the response properly.

HCI Events

Similar to the HCI Command state, we define a state variable for HCI Events via the Interrupt Endpoint.

static enum {
    USBH_IN_OFF,
    USBH_IN_SUBMIT_REQUEST,
    USBH_IN_POLL,
} usbh_in_state;

We now get:

Bluetooth class started.
[00:00:00.935] EVT <= 6E 00 
[00:00:00.935] CMD => 03 0C 00 
[00:00:00.936] EVT <= 6E 00 
[00:00:01.050] EVT <= 0E 04 01 03 0C 00 
[00:00:01.050] CMD => 01 10 00 
[00:00:01.051] EVT <= 6E 00 
[00:00:01.051] EVT <= 0E 0C 01 01 10 00 06 BB 22 06 0A 00 BB 22 
[00:00:01.051] LOG -- hci.c.2117: Manufacturer: 0x000a
[00:00:01.051] CMD => 14 0C 00 
[00:00:01.053] EVT <= 6E 00 
[00:00:01.055] EVT <= 0E FC 01 14 0C 00 43 53 52 38 35 31 30 20 41 31 
[00:00:01.055] LOG -- hci.c.2261: event_handler called with packet of wrong size 16, expected 254 => dropping packet

The last command is the response to HCI Read Local Name, which has a total length of 254, but the USB driver only delivers the first 16 bytes. Well, this reminds us that the packet size for the Interrupt Endpoint is also 16. Following along, if the HCI Event is received in max 16 byte chunks, how do we know that the packet is complete?

In addition, both the libusb as well as the winusb drivers automatically receive complete HCI Events. We don’t know how it works exactly, but we’ve got a hunch: if an Interrupt transfer is shorter than the max, the packet is complete. If an Interrupt transfer has the maximal length, the packet might not be complete. To indicate the end of the packet, there should be a valid transfer with length zero. We’ve added a printf message to report a successful transfer with len = 0.

Nevertheless, we can still add a bit of logic to first receive the HCI Event header and then re-submit the transfer until the packet is complete. Here’s the final code to receive HCI Events:

USBH_URBStateTypeDef urb_state;
uint8_t  event_transfer_size;
uint16_t event_size;
switch (usbh_in_state){
    case USBH_IN_SUBMIT_REQUEST:
        event_transfer_size = btstack_min( usb->event_in_len, sizeof(hci_event) - hci_event_offset);
        USBH_InterruptReceiveData(phost, &hci_event[hci_event_offset], event_transfer_size, usb->event_in_pipe);
        usb->event_in_frame = phost->Timer;
        usbh_in_state = USBH_IN_POLL;
        break;
    case USBH_IN_POLL:
        urb_state = USBH_LL_GetURBState(phost, usb->event_in_pipe);
        switch (urb_state){
            case USBH_URB_IDLE:
                break;
            case USBH_URB_DONE:
                usbh_in_state = USBH_IN_SUBMIT_REQUEST;
                event_transfer_size = USBH_LL_GetLastXferSize(phost, usb->event_in_pipe);
                hci_event_offset += event_transfer_size;
                if (hci_event_offset < 2) break;
                event_size = 2 + hci_event[1];
                // event complete
                if (hci_event_offset >= event_size){
                    (*usbh_packet_received)(HCI_EVENT_PACKET, hci_event, event_size);
                    hci_event_offset = 0;
                }
                break;
            default:
                log_info("URB State Event: %02x", urb_state);
                break;
        }
        if ((phost->Timer - usb->event_in_frame) > 2){
            usbh_in_state = USBH_IN_SUBMIT_REQUEST;
        }
        break;
    default:
        break;
}

With this, HCI Commands and HCI Events are working and BTstack fully starts up. Now we can run non-connecting examples, e.g. the gap_le_advertisements or the gap_inquiry example.

Great! We’re close, let’s continue with ACL In packets.

ACL In Packets

Going through the API and the other USB Class drivers, we learn that Bulk transfers don’t need any form of polling. The transfer is started and USBH_URB_DONE notifies us that it is complete. Now, as ACL packets can be longer than the max USB Bulk size, need to implement support for re-assembling ACL packets similar to the HCI Events.

A first sketch looks like this:

...
static uint16_t hci_acl_in_offset;
static uint8_t  hci_acl_in_buffer[HCI_INCOMING_PRE_BUFFER_SIZE + HCI_ACL_BUFFER_SIZE];
static uint8_t  * hci_acl_in_packet = &hci_acl_in_buffer[HCI_INCOMING_PRE_BUFFER_SIZE];
...
USBH_StatusTypeDef usbh_bluetooth_start_acl_in_transfer(USBH_HandleTypeDef *phost, USB_Bluetooth_t * usb){
    uint16_t acl_in_transfer_size = btstack_min(usb->acl_in_len, HCI_ACL_BUFFER_SIZE - hci_acl_in_offset);
    return USBH_BulkReceiveData(phost, hci_acl_in_packet, acl_in_transfer_size, usb->acl_in_pipe);
}
...
USBH_StatusTypeDef USBH_Bluetooth_InterfaceInit(USBH_HandleTypeDef *phost){
    ...
    // ACL In
    usb->acl_in_ep  =  interface->Ep_Desc[acl_in].bEndpointAddress;
    usb->acl_in_len =  interface->Ep_Desc[acl_in].wMaxPacketSize;
    usb->acl_in_pipe = USBH_AllocPipe(phost, usb->acl_in_ep);
    USBH_OpenPipe(phost, usb->acl_in_pipe, usb->acl_in_ep, phost->device.address, phost->device.speed, USB_EP_TYPE_BULK, usb->acl_in_len);
    USBH_LL_SetToggle(phost, usb->acl_in_pipe, 0U);
    hci_acl_in_offset = 0;
    usbh_bluetooth_start_acl_in_transfer(phost, usb);
    ...
}

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    ...
    // ACL In
    uint16_t acl_transfer_size;
    uint16_t acl_size;
    urb_state = USBH_LL_GetURBState(phost, usb->acl_in_pipe);
    switch (urb_state){
        case USBH_URB_IDLE:
        case USBH_URB_NOTREADY:
            break;
        case USBH_URB_DONE:
            acl_transfer_size = USBH_LL_GetLastXferSize(phost, usb->acl_in_pipe);
            hci_acl_in_offset += acl_transfer_size;
            if (hci_acl_in_offset < 4) break;
            acl_size = 4 + little_endian_read_16(hci_acl_in_packet, 2);
            // acl complete
            if (hci_acl_in_offset >= acl_size){
                (*usbh_packet_received)(HCI_ACL_DATA_PACKET, hci_acl_in_packet, acl_size);
                hci_acl_in_offset = 0;
            }
            usbh_bluetooth_start_acl_in_transfer(phost, usb);
            break;
        default:
            log_info("URB State Event: %02x", urb_state);
            break;
    }
    ...
}

To test the ACL, we switch to the gatt_counter example, which implements a simple LE Peripheral to which we can connect from a smartphone. However, before we get to test the new ACL In functionality, the output on the RTT Terminal in Ozone does not stop and seem to repeat:

...
[00:00:01.340] CMD => 08 20 20 13 02 01 06 0B 09 4C 45 20 43 6F 75 6E 74 65 72 03 02 10 FF 00 00 00 00 00 00 00 00 00 00 00 00 
[00:00:01.345] EVT <= 6E 00 
[00:00:01.348] EVT <= 0E 04 01 08 20 00 
[00:00:01.349] CMD => 0A 20 01 01 
[00:00:01.351] EVT <= 6E 00 
[00:00:01.354] EVT <= 0E 04 01 0A 20 00 

 FF 00 00 00 00 00 00 00 00 00 00 00 00 
[00:00:01.345] EVT <= 6E 00 
[00:00:01.348] EVT <= 0E 04 01 08 20 00 
[00:00:01.349] CMD => 0A 20 01 01 
[00:00:01.351] EVT <= 6E 00 
[00:00:01.354] EVT <= 0E 04 01 0A 20 00 
 FF 00 00 00 00 00 00 00 00 00 00 00 00 

Looking closely at the time stamp, we realize that it repeats a part from the output buffer. Interestingly, when we move on and initiate a connection from the smartphone, we get new output, but it also starts to repeat.

0:00
[00:00:10.258] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.260] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.261] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.262] LOG -- sm.c.2154: LE Device Lookup: not found
[00:00:10.263] EVT <= D6 09 44 00 01 25 8F A7 F4 02 54 
[00:00:10.265] LOG -- att_server.c.419: SM_EVENT_IDENTITY_RESOLVING_FAILED
[00:00:10.427] ACL <= 44 20 07 00 03 00 04 00 02 B9 00 
[00:00:10.427] EVT <= 78 02 04 00 
[00:00:10.427] ACL => 44 00 07 00 03 00 04 00 03 B9 00 
0:00
[00:00:10.258] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.260] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.261] LOG -- sm.c.2120: device type 254, addr: 00:00:00:00:00:00
[00:00:10.262] LOG -- sm.c.2154: LE Device Lookup: not found
[00:00:10.263] EVT <= D6 09 44 00 01 25 8F A7 F4 02 54 
[00:00:10.265] LOG -- att_server.c.419: SM_EVENT_IDENTITY_RESOLVING_FAILED
[00:00:10.427] ACL <= 44 20 07 00 03 00 04 00 02 B9 00 
[00:00:10.427] EVT <= 78 02 04 00 
[00:00:10.427] ACL => 44 00 07 00 03 00 04 00 03 B9 00 
...

While trying to solve this, we also disable the call to enter sleep mode in the hal_cpu.h implementation. With this, the repetition was gone. It’s unclear, what exactly happens here, especially as the USB is functional and the MCU gets woken up at least every millisecond due to the SOF interrupt. Nevertheless, we’re glad that we can move on by disabling the sleep mode temporarily.

Looking at the last snippet, we can also see that we have received an ACL packet – an ATT MTU Request.

ACL Out

To send the ATT MTU Response, we need to be able to send ACL packets to the Bulk out Endpoint. Asides from the additional code in the HCI Transport implementation and the USB Class Driver, sending happens in the regular process state-machine as well.

From hci_transport_h2_stm32.c:

static int hci_transport_h2_stm32_send_packet(uint8_t packet_type, uint8_t * packet, int size){
    switch (packet_type){
        ...
        case HCI_ACL_DATA_PACKET:
            usbh_bluetooth_send_acl(packet, size);
            return 0;
        ...
    }
...
}

From usbh_bluetooth.c:

...
static enum {
    USBH_OUT_OFF,
    USBH_OUT_IDLE,
    USBH_OUT_CMD,
    USBH_OUT_ACL_SEND,
    USBH_OUT_ACL_POLL,
} usbh_out_state;
...
static const uint8_t * acl_packet;
static uint16_t        acl_len;
...
USBH_StatusTypeDef USBH_Bluetooth_InterfaceInit(USBH_HandleTypeDef *phost){
...
    // ACL Out
    usb->acl_out_ep  =  interface->Ep_Desc[acl_out].bEndpointAddress;
    usb->acl_out_len =  interface->Ep_Desc[acl_out].wMaxPacketSize;
    usb->acl_out_pipe = USBH_AllocPipe(phost, usb->acl_out_ep);
    USBH_OpenPipe(phost, usb->acl_out_pipe, usb->acl_out_ep, phost->device.address, phost->device.speed, USB_EP_TYPE_BULK, usb->acl_out_len);
    USBH_LL_SetToggle(phost, usb->acl_out_pipe, 0U);
...
}

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
...
    switch (usbh_out_state){
        ...
        case USBH_OUT_ACL_SEND:
            USBH_BulkSendData(phost, (uint8_t *) acl_packet, acl_len, usb->acl_out_pipe, 0);
            usbh_out_state = USBH_OUT_ACL_POLL;
            break;
        case USBH_OUT_ACL_POLL:
            urb_state = USBH_LL_GetURBState(phost, usb->acl_out_pipe);
            switch (urb_state){
                case USBH_URB_IDLE:
                    break;
                case USBH_URB_NOTREADY:
                    usbh_out_state = USBH_OUT_ACL_SEND;
                    break;
                case USBH_URB_DONE:
                    usbh_out_state = USBH_OUT_IDLE;
                    // notify host stack
                    (*usbh_packet_sent)();
                    break;
                default:
                    log_info("URB State ACL Out: %02x", urb_state);
                    break;
            }
            break;
            ...
    }
}
...
void usbh_bluetooth_send_acl(const uint8_t * packet, uint16_t len){
    btstack_assert(usbh_out_state == USBH_OUT_IDLE);
    acl_packet = packet;
    acl_len    = len;
    usbh_out_state = USBH_OUT_ACL_SEND;
}
...

When testing this using the GATT Streamer demo, we’re receiving incomplete packets on the remote side. Comparing the HCI log with the trace from the USB analyzer, we realize that a single USB Bulk Transfer might fail, which isn’t a problem by itself as it can be repeated. However, the successful parts are already sent, such that a re-submit causes duplicate data to be sent over the air. Clearly, it looks like the USB Host implementation is able to split the outgoing data into multiple chunks, but lacks a way of reporting/error handling when a transfer fails.

The simple fix for this is similar to the one for HCI packet types: manually send the packet as multiple requests. This gives us a chance to only resend the failed chunk if needed.

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    ...
    uint16_t transfer_size;
    switch (usbh_out_state){
        ...
        case USBH_OUT_ACL_SEND:
            transfer_size = btstack_min(usb->acl_out_len, acl_len);
            USBH_BulkSendData(phost, (uint8_t *) acl_packet, transfer_size, usb->acl_out_pipe, 1);
            usbh_out_state = USBH_OUT_ACL_POLL;
            break;
        case USBH_OUT_ACL_POLL:
            urb_state = USBH_LL_GetURBState(phost, usb->acl_out_pipe);
            switch (urb_state){
                case USBH_URB_IDLE:
                    break;
                case USBH_URB_NOTREADY:
                    usbh_out_state = USBH_OUT_ACL_SEND;
                    break;
                case USBH_URB_DONE:
                    transfer_size = btstack_min(usb->acl_out_len, acl_len);
                    acl_len -= transfer_size;
                    if (acl_len == 0){
                        usbh_out_state = USBH_OUT_IDLE;
                        // notify host stack
                        (*usbh_packet_sent)();
                    } else {
                        acl_packet += transfer_size;
                        usbh_out_state = USBH_OUT_ACL_SEND;
                    }
                    break;
                default:
                    log_info("URB State ACL Out: %02x", urb_state);
                    break;
            }
            break;        
        ...
    }
}

With the support for outgoing ACL packets, the main HCI packets are fully implemented. We’re almost there, but there are two more issues to report.

Submitting Bulk In transfer fails silently

Firstly, when testing Classic transfer with SPP Streamer, the data stops quickly. Upon further investigation, we learn that the remote side is constantly sending, but we’re not receiving more packets. Checking with the USB analyzer shows that no new ACL IN transfer is scheduled anymore, although the USBH_BulkReceiveData function was called by the stack. When querying the transfer state, we expect to initially getUSBH_URB_IDLE, before getting USBH_URB_NOTREADY or USBH_URB_DONE eventually. As a work around, we add code to re-submit the transfer if the URB state stays USBH_URB_IDLE for 2 sub-sequent frames.

USBH_StatusTypeDef usbh_bluetooth_start_acl_in_transfer(USBH_HandleTypeDef *phost, USB_Bluetooth_t * usb){
    ...
    usb->acl_in_frame = phost->Timer;
    ...
}

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    ...
    // ACL In
    uint16_t acl_transfer_size;
    urb_state = USBH_LL_GetURBState(phost, usb->acl_in_pipe);
    uint16_t acl_packet_start;
    switch (urb_state){
        // If state stays IDLE for longer than a full frame, something went wrong with submitting the request,
        // just re-submits the request
        if ((phost->Timer - usb->acl_in_frame) > 2){
            status = usbh_bluetooth_start_acl_in_transfer(phost, usb);
            btstack_assert(status == USBH_OK);
        }
        break;
        ...
    }
    ...
}        

Great. All transfers work, we even get decent throughput rates for both SPP Streamer (Classic) as well as the GATT Streamer (LE).

Show Time!

As all examples compile and run, let’s try the the A2DP Source Demo to celebrate and listen to the famous ‘Nao Deceased’ MOD song. The demo starts and successfully connects to the Bluetooth speaker, however, the music just stutters – no party yet 🙁

But what happens? We’re running on the not-so-weak ST32F411, even if only at 72 MHz, but that has been sufficient for the CC2564 port before. The question now is: where is all the time spent?

After adding more log output, we finally realize that the function a2dp_demo_fill_sbc_audio_buffer, which generates music samples and encodes them with the SBC encoder to take up to 50 ms – although we should send a new A2DP packet every 20 ms.

Irritated, we go back to the CC2564 port and check the time for the same function. It’s less than 10 ms. Now, it’s the same code with the same compiler settings, why does it take longer. As there’s no RTOS involved, the main suspect are interrupts. OK, as a quick test, we disable IRQs during a2dp_demo_fill_sbc_audio_buffer. Interestingly, while the duration for this call is below 10 ms again, audio also plays correctly over the Bluetooth speaker.

Now, what happens? Spending a bit more time in the debugger, we learn that the USB IRQ handler is called quite (too) often. Which means, that there was a state change in the USB peripheral. Looking at the USB trace again in more detail, we see that there is a large number of Bulk In transfers which all fail with a NAK. Even more, they are back to back. WIth enough information, we turn to a search engine ‘STM32 USB Host Too Many Interrupts’ and find an explanation. In the current version of the the USB Host stack, the stack re-submits an Bulk In transfer if it receives a NAK. While this result in max throughput and might be OK when you know when to receive a packet, for Bluetooth where packets can be received at any time, this leads to an avalanche of IRQ requests, which ultimately, slow down the MCU too much. After more browsing, we find a post that suggests to On NAK, retry in next frame.

We follow this idea and make this change to the USB Host stack, stm32f4x_hal_hcd.

static void HCD_HC_IN_IRQHandler(HCD_HandleTypeDef *hhcd, uint8_t chnum){}
...
    else if (hhcd->hc[ch_num].state == HC_NAK)
    {
      hhcd->hc[ch_num].urb_state  = URB_NOTREADY;

      // BK: don't re-activate the channel for BULK endpoints
      if (hhcd->hc[ch_num].ep_type != EP_TYPE_BULK){
          /* re-activate the channel  */
          tmpreg = USBx_HC(ch_num)->HCCHAR;
          tmpreg &= ~USB_OTG_HCCHAR_CHDIS;
          tmpreg |= USB_OTG_HCCHAR_CHENA;
          USBx_HC(ch_num)->HCCHAR = tmpreg;
      }
      // BK: end fix
    }
...
}

And trigger a re-submit in the next frame:

USBH_StatusTypeDef USBH_Bluetooth_Process(USBH_HandleTypeDef *phost){
    ...
    switch (urb_state){
        ...
        case USBH_URB_NOTREADY:
            // The original USB Host code re-submits the request when it receives a NAK, resulting in about 80% MCU load
            // With our patch, NOTREADY is returned, which allows to re-submit the request in the next frame.
            if (phost->Timer != hci_acl_in_timer){
                status = usbh_bluetooth_start_acl_in_transfer(phost, usb);
                btstack_assert(status == USBH_OK);
            }
            break;
        ...
    }
    ...
}

Epilogue

It was a rather long article with many code snippets (not exactly literate programming, but not far from that either, we think). We hope this makes using USB Host and implementing your own USB Device Class more accessible.

The USB IP in the F411 is from Synopsis and it can be found in many other STM32 MCUs and also in MCUs by other vendors, so there’s a good chance this guide will be applicable to many other targets.

Here’s a final note on caution, picked up by a friend: vendor libraries are rarely battle-tested or 100% correct. See for example our fixes needed for F4 CC2564 port. Here, the same happens: the vendor libraries provide a great starting point, but fixes (that require low level knowledge) are necessary to fully use it.

MSP432P401R + CC2564C Port: UART without Hardware RTS/CTS Flow Control