
True Wireless Stereo with LE Audio
As LE Audio is getting added to most new in-ears headphones and is support in newer flagship Android phones, we’d like to highlight an important feature of the technology: True-Wireless Stereo (TWS) playback without the Link Layer shenanigans of current Bluetooth Classic In-Ear headphones.
The image above shows the I2S signals (Bit clock, Word Clock, Audio Data) of two CYW55513 Controllers configured for LC3 Offloading. The Unicast Gateway is sending a sine wave and we’ve marked the highest value of the sine wave in both signals, which are less than one audio sample apart (13 us in this case). As both have their own independent audio clocks, the audio sample is played at the closest position to the expected playback time.
– The Host sends an HCI vendor-specific command to the Controller
– The Controller raises the GPIO and gets the local time
– The Host captures the raise of the GPIO, preferably with a timer capture in hardware
– The Controller sends the local time for the GPIO raise in the response to the vendor-specific command
We have implemented this mechanism for newer Nordic LE Controllers by extending the official Zephyr HCI UART Bridge with the new HCI VS ISO Time Sync command.
With this primitive, we periodically sample timestamp pairs for the Host and the Controller, which allows to map a Bluetooth timestamp into a local timestamp. With this, we know accurately when to play a received ISO packet. Now, we “just” need to figure out how to do this.
We set up TIM2 as a free-running 1 MHz timer for all playback timing needs. Then, we connect the I2S Bit Clock (SCK) of I2S3 to the External Clock input of TIM3 (ETR 2) to let it count I2S bits (see green connection in diagram above). We also connect the Channel 1 Output Compare of TIM3 (blue connection) which triggers after a fixed number of pulses to the TIM2 Channel 1 Input Capture. As mentioned before TIM2 Channel 2 Input Capture is used for HCI ISO Time Sync (red connection).
With this setup, we configure TIM3 to count (num_samples * num_bits_per_sample * num_channels) clock pulses before starting I2S output.
As we start in a defined state, the first match for TIM3 will occur when one DMA buffer has played back. We have extended BTstack’s Audio API to provide the playback start time.
To fill buffer B, we go through the list of received audio frames and drop expired ones, where the end_time = playback_time + frame_duration is lower than the first sample in the audio buffer. With all outdated packets removed, some part of the next audio frame, in the picture the last millisecond of frame #1, can be placed at the beginning of buffer B. As buffer B isn’t filled completely, the samples from the next audio frame, here frame #2, can be added. Finally, the first millisecond of packet #3 is also stored.
By using the time information of the audio frames, it’s also possible to decode LC3 packets just-in-time instead of decoding and storing them as PCM data which would require roughly six times as much memory as storing the LC3 packets.
Example: At 48000 Hz, an audio frame of 10 ms contains 480 samples. An LC3 frame in high quality (48_6) uses 155 bytes for this. In contrast, the same frame as 16-bit PCM samples requires 960 bytes.
With this in place, we set up a test with one Unicast Gateway running on a laptop and two F4 Discovery boards. The Unicast Gateway streams a mono sine wave on both channels to both boards. In the resulting logic trace below, the time offset between the sample for 0xFFF4 is again less than the time of a single audio sample!
True Wireless Stereo (TWS) Playback on Bluetooth Classic A2DP
Up until now, Bluetooth did not provide the necessary synchronization primitives to playback audio accurately by two individual devices. To build In-Ear headphones nevertheless, there have been a range of different TWS approaches. Qualcomm provides a nice overview in their The Evolution of True Wireless Technology whitepaper on these. The following provides just a crude summary. In early attempts, both earbuds could be paired, but only one was connected at a time. The connected earbud received the Bluetooth stereo A2DP signal that was then forwarded to the second earbud using a different radio technology. The used radio technology had to provide time synchronization primitives in order to play the left and right stereo samples at the same time. Ideally, the time offset between both sides should be less than 50 microseconds. Later approaches allowed to pair only a single earbud, which then exchanged the bonding information with the other earbud. The bonding information allowed for the second earbud to implement some kind of promiscuous mode, e.g. to receive the stereo A2DP stream that is sent to the first device also on the second device using only Bluetooth technology. Unfortunately, these TWS approaches require modifications to the Link Layer that runs on the Bluetooth Controller, preventing a regular Bluetooth Host stack like BTstack to support it.Introducing LE Audio
As Bluetooth Classic Audio has been around for more than two decades, the limitations of A2DP (high-quality, but only for playback) and HFP (bi-directional, but only mono with low audio-quality) have become obvious. The new LE Audio standard addresses these shortcomings, provides a new Low Complexity Communication Codec, called “LC3”, and also provides time synchronization primitives that allow for audio playback with better than single sample (20 microseconds) accuracy. It’s great that TWS playback is now available to all Bluetooth Host Stacks, but how does it actually work and how can we use it?Microsecond Audio Timestamps for ISO Packets
For LE Audio, either Unicast or Broadcast, so-called Isochronous Streams are used. The sender and the receiver agree on a schedule, where ISO packets are periodically exchanged at a fixed Interval. As part of the setup, the maximum time the sender can try to correctly transmit an ISO packet to the receiver is specified. After this time, the packet is considered lost. As all receivers are aware of this agreement, they are able to provide this timestamp, also known as Synchronization Reference Point, with each received or lost ISO packet. During a Unicast setup, the so-called Presentation Time is negotiated, which indicates how long after the Synchronization Reference Point the audio needs to be played back. With this accurate time information, we only need to play the first sample of such an ISO packet at time = Synchronization Reference Point + Presentation Point. As that’s easier to explain than to implement, the easiest way for a Bluetooth Host stack is to delegate this task to the Bluetooth Controller which then will take care of the accurate playback timing. This is supported currently on Infineon’s CYW5551x as well as Realtek’s RTL8761CTV Bluetooth Controllers. Infineon calls this feature “LC3 Offloading”, while Realtek uses the term “Codec Offload”.
Running the LC3 Codec on the MCU
Now… while LC3 Offloading is probably the way to go for most projects, we were interested in running the LC3 Codec on a regular MCU, which allows to use Bluetooth Controllers without a built-in LC3 Codec – such as the lower-cost Nordic nRF54L15 from their nRF54L Series. This has become a longer exercise than expected due to a two “minor” details, namely, synchronizing the MCU clock to the Bluetooth clock and playing an audio sample at a precise time in the future.ISO Time Sync
The first detail we run into, is that the timestamps that are attached to the ISO packets are based on the free running clock in the Bluetooth Controller itself. To make use of these timestamps, the first step is to synchronize the MCU’s local clock with the Bluetooth clock. Unfortunately, the Bluetooth Core specification does not explain or even mention how the Bluetooth Host should synchronize with the Bluetooth Controller, so we were free to design on the green field. Given that the Host exchanges HCI packets with the Controller over a transport with undefined latency, it is not possible to instruct the Controller to send its current time over HCI. Instead, we need a way to get a local timestamp from Host and Controller for an event that both can observe. We chose to connect a free GPIO of the Controller to a GPIO of the Host MCU and implemented the following procedure:
Accurate I2S Playback
I2S peripherals are commonly used with a DMA unit with either a double or a circular buffer strategy to send audio data continuously without audio pauses. In the scope of this report, the two halves of a circular buffer can be seen as a double buffer, so we will assume a double buffer strategy is used. To be able to play the received audio frames at the correct time, we need to know when the first sample of the next buffer will be played. With this information, we can then ‘project’ the received audio samples into the I2S buffer. Unfortunately, getting a timestamp for the first sample of a DMA buffer isn’t directly supported in most MCUs. As in the past, we’re using the STM32 F4 Discovery board with an STM32F407 for this project. While the MCU supports timer capture for a GPIO raise, which we can use for the ISO Time Sync, there is no obvious way to get a timestamp for the time when the first sample is sent over I2S. Browsing around in the F4 Reference Manual, we found that it’s possible to use a timer unit to count clock pulses and also to generate an IRQ or toggle a GPIO when a predefined number of pulses have been counted. With these primitives, we came up with the following system setup:
Audio Frame Projection into DMA buffer
With the previous steps, we now have a local clock, a number of audio frames with playback times and an audio buffer with the time for the first sample. With this information, what’s left is to fill the audio buffer. The diagram shows 3 received audio frames (#1, #2, and #3) and the I2S playback buffers. The samples of buffer A are currently sent over I2S to the Digital Audio Converter (DAC), which is connected to the speaker. We assume that the playback callback is called to fill buffer B with playback time 99 ms.
