High-Fidelity Audio Architecture

High-Fidelity Audio Architecture

Major rewrite of the voice assistant’s audio processing pipeline, delivering crystal-clear 24kHz native audio playback and eliminating clicks, pops, and timing drift in long responses.

STATUS [v0.9.2]

Problems Solved

| Issue | Cause | Solution |

  • | Muffled voice | 24kHz→16kHz downsampling | Dual AudioContext architecture |
  • | Exact ±1.0 peaks | Implicit endianness | DataView with explicit little-endian |
  • | Clicks/pops | Queue-based callback latency | Proactive scheduling |
  • | Timing drift | Cumulative callback delays | Immediate chunk scheduling |

Architecture Improvements

  • Dual AudioContext: Separate contexts for capture (16kHz) and playback (24kHz)
  • Proactive Scheduling: Chunks scheduled immediately with precise timing, no queue latency
  • Explicit Endianness: DataView ensures correct little-endian PCM handling per Google’s recommendation
  • Master Gain Control: Single gain node for consistent volume across all chunks