Github Dhamuvkl Esp32s3 Ai Voice Assistant

Emily Johnson

-Mar 13, 2026, 7:40 AM

github dhamuvkl esp32s3 ai voice assistant

A fully custom, open-source AI voice assistant powered by ESP32-S3 and Xiaozhi AI framework This project is a complete DIY AI voice assistant built around the ESP32-S3 microcontroller. It combines custom PCB design, advanced audio processing, and cloud-based AI to create a device that rivals commercial smart speakers in functionality while remaining fully open-source and customizable. Unlike simple voice-controlled devices, this assistant leverages the Xiaozhi AI framework to provide natural language understanding through large language models (LLMs) like Qwen, DeepSeek, and GPT. The system uses a hybrid architecture: lightweight tasks run locally on the ESP32-S3, while computationally intensive AI processing happens on cloud servers. 📥 Full BOM with part numbers: Download BOM.csv

The custom PCB is a 2-layer design measuring approximately 80x60mm with careful attention to: 👉 Human: Give AI a camera vs AI: Instantly finds out the owner hasn't washed hair for three days【bilibili】 👉 Handcraft your AI girlfriend, beginner's guide【bilibili】 This is an open-source ESP32 project, released under the MIT license, allowing anyone to use it for free, including for commercial purposes. We hope this project helps everyone understand AI hardware development and apply rapidly evolving large language models to real hardware devices. If you have any ideas or suggestions, please feel free to raise Issues or join the QQ group: 1011329060

Voice-controlled smart devices have transformed the way we interact with technology, and with the arrival of Espressif’s ESP32-S3 platform, building a compact and intelligent voice assistant is now within reach for makers. To explore the possibilities of on-device AI and low-power voice interaction, I designed a custom portable AI voice assistant using ESP32 that integrates Espressif’s Audio Front-End (AFE) framework with the Xiaozhi MCP chatbot system. The result is a self-contained, always-on smart home controller capable of understanding and responding to natural voice commands without needing a phone. This DIY AI voice assistant project demonstrates how accessible embedded AI has become for electronics enthusiasts. The project centres on the ESP32-S3-WROOM-1-N16R8 module, which provides the processing power and Wi-Fi and Bluetooth connectivity required for both local and cloud-based operations. Its dual-core architecture and AI acceleration support allow real-time keyword detection and low-latency response.

For clear voice capture, the system uses two TDK InvenSense ICS-43434 digital MEMS microphones configured in a microphone array, enabling the AFE to perform echo cancellation, beamforming, and noise suppression effectively. Audio output is handled by the MAX98357A I2S amplifier, which drives a small speaker to deliver natural and clear voice feedback. The board’s power section is built around the BQ24250 charger and MAX20402 DC-DC converter, ensuring stable operation under both USB and battery modes. This allows the assistant to function efficiently on wall power or run portably on a Li-ion battery. Careful layout and decoupling were applied to minimise noise and maintain clean signal integrity across the analog and digital domains. To enhance user interaction, WS2812B RGB LEDs were added for visual indication, and tactile switches allow manual control and reset functions.

Each component was selected to balance performance, efficiency, and compactness, resulting in a robust design suited for continuous operation. As a voice interface, the device leverages the Xiaozhi MCP chatbot framework, which connects embedded systems to large language models. Through MCP, the assistant can communicate across multiple terminals, enabling multi-device synchronisation and smart home control. When paired with Espressif’s AFE, this setup provides reliable local wake-word detection and command recognition while extending to cloud AI platforms like Qwen and DeepSeek for complex conversation and natural language understanding. This hybrid approach ensures responsive operation with enhanced cloud intelligence when connected. The firmware was developed in VS Code using the ESP-IDF plugin (version 5.4 or above), with Espressif’s AFE library integrated for real-time voice processing.

I2S, I2C, and GPIO interfaces were configured for peripheral communication, while network connectivity handled both MQTT-based smart device control and MCP protocol data exchange. Thanks to the open-source nature of the Xiaozhi framework, adapting the system for different AI services or custom wake words was straightforward, allowing easy experimentation with different model backends and conversational logic. The complete ESP32 AI voice assistant GitHub repository includes schematics, firmware code, and detailed build instructions for makers looking to replicate this ESP32 voice assistant DIY project. A fully custom, open-source AI voice assistant powered by ESP32-S3 and Xiaozhi AI framework This project is a complete DIY AI voice assistant built around the ESP32-S3 microcontroller. It combines custom PCB design, advanced audio processing, and cloud-based AI to create a device that rivals commercial smart speakers in functionality while remaining fully open-source and customizable.

Unlike simple voice-controlled devices, this assistant leverages the Xiaozhi AI framework to provide natural language understanding through large language models (LLMs) like Qwen, DeepSeek, and GPT. The system uses a hybrid architecture: lightweight tasks run locally on the ESP32-S3, while computationally intensive AI processing happens on cloud servers. 📥 Full BOM with part numbers: Download BOM.csv The custom PCB is a 2-layer design measuring approximately 80x60mm with careful attention to: ESP32S3 AI voice assistant is a voice interaction system based on ESP32S3, implemented with Arduino IDE. In this program, we used Baidu Cloud to do STT(Speech to Text) and TTS(Text to Speech).

The LLM(Large Language Model) used in the system is TONGYIQIANWEN from Aliyun. Arduino IDE version: V2.3.2 Arduino ESP32 version: V3.0.4 (Setting in Tools > Board > Boards Manager...) Seeed Studio XIAO ESP32S3 Sense with digital MIC. Link:https://wiki.seeedstudio.com/xiao_esp32s3_getting_started/ MAX98357A I2S audio amplifier module + Speaker. An ESP32 S3 voice assistant for Home Assistant.

Light weight and should compile on low end systems. The main goal of this project was to create a voice assistant (VA) for HA that is not depending on external files*. A second requirement was that for compiling you don't need much resources. This yaml compile many times faster then the ESP32-S3_box version. This version of the voice assistant is also capable to set multiple timers. * There is one small dependency in the code if you want a sound when a timer is finished.

This dependecy is only there when compiling. A future release may solve this. For now this isn't a big show stopper While version 1 works it still suffered from stalls from time to time. The ESP32-S3-BOX version that I also have showed to be far more stable. Because of this I abandoned to 'no external files' concept and went for the full blown version and took the ESP32-S3-BOX version as start.

I stripped everything that deals with the screen and the buttons on the box. I've added my own sensors and other tools. Things I like to see and control in HA such as diagnostics, request and response sensors and timer info. The hisardware stayed the same so this installs directly on the VoicePuck. This version is a lot heavier when compiling. My advise is to start with cleaning your build files and start compiling.

When it fails just restart again. File locks will occure but they will be cleared. Files that are already compiled will be skipped so the processor load on lower end systems will handle in the end a full compile. This is nearly the same as version 2 but this one makes it possible to have a continues conversation.

Github Dhamuvkl Esp32s3 Ai Voice Assistant

People Also Search

A Fully Custom, Open-source AI Voice Assistant Powered By ESP32-S3

The Custom PCB Is A 2-layer Design Measuring Approximately 80x60mm

Voice-controlled Smart Devices Have Transformed The Way We Interact With

For Clear Voice Capture, The System Uses Two TDK InvenSense

Each Component Was Selected To Balance Performance, Efficiency, And Compactness,