Nthlam Xiaozhi Esp32 Fork An Mcp Based Chatbot Github

Emily Johnson

-Mar 13, 2026, 8:47 AM

nthlam xiaozhi esp32 fork an mcp based chatbot github

👉 人类：给 AI 装摄像头 vs AI：当场发现主人三天没洗头【bilibili】这是一个由虾哥开源的 ESP32 项目，以 MIT 许可证发布，允许任何人免费使用，或用于商业用途。我们希望通过这个项目，能够帮助大家了解 AI 硬件开发，将当下飞速发展的大语言模型应用到实际的硬件设备中。如果你有任何想法或建议，请随时提出 Issues 或加入 QQ 群：1011329060 小智 AI 聊天机器人作为一个语音交互入口，利用 Qwen / DeepSeek 等大模型的 AI 能力，通过 MCP 协议实现多端控制。 As a software engineer, this project offers several valuable opportunities, primarily in IoT (Internet of Things), Edge Computing, and AI Integration.

This project is beneficial for software engineers in several key areas Low-Cost AI Deployment You can use cheap, readily available ESP32 microcontrollers to create sophisticated voice assistants. This is a game-changer for budget-conscious projects or mass-market IoT devices. Edge Computing Focus The ESP32 handles Voice Activity Detection (VAD) and Audio Processing locally. This is an example of edge computing, where processing is done close to the user, reducing latency and reliance on constant, high-bandwidth cloud connectivity for all tasks. Protocol Implementation The project uses MCP (Model Context Protocol), which is an open protocol for AI-powered device control.

Engineers gain experience in implementing custom or open protocols over WebSocket or UDP for real-time, low-latency communication with a cloud server hosting the LLM/TTS (Text-to-Speech) APIs. Commercial voice assistants like Alexa and Google Assistant are impressive, but they often come with trade-offs: privacy concerns, limited customisation, and cloud lock-in. For makers and engineers, that naturally raises a question: Can we build our own ESP32 AI Voice Assistant - one that’s open, hackable, and truly ours? With the ESP32-S3 and the Xiaozhi AI framework, the answer is yes. In this article, I will walk through the design and implementation of a portable ESP32-S3 AI voice assistant that supports wake-word detection, natural conversation, smart-device control, and battery operation.

This project combines embedded systems, real-time audio processing, and cloud-based large language models into a single, open-source device. This DIY AI voice assistant is built around the ESP32-S3-WROOM-1-N16R8, paired with a dual-microphone array, an I²S audio amplifier, and robust power management for portable use. Voice-controlled smart devices have transformed the way we interact with technology, and with the arrival of Espressif’s ESP32-S3 platform, building a compact and intelligent voice assistant is now within reach for makers. To explore the possibilities of on-device AI and low-power voice interaction, I designed a custom portable AI voice assistant using ESP32 that integrates Espressif’s Audio Front-End (AFE) framework with the Xiaozhi MCP chatbot system. The result is a self-contained, always-on smart home controller capable of understanding and responding to natural voice commands without needing a phone. This DIY AI voice assistant project demonstrates how accessible embedded AI has become for electronics enthusiasts.

The project centres on the ESP32-S3-WROOM-1-N16R8 module, which provides the processing power and Wi-Fi and Bluetooth connectivity required for both local and cloud-based operations. Its dual-core architecture and AI acceleration support allow real-time keyword detection and low-latency response. For clear voice capture, the system uses two TDK InvenSense ICS-43434 digital MEMS microphones configured in a microphone array, enabling the AFE to perform echo cancellation, beamforming, and noise suppression effectively. Audio output is handled by the MAX98357A I2S amplifier, which drives a small speaker to deliver natural and clear voice feedback. The board’s power section is built around the BQ24250 charger and MAX20402 DC-DC converter, ensuring stable operation under both USB and battery modes. This allows the assistant to function efficiently on wall power or run portably on a Li-ion battery.

Careful layout and decoupling were applied to minimise noise and maintain clean signal integrity across the analog and digital domains. To enhance user interaction, WS2812B RGB LEDs were added for visual indication, and tactile switches allow manual control and reset functions. Each component was selected to balance performance, efficiency, and compactness, resulting in a robust design suited for continuous operation. As a voice interface, the device leverages the Xiaozhi MCP chatbot framework, which connects embedded systems to large language models. Through MCP, the assistant can communicate across multiple terminals, enabling multi-device synchronisation and smart home control. When paired with Espressif’s AFE, this setup provides reliable local wake-word detection and command recognition while extending to cloud AI platforms like Qwen and DeepSeek for complex conversation and natural language understanding.

This hybrid approach ensures responsive operation with enhanced cloud intelligence when connected. The firmware was developed in VS Code using the ESP-IDF plugin (version 5.4 or above), with Espressif’s AFE library integrated for real-time voice processing. I2S, I2C, and GPIO interfaces were configured for peripheral communication, while network connectivity handled both MQTT-based smart device control and MCP protocol data exchange. Thanks to the open-source nature of the Xiaozhi framework, adapting the system for different AI services or custom wake words was straightforward, allowing easy experimentation with different model backends and conversational logic. The complete ESP32 AI voice assistant GitHub repository includes schematics, firmware code, and detailed build instructions for makers looking to replicate this ESP32 voice assistant DIY project. You can create a release to package software, along with release notes and links to binary files, for other people to use.

Learn more about releases in our docs. 👉 Human: Give AI a camera vs AI: Instantly finds out the owner hasn't washed hair for three days【bilibili】 👉 Handcraft your AI girlfriend, beginner's guide【bilibili】 As a voice interaction entry, the XiaoZhi AI chatbot leverages the AI capabilities of large models like Qwen / DeepSeek, and achieves multi-terminal control via the MCP protocol. The current v2 version is incompatible with the v1 partition table, so it is not possible to upgrade from v1 to v2 via OTA. For partition table details, see partitions/v2/README.md.

All hardware running v1 can be upgraded to v2 by manually flashing the firmware.

Nthlam Xiaozhi Esp32 Fork An Mcp Based Chatbot Github

People Also Search

👉 人类：给 AI 装摄像头 Vs AI：当场发现主人三天没洗头【bilibili】这是一个由虾哥开源的 ESP32 项目，以 MIT

This Project Is Beneficial For Software Engineers In Several Key

Engineers Gain Experience In Implementing Custom Or Open Protocols Over

This Project Combines Embedded Systems, Real-time Audio Processing, And Cloud-based

The Project Centres On The ESP32-S3-WROOM-1-N16R8 Module, Which Provides The

Nthlam Xiaozhi Esp32 Fork An Mcp Based Chatbot Github

People Also Search

👉 人类：给 AI 装摄像头 Vs AI：当场发现主人三天没洗头【bilibili】 这是一个由虾哥开源的 ESP32 项目，以 MIT

This Project Is Beneficial For Software Engineers In Several Key

Engineers Gain Experience In Implementing Custom Or Open Protocols Over

This Project Combines Embedded Systems, Real-time Audio Processing, And Cloud-based

The Project Centres On The ESP32-S3-WROOM-1-N16R8 Module, Which Provides The

👉 人类：给 AI 装摄像头 Vs AI：当场发现主人三天没洗头【bilibili】这是一个由虾哥开源的 ESP32 项目，以 MIT