Common Modules Esp Study Shuaiwen Cui Github Io
The standard input and output library is a standard library of the C language, which provides a series of input and output functions, such as printf, scanf, etc. The header file of the standard input and output library is stdio.h. After introducing this header file, you can use the functions of the standard input and output library. The string library is a standard library of the C language, which provides a series of string processing functions, such as strcpy, strcat, etc. The header file of the string library is string.h. After introducing this header file, you can use the functions of the string library.
ESP_LOG is the log module of ESP-IDF, which provides some log output functions, such as ESP_LOGI, ESP_LOGE, etc. esp_log.h represents the header file of the log module. After introducing this header file, you can use the functions of the log module. ESP_TIMER is the timer module of ESP-IDF, which provides some timer functions, such as esp_timer_create, esp_timer_start_once, etc. esp_timer.h represents the header file of the timer module. After introducing this header file, you can use the functions of the timer module.
ESP_RANDOM is the random number module of ESP-IDF, which provides some random number functions, such as esp_random, esp_random_uniform, etc. esp_random.h represents the header file of the random number module. After introducing this header file, you can use the functions of the random number module. 📜 Check and Download My CV - English (2026) 📜 Check and Download My CV - Chinese (2026) Ubiquitous Computing and Intelligence | AIoT | Structural Health Monitoring | AI Agents
Embedded System Hardware and Software Development | Edge Computing and Intelligence | Signal Processing | Internet of Things | Digital Twin | Artificial Intelligence | System Identification | Structural Health Monitoring CV1711 Engineering Drawing (CAD) and Building Information Modeling (BIM) AY2025/2026 With the development of large language models (LLMs), the ability to handle longer contexts has become a key capability for Web applications such as cross-document understanding and LLM-powered search systems. However, this progress faces two major challenges: performance degradation due to sequence lengths out-of-distribution, and excessively long inference times caused by the quadratic computational complexity of attention. These issues hinder the application of LLMs in long-context scenarios. In this paper, we propose Dynamic Token-Level KV Cache Selection (TokenSelect), a model-agnostic, training-free method for efficient and accurate long-context inference.
TokenSelect builds upon the observation of non-contiguous attention sparsity, using Query-Key dot products to measure per-head KV Cache criticality at token-level. By per-head soft voting mechanism, TokenSelect selectively involves a small number of critical KV cache tokens in the attention calculation without sacrificing accuracy. To further accelerate TokenSelect, we designed the Selection Cache based on observations of consecutive Query similarity and implemented efficient dot product kernel, significantly reducing the overhead of token selection. A comprehensive evaluation of TokenSelect demonstrates up to 23.84×23.84\times speedup in attention computation and up to 2.28×2.28\times acceleration in end-to-end latency, while providing superior performance compared to state-of-the-art long-context inference methods. With the rapid development of large language models (LLMs), the number of parameters is no longer the sole factor significantly affecting model performance. The ability to effectively process longer context information has become one of the key metrics for evaluating LLMs’ capabilities.
The latest Web applications such as cross-document understanding (Bai et al., 2024), LLM-powered search systems (Sharma et al., 2024), repository-level code completion (Zhang et al., 2023; Di et al., 2024), and complex reasoning (OpenAI,... d.]) have all placed higher demands on the long-context abilities of LLMs. There are two main difficulties in using pre-trained LLMs for long-context inference. On one hand, LLMs are limited by their context length during pre-training (e.g. Llama 3 only has 8192 tokens). Directly inferencing on longer sequences can lead to severe performance degradation due to reasons including sequence lengths out-of-distribution (Xiao et al., 2024; Han et al., 2024).
On the other hand, even if LLMs possess sufficiently large context lengths, the quadratic computational complexity of attention with respect to sequence length makes the response time for long-context inference unbearable. Previous works have made numerous attempts to address these difficulties. To extend the context length of LLMs, the current common practice is to perform post-training on long texts (Team et al., 2024; Yang et al., 2024; GLM et al., 2024). However, this approach comes with significant computational costs, particularly in two aspects: the synthesis of high-quality long-text data and the training process on extended sequences. To accelerate long-context inference, many studies focus on the sparsity of attention, attempting to reduce the scale of KV Cache involved in computation. The key to this type of method lies in designing sparse patterns for attention, which can be mainly divided into two categories: one uses predefined sparse patterns (Wang et al., 2019; Zaheer et al.,...
However, the design of these sparse patterns is often heuristically based on historical criticality or coarse-grained criticality estimation of tokens, making it difficult to ensure that the selected tokens are truly critical, thus resulting... 1. In this paper, we further observe the non-contiguous sparsity of attention, revealing the importance of designing more fine-grained dynamic sparse patterns. To this end, we propose TokenSelect, a model-agnostic and training-free approach that utilizes token-level selective sparse attention for efficient long-context inference and length extrapolation. Specifically, for each Query, TokenSelect dynamically calculates token-level per-head criticality for the past KV Cache and selects the k𝑘k most critical tokens through our head soft vote mechanism, involving them in the attention calculation. This reduces the scale of attention calculation to a constant length familiar to the model, while maintaining almost all of the long-context information, thereby simultaneously addressing the two main difficulties for long-context inference.
To reduce the overhead of token selection, TokenSelect manages the KV Cache in token-level pages (Zheng et al., 2024) and design efficient kernel for token selection based on Paged KV Cache management through Triton... Furthermore, based on our observation of high similarity between consecutive queries, we have designed the Selection Cache, which allows consecutive similar queries to share token selection results, thereby reducing the selection frequency while ensuring... We evaluate the performance and efficiency of TokenSelect on three representative long-context benchmarks (Zhang et al., 2024; Hsieh et al., 2024; Bai et al., 2024) using three open-source LLMs (Yang et al., 2024; Dubey... The experimental results demonstrate that our TokenSelect can achieve up to 23.84×23.84\times speedup in attention computation compared to FlashInfer (flashinfer ai, [n. d.]), and up to 2.28×2.28\times acceleration in end-to-end inference latency compared to state-of-the-art long-context inference method (Xiao et al., 2024). Simultaneously, it provides superior performance on three long-text benchmarks.
In summary, we make the following contributions: This project is for ESP32 study and practice. Espressif Systems is a semiconductor company based in China, known for developing low-power wireless solutions, including Wi-Fi and Bluetooth modules and SoCs (System on Chips). Their products, such as the ESP8266 and ESP32 series, have gained popularity in IoT, embedded systems, and wireless communication due to their low cost, power efficiency, and ease of use. Espressif provides a range of development tools and software support, making their products widely adopted by developers and engineers in various industries. ESP-IDF (Espressif IoT Development Framework) is the official development framework for the ESP32 series chips from Espressif.
It provides a comprehensive set of tools and libraries to help developers create robust applications for ESP32-based devices. ESP-IDF supports FreeRTOS, Wi-Fi, Bluetooth, and a wide range of peripherals. It is compatible with the GCC toolchain and supports multiple languages like C and C++. The framework includes components for networking, security, power management, and driver development, making it suitable for a wide range of IoT applications. There are two ways to use ESP-IDF: ESP-IDF Command Prompt and GUI-based IDEs like Visual Studio Code. The latter is the most popular way to use ESP-IDF.
For rapid prototyping, we use MICROPYTHON; for high-performance applications, we use ESP_IDF.
People Also Search
- COMMON MODULES - ESP_STUDY - shuaiwen-cui.github.io
- GitHub - Shuaiwen-Cui/ESP_STUDY: esp dev board study
- Shuaiwen-cui
- Shuaiwen Cui - Google Scholar
- Home Assistant | These are just some of the sections I've ... - Facebook
- TokenSelect: Efficient Long-Context Inference and
- 常州大学mba招生简章
- ESP_STUDY - shuaiwen-cui.github.io
- Shuaiwen CUI - GitHub
- COMMON-COMMANDS - ESP_STUDY - shuaiwen-cui.github.io
The Standard Input And Output Library Is A Standard Library
The standard input and output library is a standard library of the C language, which provides a series of input and output functions, such as printf, scanf, etc. The header file of the standard input and output library is stdio.h. After introducing this header file, you can use the functions of the standard input and output library. The string library is a standard library of the C language, which...
ESP_LOG Is The Log Module Of ESP-IDF, Which Provides Some
ESP_LOG is the log module of ESP-IDF, which provides some log output functions, such as ESP_LOGI, ESP_LOGE, etc. esp_log.h represents the header file of the log module. After introducing this header file, you can use the functions of the log module. ESP_TIMER is the timer module of ESP-IDF, which provides some timer functions, such as esp_timer_create, esp_timer_start_once, etc. esp_timer.h repres...
ESP_RANDOM Is The Random Number Module Of ESP-IDF, Which Provides
ESP_RANDOM is the random number module of ESP-IDF, which provides some random number functions, such as esp_random, esp_random_uniform, etc. esp_random.h represents the header file of the random number module. After introducing this header file, you can use the functions of the random number module. 📜 Check and Download My CV - English (2026) 📜 Check and Download My CV - Chinese (2026) Ubiquitou...
Embedded System Hardware And Software Development | Edge Computing And
Embedded System Hardware and Software Development | Edge Computing and Intelligence | Signal Processing | Internet of Things | Digital Twin | Artificial Intelligence | System Identification | Structural Health Monitoring CV1711 Engineering Drawing (CAD) and Building Information Modeling (BIM) AY2025/2026 With the development of large language models (LLMs), the ability to handle longer contexts ha...
TokenSelect Builds Upon The Observation Of Non-contiguous Attention Sparsity, Using
TokenSelect builds upon the observation of non-contiguous attention sparsity, using Query-Key dot products to measure per-head KV Cache criticality at token-level. By per-head soft voting mechanism, TokenSelect selectively involves a small number of critical KV cache tokens in the attention calculation without sacrificing accuracy. To further accelerate TokenSelect, we designed the Selection Cache...