Lyriclens Semantic Search For Song Lyrics With A By Udirno Jan

Emily Johnson
-
lyriclens semantic search for song lyrics with a by udirno jan

I have over 9,000 songs on Spotify. When I want to make a playlist for a specific mood, I’m stuck scrolling or using keyword search, which is useless for abstract themes. If I want “songs about ignoring your problems productively” or “conquering your mind,” there’s no way to find them. The Spotify AI DJ beta is good, but is inconsistent when searching for songs within my already saved songs. Even more difficult, 30% of my library is in Hindi, Punjabi, Spanish, French, and Afrobeat — languages I can’t manually browse because I don’t understand the lyrics. I wanted to search by meaning instead of keywords, so I attempted to build a semantic search system using RAG.

For full technical implementation details, see the deep dive. Repository: github.com/udirno/lyric-lens I built the search system in a few days. The architecture was straightforward: Genius API for lyrics, ChromaDB for embeddings, Claude 3.5 for responses. Getting honest measurement took longer. The AI wrote the data pipeline scripts, handled ChromaDB integration, and implemented the evaluation framework.

This division of labor meant I spent 70% of my time on design and evaluation, 30% on implementation. My first evaluation run showed 7.9% Recall@5. That meant the system was only finding 1 in 13 relevant songs. But when I dug into the diagnostic tools I’d built, I discovered the problem wasn’t the search system — it was the ground truth dataset. Semantic search for song lyrics - find songs by meaning, not keywords. Evaluation Results (15 test queries, 178 ground truth songs):

Built with Python, Sentence Transformers, Chroma, Claude API. LyricLens, developed by Music Smatch, is a production AI system that extracts semantic meaning, themes, entities, cultural references, and sentiment from music lyrics at scale. The platform analyzes over 11 million songs using Amazon Bedrock's Nova family of foundation models to provide real-time insights for brands, artists, developers, and content moderators. By migrating from a previous provider to Amazon Nova models, Music Smatch achieved over 30% cost savings while maintaining accuracy, processing over 2.5 billion tokens. The system employs a multi-level semantic engine with knowledge graphs, supports content moderation with granular PG ratings, and enables natural language queries for playlist generation and trend analysis across demographics, genres, and time periods. LyricLens represents a comprehensive production deployment of large language models for analyzing music lyrics at scale.

Developed by Music Smatch, one of Italy’s leading scale-ups, the platform demonstrates sophisticated LLMOps practices including model selection, evaluation frameworks, prompt optimization, and cost management. The presentation was delivered by Eduardo Randazzo from AWS and Bruno Zambolin, Director of Innovation at Music Smatch, at an AWS event. The core problem LyricLens addresses is extracting deep semantic meaning from music lyrics beyond simple text analysis. Music serves as a universal cultural language that connects generations, geographies, and identities, and the lyrics contain rich information about themes, moods, cultural references, social movements, and emotional states. By analyzing millions of songs using AI and correlating this data with listening trends and demographics, LyricLens enables actionable insights for multiple stakeholders: brands wanting to speak the language of their communities, developers building... The platform processes over 11 million songs (a number that continues to grow daily) through a multi-level semantic engine.

At its core, LyricLens employs Amazon Bedrock’s Nova family of foundation models, specifically designed for this use case after a thorough evaluation period. The selection of Amazon Nova models was driven by several key factors that align with production LLMOps requirements. Amazon Bedrock provides a fully serverless infrastructure where Music Smatch doesn’t need to manage underlying hardware or scaling concerns. The Nova model family includes several variants optimized for different use cases: Nova Micro (text-only, fastest and most economical), Nova Lite (multimodal with 300K token context window), Nova Pro (higher accuracy with 300K context... Discover the deeper meaning behind your favorite songs This is the technical deep dive for LyricLens, a semantic search system for song lyrics.

For the high-level story and key lessons, read the overview first. This post covers implementation details, evaluation methodology, and systematic experiments. Repository: github.com/udirno/lyric-lens The system is fully open source. You can run it on your own Spotify library: You’ll need two free API keys: a Genius API token (https://genius.com/api-clients/new) for fetching lyrics, and an Anthropic API key (https://console.anthropic.com/) for Claude.

Export your Spotify library through Settings → Privacy → Download your data, save it as data/YourLibrary.json, then run the indexing and search scripts: Indexing takes a few hours because of API rate limits, but you can start searching after the first 100 songs are processed. LyricLens is a music-tech prototype that explores how song meaning can be surfaced directly within a streaming experience — without redistributing copyrighted lyrics. Inspired by Spotify’s AI-driven discovery features, LyricLens generates human-readable interpretations of songs using audio features and metadata, with optional user-provided lyric line explanations. Music discovery often focuses on what to listen to, not why a song feels meaningful. LyricLens explores a product direction where interpretation and context are built directly into the listening experience.

This project intentionally avoids storing or redistributing full song lyrics and focuses on explainable, metadata-driven interpretation.

People Also Search

I Have Over 9,000 Songs On Spotify. When I Want

I have over 9,000 songs on Spotify. When I want to make a playlist for a specific mood, I’m stuck scrolling or using keyword search, which is useless for abstract themes. If I want “songs about ignoring your problems productively” or “conquering your mind,” there’s no way to find them. The Spotify AI DJ beta is good, but is inconsistent when searching for songs within my already saved songs. Even ...

For Full Technical Implementation Details, See The Deep Dive. Repository:

For full technical implementation details, see the deep dive. Repository: github.com/udirno/lyric-lens I built the search system in a few days. The architecture was straightforward: Genius API for lyrics, ChromaDB for embeddings, Claude 3.5 for responses. Getting honest measurement took longer. The AI wrote the data pipeline scripts, handled ChromaDB integration, and implemented the evaluation fra...

This Division Of Labor Meant I Spent 70% Of My

This division of labor meant I spent 70% of my time on design and evaluation, 30% on implementation. My first evaluation run showed 7.9% Recall@5. That meant the system was only finding 1 in 13 relevant songs. But when I dug into the diagnostic tools I’d built, I discovered the problem wasn’t the search system — it was the ground truth dataset. Semantic search for song lyrics - find songs by meani...

Built With Python, Sentence Transformers, Chroma, Claude API. LyricLens, Developed

Built with Python, Sentence Transformers, Chroma, Claude API. LyricLens, developed by Music Smatch, is a production AI system that extracts semantic meaning, themes, entities, cultural references, and sentiment from music lyrics at scale. The platform analyzes over 11 million songs using Amazon Bedrock's Nova family of foundation models to provide real-time insights for brands, artists, developers...

Developed By Music Smatch, One Of Italy’s Leading Scale-ups, The

Developed by Music Smatch, one of Italy’s leading scale-ups, the platform demonstrates sophisticated LLMOps practices including model selection, evaluation frameworks, prompt optimization, and cost management. The presentation was delivered by Eduardo Randazzo from AWS and Bruno Zambolin, Director of Innovation at Music Smatch, at an AWS event. The core problem LyricLens addresses is extracting de...