A Model Context Protocol (MCP) server for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.
MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities.
Key features:
# Clone the repository
git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper
# Using uv
uv sync
# Set up pre-commit hooks
uv run pre-commit install
Create a .env
file with the following variables:
OPENAI_API_KEY=your_openai_api_key
AUDIO_FILES_PATH=/path/to/your/audio/files
To run the MCP server in development mode:
mcp dev src/mcp_server_whisper/server.py
To install the server for use with Claude Desktop or other MCP clients:
mcp install src/mcp_server_whisper/server.py [--env-file .env]
list_audio_files
- Lists audio files with comprehensive filtering and sorting options:
get_latest_audio
- Gets the most recently modified audio file with model support infoconvert_audio
- Converts audio files to supported formats (mp3 or wav)compress_audio
- Compresses audio files that exceed size limitstranscribe_audio
- Advanced transcription using OpenAI's models:
whisper-1
, gpt-4o-transcribe
, and gpt-4o-mini-transcribe
chat_with_audio
- Interactive audio analysis using GPT-4o audio models:
gpt-4o-audio-preview-2024-10-01
, gpt-4o-audio-preview-2024-12-17
, and gpt-4o-mini-audio-preview-2024-12-17
transcribe_with_enhancement
- Enhanced transcription with specialized templates:
detailed
- Includes tone, emotion, and background detailsstorytelling
- Transforms the transcript into a narrative formprofessional
- Creates formal, business-appropriate transcriptionsanalytical
- Adds analysis of speech patterns and key pointscreate_claudecast
- Generate text-to-speech audio using OpenAI's TTS API:
gpt-4o-mini-tts
(preferred) and other speech modelsModel | Supported Formats |
---|---|
Transcribe | flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm |
Chat | mp3, wav |
Note: Files larger than 25MB are automatically compressed to meet API limits.
Claude, please transcribe my latest audio file with detailed insights.
Claude will automatically:
get_latest_audio
transcribe_with_enhancement
using the "detailed" templateClaude, list all my audio files that are longer than 5 minutes and were created after January 1st, 2024, sorted by size.
Claude will:
list_audio_files
with appropriate filters:
min_duration_seconds: 300
(5 minutes)min_modified_time: <timestamp for Jan 1, 2024>
sort_by: "size"
Claude, find all MP3 files with "interview" in the filename and create professional transcripts for each one.
Claude will:
list_audio_files
with:
pattern: ".*interview.*\\.mp3"
format: "mp3"
transcribe_with_enhancement
enhancement_type: "professional"
model: "gpt-4o-mini-transcribe"
(for efficiency)Claude, create a claudecast with this script: "Welcome to our podcast! Today we'll be discussing artificial intelligence trends in 2025." Use the shimmer voice.
Claude will:
create_claudecast
tool with:
text_prompt
containing the scriptvoice: "shimmer"
model: "gpt-4o-mini-tts"
(default high-quality model)instructions: "Speak in an enthusiastic, podcast host style"
(optional)speed: 1.0
(default, can be adjusted)Add this to your claude_desktop_config.json
:
{
"mcpServers": {
"whisper": {
"command": "uvx",
"args": [
"--with",
"aiofiles",
"--with",
"mcp[cli]",
"--with",
"openai",
"--with",
"pydub",
"mcp-server-whisper"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key",
"AUDIO_FILES_PATH": "/path/to/your/audio/files"
}
}
}
}
AUDIO_FILES_PATH
to /Users/<user>/Movies/Omi Screen Recorder
and replace <user>
with your usernameThis project uses modern Python development tools including uv
, pytest
, ruff
, and mypy
.
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=src
# Format code
uv run ruff format src
# Lint code
uv run ruff check src
# Run type checking (strict mode)
uv run mypy --strict src
# Run the pre-commit hooks
pre-commit run --all-files
The project uses GitHub Actions for CI/CD:
To create a new release version:
git checkout main
# Make sure everything is up to date
git pull
# Create a new version tag
git tag v0.1.1
# Push the tag
git push origin v0.1.1
For detailed architecture information, see Architecture Documentation.
MCP Server Whisper is built on the Model Context Protocol, which standardizes how AI models interact with external tools and data sources. The server:
Under the hood, it uses:
pydub
for audio file manipulationasyncio
for concurrent processingContributions are welcome! Please follow these steps:
git checkout -b feature/amazing-feature
)uv run pytest && uv run ruff check src && uv run mypy --strict src
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the MIT License - see the LICENSE file for details.
Seamless access to top MCP servers powering the future of AI integration.