Text-to-Speech
Transformers
ONNX
GGUF
Chinese
English
voice-dialogue
speech-recognition
large-language-model
asr
tts
llm
chinese
english
real-time
conversational
Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MoYoYoTech/VoiceDialogue with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto") - llama-cpp-python
How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MoYoYoTech/VoiceDialogue", filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf", )
llm.create_chat_completion( messages = "\"The answer to the universe is 42\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use MoYoYoTech/VoiceDialogue with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use Docker
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- LM Studio
- Jan
- Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Unsloth Studio
How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
- Pi
How to use MoYoYoTech/VoiceDialogue with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MoYoYoTech/VoiceDialogue:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MoYoYoTech/VoiceDialogue with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Lemonade
How to use MoYoYoTech/VoiceDialogue with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MoYoYoTech/VoiceDialogue:Q6_K
Run and chat with the model
lemonade run user.VoiceDialogue-Q6_K
List all available models
lemonade list
系统架构
核心架构理念
本项目采用分层、解耦的模块化架构,旨在实现高度的可维护性和可扩展性。其核心思想是**关注点分离 (Separation of Concerns)**:
- 底层能力子系统 (
asr/,tts/,llm/,audio/): 每个模块都是一个独立的、高内聚的功能单元(如语音识别、音频I/O)。它们不包含业务逻辑,只提供纯粹的能力。 - 服务编排层 (
services/): 负责编排和调度底层子系统的能力,以实现具体的业务流程(如语音对话、状态监控)。 - 接口层 (
api/,cli/): 作为应用的入口,负责接收外部指令,并将其委派给服务层处理。
数据流程图 (CLI 模式)
用户语音输入 → Audio Subsystem (Capture) → ASR Subsystem (Recognize) → LLM Subsystem (Generate) → TTS Subsystem (Synthesize) → Audio Subsystem (Player)
↑ ↓
└─────────────────────────────────────────── 实时语音交互循环 ───────────────────────────────────────────────────────────┘
核心组件与分层
| 层级 | 模块/组件 | 职责描述 |
|---|---|---|
| 接口层 | api/, cli/ |
提供 HTTP/命令行接口,作为系统入口。 |
| 服务层 | services/ |
业务流程编排。例如,PlayerService 负责处理播放任务的业务逻辑(如中断、历史记录),并调用底层播放器。 |
| 能力子系统 | asr/ |
**语音识别 (ASR)**。包含多种识别策略(如 FunASR, Whisper)及其管理。 |
tts/ |
**文本转语音 (TTS)**。包含多种语音合成策略(如 Kokoro, Moyo)及其管理。 | |
llm/ |
**大语言模型 (LLM)**。负责处理文本生成逻辑。 | |
audio/ |
音频I/O。提供纯粹的音频输入(capture)和输出(player)能力,不含业务逻辑。 |
|
| 核心框架 | core/ |
应用骨架。包含线程基类、全局常量、状态管理器和系统启动器。 |
多线程架构
系统采用多线程设计,各组件通过队列进行高效解耦通信:
- 音频捕获线程: (
audio.AecCapture/audio.PyAudioCapture) 持续捕获音频数据。 - 语音监测线程: (
services.SpeechMonitor) 检测用户语音活动。 - ASR工作线程: (
asr.models.*) 语音识别处理。 - LLM工作线程: (
llm.generator) 文本生成处理。 - TTS工作线程: (
tts.models.*) 语音合成处理。 - 播放服务线程: (
services.PlayerService) 处理播放任务的业务逻辑。 - 音频播放线程: (
audio.AudioPlayer) 播放解码后的纯音频数据。