Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

VoiceDialogue / Vibevoice /architecture.md

liumaolin

Update project docs.

2f3888a 11 months ago

preview code

Raw

History Blame Contribute Delete

2.96 kB

系统架构

核心架构理念

本项目采用分层、解耦的模块化架构，旨在实现高度的可维护性和可扩展性。其核心思想是**关注点分离 (Separation of Concerns)**：

底层能力子系统 (asr/, tts/, llm/, audio/): 每个模块都是一个独立的、高内聚的功能单元（如语音识别、音频I/O）。它们不包含业务逻辑，只提供纯粹的能力。
服务编排层 (services/): 负责编排和调度底层子系统的能力，以实现具体的业务流程（如语音对话、状态监控）。
接口层 (api/, cli/): 作为应用的入口，负责接收外部指令，并将其委派给服务层处理。

数据流程图 (CLI 模式)

用户语音输入 → Audio Subsystem (Capture) → ASR Subsystem (Recognize) → LLM Subsystem (Generate) → TTS Subsystem (Synthesize) → Audio Subsystem (Player)
↑                                                                                                                        ↓
└─────────────────────────────────────────── 实时语音交互循环 ───────────────────────────────────────────────────────────┘

核心组件与分层

层级	模块/组件	职责描述
接口层	`api/`, `cli/`	提供 HTTP/命令行接口，作为系统入口。
服务层	`services/`	业务流程编排。例如，`PlayerService` 负责处理播放任务的业务逻辑（如中断、历史记录），并调用底层播放器。
能力子系统	`asr/`	语音识别 (ASR)。包含多种识别策略（如 FunASR, Whisper）及其管理。
	`tts/`	文本转语音 (TTS)。包含多种语音合成策略（如 Kokoro, Moyo）及其管理。
	`llm/`	大语言模型 (LLM)。负责处理文本生成逻辑。
	`audio/`	音频I/O。提供纯粹的音频输入(`capture`)和输出(`player`)能力，不含业务逻辑。
核心框架	`core/`	应用骨架。包含线程基类、全局常量、状态管理器和系统启动器。

多线程架构

系统采用多线程设计，各组件通过队列进行高效解耦通信：

音频捕获线程: (audio.AecCapture/audio.PyAudioCapture) 持续捕获音频数据。
语音监测线程: (services.SpeechMonitor) 检测用户语音活动。
ASR工作线程: (asr.models.*) 语音识别处理。
LLM工作线程: (llm.generator) 文本生成处理。
TTS工作线程: (tts.models.*) 语音合成处理。
播放服务线程: (services.PlayerService) 处理播放任务的业务逻辑。
音频播放线程: (audio.AudioPlayer) 播放解码后的纯音频数据。