安裝與使用 — jt-live-whisper

系統需求System requirements

AmacOS

macOS（Apple Silicon / Intel）· Python 3.12+ · Homebrew（需事先安裝）· BlackHole 2ch（虛擬音訊驅動，安裝腳本會自動安裝）macOS (Apple Silicon / Intel) · Python 3.12+ · Homebrew (pre-installed) · BlackHole 2ch (auto-installed)

BWindows

Windows 10 以上 · Python 3.12+（從 python.org 安裝，勾選「Add to PATH」）· PowerShell 5.1+（Windows 10 內建）Windows 10+ · Python 3.12+ (from python.org, check "Add to PATH") · PowerShell 5.1+ (built in)

C共通Common

地端 LLM 伺服器（翻譯 / 摘要用，推薦 Ollama）。沒有 LLM 伺服器也能用：可切換為 NLLB / Argos 離線翻譯，但摘要功能需要 LLM。A local LLM server (for translation/summary; Ollama recommended). Works without one via NLLB / Argos offline translation — but summaries need an LLM.

磁碟空間需求Disk space

安裝腳本會在安裝前自動檢查可用空間是否足夠。The installer checks free space before starting.

本機Local machine

安裝層級Install tier	大小Size	內容Includes
最小Minimal	~3 GB	venv + 1 個 Whisper 模型 + 基本套件venv + 1 Whisper model + basics
推薦Recommended	~8 GB	加上 HuggingFace 快取（離線處理用）+ HuggingFace cache (offline processing)
完整Full	~14 GB	全部 Whisper 模型 + 快取 + MoonshineAll Whisper models + cache + Moonshine

GPU 伺服器（選配）GPU server (optional)

安裝層級Install tier	大小Size	內容Includes
最小Minimal	~5 GB	PyTorch + 1 個模型PyTorch + 1 model
完整Full	~12 GB	PyTorch + 全部 5 個模型 + 講者辨識套件PyTorch + all 5 models + diarization

快速開始Quick start

1. 三行指令安裝1. Install in 3 commands

macOS

mkdir -p ~/Apps/jt-live-whisper && cd ~/Apps/jt-live-whisper
curl -fsSL https://raw.githubusercontent.com/jasoncheng7115/jt-live-whisper/main/install.sh -o install.sh
bash install.sh

Windows (PowerShell)

mkdir C:\jt-live-whisper -Force | Out-Null; cd C:\jt-live-whisper
irm https://raw.githubusercontent.com/jasoncheng7115/jt-live-whisper/main/install.ps1 -OutFile install.ps1
powershell -ExecutionPolicy Bypass -File install.ps1

安裝腳本會自動下載並設定所有地端 AI 模型和相依套件。最後會詢問是否設定 GPU 語音辨識伺服器（選填）。首次安裝約 10–20 分鐘（視網路速度，macOS 需額外編譯 whisper.cpp）。The installer downloads and configures all local AI models and dependencies, then offers optional GPU-server setup. First run ~10–20 min (macOS also compiles whisper.cpp).

2. 設定音訊裝置2. Set up audio

macOS — BlackHole

安裝 BlackHole 後需重新啟動電腦，再到「音訊 MIDI 設定」建立虛擬裝置：After installing BlackHole, restart, then in "Audio MIDI Setup" create:

多重輸出裝置（必要）：勾選喇叭 / 耳機 + BlackHole 2ch，主裝置選 BlackHole；到「系統設定 → 聲音 → 輸出」選此裝置。讓系統音訊同時送到耳機與 BlackHole，程式才能擷取對方聲音。Multi-Output Device (required): check your speakers/headphones + BlackHole 2ch, set BlackHole as primary, then select it in System Settings → Sound → Output.
聚集裝置（選配，錄雙方聲音用）：勾選 BlackHole 2ch + 麥克風，時脈來源選 BlackHole，其他裝置勾「偏移修正」。程式會自動偵測為錄音裝置。Aggregate Device (optional, to record both sides): check BlackHole 2ch + your mic, clock source BlackHole, "Drift Correction" on others. Auto-detected for recording.

Zoom / Teams 的喇叭輸出要設成「多重輸出裝置」，不能直接選 AirPods，否則 BlackHole 收不到聲音。麥克風維持原設定即可。Set Zoom/Teams speaker output to the Multi-Output Device (not AirPods directly), or BlackHole gets no audio. Leave the mic as-is.

Windows — WASAPI

Windows 不需要安裝額外虛擬音訊驅動。程式透過 WASAPI Loopback 直接擷取系統播放的音訊，大多數情況不需手動設定。Windows needs no extra audio driver — the app captures system playback via WASAPI Loopback, usually with zero setup.

若自動偵測失敗，可啟用「立體聲混音」：右鍵音量圖示 → 音效設定 → 錄製 → 右鍵「顯示已停用的裝置」→ 啟用「立體聲混音」。If auto-detect fails, enable "Stereo Mix": right-click the volume icon → Sound settings → Recording → show disabled devices → enable Stereo Mix.
驗證：執行 .\start.ps1 --list-devices 確認列表中有 loopback 裝置。Verify with .\start.ps1 --list-devices — a loopback device should appear.

3. 安裝地端 LLM（翻譯 / 摘要用）3. Install a local LLM (translate / summary)

# macOS：透過 Homebrew 安裝
brew install ollama
# Windows：從 https://ollama.com/ 下載安裝程式
# 下載推薦的翻譯模型（兩平台皆同）
ollama pull qwen2.5:14b

推薦硬體：若有 NVIDIA DGX Spark（128GB），把 Ollama 裝在上面 CP 值很高，透過 --llm-host 指向即可。不裝 LLM 也能翻譯：可切換 NLLB（中日英互譯）或 Argos（僅英翻中）離線引擎，但摘要仍需 LLM。Recommended: a DGX Spark (128 GB) is great value for Ollama; point to it via --llm-host. No LLM? Use NLLB / Argos offline — summaries still need an LLM.

4. 啟動4. Launch

# macOS
cd ~/Apps/jt-live-whisper
./start.sh
# Windows (PowerShell)
cd C:\jt-live-whisper
.\start.ps1

程式進入互動式選單，依序選擇功能模式、翻譯引擎、AI 辨識模型等。音訊裝置全自動偵測，不需手動選擇。The interactive menu walks you through mode, translation engine and model. Audio devices are auto-detected.

使用方式Usage

以下以 macOS 指令為主，Windows 請把 ./start.sh 換成 .\start.ps1，其餘參數相同。Examples use macOS; on Windows replace ./start.sh with .\start.ps1 — flags are identical.

WebUI 瀏覽器介面（推薦）Web UI (recommended)

./start.sh --webui            # macOS
.\start.ps1 --webui           # Windows

自動開啟瀏覽器（預設 http://localhost:19781），網頁中完成所有設定後按「開始」即可。即時 / 離線功能全包，支援淺色 / 深色主題、手機 / 平板。Opens the browser (default http://localhost:19781); configure and click Start. Live & offline features, light/dark themes, phone/tablet.

即時模式（邊聽邊轉）Live mode

# CLI 模式（跳過選單）
./start.sh --mode en2zh --engine llm --llm-model qwen2.5:14b
# 英中雙向字幕（對方英文翻中文 + 自己中文翻英文）
./start.sh --mode en_zh
# 日中雙向字幕
./start.sh --mode ja_zh
# 即時翻譯 + 同時轉錄麥克風
./start.sh --mode en2zh --mic

離線處理音訊檔Offline audio files

# 英翻中 + 自動摘要
./start.sh --input meeting.mp3 --summarize
# 講者辨識
./start.sh --input meeting.mp3 --diarize
# 指定講者人數 + 摘要
./start.sh --input meeting.mp3 --diarize --num-speakers 3 --summarize

批次摘要Batch summary

./start.sh --summarize logs/英翻中_逐字稿_20260101_120000.txt

即時模式快捷鍵：Ctrl+C 停止轉錄 · Ctrl+P 暫停 / 繼續。Live-mode shortcuts: Ctrl+C stop · Ctrl+P pause / resume.

互動式選單功能一覽Interactive menu

不帶任何參數啟動即進入選單，逐步引導完成設定；最後顯示等效 CLI 指令。Launch with no flags to enter the menu; it prints the equivalent CLI command at the end.

即時模式選單Live-mode menu

#	項目Step	說明Notes
1	輸入來源Input source	即時語音 / 讀入檔案Live / file
2	功能模式Mode	10 種模式分群顯示10 modes, grouped
3	麥克風轉錄Mic	轉錄模式詢問是否同時錄麥克風Optional in transcription modes
4	辨識位置Location	GPU 伺服器 / 本機（有設定才顯示）GPU server / local (if configured)
5	ASR 引擎ASR engine	Whisper / Moonshine（英文）Whisper / Moonshine (English)
6	辨識模型Model	依裝置效能自動推薦Auto-recommended
7	翻譯引擎Translate engine	LLM / NLLB / ArgosLLM / NLLB / Argos
8	翻譯模型Translate model	動態查詢伺服器模型Live server model list
9	會議主題Topic	選填，提升術語準確度Optional, improves jargon
10	音訊場景Scene	會議 / 教育訓練 / 快速字幕Meeting / training / subtitle
11	錄音設定Recording	混合 / 僅播放音訊 / 不錄Mix / playback only / none
12	確認啟動Confirm	顯示等效 CLI 指令Shows equivalent CLI

離線處理選單Offline menu

#	項目Step	說明Notes
1	功能模式Mode	9 種（不含純錄音）9 modes (no record-only)
2	辨識位置Location	GPU 伺服器快 5–10 倍GPU server 5–10× faster
3	辨識模型Model	伺服器模式顯示快取標籤Cache tags in server mode
4	LLM 伺服器LLM server	翻譯模式才詢問，自動偵測類型Translate modes; auto-detect
5	翻譯模型Translate model	伺服器模型 + 本機離線選項Server models + offline
6	講者辨識Diarization	不辨識 / 自動 / 指定 2–20 人None / auto / 2–20 speakers
7	摘要與校正Summary	摘要+校正 / 只摘要 / 只逐字稿Summary+correct / summary / transcript
8	摘要模型Summary model	推薦 120B 以上≥120B recommended
9	會議主題Topic	選填，提升翻譯與摘要品質Optional
10	確認啟動Confirm	顯示等效 CLI 與設定總覽Shows CLI & summary

命令列參數Command-line flags

參數Flag	說明Description	預設Default
`--webui`	啟動 WebUI 瀏覽器介面Launch the Web UI
`--mode MODE`	功能模式（en2zh / zh2en / ja2zh / zh2ja / en_zh / ja_zh / en / zh / ja / record）Mode (en2zh / zh2en / ja2zh / zh2ja / en_zh / ja_zh / en / zh / ja / record)	`en2zh`
`--asr ASR`	辨識引擎（whisper / moonshine / faster-whisper）ASR engine	`whisper`
`-m, --model`	Whisper 模型（base.en … large-v3-turbo / large-v3）Whisper model	依裝置推薦auto
`--moonshine-model`	Moonshine 模型（medium / small / tiny）Moonshine model	`medium`
`-s, --scene`	使用場景（meeting / training / presentation / subtitle）Scene	`training`
`-e, --engine`	翻譯引擎（llm / nllb / argos）Translation engine	`llm`
`--llm-model`	LLM 翻譯模型LLM translation model	`qwen2.5:14b`
`--llm-host HOST`	LLM 伺服器位址（自動偵測類型）LLM server address (auto-detected)
`--topic TOPIC`	會議主題（提升翻譯與摘要品質）Meeting topic
`-d, --device ID`	音訊裝置 IDAudio device ID	自動偵測auto
`--list-devices`	列出可用音訊裝置後離開List audio devices and exit
`--input FILE […]`	離線處理音訊檔Offline-process audio files
`--diarize`	啟用講者辨識（需 --input）Enable diarization (needs --input)
`--num-speakers N`	指定講者人數（需 --diarize）Speaker count (needs --diarize)	自動auto
`--summarize [FILE …]`	生成 AI 摘要Generate AI summary
`--summary-model`	摘要用 LLM 模型Summary LLM model	`gpt-oss:120b`
`--mic`	同時轉錄麥克風（即時模式）Also transcribe mic (live)
`--record`	即時模式同時錄製音訊Record audio in live mode
`--rec-device ID`	錄音裝置 IDRecording device ID
`--denoise`	即時模式啟用背景降噪Background denoise (live)
`--local-asr`	強制本機辨識（忽略 GPU 伺服器）Force local recognition
`--restart-server`	強制重啟 GPU 伺服器Force-restart GPU server

目錄結構與技術架構Layout & architecture

目錄結構Directory layout

jt-live-whisper/
  translate_meeting.py     # 主程式（跨平台）
  webui.py / webui.html    # WebUI 後端 + 前端
  subtitle_overlay.py      # 懸浮字幕（PyQt6）
  start.sh / start.ps1     # 啟動腳本
  install.sh / install.ps1 # 安裝腳本
  remote_whisper_server.py # GPU 伺服器端服務（選配）
  config.json              # 使用者設定（自動產生）
  SOP.md / CHANGELOG.md    # 手冊 / 版本記錄
  logs/  recordings/       # 記錄檔 / 暫存音訊
  whisper.cpp/  venv/      # 即時引擎 / 虛擬環境

技術架構Pipeline

# 即時模式
系統音訊(BlackHole / WASAPI)
  → Whisper / Moonshine # 地端辨識
    → LLM / NLLB / Argos # 地端翻譯
      → 終端字幕 + 記錄檔

# 離線模式
音訊檔 → ffmpeg → faster-whisper
  → (選配) 講者辨識
    → LLM 翻譯 + AI 摘要

# WebUI（--webui）
webui.py (FastAPI + WebSocket)
  → 瀏覽器設定頁
  → 啟動 translate_meeting.py 子程序
  → TCP localhost:19780 收事件
  → WebSocket 推送到瀏覽器

硬體建議Hardware guide

所有 AI 推論皆在地端執行，硬體規格直接影響辨識速度與體驗。All inference is local — hardware directly affects speed and experience.

macOS

配置Config	記憶體RAM	適用場景Use case
Apple（M2 以上）Apple (M2+)	16 GB	即時轉錄、離線處理；GPU 加速 mlx-whisper，推薦 large-v3-turboLive + offline; mlx-whisper GPU, large-v3-turbo
Apple（M2 以上）Apple (M2+)	24 GB+	即時轉錄 + 本機 LLM；可同時跑 Ollama 14B 翻譯Live + local LLM (Ollama 14B alongside)
Intel	8 GB+	離線處理為主；即時建議搭 GPU 伺服器Offline-focused; pair a GPU server for live

Windows

配置Config	即時辨識Live	離線 7 分鐘音檔Offline 7-min clip
純 CPU（無獨顯）CPU only	勉強可用Usable	~15–25 分min
GTX 1660 Super (6 GB)	可用OK	~1–2 分min
RTX 4060 (8 GB)	流暢Smooth	~30–40 秒（性價比最高）s (best value)
RTX 4060 Ti (16 GB)	流暢Smooth	~20–30 秒s
RTX 3060 (12 GB)	流暢Smooth	~40–50 秒s

Windows + NVIDIA GPU 是最簡單的高效能方案：不需額外硬體或伺服器，安裝後直接用 large-v3-turbo，即時與離線都有 CUDA 加速。最低建議 6 GB VRAM。Windows + NVIDIA GPU is the simplest high-performance setup — no extra hardware, CUDA for both live and offline. 6 GB VRAM minimum.

GPU 伺服器（選配，辨識加速）GPU server (optional)

GPU	VRAM	離線 7 分鐘音檔Offline 7-min	說明Notes
RTX 4060 以上RTX 4060+	8 GB+	~20–30 秒s	消費級入門Entry
RTX 4090	24 GB	~10–15 秒s	消費級旗艦Flagship
NVIDIA DGX Spark	128 GB	~10 秒s	同時跑 Ollama LLM + Whisper，一機搞定Ollama LLM + Whisper on one box

LLM 伺服器（選配，翻譯 / 摘要）LLM server (optional)

用途Use	建議模型Model	記憶體 / VRAMRAM / VRAM
翻譯Translation	14B 以上（如 qwen2.5:14b）≥14B (e.g. qwen2.5:14b)	~12 GB
摘要Summary	120B 以上（如 gpt-oss:120b）≥120B (e.g. gpt-oss:120b)	~80 GB

升級Upgrade

# macOS
./install.sh --upgrade
# Windows (PowerShell)
.\install.ps1 -Upgrade

自動從 GitHub 下載最新版本，升級後建議重新執行安裝腳本確認相依套件完整。Pulls the latest from GitHub; re-run the installer afterward to verify dependencies.