把每一段聲音,即時變成你看得懂的字幕 Turn any sound into subtitles you understandin real time

jt-live-whisper 是 100% 全地端的 AI 語音工具集:即時轉錄、即時翻譯、錄音檔批次處理、講者辨識與會議摘要。所有 AI 模型都在你自己的設備上執行,資料不經過任何雲端服務。 jt-live-whisper is a 100% on-device AI voice toolkit: live transcription, live translation, batch audio processing, speaker diarization and meeting summaries. Every AI model runs on your own hardware — nothing ever touches the cloud.

WebUI 英中雙向對話模式 / WebUI bidirectional chat
即時翻譯字幕Live subtitles 中日英互譯ZH / JA / EN 講者辨識Speaker diarization AI 會議摘要AI summaries 雙向字幕Bidirectional 終端機 CLITerminal CLI WebUI 介面Web UI 懸浮字幕Overlay subtitles 關鍵字通知Keyword alerts macOS · Windows

資料留在自己手上,才能安心開機密會議Your data stays yours — safe enough for confidential meetings

語音辨識、翻譯、講者辨識、摘要全部使用自有設備上的 AI 模型,不需要任何雲端 API Key,也不會把語音或會議內容上傳給第三方。適合企業內部會議、機密討論,以及任何在意隱私的場合。Speech recognition, translation, diarization and summarization all run on AI models on your own hardware — no cloud API keys, nothing uploaded to any third party. Built for internal meetings, confidential discussions and anyone who cares about privacy.

為什麼選 jt-live-whisperWhy jt-live-whisper

一套工具,從即時翻譯到會後摘要全包,全程不離開你的設備。One toolkit — from live translation to post-meeting summaries — all without leaving your device.

完全地端執行Fully on-device

辨識、翻譯、講者辨識、摘要全部用自有設備上的 AI 模型,無需雲端 API Key、不上傳任何資料。Recognition, translation, diarization and summaries all use local AI models — no cloud API key, no uploads.

隱私安全Private & secure

會議內容、語音資料全程留在自有設備,適合企業內部會議與機密討論。Meeting content and audio stay on your machine — ideal for internal and confidential discussions.

零月租成本Zero subscription

不需要付費雲端 API(ChatGPT、Claude、Gemini 等),所有採用的 AI 模型皆為自由開源。No paid cloud APIs (ChatGPT, Claude, Gemini…) — every model is free and open source.

不限應用程式Any application

採用系統音訊裝置層級擷取,理論上任何軟體的聲音輸出都能處理(Zoom、Teams、Meet、YouTube、Podcast 等)。Captures at the system-audio device level — virtually any app's output works (Zoom, Teams, Meet, YouTube, podcasts…).

功能完整Complete toolkit

從即時轉錄翻譯、離線音訊處理、講者辨識到 AI 摘要,一套搞定。From live transcription & translation to offline processing, diarization and AI summaries — all in one.

三行指令安裝Three-command install

貼上三行指令,安裝腳本自動下載並編譯所有 AI 模型和相依套件,macOS 與 Windows 都支援。The installer downloads and builds every AI model and dependency for you — macOS and Windows.

緣起:某次參加原廠的線上技術課程,全程英文授課,聽得七零八落。為了補足英文聽力的不足,乾脆動手打造了這套工具來即時翻譯,結果功能越做越多,就變成現在這個樣子了 XD Origin: Struggling to follow an all-English vendor training session, the author built a tool to translate it live — then kept adding features until it became what you see today. 😄

核心功能Core features

即時、離線、講者、摘要——會議全流程都覆蓋。Live, offline, speakers and summaries — the whole meeting workflow.

即時語音轉錄翻譯Live transcription & translation

擷取系統音訊,地端 AI 即時辨識並翻譯成繁體中文字幕顯示於終端機。開會、看影片、聽 Podcast 即時翻譯。Captures system audio and transcribes & translates it to subtitles in real time — meetings, videos, podcasts.

離線音訊檔批次處理Offline batch processing

支援 mp3 / wav / m4a / flac,使用 faster-whisper 離線轉錄翻譯,適合會後補做逐字稿。mp3 / wav / m4a / flac via faster-whisper — perfect for post-meeting transcripts.

講者辨識Speaker diarization

自動辨識音訊中的不同講者並以不同顏色標示,支援自動偵測或手動指定講者人數。Detects different speakers and color-codes them — auto-detect or set the speaker count manually.

AI 會議摘要AI meeting summary

透過地端 LLM 產出重點整理 + 校正逐字稿,搭配講者辨識,摘要中不同講者以不同顏色區分。A local LLM produces key-point summaries plus a corrected transcript, color-coded by speaker.

時間軸互動逐字稿Interactive transcript

HTML 逐字稿內嵌音訊播放器與波形圖,點波形即跳到該時間點,播放時對應段落即時高亮。HTML transcript with an embedded player and waveform — click to seek; the current line highlights as it plays.

雙向字幕 + 10 種模式Bidirectional & 10 modes

英中 / 日中雙向,同時擷取系統音訊與麥克風,對方外語翻中文、自己中文翻外語;共 10 種功能模式。EN↔ZH / JA↔ZH bidirectional from system audio + mic, plus 10 modes in total.

看完整功能詳解 →See all features →

兩種操作介面Two ways to operate

習慣終端機,或偏好瀏覽器圖形介面,兩種都行——同一套功能、同一份設定。Prefer the terminal or a browser GUI? Both drive the exact same features and config.

終端機 CLI + 互動式選單Terminal CLI + interactive menu

直接 ./start.sh 進入互動式選單,逐步引導完成所有設定;進階用戶可用命令列參數一行啟動。Run ./start.sh for a guided interactive menu, or launch directly with CLI flags for power users.

  • 新手友善的選單,音訊裝置全自動偵測Beginner-friendly menu; audio devices auto-detected
  • 選單最後顯示等效 CLI 指令,下次可直接用The menu prints the equivalent CLI command for next time
  • 字幕直接顯示在終端機,最輕量Subtitles right in the terminal — the lightest option

瀏覽器 WebUIBrowser Web UI

./start.sh --webui 在瀏覽器中完成所有設定與操作,不需記指令。即時 / 離線功能全包。./start.sh --webui does everything in the browser — no commands to remember. Covers live & offline.

  • 聊天 / 字幕模式切換、淺色 / 深色主題Chat / subtitle modes, light / dark themes
  • 各階段即時進度(辨識 / 講者 / 校正 / 摘要)Live progress for each stage (ASR / diarize / correct / summary)
  • 支援遠端觀看(密碼保護)、手機 / 平板Remote viewing (password-protected), phone / tablet

兩種部署方式Two ways to deploy

一台電腦就能跑;要更快,再加一台 GPU 伺服器。兩種模式可隨時切換、自動降級。Run on one machine, or add a GPU server for speed. Switch anytime — it auto-falls back.

單機模式Single machine

一台 Mac 或 Windows PC 即可完成所有處理,不需要額外硬體。One Mac or Windows PC handles everything — no extra hardware.

  • macOS Apple Silicon(M1–M4):mlx-whisper Metal GPU 加速,辨識約 1–3 秒macOS Apple Silicon (M1–M4): mlx-whisper Metal GPU, ~1–3 s recognition
  • Windows + NVIDIA GPU:安裝程式自動啟用 CUDA,辨識約 0.5–1 秒Windows + NVIDIA GPU: installer auto-enables CUDA, ~0.5–1 s
  • 無 GPU / Intel Mac:CPU 辨識,搭配 small 模型可用No GPU / Intel Mac: CPU recognition with the small model

本機 + GPU 伺服器Client + GPU server

本機負責音訊擷取與介面,辨識與講者辨識交給區網內的 GPU 伺服器(系統音訊與麥克風兩路都可送遠端)。The client captures audio & UI; recognition and diarization run on a LAN GPU server (both system audio & mic).

  • 離線辨識快 5–10 倍,即時辨識約 0.3–0.5 秒Offline 5–10× faster; live recognition ~0.3–0.5 s
  • 伺服器可為 DGX Spark 或裝有 NVIDIA GPU 的 Linux 主機(RTX 4090 / 5090 亦可)DGX Spark or any Linux box with an NVIDIA GPU (RTX 4090 / 5090 work too)
  • 伺服器離線自動降級為本機處理,不中斷使用Auto-falls back to local if the server is offline — no interruption
實際畫面Screenshots

畫面導覽See it in action

點任一張圖可放大。Click any image to enlarge.

即時英翻中字幕(macOS)
01

即時翻譯字幕Live translated subtitles

擷取系統音訊,地端 AI 一邊聽一邊辨識並翻成繁體中文,字幕即時顯示於終端機,並附上翻譯速度標籤與音訊波形。開會、看影片、聽 Podcast 都能即時跟上。Captures system audio and transcribes & translates it live, with speed badges and an audio waveform right in the terminal — keep up with meetings, videos and podcasts.

WebUI 設定頁
02

WebUI 瀏覽器介面Web UI in your browser

--webui 在瀏覽器中完成所有設定與操作:即時字幕、離線處理、講者辨識、摘要全包,辨識模型依裝置自動推薦,各階段即時進度顯示,手機 / 平板也能用。With --webui, do everything in the browser — live subtitles, offline processing, diarization and summaries — with device-aware model recommendations, live progress, and phone/tablet support.

WebUI 字幕模式 - 雙向
03

雙向字幕模式Bidirectional subtitles

英中(en_zh)與日中(ja_zh)雙向:同時擷取系統音訊與麥克風,對方外語翻中文、自己中文翻外語,適用於雙語視訊會議。電影風格黑底大字,一眼看清。EN↔ZH and JA↔ZH: captures system audio and your mic at once — their language to Chinese, your Chinese to theirs. Big cinema-style captions for bilingual calls.

離線處理選單
04

離線批次處理Offline batch processing

匯入錄音檔即可離線轉錄翻譯。互動式選單依序引導模式、辨識位置與模型、翻譯引擎、講者辨識與摘要設定,最後顯示等效 CLI 指令方便下次直接用。Import a recording for offline transcription. The interactive menu walks you through mode, engine, diarization and summary, then prints the equivalent CLI command for next time.

講者辨識結果
05

講者辨識Speaker diarization

自動辨識音訊中的不同講者,以不同顏色清楚標示,支援自動偵測或手動指定 2–20 位講者。誰在什麼時候說了什麼,一目了然。Identifies different speakers and color-codes them — auto-detect or set 2–20 speakers manually. Who said what, when, at a glance.

AI 會議摘要
06

AI 會議摘要AI meeting summary

批次對記錄檔生成摘要,透過地端 LLM 產出重點整理與校正逐字稿。搭配講者辨識時,摘要中不同講者以不同顏色區分,會後重點立即成形。Generates summaries from logs via a local LLM — key points plus a corrected transcript, color-coded by speaker when diarization is on.

時間軸逐字稿 HTML
07

時間軸互動逐字稿Interactive timeline transcript

時間逐字稿 HTML 內嵌音訊播放器與波形圖,可直接點波形任意位置跳至該時間點;播放時對應段落即時高亮,對照聆聽超方便。另可輸出 SRT / WebVTT 字幕檔。The HTML transcript embeds a player and waveform — click anywhere to seek, and the matching line highlights as it plays. SRT / WebVTT export too.

關鍵字即時通知
08

關鍵字即時通知Keyword alerts

設定關鍵字,即時辨識出現時自動全螢幕警示 + 音效提醒。可追蹤會議重點,或線上課程在「請實作」「這個會考」時自動提醒,內建冷卻機制避免重複通知。Set keywords and get a full-screen alert plus sound the moment they're spoken — great for tracking key topics, with a cooldown to avoid repeats.

懸浮字幕
09

懸浮字幕Overlay subtitles

桌面半透明字幕覆蓋視窗(PyQt6),可疊加於任何應用程式上方。字體依視窗大小自動縮放、可拖曳移動與調整大小、支援滑鼠穿透與淡入淡出動畫,單語 / 雙語自動切換高度。A translucent desktop overlay (PyQt6) that floats over any app — auto-scaling text, drag & resize, click-through mode and fade animations.

字幕轉發到 Telegram
10

字幕轉發Subtitle forwarding

即時字幕自動轉發到通訊平台:Telegram / Slack / Discord / Teams / LINE / Nextcloud Talk / 通用 API。可同時啟用多平台、自訂發送間隔與內容(含時間 / 原文 / 譯文)。Forwards live subtitles to Telegram / Slack / Discord / Teams / LINE / Nextcloud Talk / a custom API — multiple at once, with custom intervals and content.

三行指令安裝Install in 3 commands

安裝腳本自動下載並設定所有地端 AI 模型和相依套件。首次約 10–20 分鐘。The installer downloads and configures every local AI model and dependency. First run takes ~10–20 min.

macOS — 終端機貼上即可macOS — paste into Terminal
mkdir -p ~/Apps/jt-live-whisper && cd ~/Apps/jt-live-whisper
curl -fsSL https://raw.githubusercontent.com/jasoncheng7115/jt-live-whisper/main/install.sh -o install.sh
bash install.sh
Windows — 以系統管理員開啟 PowerShellWindows — open PowerShell as Administrator
mkdir C:\jt-live-whisper -Force | Out-Null; cd C:\jt-live-whisper
irm https://raw.githubusercontent.com/jasoncheng7115/jt-live-whisper/main/install.ps1 -OutFile install.ps1
powershell -ExecutionPolicy Bypass -File install.ps1
啟動(推薦 WebUI)Launch (Web UI recommended)
# macOS
./start.sh --webui
# Windows
.\start.ps1 --webui

不裝 LLM 也能翻譯:程式可切換為 NLLB(中日英互譯)或 Argos(僅英翻中)離線翻譯引擎,完全不需要額外伺服器。注意:摘要功能仍需 LLM 伺服器(推薦 Ollama)。 No LLM needed to translate: switch to the NLLB (ZH/JA/EN) or Argos (EN→ZH) offline engines — no server required. Summaries still need an LLM server (Ollama recommended).

完整安裝與使用說明 →Full install & usage docs →