Whispererr – Middle-Click Voice‑to‑AI for Every Input

Live demo

See Whispererr in action

This is the experience: middle-click, speak in your own language, let Whispererr translate and drop polished text exactly where you need it. Watch the overlay pop up, the translation toggle flip, and the result appear instantly in the input field.

The video is captured right from macOS Brave – no edits, no mockups. This is what you get when you install the extension and native helper with your own OpenAI key.

What Whispererr does

Whispererr is a browser extension that gives you voice-driven generative AI anywhere you can type. Middle-click in any text field, speak naturally, and Whispererr converts your speech into structured AI output: emails, tickets, docs, GitHub issues, answers, or even code.

Instead of juggling tabs, copy-paste, or separate prompt windows, you talk once and the text appears right where you need it. Your flow stays in the browser tab you’re in: Jira, Gmail, Notion, GitHub, ChatGPT – anywhere.

You don’t have to think like an AI whisperer to get value out of it either. Whispererr is built so you can just talk like a human, in your own language, and let the system do the prompt engineering, translation, and formatting behind the scenes.

How the middle-click magic works

A lightweight native host and the Whispererr browser extension listen for your middle-click in focused text fields. While you hold the button, Whispererr records audio, transcribes it via Whisper, wraps it in a smart prompt, and calls your preferred LLM.

The generated text is written straight into the input field you clicked on.

Why builders love Whispererr

Whispererr eliminates the gap between “I know what I want to say” and “it’s written down – perfectly”. You stay in flow, speak your intent, and let AI handle the typing, formatting, and boilerplate.

Whether you’re a developer, founder, support agent or writer, it feels like adding a tiny assistant into your mouse: one that understands your language, your shortcuts, and your favourite tools.

Middle-click, anywhere

Use the middle mouse button in any browser input field to start voice-to-AI instantly.
Voice to code, docs, or mail

Dictate bug reports, commit messages, emails, or even code; Whispererr and your model do the typing.
Whisper inside

Leverages Whisper-style speech recognition for fast, accurate transcripts as the base for AI.
Native translation built in

Speak in your native language and let Whispererr translate it for you – for example, dictate in German, flip the language switch to EN, and export the final result in English.
Your models, your rules

Point Whispererr at your preferred AI backend and prompt presets – you stay in control of outputs and cost.

Native language in, English out

This is the “no shit, that’s cool” feature. You speak in your native language – for example German – and with a simple language switch Whispererr turns your words into fluent English you can drop into emails, tickets, docs, or code comments.

You stay in your language. Whispererr handles the translation and AI polish, then types the English result directly into whatever input field you were using.

Whispererr language switch UI for native language to English translation — Language switch: dictate in your language, output in English.

Works everywhere

Drop polished text even into Telegram Web

Whispererr hooks into any input field that your browser exposes, so even “exotic” apps like web.telegram get a native-feeling voice-to-AI upgrade. Middle-click, talk in your usual language, and watch the message box fill up without needing bots, slash commands, or weird integrations.

Need to message an English-speaking business partner? Keep speaking in your native language, flick the English output toggle, and Whispererr types a flawless Telegram message for you. No more noisy voice memos or typo-heavy texts—just clean, translated prose ready to send.

Whispererr dictation producing an English message inside Telegram Web — Telegram Web compose field filled with Whispererr’s translated output.

Google AI Studio + Whisper

Dictate rows, build AI apps faster

Google AI Studio is fantastic for assembling Gemini-powered app logic, but its built-in voice capture can’t keep up with Whisper-level accuracy. Whispererr fixes that gap: the AI Enablement + AI Integration Brave extension lets you dictate whole rows of requirements, prompts, or copy, then pass the clean transcript straight into Gemini.

The same flow we show with Telegram on this page now lives inside AI Studio and our site: hold the trigger, speak in your native language, and Whispererr streams the best-in-class speech recognition output into Google’s UI so you can assemble generative AI apps without touching the keyboard.

You’re effectively pairing the market leader in language recognition (Whisper) with the market leader for building generative AI applications (Gemini in Google AI Studio) – a super combo that makes shipping voice-dictated AI products feel instant.

Google AI Studio screenshot with Whispererr dictation — Google AI Studio fed by Whispererr’s dictation inside the Brave extension.

Why Whisper?

What OpenAI Whisper actually is

Whisper is the foundation that makes Whispererr possible. It’s OpenAI’s open speech-recognition model trained on hundreds of thousands of hours of multilingual audio. Instead of brittle command-style systems, it uses a massive transformer to understand natural speech in many languages, detect accents, and auto-translate when you need it.

Behaves like an API: stream microphone audio in, get accurate text with timestamps and language labels back.
Handles mixed languages and noisy environments with surprising stability.
Production tested inside ChatGPT voice mode and countless third-party apps.
Lets Whispererr capture speech in German, Portuguese, English, etc., and feed the confident transcription into your LLM.

Long story short: you benefit from world-class research without building your own speech stack – we just point Whispererr at Whisper and everything downstream gets higher quality input.

Security + quality

Encrypted HTTPS to OpenAI

Multilingual transformer brain

Context-aware translation

Runs in the same infra that powers ChatGPT voice.

Security-grade voice-to-text

Whispererr leans on OpenAI’s Whisper stack because it’s built like serious infrastructure, not a toy dictation widget. Audio is streamed over HTTPS straight to OpenAI’s encrypted endpoints, processed inside hardened data centers, and returned as text over the same secure channel. Nothing is stored inside the extension – once the transcription lands, the audio buffer is discarded.

Whisper itself is trained on massive multilingual corpora, so your translation isn’t a literal word swap: the model understands context, tone, and domain-specific phrasing before handing back the text that flows into your LLM.

Security highlights

End-to-end HTTPS transport between your browser, native host, and OpenAI’s API.
OpenAI stores audio only as long as needed to return the transcript; Whispererr drops the local file immediately.
Multilingual transformer model reduces hallucinations and enforces context-aware translation.
Your API key stays in the macOS Keychain, so there’s no plaintext credential inside the browser.

The result: enterprise-grade voice capture with accurate translation that you can trust inside regulated workflows.

Pro controls for power users

Whispererr comes with a couple of professional “in-the-weeds” controls. Click the toolbar icon to trigger recording immediately, or use the floating dot in each input. Prefer shortcuts? Map any key combo – even your middle mouse button with tools like Keyboard Maestro or BetterTouchTool – and fire dictation without touching the UI.

Extension icon → record

Tap the Whispererr icon in your browser toolbar and recording starts instantly. The icon badge flips to “REC” so you know it’s listening, and you can still use the dot inside inputs if you prefer.

Custom shortcuts & middle-click

Assign a global shortcut right in Whispererr or bridge your middle mouse button via Keyboard Maestro / BetterTouchTool / custom scripts. Hold that trigger, talk, and Whispererr handles the rest.

Where Whispererr shines

Whispererr is built for people who think faster than they can type. It keeps you in the tools you already use, but removes the slow part – turning half-formed ideas into clean, structured text.

Developers
Product & founders
Support & success
Sales & outreach
Writers & researchers

Developers shipping faster

Dictate commit messages, PR descriptions, GitHub issues, or “explain this code” prompts while your hands stay on the keyboard. Whispererr can translate from your native language, tighten up the wording, and post it straight into your dev tools.

Support that sounds human

Talk through what the customer is struggling with and let Whispererr draft a clear, empathetic reply in English. Use your own language to think and reason, then send polished responses that match your team’s tone of voice.

Founders, PMs and writers

Brain-dump product ideas, meeting notes or rough outlines into any text field. Whispererr turns the ramble into structured specs, summaries, or first-draft copy that you can tweak instead of writing from scratch.

Built for hands-on engineers

Whispererr isn’t a generic dictation toy. It’s for people who already live in their terminal, IDE, or browser dev tools, and want to move even faster: software engineers, cloud/SRE/DevOps folks, systems engineers, power users who tweak everything, and founders who ship their own product.

If you’re comfortable juggling tabs, APIs, and automation, Whispererr fits right in. If you’d rather have IT set it up for you, this probably isn’t your tool (yet).

It also assumes you can control your environment. Whispererr is engineered for focused makers working from a studio, home office, or any setup where talking to your computer is natural — not for open-plan floors where seventeen colleagues share the same desk pod. If you thrive in a quiet space, work remotely, and invest in good microphones and fast workflows, you’re exactly who we built it for.

Stylized card showing Whispererr power users

Requirements today

macOS device plus Chrome, Brave, or any Chromium-based browser.
Your own OpenAI API key stored in the macOS Keychain.
Willingness to experiment with presets, prompts, and workflows.
Comfortable running a native helper + browser extension combo.

In short: if you build or operate software and know how to grab an API token, Whispererr is for you.

Core Whispererr features

Under the hood, Whispererr is a small set of sharp, opinionated features designed to kill busywork: speak once, get clean output, and keep your hands free for the parts that actually need your brain.

🖱️

Middle-click trigger

Hold the middle mouse button in any input field to start recording instantly. No extra windows, widgets, or overlays – just click, talk, release.

🌍

Native → English switch

Speak in your native language – for example German – and let Whispererr translate and polish the result into clear English that’s ready to send.

🔀

Instant translation toggle

Switch between “native language in, same language out” and German → English translation from the floating menu so you always dictate in whatever language feels natural.

🧩

Works in all your tools

Jira, Gmail, Notion, GitHub, helpdesk, your internal admin UI – if it has a text box in the browser, Whispererr can type into it.

⚡

Native host speed

A lightweight native host handles audio locally, sends just what’s needed to your models, and keeps the whole flow feeling instant.

🔐

Your keys, your control

Use your own OpenAI API key from the macOS Keychain and wire Whispererr into the models and prompts that fit your workflow and budget.

Whispererr – VoiceToEverywhere · Founder Log

🚀 The full story behind Whispererr – VoiceToEverywhere

On the web you see this walkie-talkie feature from chatgpt.com every day: press the mic, talk, and a few seconds later perfectly readable text appears in the window. Not the usual clunky word-by-word transcription the other tools spit out, but true speech recognition — fast, clean, stable.

There’s just one catch: you only get that quality on chatgpt.com.

Everywhere else online — comments, forms, support tickets, email, admin interfaces — you’re stuck typing the old-fashioned way. So it was obvious: we needed a way to bring this high-end speech recognition to any input field across the entire internet. One button → mic opens → you talk → the text lands right where your cursor is. Done.

Sure, there was a Chrome extension that tried to do something similar, but of course it was 19 €/month, bloated UI, endless options. When the trial ended, the UI just faded to black. So we said: If the world won’t hand us a good tool, we’ll build it ourselves.

Not the old-school way either — we’re doing it 2025 style.

The software? Whispererr – VoiceToEverywhere.

🎤 Whisper in VS Code → turns your speech into a prompt

The heart of the workflow starts inside Visual Studio Code with an extension called “Whisper Assistant” (not our Whispererr system, the stock extension). You hit record, speak freely in German, and the extension outputs clean English text:

Let’s say I don’t have a Brave Browser extension and imagine that you go to a field where you can enter something. And then the extension recognizes that you are in a field and then a small dot appears and you can click on the dot. And with the small dot, the dictation starts immediately, where you can dictate in OpenAI Whisper. When you are done dictating, you click on the dot again and then the whole thing is automatically inserted where your cursor is.

use the macOS Keychain for the OpenAI token: o-api

design proposal for the UI is attached.

(Original first prompt, btw! plus two UI concept screenshots.)

This is where the first piece of magic happens:

Whisper simply turns your spoken sentence into text.

No structure. No analysis. No decomposition into components.

Just your words → dropped as a prompt in the editor. That’s the starting point.

⚡ GPT-5.1 Codex extension → builds the entire extension from that prompt

Then the machine that does the heavy lifting takes over: the Codex 5.1 extension inside VS Code. Codex reads the Whisper prompt and handles the entire technical implementation. Not “a little assist” — we’re talking full ownership:

Codex writes the whole blueprint and codes everything end to end.

Folder structure, architecture, events, listeners, browser permissions, manifest
API handling, middle-click hooking, UI interactions, messaging between content and background scripts
Error handling, logging, and everything you need for a stable extension

Codex and Whisper extensions in action — Example: Codex 5.1 extension + Whisper Assistant working side by side — this is exactly how the browser extension comes to life.

Screenshot of the automatically generated folder structure — Gen AI even builds the full folder structure — scroll this page and we explain every path in detail.

Please also explore the rest of this site: we walk through how generative AI doesn’t just write code, it orchestrates the entire file and folder structure step by step.

Codex writes line after line like a developer deep in the tunnel.

The extension ships fully generated — without you typing a single character. After roughly 90 minutes (Brave made loading the extension a little… stubborn) the first version already did this:

Click into any text field
Press the middle mouse button
Mic overlay opens
Dictate
Text appears in the field

Voilà — Whispererr – VoiceToEverywhere was born.

🌐 Next up: microsite + deployment pipeline

The web presence was dictated too — first in German, Codex answered in English. The machine generated the full microsite, static files, layout, styles, text, JS interactions, and finally a push.sh script for FTP uploads.

Screenshot of the generated push.sh deployment script — push.sh: automatically generated deployment bash script straight from Codex.

complete microsite
static files + structure + styles + text + JS functionality
images folder (assets dropped in manually later)

I created a folder named “web”; inside that folder we still need a small website, basically a product page that explains what the software does, how it works, what the advantages are — around 500 words. Please create an extra CSS file and an extra JavaScript file. Also make another folder called “images”; I’ll drop the assets in myself. Just start — language should be English.

please create a push.sh script for FTP upload with the following data: ...

Codex delivered:

the full script
upload logic + FTP transfer + paths + exit codes
automated cleanup, logging, and retry handling

In other words: every single piece you need for website deployment — generated.

🔧 The parts we still have to do manually

Of course there are steps AI can’t automate because they require access to real systems (our site lives on a traditional hoster, not a hyperscaler API platform):

create the subdomain inside the hosting panel
activate TLS certificates
create the FTP account
store the credentials in the macOS Keychain

That’s classic infrastructure work. A human has to click those systems because AI can’t log into your hoster (again: hoster, not an API-first hyperscaler). Once that’s done, the machine takes back over — push.sh handles deployment like we’re on a full CI/CD stack.

🧠 The essence

A senior engineer plans, reviews, and signs off. The machine does the programming.

You define the requirement: what should happen, how should it behave, what should the UX feel like?

Whisper → turns speech into text. Codex → builds code, architecture, files, events, deployment. All generated, all end-to-end. No typing. No fiddling in an editor.

That’s not “AI integration”. That’s AI as the full software producer — you’re the architect, the machine is the crew on site. That’s exactly what 2025 calls an “AI application par excellence”. 🚀🔥

Workflow in three steps

Install the Whispererr browser extension and native host so middle-clicks in inputs can trigger voice capture.
Add your OpenAI API token to the macOS Keychain named (`openai-api-keychain`).
Middle-click, speak, release – Whispererr drops the AI result directly into the text field you were in.

From typing to talking

Whispererr is built out of a simple frustration: we think in paragraphs, but keyboards make us type in slow motion. With Whispererr, “coding”, “writing” and “explaining” become something you say – not something you grind out character by character.

“Whispererr is the middle-click that finally makes that conversation feel like Star Trek: ‘Computer, ...’”

What it costs to talk instead of type

Whispererr itself is just the tool. The only ongoing cost comes from your OpenAI usage – mainly the Whisper transcription API and the language model you plug in. The punchline: voice feels like a superpower, but the bill usually looks like loose change.

API costs in plain language

As of today, the OpenAI Whisper API is billed per minute of audio (around a few tenths of a Euro per hour), and the text model is billed per 1,000 tokens – a few cents for hundreds of messages. Light or heavy use, you only pay for what you actually dictate.

You add your own API key in the macOS Keychain, so you keep full control. If pricing changes, Whispererr automatically follows whatever your OpenAI account charges.

Example: 50 hours of dictation

Say you go hard and dictate around 2 hours every workday – that’s roughly 50 hours per month. With current Whisper pricing (about 0.006 USD per minute), the math looks like:

50 hours × 60 minutes × 0.006 USD ≈ 18 USD in transcription
plus just a few Euros in LLM tokens, even when you let the AI polish every result

In practice that’s in the ballpark of 15–25 € per month for intensive daily usage. Dictate less? Pay less.

Typing vs. talking math

If you’re paid even 30 € per hour, saving just 10 minutes per day easily covers the monthly API bill. Most people gain much more – especially if they spend hours writing tickets, emails, or documentation.

So you swap “hunting for letters on a keyboard” for talking in your own language, and the cost difference is tiny compared to the time you get back.

How much time Whispererr saves

Stop pecking at a keyboard and start talking. Whispererr turns minutes of typing into seconds of speaking, so you can get back to building, shipping, and thinking instead of hunting for the right keys.

3×

faster than typing

Draft the same email, ticket, or spec in a third of the time by speaking it once instead of hammering it out by hand.

+4h

per week back

Replace daily “writing chores” – status updates, summaries, replies – with a few minutes of voice and instant AI output.

0

context switches

Speak in your own language, get polished English in the same input field, and never bounce between tools or tabs.