Fixing ChatGPT Voice Mode in Manaby: The macOS Microphone Challenge
How we solved the microphone permission issues preventing ChatGPT Voice Mode from working in Electron-based browsers on macOS.
Table of Contents
- The Problem
- Why Desktop Apps Struggle
- The macOS Audio Device Maze
- How ChatGPT’s Desktop App Has the Same Issue
- Our Solution
- Technical Deep Dive
- The Result
The Problem
ChatGPT’s Voice Mode is one of its most impressive features. You can have a real-time conversation with the AI, and it responds naturally with voice. But when I tried using it in Manaby, nothing happened. The microphone button would appear, but clicking it did nothing. No error, no feedback—just silence.
This is the kind of bug that drives developers crazy. The feature worked perfectly in Safari and Chrome. Why not in an Electron-based browser?
Why Desktop Apps Struggle
Web browsers have been dealing with microphone permissions for years. They have mature, battle-tested permission systems. But Electron apps are different. We’re running Chromium inside a native shell, and that creates complications:
- System-level permissions - macOS requires apps to explicitly request microphone access in the entitlements
- User consent dialogs - The system needs to prompt the user before granting access
- Device enumeration - The app needs to correctly identify which audio devices are available
- Default device selection - When a user has multiple microphones, which one do we use?
Each of these can fail silently, leaving you with a microphone that just doesn’t work.
The macOS Audio Device Maze
macOS has one of the most complex audio systems of any operating system. Users can have:
- Built-in MacBook microphones
- AirPods with microphones
- External USB microphones
- Bluetooth headsets
- Virtual audio devices (for screen recording, streaming, etc.)
The system maintains a “default” device, but this can change dynamically. Plug in headphones? The default changes. Connect AirPods? It changes again. Remove them? Back to the built-in mic.
For web apps in a normal browser, this is handled automatically. But for Electron apps, we need to manage this ourselves.
How ChatGPT’s Desktop App Has the Same Issue
Here’s the interesting part: ChatGPT’s official desktop app for macOS has the exact same problem.
When you try to use Voice Mode in the ChatGPT desktop app, you can’t select which microphone to use. It tries to use the system default, but the selection doesn’t work correctly. Many users have reported this issue—Voice Mode simply doesn’t work in the desktop app.
This validated what we were seeing. The problem wasn’t unique to Manaby. It was a fundamental challenge with how Electron apps (and similar desktop frameworks) handle audio device permissions on macOS.
Our Solution
We implemented a multi-layer fix:
1. Cross-Origin Isolation
ChatGPT’s Voice Mode uses advanced web features that require cross-origin isolation. We added proper headers and permission handling:
// Handle cross-origin isolation for advanced features
session.webRequest.onHeadersReceived((details, callback) => {
const headers = details.responseHeaders || {};
if (details.url.includes('chat.openai.com') ||
details.url.includes('chatgpt.com')) {
headers['Cross-Origin-Opener-Policy'] = ['same-origin'];
headers['Cross-Origin-Embedder-Policy'] = ['require-corp'];
}
callback({ responseHeaders: headers });
});
2. Enhanced Permission Handling
We improved how Manaby requests and manages microphone permissions:
// Request permission for media devices
session.setPermissionRequestHandler((webContents, permission, callback) => {
const allowedPermissions = ['media', 'microphone', 'camera'];
if (allowedPermissions.includes(permission)) {
// Grant permission after user consent
callback(true);
} else {
callback(false);
}
});
3. Device Enumeration Fix
The key fix was in how we enumerate and select audio devices:
// Properly enumerate media devices
const devices = await navigator.mediaDevices.enumerateDevices();
const audioInputs = devices.filter(d => d.kind === 'audioinput');
// Find the default device or fall back to the first available
const defaultDevice = audioInputs.find(d => d.deviceId === 'default')
|| audioInputs[0];
4. macOS Entitlements
We ensured the app’s entitlements properly declare microphone access:
<key>com.apple.security.device.audio-input</key>
<true/>
Technical Deep Dive
The root cause was a combination of issues:
-
WebRTC initialization - ChatGPT’s Voice Mode uses WebRTC for real-time audio. Our WebRTC initialization wasn’t properly handling the permission flow.
-
Device ID caching - macOS device IDs can change between sessions. We were caching IDs that became invalid.
-
Permission timing - The permission request needs to happen at the right moment. Too early, and the system dialog gets dismissed. Too late, and the audio stream fails to initialize.
-
getUserMedia constraints - We needed to be more flexible with audio constraints, allowing the system to choose the best device rather than forcing a specific one.
The fix involved rewriting portions of our permission service to handle these edge cases properly.
The Result
Voice Mode now works reliably in Manaby. Click the microphone button in ChatGPT, and you can have a natural voice conversation—just like in the regular browser.
What makes this particularly satisfying is that we solved a problem that even OpenAI’s own desktop app struggles with. When you use ChatGPT Voice Mode in Manaby, you’re getting a better experience than the official desktop app provides.
This is one of those bugs where the fix seems simple in retrospect, but finding the actual cause required hours of debugging across multiple layers of the stack. From JavaScript to Electron, from Electron to macOS, and from macOS back to the audio hardware.
If you’ve been frustrated by Voice Mode not working in desktop apps, give Manaby a try. We’ve done the hard work so you don’t have to.
Download Manaby 0.0.32 and try Voice Mode yourself.