Fixing ChatGPT Voice Mode in Manaby: The macOS Microphone Challenge

The Problem
Why Desktop Apps Struggle
The macOS Audio Device Maze
How ChatGPT’s Desktop App Has the Same Issue
Our Solution
Technical Deep Dive
The Result

The Problem

ChatGPT’s Voice Mode is one of its most impressive features. You can have a real-time conversation with the AI, and it responds naturally with voice. But when I tried using it in Manaby, nothing happened. The microphone button would appear, but clicking it did nothing. No error, no feedback—just silence.

This is the kind of bug that drives developers crazy. The feature worked perfectly in Safari and Chrome. Why not in an Electron-based browser?

Why Desktop Apps Struggle

Web browsers have been dealing with microphone permissions for years. They have mature, battle-tested permission systems. But Electron apps are different. We’re running Chromium inside a native shell, and that creates complications:

System-level permissions - macOS requires apps to explicitly request microphone access in the entitlements
User consent dialogs - The system needs to prompt the user before granting access
Device enumeration - The app needs to correctly identify which audio devices are available
Default device selection - When a user has multiple microphones, which one do we use?

Each of these can fail silently, leaving you with a microphone that just doesn’t work.

The macOS Audio Device Maze

macOS has one of the most complex audio systems of any operating system. Users can have:

Built-in MacBook microphones
AirPods with microphones
External USB microphones
Bluetooth headsets
Virtual audio devices (for screen recording, streaming, etc.)

The system maintains a “default” device, but this can change dynamically. Plug in headphones? The default changes. Connect AirPods? It changes again. Remove them? Back to the built-in mic.

For web apps in a normal browser, this is handled automatically. But for Electron apps, we need to manage this ourselves.

How ChatGPT’s Desktop App Has the Same Issue

Here’s the interesting part: ChatGPT’s official desktop app for macOS has the exact same problem.

When you try to use Voice Mode in the ChatGPT desktop app, you can’t select which microphone to use. It tries to use the system default, but the selection doesn’t work correctly. Many users have reported this issue—Voice Mode simply doesn’t work in the desktop app.

This validated what we were seeing. The problem wasn’t unique to Manaby. It was a fundamental challenge with how Electron apps (and similar desktop frameworks) handle audio device permissions on macOS.

Our Solution

We implemented a multi-layer fix:

1. Cross-Origin Isolation

ChatGPT’s Voice Mode uses advanced web features that require cross-origin isolation. We added proper headers and permission handling:

// Handle cross-origin isolation for advanced features
session.webRequest.onHeadersReceived((details, callback) => {
  const headers = details.responseHeaders || {};

  if (details.url.includes('chat.openai.com') ||
      details.url.includes('chatgpt.com')) {
    headers['Cross-Origin-Opener-Policy'] = ['same-origin'];
    headers['Cross-Origin-Embedder-Policy'] = ['require-corp'];
  }

  callback({ responseHeaders: headers });
});

2. Enhanced Permission Handling

We improved how Manaby requests and manages microphone permissions:

// Request permission for media devices
session.setPermissionRequestHandler((webContents, permission, callback) => {
  const allowedPermissions = ['media', 'microphone', 'camera'];

  if (allowedPermissions.includes(permission)) {
    // Grant permission after user consent
    callback(true);
  } else {
    callback(false);
  }
});

3. Device Enumeration Fix

The key fix was in how we enumerate and select audio devices:

// Properly enumerate media devices
const devices = await navigator.mediaDevices.enumerateDevices();
const audioInputs = devices.filter(d => d.kind === 'audioinput');

// Find the default device or fall back to the first available
const defaultDevice = audioInputs.find(d => d.deviceId === 'default')
  || audioInputs[0];

4. macOS Entitlements

We ensured the app’s entitlements properly declare microphone access:

<key>com.apple.security.device.audio-input</key>
<true/>

Technical Deep Dive

The root cause was a combination of issues:

WebRTC initialization - ChatGPT’s Voice Mode uses WebRTC for real-time audio. Our WebRTC initialization wasn’t properly handling the permission flow.
Device ID caching - macOS device IDs can change between sessions. We were caching IDs that became invalid.
Permission timing - The permission request needs to happen at the right moment. Too early, and the system dialog gets dismissed. Too late, and the audio stream fails to initialize.
getUserMedia constraints - We needed to be more flexible with audio constraints, allowing the system to choose the best device rather than forcing a specific one.

The fix involved rewriting portions of our permission service to handle these edge cases properly.

The Result

Voice Mode now works reliably in Manaby. Click the microphone button in ChatGPT, and you can have a natural voice conversation—just like in the regular browser.

What makes this particularly satisfying is that we solved a problem that even OpenAI’s own desktop app struggles with. When you use ChatGPT Voice Mode in Manaby, you’re getting a better experience than the official desktop app provides.

This is one of those bugs where the fix seems simple in retrospect, but finding the actual cause required hours of debugging across multiple layers of the stack. From JavaScript to Electron, from Electron to macOS, and from macOS back to the audio hardware.

If you’ve been frustrated by Voice Mode not working in desktop apps, give Manaby a try. We’ve done the hard work so you don’t have to.

Download Manaby 0.0.32 and try Voice Mode yourself.

Table of Contents