iPhone recording, NotebookLM for the transcript, ChatGPT or Claude for the summary. String those three free tools together, and it takes about 15 minutes to go from a recording to structured meeting notes. This article walks through each step, plus the small tricks I’ve picked up after using this flow for half a year.
If you don’t want to pay for Otter or Fireflies, this workflow is basically enough.
By the way, Whisper can also replace NotebookLM for transcription, but you need to set up a Python environment yourself, and you still need to run a separate summary step afterward. NotebookLM handles transcription + knowledge base in one place, so the integration is a bit smoother. If privacy matters and you don’t want to upload recordings to the cloud, a local Whisper setup is worth considering. I compare the options in the tool comparison.
Complete SOP
Step 1: Record
Tool: the iPhone Voice Memos app. Before the meeting starts, open the app, press record, put the phone on the table. Done.
A few details affect recognition quality:
Phone position. The center of the table is best, and the farthest speaker should be within 2 meters. If the phone is only in front of you, recognition for people across the table may drop by 20%.
Background noise. This is the biggest variable. A quiet meeting room can reach 95% recognition accuracy; a cafe is closer to 75%. If the environment is noisy, consider using a directional external microphone.
Recording format. The iPhone default is M4A, and NotebookLM can read it. No extra conversion needed.
Online meetings work differently: use macOS QuickTime Player to record system audio. The quality is much better than putting a phone next to your computer. The flow is QuickTime → New Audio Recording → choose system audio input (requires a virtual audio device such as BlackHole or Loopback).
Step 2: Upload to NotebookLM
After the meeting, open NotebookLM, create a new notebook, and upload the audio file.
NotebookLM automatically starts transcribing. A 30-minute recording usually finishes in about 3-5 minutes, and a two-hour meeting takes about 10-15 minutes.
After transcription finishes, you can see the full transcript. Chinese recognition accuracy is around 90%; domain terms and people’s names are the most common failure points.
The practical move is to skip proofreading and go straight to summarization. When AI summarizes, it automatically ignores recognition mistakes: if “機器學習” is recognized as “機器雪習”, AI still knows what is being discussed. Unless you are publishing an official meeting transcript, proofreading is a waste of time.
NotebookLM has one extra benefit: it adds the recording to a knowledge base. Three months later, if you want to find “that meeting where we discussed pricing with the client,” you can just ask inside NotebookLM. The detailed transcript workflow is in the NotebookLM transcript guide.
Step 3: AI Summary
Copy the transcript out and send it to ChatGPT or Claude. My usual prompt:
以下是一場會議的逐字稿。請整理出:
1. 三句話摘要(這場會議在討論什麼、得到什麼結論、下一步是什麼)
2. 每個人的待辦事項(如果能從上下文判斷誰負責什麼)
3. 需要在下次會議前完成的事
4. 有爭議或尚未達成共識的議題
逐字稿:
[貼上逐字稿]
The result usually comes back in 1-2 minutes.
If the meeting is long (over 1 hour), the transcript may exceed the AI input limit. In that case, first use NotebookLM’s AI feature to generate an initial summary, then send that summary to the model for a more detailed cleanup. Two-stage compression works well.
Step 4: Archive
After you get the summary, save two copies:
- Keep the original transcript in NotebookLM (as the knowledge base)
- Save the summary to Notion or Google Docs (easy to share with attendees)
From the end of the recording to a finished summary, the whole flow takes about 15 minutes. Meetings under 30 minutes may be done in 10 minutes.

Quality Test
I took the same 45-minute meeting recording and ran it through NotebookLM (free) and Otter.ai (Pro, $16.99/month):
Recognition accuracy (Chinese):
- NotebookLM: about 90%
- Otter.ai: about 78%
Speaker recognition:
- NotebookLM: none
- Otter.ai: available, but Chinese accuracy is about 70%
Processing time:
- NotebookLM: 4 minutes
- Otter.ai: live (runs while recording)
Summary quality:
- NotebookLM + Claude: clear structure, focused points
- Otter.ai built-in summary: shorter, occasionally misses details
Based on numbers collected by the community, Whisper (running the large-v3 model locally) and Fireflies have Chinese recognition accuracy of about 90% and 85%, respectively. Fireflies Pro is $18/month. Its Chinese recognition is better than Otter’s, but not as good as NotebookLM’s.
Conclusion: for Chinese meetings, NotebookLM’s recognition quality is actually better than paid tools. Otter wins at English live transcription and speaker recognition, but both features are weaker in Chinese scenarios.

Gaps Compared with Paid Tools
This free workflow has three clear downsides.
No live transcription. You have to wait until after the meeting to process it. Paid tools such as Otter and Fireflies can transcribe during the meeting, which helps in some discussion scenarios.
No automatic speaker recognition. NotebookLM’s transcript is continuous text. It does not mark who said what. When there are many people in the meeting, going back to find “who said that sentence” is more of a hassle.
More manual steps. Uploading the recording, copying the transcript, and pasting it into ChatGPT add about 5 minutes of extra work. Paid tools can automate the whole thing.
Are those gaps worth $17-18 every month? It depends on usage. If you have 3-4 meetings a week, the extra manual time adds up to about 20 minutes, which most people can accept. If you have more than 5 meetings a day and need live transcription plus speaker recognition, paid tools make much more sense. The detailed comparison is in the tool comparison.

A Few Small Tricks
Test once before recording. When you arrive in the meeting room, start recording and say a few sentences, then play it back to check clarity. A room with heavy echo may produce audio that is almost impossible to recognize.
Have everyone say one sentence at the start. If you need to separate speakers, ask each person to do a 10-second self-introduction at the beginning. That section helps with manual speaker tagging later.
Pause recording during long meeting breaks. Pause during breaks and resume when everyone comes back. This prevents the transcript from containing a huge block of silence or small talk.
Create fixed prompt templates. I keep three versions: one for formal meetings (stricter output format), one for brainstorming (idea collection), and one for client meetings (action items and commitments).

Who This Workflow Fits
Good fit: no more than 5 meetings a week, meetings mainly in Chinese, and no desire to spend money on tools.
Not a good fit: you need live transcription (captions during the meeting), automatic Zoom/Teams integration, or a shared meeting notes platform for a team of 5 or more.
Tools keep improving. Half a year ago, NotebookLM’s Chinese recognition wasn’t this stable. Now it’s a step above paid competitors. This workflow will keep changing as the tools update.

Penchan’s Experience
iPhone recording + NotebookLM + large-model summaries is what I actually use; all 3-4 of my weekly meetings go through it. The whole setup costs nothing, and the downstream summary quality depends on the model I choose: Claude for formal notes (clean structure), ChatGPT for quickly scanning the highlights (a few more perspectives).
The habit of having everyone say one sentence at the beginning is one I picked up the hard way. For meetings where I skipped it, finding “who said that sentence” afterward could burn 5-10 minutes. If everyone introduces themselves once during recording, voice characteristics are usually enough to match things up during cleanup.
Otter / Fireflies / Plaud / Tinrec / Vocol have not made it into my daily workflow. In Chinese scenarios, NotebookLM’s recognition quality is noticeably better than Otter’s. Paid tools’ live transcription and speaker recognition are both weaker in Chinese, and the fixed monthly cost makes their ROI worse for me than the free three-tool combo.
CapCut’s Taiwanese Hokkien recognition is the key tool that fills NotebookLM’s weak spot. For mixed-language meetings or audio from elders, I first use CapCut to convert speech to text, then send it back to NotebookLM for Q&A. For pure Taiwanese Hokkien content, recognition reaches about 70-80%, much steadier than NotebookLM’s 30-40%.
The next thing I want to watch is how Taiwanese Hokkien support evolves. Once a steadier free Taiwanese Hokkien option appears, I will update this workflow again.
Further Reading
- NotebookLM Transcript Guide
- AI Meeting Notes Tool Overview
- AI Meeting Notes Tool Comparison
- CapCut AI Subtitle Guide|Automatic Taiwanese Hokkien Recognition
FAQ
Q: Is iPhone recording quality good enough?
Yes, if the phone is on the table within 2 meters. In my tests, the recognition error rate is around 5-10%. The real quality killer is background noise, not the microphone.
Q: How long does NotebookLM take to generate a transcript?
It depends on the recording length. A 30-minute recording usually finishes in about 3-5 minutes. A two-hour meeting takes roughly 10-15 minutes. After uploading, you can go do something else.
Q: What if the transcript has typos?
That’s normal. AI transcription can’t be 100% accurate, and names plus domain terms are the easiest places to fail. In practice, I don’t proofread line by line. I send it straight to AI for summarization. During the summary stage, AI usually ignores recognition mistakes and keeps the correct meaning.
Q: Can it recognize different speakers?
NotebookLM currently doesn’t automatically identify speakers. If you need to separate who said what, there are two options: have everyone introduce themselves at the start of the recording, or manually tag speakers after the meeting.
Q: How do I record online meetings?
When using Zoom or Meet, record system audio with your computer’s screen recording feature. On macOS, use QuickTime Player. On Windows, use the built-in Xbox Game Bar. The audio quality is much better than placing a phone beside the computer.
Q: How is this workflow different from paid tools?
There are three gaps: no live transcription, so you can only process after the meeting; no automatic speaker recognition; and extra manual steps for uploading and summarizing. If you have no more than two meetings a day, those gaps are tolerable.
Q: Which is better for summaries, ChatGPT or Claude?
Claude’s summaries are more organized and cleaner in format. ChatGPT’s summaries feel more lively and sometimes add a few extra angles. I use Claude for formal notes and ChatGPT when I just want the highlights quickly.
— Penchan