The Audio File Format Guide for Voice Over Clients

Audio file format guide for voice over clients: WAV vs MP3, sample rates, bit depth, and what to request from your Spanish voice over artist.

WAV is almost always what you want. I could end this article here, but that would leave you knowing what to ask for without understanding why — and that understanding matters when your editor asks for something different, or when a platform rejects your file, or when the final video sounds like it was recorded inside a tin can.

The audio file format you request from your voice over artist determines what you can do with that audio later. Ask for the wrong thing and you limit your options. Ask for the right thing and you have flexibility you didn't know you needed.

WAV: The Format That Keeps Everything

WAV files are uncompressed audio. Every detail the microphone captured stays in the file. When I record a Spanish voice over for Ford or Netflix, I deliver WAV because it gives their audio engineers complete control. They can EQ, compress, normalize, mix with music, and export to any final format without quality loss.

The tradeoff is file size. A 60-second voice over at broadcast quality runs about 10MB as a WAV file. For a single spot, that's nothing. For a 200-module e-learning project, it adds up — but even then, storage is cheap and quality degradation is expensive.

According to the Audio Engineering Society's technical guidelines, uncompressed formats like WAV preserve the full dynamic range and frequency response of the original recording. Once you compress audio, you cannot recover what was lost. This matters more than most clients realize.

MP3: When Size Matters More Than Quality

MP3 files use lossy compression. The algorithm removes audio information it considers "less important" to human hearing — high frequencies, quiet details, subtle harmonics. A 10MB WAV becomes a 1MB MP3. But that 9MB didn't disappear into nothing. It was your audio quality.

For final delivery to platforms that will compress again anyway, MP3 can work. Spotify, YouTube, and most social platforms apply their own compression regardless of what you upload. If your Spanish voice over is going straight to Instagram Stories with no post-production, MP3 at 320kbps is probably fine.

But here's what happens when you edit an MP3: every time you save, it compresses again. Re-encode an MP3 three times and artifacts become audible — that underwater, slightly metallic quality that screams "low budget." I've heard this in corporate videos from companies that should know better.

Sample Rate: The Numbers That Actually Matter

Sample rate measures how many audio snapshots per second the file contains. The standard for voice over is 48kHz (48,000 samples per second). Some clients request 44.1kHz, which was the CD standard and remains common in music production.

For voice work, 48kHz is preferable because it's the video standard. If your Spanish corporate video will be edited in Premiere or Final Cut, your editor expects 48kHz audio. Mismatched sample rates don't break anything, but they force a conversion that introduces tiny artifacts.

Have you ever watched a corporate video where the voice seemed slightly disconnected from the visuals, almost imperceptibly out of sync? Sometimes that's a sample rate mismatch nobody caught in post.

Recording at higher sample rates like 96kHz is technically possible but pointless for voice over. The human voice doesn't contain frequencies that benefit from higher sampling. You're just creating larger files for no quality improvement.

Bit Depth: 16 vs 24

Bit depth determines the dynamic range a file can capture — the distance between the quietest and loudest sounds. Voice over doesn't need extreme dynamic range (we're not recording orchestras), but 24-bit files give editors more headroom to work with.

A 16-bit file has a theoretical dynamic range of 96dB. A 24-bit file has 144dB. In practical terms, 24-bit means your editor can boost quiet passages or reduce loud ones without introducing digital noise. The difference between a voice over that sounds professional and one that sounds "edited" often lives in these details.

I deliver 24-bit as standard. The file size increase is minimal — roughly 50% larger than 16-bit at the same sample rate — and the flexibility it provides is worth far more than the storage cost.

What to Request: The Practical Specs

For most video production: WAV, 48kHz, 24-bit, mono. This gives your editor maximum flexibility and matches industry standards. If your post house or agency has different specs, they'll tell you.

For e-learning platforms: Check your LMS requirements first. Some older systems prefer MP3 for faster loading. Articulate Storyline and most modern platforms handle WAV without issues. When in doubt, request WAV masters and generate MP3s yourself — you can always compress down, never up.

For podcasts: WAV masters, then export to whatever your hosting platform recommends. Most podcast hosts accept WAV uploads and handle compression on their end, which actually produces better results than pre-compressed files.

For social media: If the content goes through any post-production, start with WAV. If you're uploading raw voice directly to a platform (which I'd question for brand content), MP3 at 320kbps works.

Mono vs Stereo for Voice

Voice over is recorded and delivered in mono. Always. A single voice coming from a single microphone produces a mono signal. Delivering it in stereo just doubles the file size without adding information.

Some clients request stereo because their video editor told them to. What the editor probably meant was "stereo project settings," not "stereo voice file." (I've had this conversation approximately 400 times.) Your mono voice file will sit perfectly in a stereo mix — the editor places it center and moves on.

The only exception is character dialogue recorded with spatial positioning for gaming or VR, and that's a specialized workflow most clients never encounter.

The Format Your Editor Actually Needs

Before your session, ask your editor or post house what specs they want. Different workflows have different requirements. A broadcast commercial for Telemundo follows different delivery specs than a YouTube pre-roll or an internal training video.

If nobody on your team knows what to request, WAV 48kHz 24-bit mono is the safe default. Any professional can work with this format. And if you're working with a professional Spanish voice over artist who understands the full production chain, they'll deliver exactly what you need or tell you what to ask for.

I keep project files indefinitely. If your specs change six months later, I can re-export from the original session. That's worth knowing — and worth asking about when you hire someone new.

When Clients Request the Wrong Thing

About once a month, someone requests MP3 "for higher quality." They read somewhere that 320kbps is "CD quality" and assumed bigger numbers mean better sound. The reality is more direct: MP3 at any bitrate contains less information than the WAV it came from.

I'll deliver whatever format you request. But if you ask for MP3 and I suspect you actually need WAV, I'll say something. That's part of working with someone who does this professionally rather than just filling orders.

The format decision isn't creative. It's technical. Get it right once and never think about it again.

Need a Spanish voice over for your next project? Get in touch and I'll get back to you within the hour.

Get in touch