The music should come first. Always. If you're planning a Spanish voice over project and wondering about production order, this is the answer that 20 years of recording for brands like Coca-Cola, Nike, and Ford have confirmed over and over again.
Now let me tell you why.
Music sets the emotional container
When I record against a music track, I'm not just reading words. I'm inhabiting a space that already has a mood, a tempo, a personality. The music tells me whether this is an uplifting brand story or a serious corporate message. It tells me where to breathe, where to land the emotional beats, where the energy needs to rise or fall.
A 2022 study from the Journal of Advertising Research found that audio-visual congruence β when voice and music align emotionally β increases brand recall by up to 35%. That alignment doesn't happen by accident. It happens because somebody made a production decision early in the process.
Without music, I'm guessing. And a voice over artist guessing is a voice over artist who will need to re-record when the music finally shows up and doesn't match the delivery.
The tempo problem nobody anticipates
Here's what happens when you record voice first and choose music later: the editor tries to fit a 32-second read into a 30-second music bed with a specific rhythm. The music has natural peaks at seconds 8, 15, and 24. Your voice over was recorded without knowing this. The result sounds like two strangers sharing an elevator β technically present in the same space, fighting for territory.
According to Nielsen's research on audio effectiveness, consumers can detect misaligned audio elements within 3 seconds of exposure, even when they can't articulate what's wrong. They just feel discomfort. And that discomfort doesn't make them think "bad production" β it makes them think "bad brand."
Have you ever watched a car commercial where the voice and music seem to be telling different stories? That's almost always a production order problem.
When voice first actually works
There are exceptions. Very few.
E-learning modules where music is purely ambient β no rhythm, no emotional arc, just a soft bed underneath information. IVR systems where music is a brief intro before the functional message begins. Internal corporate videos where the budget doesn't include custom music and you're pulling from a library after the fact.
But advertising? Promotional content? Brand documentaries? Music first. Every time. The emotional stakes are too high to leave the relationship between music and voice to chance.
The practical workflow that works
Here's the production order I recommend after two decades in this industry:
Select your music track (or have your composer deliver at least a rough version). Send it to your voice over artist with the script. Record against the actual track, not against silence. Make adjustments in real time during the session because you can hear exactly how the voice sits in the mix.
This isn't complicated. But it requires planning the music selection before the voice over session, which means the creative team needs to make decisions earlier in the process than many are comfortable with. They want to "see options" for everything before committing to anything. That approach works for casting. It doesn't work for production order.
The Spanish script length factor
Spanish runs about 30% longer than English β this is a fact I've written about extensively. When you're recording a Spanish voice over against a music track that was timed for an English version, you're already fighting the clock. Having the music during recording lets me and the client hear exactly where the script is running long, where we need to trim, where the delivery needs to compress slightly to hit a musical beat.
Recording without music means discovering these problems in post-production, when fixing them is expensive and the deadline is tomorrow.
(I once had a client send music after approving the voice over, then call in a panic because the read was 4 seconds longer than the track. We re-recorded. Could have been avoided with a 30-second email the day before the session.)
What about custom music scored to voice?
Sometimes the music is being composed specifically for the project, scored to picture and voice. In this case, the voice does come first β but this is a $50,000+ production with a composer who will write around the voice over artist's delivery. This isn't the same as choosing between recording voice first or music first. This is a completely different production model where the music is designed as a response to the voice.
If you have that budget, you probably also have a producer managing all of this. And that producer will tell you the same thing I'm telling you: the default is music first, and custom scoring is the exception that proves the rule.
The remote session advantage
With Source Connect, I can record from my studio while the client, editor, and music supervisor all listen in real time from wherever they are. Someone in Los Angeles can play the music track. Someone in New York can direct. I can record in Buenos Aires. Everyone hears exactly how the voice sits against the music, and we can adjust delivery, pacing, and emphasis on the fly.
This technology exists. Using it well requires having the music ready before the session. The technology doesn't help if the production order is wrong.
The mood mismatch disaster
I've seen this too many times: voice recorded with a warm, conversational, relaxed delivery. Music selected later turns out to be driving, urgent, energetic. The client hears them together and says "something feels off." They're right. Something is very off. The voice and music are telling different emotional stories, and the audience picks up on that conflict even when they can't name it.
Studies on multimodal perception from MIT's Media Lab show that humans process audio incongruence as a trust signal β when sound elements don't match, credibility drops. For a brand message, this is the opposite of what you want.
The first take and music
The first take is usually the best take. This is something I've learned over thousands of sessions, and it's true because the first take captures the most natural interpretation before overthinking sets in. But the first take is only the best take if the voice over artist understood the emotional context from the start. Music provides that context instantly, without explanation, without a creative brief full of adjectives that mean different things to different people.
"Make it warm but energetic" is a direction. Music playing underneath is an environment. One requires interpretation. The other requires inhabitation. Big difference.
For Spanish voice music production planning
If you're managing a Spanish voice over project and want to get the production order right, here's the checklist: music selected or at least narrowed to 2-3 options before the voice session is scheduled, music tracks sent to the voice over artist at least 24 hours before recording, session time allocated for recording against each music option if the final selection isn't made, script edited for Spanish length with the music timing already factored in.
And if none of that is possible because the timeline is impossible? At least tell your voice over artist the genre, tempo, and emotional tone of the music you're planning to use. A reference track of something similar helps more than you might think.
The relationship between music and voice over in Spanish production isn't a mystery. It's a planning question, and the answer is clear: music comes first because the voice needs to live inside a world that already exists. Build the world, then populate it with voice. The reverse creates dissonance that your audience will feel, even if they never know why.
Need a Spanish voice over for your next project? Get in touch and I'll get back to you within the hour.



