Booth Basics

Why and how we do what we do.

Why Booth Basics

Serafim: When we were starting out, there was a lot of things we didn’t know then that we know now…

Sierra: As we sought feedback on our early efforts, it launched us on a series of quests with goals like “Passing Audiobook Submission Standards” and “Finding a Natural Sound” and “Getting Comfortable in the Booth/Studio.” Gradually, our way of working began to come together.

Serafim: In this series of postcards, we aim to cut through some of the noise (pun intended!) and demonstrate how one narrator-engineer team—day to day, project to project—goes about making the tools of the voice acting/audiobook trade work best for us. We can’t promise that everything we do will work for you (especially right out of the gate), but we hope it will help you ask better questions (as others’ dispatches helped us!) that will ease your entry into the industry.

Sierra: As in this post, everything bolded in black is a link (except this one, ha). In addition, we put links that jump you within the postcards in blue and we put non-linked terms in purple to direct you to menus within an independent (software or other) program. If you have questions about anything you see here or on our social media, please reach out. We’re all in this together after all.

Sierra: As in this post, everything bolded in black is a link (except this one, ha). In addition, we put links that jump you within the postcards in blue and we put non-linked terms in purple to direct you to menus within an independent (software or other) program. If you have questions about anything you see here or on our social media, please reach out. We’re all in this together after all.

Why IZotope RX?

Serafim: As you get more comfortable producing audiobooks, your ear will become more attuned to imperfections exacerbated by ACX’s and other platforms’ Loudness requirements. You’ll become more aware of producing mouth clicks or rogue breaths—which is good news. You’re a human being. Bodies make noise. You don’t need to freak out, but you do need a workflow that will give you a no-stress way to address these issues.

Sierra: So, along those lines, what is IZotope RX and why is it essential?

Serafim: IZotype RX is one of the most prominent players in the game of audio restoration. It’s a complex suite of tools that address all sorts of audio issues, including noise, interference, wind, crackles—people use it to digitize vinyl records.

Sierra: So it’s a remastering tool?

Serafim: Yes. And for voice actors, in particular, there are two convenient IZotope modules that can act as plug-ins within your DAW: Mouth Declick and Breath Control.
     Mouth Declick (as opposed to the more generic Declick) applies a sophisticated algorithm to determine what is and is not a mouth click.

Sierra: To hear mouth clicks in the wild, just turn on C-Span and listen to a congressperson speaking into a microphone. Or pay close attention to your favorite television show. In our daily lives, we usually just tune these noises out. But in the audiobook world, we don’t like them, so what can we do?

Serafim: Ideally, this IZotope module scans your file, locates especially egregious pops and clicks, and wipes them away without taking a bite out of normal articulation.
     Whether or not you choose to seek out more in-depth instruction, it’s worth taking some time to just play with the settings. For example, there are three key adjustable settings in Mouth DeClick: sensitivity, frequency skew, and click widening. To see what they do, feed in an audio sample, crank the values all the way up, then listen. (You have the option of clicking a checkbox and hearing output clicks only, so you know exactly what’s being eliminated.)

Sierra: At the most extreme settings, the module’s probably stripping away sounds you want to keep right?

Serafim: Exactly. Like consonants.
     What’s most important is for you to develop an intuitive feel of what this or any program can do, first at its most extreme. Then you start to dial it down until you find your FX processing sweet spot.

Sierra: That’s when processing does what you want it to do but you still sound like yourself. So it’s a sweet spot that reconciles platform standards with your own.

Serafim: ACX or another platform isn’t necessarily going to flag you for having too many mouth clicks.

Sierra: Though listeners might do so in their reviews. People have become quite sensitive about mouth clicks in the audiobook world, less so when it comes to breaths, which we’ll get to in a bit.

Serafim: That’s why you want to seek feedback, ideally from listeners who care at least a little bit about audiobook sound. You’re going in with a specific question: Does this FX processing sound too heavy, not heavy enough, or just right? You might consider rendering several different versions and then setting up a blind test so as to get the most unbiased observations.

Sierra: We also like to audition our tracks with different headphones. Apple AirPods Pro have turned out to be a key part of our workflow, because they tend to favor higher frequencies, like many popular daily-use earbud headphones.

Serafim: You may hear more mouth noises in AirPods than you would hear in really nice professional headphones.

Sierra: That’s kind of a good thing, right? That enables you to be confident, once you’ve fine-tuned, that you’re giving the listener the experience that you want them to have and that you want to have when you’re listening to an audiobook. That also means you want to be confident in the tools you use, so you don’t have to crank up the volume to an extreme. Your ears are, after all, your most important technology.
     What about Breath Control?

Serafim: Your breathing is another key element in your own suite of expressive tools. You don’t necessarily want to eliminate every breath or even any breath. In some projects, on some days, you may find some breaths distracting. IZotope’s Breath Control module enables you to attenuate breaths across a project as desired. If you do crank up the Breath Control settings to their extreme, it’s probably going to sound unnatural.

Sierra: Even if it doesn’t, you may find that you’ve introduced a glitch into your audio. We discovered, when producing one particular audiobook, in listening to the entire file—which we are want to do, often more than once—that the Breath Control was out of control: it had begun to interpret certain letters as breaths, creating little dips in volume throughout the file.

Serafim: Again, it’s finding the sweet spot, so you don’t have to micromanage.

Sierra: It’s definitely a tradeoff, becoming sensitive to otherwise minute noises: once you’ve heard them, you can’t go back, then you need to do something about them.

Serafim: We started with the technology for a reason. There are things a voice actor can do to prepare.

Sierra: The first thing everyone’s going to tell you, great advice, is hydration. Just as we’ve been advising you to experiment with your plug-in settings, you’ll need to discover, with time and trial, just how much water you need to drink and when, based on what feels good. Some voice actors swear off dairy products, others don’t like the after-effects of toothpaste. Something that bothers one person may have no effect on you at all. There is no average voice actor and no one-for-all solution.

Serafim: What about breathing?

Sierra: I like that there’s competing schools of thought on breath. Some voice actors have developed a regular pattern of breath, like the tide going in and out. I happen to be someone who breaths very quietly. We did find ourselves using a bit of Breath Control during an especially hot summer when we had no AC. Good lesson: The environment of the booth will influence the environment of your mouth and the sounds you make. To the extent that you can make yourself comfortable in your booth, that’s going to have an impact. There’s also the iron law of body chemistry; you can’t control day-to-day fluctuations, but you can become more aware, say, when you need to take a drink.

Serafim: There’s only so much you can do to optimize your voice. You do your job and then let the engineer and the technology do theirs.

Why Noise Floor, Peak, RMS?

Why Noise Floor, Peak, RMS?

Sierra: So you want to put out an audiobook? Whatever platform you’re using for distribution (ACX, Findaway Voices, Author’s Republic, etc.), you’re going to need to pass certain submission requirements. On the way to doing so, you may have some questions. We have answers.

Serafim: There’s no better place to begin than these three key measurements: Noise floor, peak, and RMS. At this point, you may feel a little overwhelmed. That’s where Booth Basics come in—we want to help you customize your workflow, so you won’t have to abruptly change course in order to pass standards you neither understand nor appreciate.

Sierra: When you’re recording a whole book, it may be tempting to think of it as one unit, but you’ll need to apply standards at the chapter or file level (opening/closing credits, titles). Each of these smaller unit will need a noise floor no greater than -60db, a peak that doesn’t exceed -3.5db, and RMS that averages between -18 and -23db.

Serafim: You’ll also need a few seconds of room tone [standards vary] at the beginning and end of each file.

Sierra: Make it easy for yourself: Record a few minutes of silence during a quiet moment, then set up a track with intro and outro silence so you can easily cut and paste into your projects—then it becomes part of your workflow, a box you check, nothing you need to think more about.
     As you either already know or will soon figure out, it’s incredibly time consuming to make an audiobook—fellow perfectionists, every chance you get to make something simpler, quicker, easier, just do it. You want to save your attention and energy for making the creative choices. On the way there—noise floor, what do we need to know?

Serafim: Your noise floor should be no higher than -60db. That means you want to arrive at a value between -60 and -90. It also means that your recording won’t have a background hum and you won’t hear dogs barking or children laughing in the background, anything that could make it harder for listeners to hear and enjoy your work. As we’ve discussed elsewhere, if you are recording in a closet or a sound-insulated but not sound-proof booth, you will need help from software to meet this requirement.

Sierra: Why is there a lower limit? What if your noise floor falls below -90?

Serafim: You don’t want that because you’re approaching digital silence. When noise floor values plunge, it suggests either that there is an editing error or that very heavy processing has been applied.

Sierra: If you listen to some older audiobooks, you’ll hear that the sound appears to fall off a cliff at each break.

Serafim: Here’s where EQ comes in. Equalization enables you to select a certain frequency range and attenuate it. Think of a stereo with a nob that boosts or reduces bass; EQ is a digital nob that performs the same function. Most of the noise that you want to eliminate lives in the lower frequency range, so you can use a high-pass filter (a shelf blocking everything below 65Hz) to carve away that segment without degrading your voice and with the result that your noise floor drops to an acceptable degree.

Sierra: What about peak?

Serafim: Peak refers to the max loudness reached by a signal in a recording. When recording, if you set your microphone gain too high [more on this when we talk about our interface: RME Fireface UFX II], it will send you across the 0db threshold and the sound will distort. For safe file exporting, maximum peak setting enables you to set another shelf through which sound will not go. In general, platforms will want some headroom, so you’ll set your peak to -3 or slightly lower, allowing the supervising engineers room to apply additional processing.

Sierra: So in a nutshell, noise floor helps you avoid a noisy recording and peak helps you avoid a distorted recording?

Serafim: Exactly.
     Peak and RMS are closely related (as distinct from noise floor); in order to master both, you’ll need to use compression. (LUFS is the modern equivalent of RMS, but the audiobook industry appears to be sticking with RMS.) Root mean square measures average loudness across a file; it matters because you want your listeners to be able to hear the full spectrum of emotions—from dead calm to wildly excited—without their needing to constantly fiddle with their volume, like a classical music fan does when taking in a symphony.

Sierra: You’ve got to have that remote in your hand! Not so audiobook listeners, thanks to RMS.

Serafim: ACX and other platforms are asking that your recording falls on average between -18 and -23db. If you’re anything like us, your initial average will probably be a little high. That’s natural.
     First, compression shaves off the highest peaks then your software of choice pushes this compressed span up into a loudness sweet spot. We experimented with different software and arrived at Adobe Audition, because it allows the user to set peak and RMS values in a way that renders submission-ready files, no further manipulation necessary.

Sierra: It might be worth mentioning that, when confronted with the challenge of RMS, we did what many newbie producers probably do and raised our gain—oooh, it sounds so much louder, more present, better! But as you discover, the more you raise the gain, the more everything gets louder, so you ping pong back and forth between passing RMS but failing noise floor and vice versa. If that’s where you’ve landed, why you’re here, seeking harmony. . . 

Serafim: Then you need compression. To recap, unless you’re recording on a spaceship, your noise floor will be higher than -60db, so you will use EQ to create a high-pass filter, think of it like releasing those booster rockets. For peak and RMS, you can rely on your DAW’s compression tools; with Audition, it takes mere seconds. At some point though, you may want more surgical precision. For example you can further experiment with EQ settings to help your recording sound more like you, an idea we’ve touched on before in our mic postcard and will inevitably return to.

Sierra: It’s worth noting that passing audiobook standards alone will not make you sound the most like you. With a DAW like Audition, however, you can do more. [Audition requires a subscription, but it isn’t the only option, just the one we use.]

Serafim: We don’t anticipate that everyone reading these postcards will want to make the same exact choices, but we hope you’ll go away with a better understanding of how these different tools and values fit together. We also hope that you can now appreciate how seemingly nitpicky standards meaningfully shape the listening experience. It can get frustrating but meeting these standards is an important milestone on the road to great sound. Once you reach this milestone, well, it’s just one peak and you might already be thinking about the next mountain to climb. And that’s why there will be more postcards from the booth!

Why Reaper 

Sierra: Reaper is one of our most valued tools, but to unlock its potential, you first need to understand this concept of a DAW. Passing the mic to our resident sound engineer: What is a DAW and why do you need one if you are an audiobook narrator and/or all-around voice artist?

Serafim: Okay, so let me put my white coat and glasses on—DAW stands for a digital audio workstation. It’s a piece of software that marries all of your audio tools and enables you to edit your recorded material very conveniently on a timeline. It’s like Microsoft Word: there are simpler programs for typing, but Word is going to give you all these tools to organize text very neatly into sections and paragraphs. That’s the equivalent of a DAW but for audio.

Sierra: Nice analogy. Within our workflow, Reaper interacts with other tools, some of which are, in their own right, DAWs, specifically: Adobe Audition [subscription required] and Audacity [free].

Sierra: There’s an update practically every day, which could get a little much until we put it in the calendar for a once-per-week download. But over time those are improvements that we want.
     Let’s get a little more specific. The features we’ll be discussing are mostly not exclusive to Reaper, but we like how they work in Reaper. That’s why it’s our DAW of choice. If you’re considering Reaper or already working with it, we want to share some tips. For starters, there are two modes of recording, which we use all the time: Tape mode and Create New Take mode. [Note: terms in purple, though not links, will connect you via Reaper’s Help menu to the tool in question.]

Serafim: Tape mode is well-named: imagine you have a tape running, you make a mistake or you want to do it differently, so you rewind, press record, and it records over the previous take. And Create New Take mode (also known across DAWs as ‘comping’) enables you to record several versions of the same take, neatly stacked and represented with different color waveforms; you can go back later and choose your favorite take and then ‘glue‘ it into the track, with a handy short-cut like ‘g’, which you can also set up in Reaper.

Sierra: We generally live in Tape mode and visit its counterpart, because, when I do make an error, we want to stop and take a second to reset [a luxury of the home studio]. This is going to be more time efficient than just leaving all these errors in, which would make it difficult for us to tap into the flow of what we’ve recorded. When editing doesn’t keep pace with recording, say, each new chapter, thanks to this technique, we end up with more continuity of sound and storytelling as we move through a longer project.

Serafim: A few more favorite Reaper features: Via Layouts [in the Options Menu], you can customize (video) the visual interface. For instance, you can make the meters on the channel as big as you want. If you’re recording alone, let’s say, and you’re keeping your laptop outside your booth [as recommended], you can make your meters as big as the entire screen, so you won’t need binoculars to see your levels.

Sierra: And you won’t compromise your larynx by jutting your head forward continuously as you strain to see better.
   Within Reaper, via Preferences, you can also set up Audition as an External Editor, which means that with the click of another shortcut [we chose ‘a’], you can shift a time selection over to Audition, clean it up as desired, then save and click back to Reaper. If it can’t be fixed via Audition, then we know to re-record it in the moment.

Serafim: A lot of programs allow you to make markers but Reaper takes it to the next level with its differentiation of markers by type [markers vs. regions], color, number, and name. It also features a tool called Region/Marker Manager, a little window that floats on your screen and enables you to jump easily from one marker to another, by color, say, across instances of a character’s voice in different scenes or repeat appearance of a word or name, say you want to check the pronunciation later. It’s the best implementation of this particular feature that I’ve ever seen in a DAW.

Sierra: We sometimes like to edit away from the laptop, both because the listener probably won’t have the engineer’s headphones and because it’s nice to get away from the screen for a while. So we’ll process the time selection in question, a chapter, say, and upload it to a file sharing service (Overcast, in our case). I’ll pop in my AirPods Pro and then listen and jot notes on a post-it: Noisy breath, 3 minutes. Missing word, 5 minutes 17 seconds. When I go back to Reaper, I can put my cursor at the start of the track (as opposed to the overall project file), go to Project Settings and reset that start time so 0:00 appears where I need it. So no matter where I am in the overall project file, I can follow the ruler above and zip through my edits.

Serafim: Reaper doesn’t have an incredibly sophisticated metering system, but it does have this plug-in called JS: Audio Statistics [located via FX Browser], which enables you to track your RMS levels and your noise level. We find Audition to be a better tool when it comes to meeting Loudness (ACX) standards, but the Audio Statistics plug-in will help you find and correct for dead silence. A recording is more often too noisy, but it can also be too quiet, sounding unnatural. ACX Check will ding your recording if you have even a second or two of dead silence, but it won’t tell you where those seconds are. That’s where Audio Statistics comes in—let’s say you left a little gap between two splices then glued the section, you’ll be able to see that by playing the file and watching the Audio Statistics window [RMS Window Min L showing a figure less than -90, e.g. -105]. At that point, you’ll also be able to see the flatline in the waveform.

Sierra: To wrap up, Reaper is attractive first because it’s inexpensive and easy to install. You can try it free for 60 days before paying $60 [as of May 2022] for a discounted use license, appropriate for most small businesses.

Serafim: And if you decide not to go with Reaper, you can still look for the tools that we’ve described, and their presence or absence in a different DAW might help you in your choice. If you are going forward with Reaper, I’ve found Kenny Goia’s YouTube channel to be one of the most comprehensive tutorial resources. There’s also the Reaper Blog and Booth Junkie.

Sierra: If you have any questions or comments, please write us!

Why Neumann TLM 103

Serafim: We started our microphone search, informed both by your studying at Edge Studio and Gravy for the Brain and your experience in radio [as a freelance foreign correspondent in Beirut], as well as my years of experience with audio engineering, recording music primarily. If you happen to be in New York City, B&H has a small room with multiple microphones all connected, so you can speak (or sing) into them. [You can also order from them and try gear out in your own actual space :-)]

Sierra: Because of you, I walked into this room already knowing quite a bit about the kind of mic we needed—to start, a condenser mic with a cardioid polar pattern, right?

Serafim: Yes, cardioid means it has a heart shape—the bottom of the heart is pointing forward toward the speaker and the top of the heart blocks the sound that comes from behind the microphone—versus an omni-directional microphone that’s very good for recording when, let’s say, you want to capture the sound of a music hall. We ended up testing two microphones from AKG and two from Neumann, the 102 and 103.

Sierra: We went in knowing that the TLM 103 is the industry standard [in our price range, as opposed to the U87]. I had heard that from people in the voice over industry, and you were familiar with it because it is a multi-industry standard mic.
     But I do remember you telling me that the mic operates in concert with you the speaker. You, in essence, dance with it, so it’s really a question of who is going to be your best dance partner, not the universal industry-standard dance partner. That said, in our case, we did end up with the 103, because we liked the sound of it with my voice when we tested it against other comparable mics.

Serafim: Exactly. We liked it so much because it sounded the most like you. That said, you can spend an enormous amount of time chasing the microphone that sounds exactly like you. And once you have your mic and your booth, you’ll still need an FX chain: EQ, compression, saturation, and so on. The process of working all of that out took us a couple of years, during which we experimented with how to position both the speaker and the mic, right?

Sierra: Like many people, I think, I initially assumed that the mic should go right in front of your mouth and had to learn that you’re more likely to want about six inches distance in between. And it will probably serve you best if it’s off center, even slightly above you [if hung upside down like ours].

Sierra: That particular pop filter really curves around the mic so it takes up very little additional space. You can have a larger booth than ours and you’re still going to want to think about how you’re orienting yourself in that space right in front of the mic. Our set-up enables me to feel relaxed and confident, which is a huge part of anyone’s sound and perhaps an unsung part of the microphone experience. Also the FX chain, which is your domain.

Serafim: We use EQ to boost certain frequencies and reduce others, then we use a saturation plug-in, which adds a little analog filter to make the sound less brittle and digital sounding. And then we have a compressor that reduces the loudest sounds and then boosts everything; rather than getting a lot of dynamic range you’re arriving at something more consistent, easier on the ear.

Sierra: Now you’ve got me wondering: How does a microphone fit into a digital vs. analog system?

Serafim: So the mic is an analog component but in order for it to communicate with the computer, it needs to go through a digital converter i.e. audio interface [RME Fireface UFX II, in our case]. When you’re recording in a vocal booth, where you don’t have the natural reverberation of an ordinary room—what saturation does, and it’s gonna sound crazy, but it adds imperfections.

Sierra: So you have to add back in some of what you’ve taken out?

Serafim: Yes! That’s why vinyl records can sound more vibrant than digital, because vinyl has these imperfections. And we as humans tend to gravitate toward liking those imperfections. So we need saturation—and we call it saturation because it colors the sound. If you use saturation in Photoshop you’re adding extra depth to the color. We also talk about color in recording because otherwise it can be hard to verbally distinguish between different kinds of sound. With the right kind and amount of saturation, the recording stops being very pure and starts to sound more alive.

Why WhisperRoom

Serafim: The sound was good, it’s an excellent mic, but we realized fairly quickly that the recording wouldn’t meet ACX standards for background noise [-60db].

Sierra: We started out with this idea of, ‘Okay, we’re going to go to the experts.’ And the experts tended to say: ‘Eliminate as much noise as possible at the source. Really think about your studio build: You might need to open your studio walls, put insulation in.’ But we couldn’t just rip open the walls of our fifth-floor walkup so…

Serafim: Johnny Heller [Sierra’s coach via Edge Studio] has a WhisperRoom. We looked into it, then we bought our own right at the beginning of the pandemic. WhisperRoom is very transparent in calling their booths “sound isolation enclosures,” in other words, they reduce but do not eliminate noise [as typical of living room booths].

Serafim: Of course, just knowing we weren’t passing didn’t tell us how to pass. We needed to figure out how to use the software in a way that lowered our noise floor without degrading voice quality. We spent about a year experimenting before landing on our current set-up.
     You spend a lot more time in the booth, since I only do intros/outros. How comfortable do you find it?

Sierra: In the winter, the WhisperRoom is wonderfully cozy; in the summer, it’s hot. Since air conditioners are just too noisy for audiobook production, I know some voice actors swear by wet towels. Via an Audio Publisher’s Association webinar, we did hear about Polar Products, which sells vests fortified with special ice packs. We also added a thermometer to the booth and decided to stop working once it exceeded a certain temperature and/or reached a certain level of discomfort. Recording in the early morning helped.

Serafim: So to wrap up, some pros and cons: Overall, we’re happy with our booth! It’s very well made and helps us get a great sound. We feel confident saying that after nearly two years of use. It also looks good.

Sierra: Everyone asks about it! It’s a conversation starter.

Serafim: Do you think it’s affordable in comparison to other options?

Sierra: That’s tough to say. At more than $6,000 it’s not inexpensive in relative terms. But there aren’t that many options and it’s definitely cheaper than building your own custom booth. It’s more expensive than the perfect closet. 

Serafim: But that perfect closet wouldn’t have windows, whereas our booth has two, so you can look out and see me smiling at you.

Sierra: Most important pro!

Serafim: We sometimes wish we had bought the optional rollers, which would have meant we could move the WhisperRoom (without taking it apart first). Once we hired a mounting expert, we had more success keeping our microphone and iPad on the wall [via Triad Orbit]. I was a little concerned that screws and holes would compromise the integrity of the booth, but that doesn’t seem to have happened.

Sierra: As far as set-up goes, it’s worth mentioning that you can call WhisperRoom and a live, friendly human being will help you on the phone, which is amazing. 
    I do want to say something about the ventilation silencing system—like the booth itself, it reduces noise but doesn’t eliminate it, so it hasn’t worked for our purposes. The booth is modular so you can decide what pieces are essential to you. 

Serafim: For the reasons we’ve described above, we recommend WhisperRoom to fellow voice actors and authors planning frequent recordings. Feel free to write us with any specific questions.

Connect with Us