

Booth Basics

Why and how we do what we do.

Guide

Why Booth Basics

Serafim: Hello fellow audio producer! We are HiSierrafim Audio and this is Booth Basics. When Sierra and I were starting out in 2019, there was so much we didn’t know.

Sierra: As we sought feedback on our early efforts, it launched us on a series of quests with goals like “Passing Audiobook Submission Standards” and “Finding a Natural Sound” and “Getting Comfortable in the Booth/Studio.” Gradually, our way of working began to come together.

Serafim: In the spirit of camaraderie, we’d like to offer you a peek inside (and around) our vocal booth and share tips from our workflow. We want it to be easier for you because—well, why not?

Sierra: In this series of postcards from the booth, you can find more of what you need to know about some of the most popular and valuable tools (hardware and software) in wide use by audio professionals, tools we’re coming to know well. We link between postcards so you can easily jump from one to another to get an idea of how it all fits together and what purpose it serves: Audiobook Submission Standards, WhisperRoom (our recording booth), Neumann TLM 103 (our mic), Reaper , Adobe Audition, and IZotope RX (our software), with more to come! (FYI: We do not use affiliate links.) You only have a few minutes? Select from key topics via the golden Guide to your left.

Sierra: In this series of postcards from the booth, you can find more of what you need to know about some of the most popular and valuable tools (hardware and software) in wide use by audio professionals, tools we’re coming to know well. We link between postcards so you can easily jump from one to another and get an idea of how it all fits together and what purpose it serves: Audiobook Submission Standards, WhisperRoom (our recording booth), Neumann TLM 103 (our mic), Reaper , Adobe Audition, and IZotope RX (our software), with more to come! (FYI: We do not use affiliate links.)

Serafim: We aim to cut through some of the noise (pun intended!) and demonstrate how one narrator-engineer team—day to day, project to project—goes about recording, editing, and mastering audio. Not everything we do may work for you or your budget (especially right out of the gate), but we hope our postcards will help you to ask important questions (as others’ dispatches help us!) that will streamline your workflow and ease your way in the industry.

Sierra: As in this post, everything bolded in purple is an external link (except this one, ha). In addition, we put links that jump you within the postcards in blue, links to images within our social media in red, and we put non-linked terms in black to direct you to menus within an independent (software or other) program. We keep an eye out for any better way to do what we do (i.e. less expensive, more efficient, more elegant, etc.), so we update posts whenever necessary. If you have comments or questions about anything you see here or on our social media, please reach out. We’re all in this together after all.

Sierra: As in this post, everything bolded in purple is an external link (except this one, ha).In addition, we put links that jump you within the postcards in blue, links to images within our social media in red, and we put non-linked terms in black to direct you to menus within an independent (software or other) program. We are (always) still learning, so we update posts whenever necessary. If you have comments or questions about anything you see here or on our social media, please reach out. We’re all in this together after all.

Why Noise Floor, Peak, RMS

Sierra: So you want to put out an audiobook? Whatever platform you’re using for distribution (ACX, Findaway Voices, Author’s Republic, etc.), you’re going to need to pass certain submission requirements. On the way to doing so, you may have some questions. We have answers.

Serafim: There’s no better place to begin than these three key measurements: Noise floor, peak, and RMS. At this point, you may feel a little overwhelmed. That’s where Booth Basics come in—we want to help you customize your workflow, so you won’t have to abruptly change course in order to pass standards you neither understand nor appreciate.

Sierra: When you’re recording a whole book, it may be tempting to think of it as one unit, but you’ll need to apply standards at the chapter or file level (opening/closing credits, titles). Each of these smaller unit will need a noise floor no greater than -60db, a peak that doesn’t exceed -3.5db, and RMS that averages between -18 and -23db.

Serafim: You’ll also need a few seconds of room tone [standards vary] at the beginning and end of each submitted file.

Sierra: Make it easy for yourself: Record a few minutes of silence during a quiet moment, then set up a track with intro and outro silence that you can easily cut and paste into your projects—then it becomes part of your workflow, a box you check, nothing you need to think more about.

Sierra: As you either already know or will soon figure out, it’s incredibly time consuming to make an audiobook—fellow perfectionists, every chance you get to make something simpler, quicker, easier, just do it. You want to save your attention and energy for making the creative choices. On the way there—noise floor, what do we need to know?

Serafim: Your noise floor should be no higher than -60db. That means you want to arrive at a value between -60 and -90. It also means that your recording won’t have a background hum and you won’t hear dogs barking or children laughing in the background, anything that could make it harder for listeners to hear and enjoy your work. As we discuss elsewhere, if you are recording in a closet or a sound-insulated but not sound-proof booth, you will need help from software to meet this requirement.

Sierra: Why is there a lower limit? What if your noise floor falls below -90?

Serafim: You don’t want that because you’re approaching digital silence. When noise floor values plunge, it suggests either that there is an editing error or that very heavy processing has been applied.

Sierra: If you listen to some older audiobooks, you’ll hear that the sound appears to fall off a cliff at each break.

Serafim: Here’s where EQ comes in. Equalization enables you to select a certain frequency range and attenuate it. Think of a stereo with a knob that boosts or reduces bass; EQ is a digital knob that performs the same function. Most of the noise that you want to eliminate lives in the lower frequency range, so you can use a high-pass filter (a shelf blocking everything below 65Hz) to carve away that segment without degrading your voice and with the result that your noise floor drops to an acceptable degree.

Sierra: What about peak?

Serafim: Peak refers to the max loudness reached by a signal in a recording. When recording, if you set your microphone gain too high, it will send you across the 0db threshold and the sound will distort. For safe file exporting, maximum peak setting enables you to set a ceiling for your sound. In general, distribution platforms will want some headroom, so you’ll set your peak to -3 or slightly lower, allowing supervising engineers room to apply additional processing. [Note: In some instances, you may need to set an even lower peak value if you find that it’s jumping up once you apply your FX processing chain.]

Sierra: So in a nutshell, noise floor helps you avoid a noisy recording and peak helps you avoid a distorted recording?

Serafim: Exactly.

Serafim: Peak and RMS are closely related (as distinct from noise floor); in order to master both, you’ll need to use compression. (LUFS is the modern equivalent of RMS, but the audiobook industry appears to be sticking with RMS.) Root mean square (RMS) measures average loudness across a file; it matters because you want your listeners to be able to hear the full spectrum of emotions—from dead calm to wildly excited—without their needing to constantly fiddle with their volume, like a classical music fan does when taking in a symphony.

Sierra: You’ve got to have that remote in your hand! Not so audiobook listeners, thanks to RMS.

Serafim: ACX and other platforms are asking that your recording falls on average between -18 and -23db. If you’re anything like us, your initial average will probably be a little high [meaning your sound is too quiet]. That’s natural.

Serafim: First, compression shaves off the highest peaks then your software of choice pushes this compressed span up into a loudness sweet spot. We experimented with different software and arrived at Adobe Audition, because it allows the user to set peak and RMS values in a way that renders submission-ready files, no further manipulation necessary.

Sierra: As already implied, when first confronted with the challenge of RMS, we did what many newbie producers probably do and raised our gain—oooh, it sounds so much louder, more present, better! As you too may discover, the more you raise the gain, the more everything gets louder, so you ping pong back and forth between passing RMS but failing noise floor and vice versa. If that’s where you’ve landed, why you’re here, seeking harmony, well . . .

Serafim: You need compression! To recap, unless you’re recording in outer space, your noise floor will be higher than -60db, so you will use EQ to create a high-pass filter, think of it like releasing those booster rockets. For peak and RMS, you can rely on your software’s compression tools. At some point though, you may want more surgical precision. For example you can further experiment with EQ settings to help your recording sound more like you, an idea we touch on in our mic postcard and will inevitably return to.

Sierra: It’s worth noting that passing audiobook standards alone will not make you sound the most like you. With a DAW like Audition, however, you can do more. [Audition requires a subscription, but it isn’t the only option, just the one we use.]

Serafim: We don’t anticipate that everyone reading these postcards will want to make the same exact choices, but we hope you’ll go away with a better understanding of how these different tools and values fit together. We also hope that you can now appreciate how seemingly nitpicky standards meaningfully shape the listening experience. It can get frustrating but meeting these standards is an important milestone on the road to great sound. Once you reach this milestone, well, it’s just one peak and you might already be thinking about the next mountain to climb. And that’s why there are more postcards from the booth!

Why WhisperRoom

Sierra: Before we even thought of getting a booth, we bought our mic [Neumann TLM 103]. We attached it via a boom arm to our bookshelf right next to the window, and we started recording!

Serafim: The sound was good, it’s an excellent mic, only then we discovered that the recording wouldn’t meet ACX standards for background noise [-60db].

Sierra: We started out with this idea of, ‘Okay, we’re going to go to the experts.’ And the experts tended to say: ‘Eliminate as much noise as possible at the source. Really think about your studio build: You might need to open your studio walls, put insulation in.’ But we couldn’t just rip open the walls of our fifth-floor walkup so . . .

Serafim: Johnny Heller [Sierra’s coach via Edge Studio] has a WhisperRoom. We looked into it, then we bought our own right at the beginning of the pandemic. It came in pieces on a big truck and we put it together ourselves. WhisperRoom is very transparent in calling their booths “sound isolation enclosures,” in other words, they reduce but do not eliminate noise [as typical of living room booths].

Sierra: So we needed a booth and we needed audio software. Tip: Even before we submitted our first book, we knew we weren’t passing ACX standards because of ACX Check, a free plug-in for the Audacity DAW. [We use Reaper for recording.] We find ACX Check more comprehensive than Audible’s also free Audio Lab.

Serafim: Of course, just knowing we weren’t passing standards didn’t tell us how to pass. We needed to figure out how to use the software in a way that lowered our noise floor without degrading voice quality. We spent about a year experimenting before landing on our current set-up.

Serafim: You spend a lot more time in the booth, since I only do intros/outros. How comfortable do you find it?

Sierra: In the winter, the WhisperRoom is wonderfully cozy; in the summer, it’s hot [and humid here in NY!]. Since air conditioners are just too noisy for audiobook production, I know some voice actors swear by wet towels. Via an Audio Publisher’s Association webinar, we heard about Polar Products, which sells vests fortified with special ice packs. We also added a thermometer to the booth and decided to stop working once it exceeded a certain temperature and/or reached a certain level of discomfort. Recording in the early morning also helped.

Serafim: So to wrap up, some pros and cons: Overall, we’re happy with our booth! It’s very well made and helps us get a great sound. We feel confident saying that after nearly two years of use. It also looks good.

Sierra: Everyone asks about it! It’s a conversation starter.

Serafim: Do you think it’s affordable in comparison to other options?

Sierra: That’s tough to say. At more than $6,000 it’s not inexpensive in relative terms. But there aren’t that many options and it’s definitely cheaper than building your own custom booth. It’s more expensive than the perfect closet.

Serafim: But that perfect closet wouldn’t have windows, whereas our booth has two, so you can look out and see me smiling at you.

Sierra: Most important pro!

Serafim: We sometimes wish we had bought the optional rollers, which would have meant we could move the WhisperRoom (without taking it apart first). Once we hired a mounting expert [read: taskrabbit], we had more success keeping our microphone and iPad on the wall [via Triad Orbit]. I was a little concerned that screws and holes would compromise the integrity of the booth, but that doesn’t seem to have happened.

Sierra: As far as set-up goes, it’s worth mentioning that you can call WhisperRoom and a live, friendly human being will help you on the phone, which is amazing.

Sierra: I do want to say something about the ventilation silencing system—like the booth itself, it reduces noise but doesn’t eliminate it, so it hasn’t worked for our purposes. The booth is modular so you can decide what pieces are essential to you.

Serafim: For the reasons we’ve described above, we recommend WhisperRoom to fellow voice actors and authors planning frequent recordings. Feel free to write us with any specific questions.

Why Neumann TLM 103

Serafim: We started our microphone search, informed both by your studying at Edge Studio and Gravy for the Brain and your experience in radio [as a freelance foreign correspondent in Beirut], as well as my years of experience with audio engineering, recording music primarily. If you happen to be in New York City, B&H has a small room with multiple microphones all connected, so you can speak (or sing) into them. [You can also order from them and try gear out in your own actual space :-)]

Sierra: Because of you, I walked into this mic room already knowing quite a bit about the kind we needed—to start, a condenser mic with a cardioid polar pattern, right?

Serafim: Yes, cardioid means it has a heart shape—the bottom of the heart is pointing forward toward the speaker and the top of the heart blocks the sound that comes from behind the microphone—versus an omni-directional microphone that’s very good for recording when, let’s say, you want to capture the sound of a music hall. The mic is an analog component but in order for it to communicate with the computer, it needs to go through a usb audio interface [RME Fireface UFX II, in our case]. That day at B&H, we ended up testing two mics from AKG and two from Neumann, the 102 and 103.

Sierra: We went in knowing that the TLM 103 is the industry standard [in our price range, as opposed to the U87]. I had heard that from people in the voice over industry, and you were familiar with it because it is a multi-industry standard mic.

Sierra: But I do remember you telling me that the mic operates in concert with you the speaker. You, in essence, dance with it, so it’s really a question of who is going to be your best dance partner, not the universal industry-standard dance partner. That said, in our case, we did end up with the 103, because we liked the sound of it with my voice when we tested it against other comparable mics.

Serafim: Exactly. We liked it so much because it sounded the most like you. That said, you can spend an enormous amount of time chasing the microphone that sounds exactly like you. And once you have your mic and your booth, you’ll still need an FX chain: EQ, compression, and so on. The process of working all of that out took us a couple of years, during which we experimented with how to position both the speaker and the mic, right?

Sierra: Like many people, I think, I initially assumed that the mic should go right in front of your mouth and had to learn that you’re more likely to want about six inches distance in between the two. And it will probably serve you best if it’s off center, even slightly above you [if hung upside down like ours].

Serafim: We ended up mounting the mic on the wall of our WhisperRoom using the Triad Orbit system. We also tried out a few different pop filters before landing on the Avantone PS1 Pro-Shield Studio pop filter, also from B&H.

Sierra: That particular pop filter really curves around the mic so it takes up very little additional space. You can have a larger booth than ours and you’re still going to want to think about how you’re orienting yourself in that space right in front of the mic. Our set-up enables me to feel relaxed and confident, which is a huge part of anyone’s sound and perhaps an unsung part of the microphone experience. Also the FX chain, which is mostly your domain.

Serafim: We use EQ to amplify certain frequencies and reduce others. We have a compressor that reduces the loudest sounds and then boosts everything; rather than getting a lot of dynamic range you’re arriving at something more consistent, easier on the ear.

Why Reaper

Sierra: Reaper is one of our most valued tools, but to unlock its potential, you first need to understand this concept of a DAW. Passing the mic to our resident sound engineer: What is a DAW and why do you need one if you are an audiobook narrator and/or all-around voice artist?

Serafim: Okay, so let me put my white coat and glasses on—DAW stands for a digital audio workstation. It’s a piece of software that marries all of your audio tools and enables you to edit your recorded material very conveniently on a timeline. It’s like Microsoft Word: there are simpler programs for typing, but Word is going to give you all these tools to organize text very neatly into sections and paragraphs.

Sierra: Within our workflow, Reaper interacts with other tools, some of which are, in their own right, DAWs, specifically: Adobe Audition [subscription required] and Audacity [free].

Serafim: Now people might be wondering, ‘Wow, so I need more than one DAW?’ You don’t actually need more than one, or rather, you’re not necessarily going to use them as DAWs per se. For instance, we use Audacity pretty much exclusively for ACX Check, which we already talked about here. It just happens that Audacity is also a DAW. [Reminder: terms in blue will jump you within the postcards.]

Serafim: We talk more about Audition in another Booth Basic; here, let us just say that it’s an asset when it comes to razor-fine editing tasks i.e. noise that your filters don’t catch. Reaper excels as a DAW not because it’s “better” than Audition in absolute terms (though it might be better than Audacity) but because it’s inexpensive, infinitely customizable, and constantly evolving.

Sierra: There’s an update practically every day, which got to be a little much until we put it in the calendar for a once-per-week download. But over time those are improvements that we want.

Sierra: Let’s get a little more specific. The features we’ll be discussing are mostly not exclusive to Reaper, but we like how they work in Reaper. That’s why it’s our DAW of choice. If you’re considering Reaper or already working with it, we want to share some tips. For starters, there are two modes of recording, which we use all the time: Tape mode and Create New Take mode. [Reminder: terms in black, though not links, will connect you via Reaper’s Help menu to the tool in question.]

Serafim: Tape mode is well-named: imagine you have a tape running, you make a mistake or you want to do it differently, so you rewind, press record, and it records over the previous take. And Create New Take mode (also known across DAWs as ‘comping’) enables you to record several versions of the same take, neatly stacked and represented with different color waveforms; you can go back later and choose your favorite take and then ‘glue’ it into the track, with a handy short-cut like ‘g’, which you can also set up in Reaper.

Sierra: We generally live in Tape mode and visit its counterpart, because, when I do make an error, we want to stop and take a second to reset [a luxury of the home studio]. For us, this is more time efficient than just leaving all these errors in, which would make it difficult for us to tap into the flow of what we’ve recorded. When editing doesn’t keep pace with recording, say, each new chapter, thanks to this technique, we end up with more continuity of sound and storytelling as we move through a longer project.

Serafim: A few more favorite Reaper features: Via Layouts [in the Options Menu], you can customize (video) the visual interface. For instance, you can make the meters on the channel as big as you want. If you’re recording alone, let’s say, and you’re keeping your laptop outside your booth [as generally recommended], you can make your meters as big as the entire screen, so you won’t need binoculars to see your levels.

Sierra: And you won’t compromise your larynx by jutting your head forward continuously as you strain to see better.
Within Reaper, via Preferences, you can also set up Audition as an External Editor, which means that with the click of another shortcut [we chose ‘a’], you can shift a time selection over to Audition, clean it up as desired, then save and click back to Reaper. If it can’t be fixed via Audition, then we know to re-record it in the moment.

Serafim: A lot of programs allow you to make markers but Reaper takes it to the next level with its differentiation of markers by type [markers vs. regions], color, number, and name. It also features a tool called Region/Marker Manager, a little window that floats on your screen and enables you to jump easily from one marker to another, by color, say, across instances of a character’s voice in different scenes or repeat appearance of a word or name, when you want to check the pronunciation later. It’s the best implementation of this particular feature that I’ve ever seen in a DAW.

Sierra: We sometimes like to proof away from the laptop on earbuds, because it’s nice to get away from the screen and because the listener likely won’t have the engineer’s fancy headphones. So we’ll process the time selection in question, a chapter, say, and upload it to a file sharing service (Overcast, in our case). I’ll pop in my AirPods Pro and then listen and jot notes on a post-it: Noisy breath, 3 minutes. Missing word, 5 minutes 17 seconds. When I go back to Reaper, I can put my cursor at the start of the track (as opposed to the overall project file), go to Project Settings and reset that start time so 0:00 appears where I need it. So no matter where I am in the overall project file, I can follow the ruler above and zip through my edits.

Serafim: Reaper doesn’t have an incredibly sophisticated metering system, but it does have this plug-in called JS: Audio Statistics [located via FX Browser], which enables you to track your RMS levels and your noise level. We find Audition to be a better tool when it comes to meeting Loudness standards, but the Audio Statistics plug-in will help you find and correct for dead silence. A recording may be too noisy or too quiet, sounding unnatural. ACX Check will ding your recording if you have even a second or two of dead silence, but it won’t tell you where those seconds are. That’s where Audio Statistics comes in—let’s say you left a little gap between two splices then glued the section, you’ll be able to see that by playing the file and watching the Audio Statistics window [RMS Window Min L showing a figure less than -90, e.g. -105]. At that point, you’ll also be able to see the flatline in the waveform.

Sierra: To wrap up, Reaper is attractive because it’s inexpensive and easy to install. You can try it free for 60 days before paying $60 [as of May 2022] for a discounted use license, appropriate for most small businesses.

Serafim: And if you decide not to go with Reaper, you can still look for the tools that we’ve described, and their presence or absence in a different DAW might help you in your choice. If you are going forward with Reaper, I’ve found Kenny Goia’s YouTube channel to be one of the most comprehensive tutorial resources. There’s also the Reaper Blog and Booth Junkie.

Why Adobe Audition

Sierra: In another Booth Basic, we named Reaper as our DAW of choice, but we have to give some more attention to Adobe Audition. As we’ve already covered, Audition is a DAW, you can record in it, and it will help you pass Audiobook Submission (aka Loudness) standards, which is why we love it. You can also combine it with a more customizable DAW like Reaper, in which case Audition plays one or both of two main roles in your workflow . . .

Serafim: We use Audition as an external editor within Reaper for more precise editing, [Simple Editing]—the external editor feature makes it easy to jump a snippet of sound in between Reaper and Audition via a customizable shortcut. If you set up Audition as an external editor within Reaper [under Preferences, scroll down to external editor and select Audition], you have the option of specifying: Open item copies in primary external editor [under the Actions menu]. That allows for non-destructive editing, which both Reaper and Adobe products make a priority.

Serafim: We also use Audition in its stand-alone mode to meet Loudness standards. With the click of a button [the aptly named ITU-R BS.1770-3 Loudness setting, listed under Match to], we can target RMS and peak levels for any file [having, at an earlier stage, reduced our noise floor as necessary with the help of EQ].

Sierra: So in goes a file with poor RMS, out comes a submission-ready track that passes ACX and other platform standards. Eureka! At which point, you might understandably ask: Are we done now?

Serafim: Not quite. When you successfully match platform standards, your volume gets boosted, along with noises made by the voice actor, whether mouth clicks, breaths, or rustles. Fortunately, Audition has an especially user-friendly form of what’s called spectral view.

Sierra: I’d previously thought of this as a specialist’s tool, the kind used by ornithologists or scientists, not by voice actors intent on basic sound engineering.

Serafim: In spectral view, different frequencies of sound appear in different colors, so ornithologists do use it to analyze bird sounds; you can see the beginning, middle, and end of a sound and the gaps between words or phrases, where there shouldn’t necessary be any color/sound. Audition also provides you with a spot healing brush tool, which enables you to touch up your audio in the same subtle way you would a photograph in Adobe Photoshop or another graphics program. Only you’re going to be listening and looking.

Sierra: Before we started using the spectral view, we did a lot of zooming and slicing in Reaper to try and clean up noise, and sometimes it worked and we felt good, but a lot of times we just had to go back and spend even more time re-recording. Not anymore. (Hat tip to George the Tech, a great resource for all things audio engineering, for pointing us in this direction.)

Serafim: There’s no equivalent in Reaper to Adobe’s spot healing tools. We prefer the rectangular Marquee Selection Tool: Click E to activate the tool; select (your best guess at) the noise; then hit Command/Control U to “heal.” You may need to try/try again to land on the right selection.

Serafim: When you work in Audition, you are preserving the integrity of the sound, excepting the intrusion. It is targeted removal.

Sierra: If you decide to use more than one DAW, you’re going to want to set up your FX chain in both. Then, when you select a segment that needs work, click ‘a’ (or your shortcut of choice), and jump that sound to Audition, you’ll be able to pinpoint the noise that sent you there. [Note: the same FX rack may vary slightly between programs.] Then you heal, hit save, and click back to Reaper.

Serafim: Spot healing, however, has its limits. You could surgically remove every single click or noise, but for a 10-hour book, that would require an enormous time commitment.

Sierra: Even for a 20-minute book.

Serafim: That’s where IZotope RX comes in!

Why IZotope RX

Serafim: As you get more comfortable producing audiobooks, your ear will become more attuned to imperfections exacerbated by ACX’s and other platforms’ Loudness requirements. You’ll become more aware of producing mouth clicks or rogue breaths—which is good news. It means you’re a human being. Bodies make noise. You don’t need to freak out, but you do need a workflow that will give you a no-stress way to address these issues.

Sierra: So, along those lines, what is IZotope RX and why is it essential?

Serafim: IZotype RX is one of the most prominent players in the game of audio restoration. It’s a complex suite of tools that address all sorts of audio issues, including noise, interference, wind, crackles—people use it to digitize vinyl records.

Sierra: So it’s a remastering tool?

Serafim: Yes. And for voice actors, in particular, there are two convenient IZotope RX modules [only available in IZotope RX Standard] that can act as plug-ins within your DAW: Mouth Declick and Breath Control.

Serafim: Mouth Declick (as opposed to the more generic Declick) applies a sophisticated algorithm to determine what is and is not a mouth click. [Note: First, you’ll have made your best effort to distinguish between mouth clicks you can control (i.e. lip smacks) and those you often can’t (a veritable symphony of crinkles)].

Sierra: To hear mouth clicks in the wild, just turn on C-Span and listen to a congressperson speaking into a microphone. Or pay close attention to your favorite television show whenever music isn’t playing in the background. In our daily lives, we usually just tune these noises out. But they’re louder and more prominent in a recording, so what can we do?

Serafim: Ideally, this IZotope module scans your file, locates especially egregious pops and clicks, and wipes them away without taking a bite out of normal articulation.

Serafim: Whether or not you choose to seek out more in-depth instruction, it’s worth taking some time to just play with the settings. For example, there are three key adjustable settings in Mouth DeClick: sensitivity, frequency skew, and click widening. To see what they do, feed in an audio sample, crank the values all the way up, then listen. (You have the option of clicking a checkbox and hearing output clicks only, so you know exactly what’s being eliminated.)

Sierra: At the most extreme settings, the module’s probably stripping away sounds you want to keep right?

Serafim: Exactly. Like consonants.

Serafim: What’s most important is for you to develop an intuitive feel of what this or any program can do, first at its most extreme. Then you start to dial it down until you find your FX processing sweet spot.

Sierra: That’s when processing does what you want it to do but you still sound like yourself. So it’s a sweet spot that reconciles platform standards with your own.

Serafim: ACX or another platform isn’t necessarily going to flag you for having too many mouth clicks.

Sierra: Though listeners might do so in their reviews. People have become quite sensitive about mouth clicks in the audiobook world, less so, it seems, when it comes to breaths, which we’ll get to in a bit.

Serafim: That’s why you want to seek feedback, ideally from listeners who care at least a little bit about audiobook sound. You’re going in with a specific question: Does this FX processing sound too heavy, not heavy enough, or just right? You might consider rendering several different versions and then setting up a blind test so as to get the most unbiased observations.

Sierra: We also like to audition our tracks with different headphones. Apple AirPods Pro have turned out to be a key part of our workflow, because they tend to favor higher frequencies, like many popular daily-use earbud headphones.

Serafim: You may hear more mouth noises in AirPods than you would hear in really nice professional headphones.

Sierra: Kind of a good thing, right? That enables you to be confident, once you’ve fine-tuned, that you’re giving the listener the experience that you want them to have and that you want to have when you’re listening to an audiobook. That also means you want to be confident in the tools you use, so you’re not reduced to cranking up the volume to an extreme. Your ears are, after all, your most important technology.

Sierra: What about Breath Control?

Serafim: Your breathing is another key element in your own suite of expressive tools. You don’t necessarily want to eliminate every breath or even any breath. In some projects, on some days, you may find some breaths distracting. IZotope’s Breath Control module enables you to attenuate breaths across a project as desired. If you do crank up the Breath Control settings to their extreme, it’s probably going to sound unnatural.

Sierra: Even if it doesn’t, you may find that you’ve introduced a glitch into your audio. We discovered, when producing one particular audiobook, in listening to the entire file—which we are want to do, often more than once—that the Breath Control was out of control: it had begun to interpret certain letters as breaths, creating little dips in volume throughout the file.

Serafim: Again, it’s finding the sweet spot, so you don’t have to micromanage.

Sierra: It’s definitely a tradeoff, becoming sensitive to otherwise minute noises: once you’ve heard them, you can’t go back, then you need to do something about them.

Serafim: We started with the technology for a reason. There are lifestlye choices a voice actor can make to prepare.

Sierra: The first thing everyone’s going to tell you, great advice, is hydration. Just as we’ve been advising you to experiment with your plug-in settings, you’ll need to discover, with time and trial, just how much water you need to drink and when, based on what feels good. You may already drink enough so don’t go crazy(!) Some voice actors swear off dairy products, others don’t like the after-effects of toothpaste. Something that bothers one person may have no effect on you at all. There is no average voice actor and no one-for-all solution.

Serafim: What about breathing?

Sierra: I like that there’s competing schools of thought on breath. Some voice actors have developed a regular pattern of breath, like the tide going in and out. I happen to be someone who breaths very quietly. We did find ourselves using a bit of Breath Control during an especially hot summer when we had no AC. Good lesson: The environment of the booth will influence the environment of your mouth and the sounds you make. To the extent that you can make yourself comfortable in your booth, that’s going to have an impact. There’s also the iron law of body chemistry; you can’t control day-to-day fluctuations, but you can become more aware, say, when you need to take a drink, a snack, or a break.

Serafim: There’s only so much you can do to optimize your voice. You do your job and then let the engineer and the technology do theirs.

Why RME Interface

Sierra: What is an interface and why do I need one?

Serafim: You need a USB interface to connect and power your mic and help it communicate with your computer and your headphones.

Sierra: It’s called a USB interface because it connects to the computer via USB, but it connects with the mic via XLR cable, right?

Serafim: Yes. More specifically, the interface is both a pre-amp and a DAC [digital-to-analog converter]. As a pre-amp it enables you to power your [condenser] microphone. It takes a very soft signal coming from the mic and amplifies it to what we call line level, that is audio level. When you connect a mic, computer, and headphones to the interface, it’s converting back and forth between analog and digital in order to transmit information from one to the other so you can record sound and play it back.

Sierra: So, it sounds like one of those tools you absolutely need?

Serafim: Definitely. We started out with RME’s Babyface, having both had previous experience with Focusrite’s Scarlett.

Sierra: We both liked the Scarlett well enough, but you didn’t think that was the best choice for us for audio narration. Why is that?

Serafim: The Scarlett is great value. In the pro audio worlds, however, professionals favor RME, due to its quality: transparent preamps and excellent sound characteristics. The difference, then, between Scarlett and Babyface lies in the quality of audio capture from the microphone. If you have a really good mic, you want your interface to have a comparable quality, because the two build on each other.

Sierra: You’ve also said that the difference between the Babyface and the Scarlett isn’t that dramatic, especially given that the Babyface is considerably more expensive.

Serafim: That’s true, so why would you upgrade? Consider that your computer has a built-in DAC, but it’s extremely low quality. It’s just not a high priority for the laptop manufacturer to invest a lot in developing the best possible DAC, given the kind of headphones the average consumer is using, not to mention the average USB mic.

Serafim: When you buy a Scarlett, you are getting a huge jump in quality from that built-in DAC. When you jump from Scarlett to the Babyface, the gap in quality won’t be as vast as between the computer and the Scarlett. But it’s still noticeable. I can hear the difference. That said, the Scarlett might be the best choice for you given your set-up. An interface can only convey what your mic picks up in the first place.

Sierra: In a future Booth Basics, we’ll talk about what it means to work as a team so far as technology’s concerned. Let us briefly mention here that we started out with the Babyface but quickly moved over to the Fireface UFX II, a more sophisticated version of the Babyface.

Serafim: I already owned the Fireface since I used it for recording instrumental music. It has four microphone inputs. For solo voice over needs, it’s overkill. So why are we mentioning it? Two people can use the Babyface at the same time, but if you adjust the volume for one person, the other person has to have the exact same volume. The advantage of the Fireface UFX is that it has two independent headphone amplifiers each with a dedicated volume control.

Sierra: That’s especially important to us because we use such different headphones, the topic of another future Booth Basic! Is there any workaround for voice artists with a Babyface, Scarlett, or another interface, collaborating on a temporary basis?

Serafim: You could technically get a separate headphone amplifier to connect to your Babyface, but it means adding an extra piece of technology and additional complexity. When we were producing a memoir narrated by the author, we had to go with an option for three headphones. As you may already have guessed, Fireface also happens to have even higher quality pre-amps, yielding higher quality audio capture.

Sierra: We all want to sound as good as possible [i.e. like ourselves]. If someone only has enough to invest in a good mic or a good interface, which should they buy?

Serafim: Apart from your computer, your microphone is your most important tool. In an ideal world, though, you’d buy everything at once, all on a comparable level, whether budget-conscious or professional picks. Whatever you buy, try it out, do some blind tests. You deserve to be happy with your sound. If you’re seeking additional guidance, feel free to reach out! We’ll be back with another Booth Basic next month.

Booth Basics

Loudness

Booth: WhisperRoom

Mic: Neumann TLM 103

DAW: Reaper

FX: Adobe Audition

FX: IZotope RX

Why Booth Basics

Why Noise Floor, Peak, RMS

Why Noise Floor, Peak, RMS

Why WhisperRoom

Why Neumann TLM 103

Why Reaper

Why Adobe Audition

Why IZotope RX

Why RME Interface

Connect with Us