Why ElevenLabs’ Audio Tags Are a Big Deal for AAC

A mockup of the phone shows text entered into the Spoken AAC app: “[Excited] Wow, that's awesome”. Floating around the phone are other bracketed words, including [laughing], [disbelief], [quietly], [cheerful], [gasps], and [shouting]. Two orange arrows point from the clouds of bracketed words toward the textbox in the app.

Imagine you’re communicating with a text-to-speech voice. Without adding additional words, can you be sure “Great!” will be interpreted as sarcasm rather than genuine excitement? Probably not. Augmentative and alternative communication (AAC) technology has come a long way, but it still struggles to capture the nuances of spoken language. Even the most realistic-sounding text-to-speech voices can’t really express emotion or emphasis. That’s why ElevenLabs’ new audio tags feature might be a game changer for AAC.

Audio tags are commands that can be used to fine tune the tone and delivery of digital voices. You insert them among your words anywhere you want to adjust pacing, energy, emotion, etc. So in the example, “Great!”, you could preface the words with [sarcastic] to make sure they sound appropriately snarky.

What Are Audio Tags?

ElevenLabs introduced audio tags with their Eleven v3 model. They are short text commands placed inside square brackets that can shape the way your voice sounds. You can use them to adjust things like energy or emotion. This means you’re in control of not just what your message says, but how it sounds.

For example:

These are just a small sampling of the practically infinite tag options. Eleven v3 will try to interpret anything you enter in square brackets as an audio tag. For instance, you could try something silly like [goblin noises].

Spoken added audio tag compatibility alongside other ElevenLabs features in update 1.9.3. You can read our blog post about the update to learn about the other features, like voice clones and prompt-based voice design.

How to Turn On Audio Tags in Spoken

⚠️ This part’s important: audio tags are still experimental on ElevenLabs’ side. That means they’re in early testing, and they may not always work as expected. Besides that unreliability, there’s nothing risky about using them. The main reason we don’t have them enabled by default is because the model they require is much slower than the default one we chose, eleven_turbo_v2_5. Although this new feature is a gamechanger, we believe timely playback will still be a priority for most users.

To try audio tags in Spoken, you need to switch your voice model to the version that supports tags. Here’s how:

Eleven v3 is the only model that currently supports audio tags.

For a more thorough tutorial with images, see our help topic on audio tags.

How to Use Audio Tags in Spoken

Heads up: Before you start using audio tags, keep in mind that they might not always work. Like we said before, v3 can be unreliable. Sometimes the voice will read your tags out loud instead of applying them. Other times, a tag might be ignored completely. We imagine this will be fixed as ElevenLabs continues to develop the model.

To use audio tags, you must be using an ElevenLabs voice. You must also have eleven_v3 selected as your model, like we explained above.

Adding Audio Tags to Everything

Once you’ve selected eleven_v3, you will see a new text field appear below the model selector. Here, you can enter tags you’d like to apply to everything you say. Spoken will invisibly add them to the beginning of anything you type. We thought this would help reduce the amount of typing required if you want to regularly include something like a personality trait (like cheerfulness or warmth).

An example of the aforementioned text field from Spoken AAC filled in. The words entered are Affectionate, Kind, and Warm, separated by commas.

To use this field, you don’t need to surround your tags in brackets. You can simply enter them as plain text. If you want to add multiple, just separate them with commas like in the example above.

Another suggestion we have for this feature is to use it to apply an accent to your voices. If you type something like “strong Australian accent” here, ElevenLabs will try to apply that accent to any voice you have selected. This can be useful if your accent doesn’t have a lot of representation in the ElevenLabs voice library, or if a voice you designed via prompt didn’t come out with the correct accent (since they have an unfortunate tendency to trend American, even when you specify something else).

Adding Audio Tags to Individual Phrases

If you want to add audio tags on a message-by-message basis, it’s pretty simple. You just need to enter them when you’re composing a message in Spoken, right alongside your other words. Audio tags can go anywhere in a sentence and you can add as many as you want. If you need to shift your tone partway through a message, you can easily do that. For instance, if you need to take a break to say something important, you can easily pivot from [humorous] to [serious] and back.

The text box in Spoken - Tap to Talk AAC with a message entered, using audio tags: [disappointed] It sure is a shame it's raining. [upbeat] Oh well, there's always tomorrow!

When you’re using audio tags in individual messages, you’ll need to surround them in square brackets so ElevenLabs understands they’re not just part of your sentence. If you don’t, they’ll never be applied.

Audio tags can be used for more than just shifting tone, though. If you want to add laughter to something you say, you can even do that. Try [laughs] if you just want the laugh in isolation, or [laughing] if you want it integrated into your speech. Some similar examples include [sighs] or [gasps]. The tags aren’t limited to noises you can make with your mouth, though. If needed, you can even add sound effects like [knocking] or [applause].

You can also adjust delivery. Need to whisper? Just add [whispering]. Need to shout? Well, you know what to do. You can even add [rhythmically] or [singsong] if you’re feeling particularly chipper. Or maybe you just need to slow things down so someone can understand you better. In that case, [slowly] will do the trick.

Why Audio Tags Might Be a Big Deal for AAC

Although providing access to speech is the most important part of AAC, that’s really just the baseline. In an ideal world, AAC technology should be as robust and efficient as verbal speech. At Spoken, one step we’ve taken toward that is removing the limitations on vocabulary traditional AAC systems impose. Now, we’re excited to introduce something that breaks another barrier. While we can’t take credit for ElevenLabs’ underlying innovation, we’re eager to explore how audio tags fit into the Spoken ecosystem and change how communication works for our users.

In verbal speech, how you say something matters just as much as what you say. Tone, pacing, emphasis, and delivery can communicate a lot. Audio tags offer a way to bring that color to AAC without being overly verbose. By embedding simple cues, AAC users can adjust the “performance” of anything they say.

This could transform not only day-to-day conversation but also self-expression. Imagine being able to tell a joke that actually sounds like one, or to express frustration without needing to overexplain. That means faster communication, fewer misunderstandings, and a stronger sense of ownership over one’s voice.

We’re certain AAC users will have a much easier time expressing themselves using audio tags and we’re excited to see that take shape now that the feature is available in Spoken.

Making Audio Tags Easier to Use

We’ll be the first to admit that typing out audio tags isn’t ideal in an AAC app. We know typing is difficult or impossible for many users, which makes the current implementation of audio tags inaccessible to some. We only made the decision to add them as-is because we didn’t want to gatekeep such a groundbreaking feature.

If we see that users love audio tags (and we think you will), we plan to build a much easier system that lets you tap to pick a tone, without typing anything at all. This may take some time, but it’s definitely in our sights and may even increase in priority depending on user response.

In the meantime, you can try the text field on the Other Voices page to test audio tags across many messages with a minimal amount of typing. It’s far from being a solution, but it should at least alleviate the issue.

Audio Tag Examples

Practically anything can be an audio tag, as long as it’s surrounded by brackets. However, it’s still worth showing examples so you can get a better idea of the many ways they can be used.

Emotion and Tone

Delivery Style

Nonverbal Sounds

Accents and Persona

Sound Effects

Layered or Complex Combos

Of course, these are just a small sample of what’s possible with audio tags. You can find an even longer list of tags people have tested here. The list contains around 2,000 tags that can be sorted by category. They also display user ratings, so you can get an idea of how well they’ve worked for others.

About Spoken

Spoken is an app that helps people with aphasia, nonverbal autism, and other speech and language disorders.