Why ElevenLabs’ Audio Tags Are a Big Deal for AAC

A mockup of the phone shows text entered into the Spoken AAC app: “[Excited] Wow, that's awesome”. Floating around the phone are other bracketed words, including [laughing], [disbelief], [quietly], [cheerful], [gasps], and [shouting]. Two orange arrows point from the clouds of bracketed words toward the textbox in the app.

Imagine you’re communicating with a text-to-speech voice. Without adding additional words, can you be sure “Great!” will be interpreted as sarcasm rather than genuine excitement? Probably not. Augmentative and alternative communication (AAC) technology has come a long way, but it still struggles to capture the nuances of spoken language. Even the most realistic-sounding text-to-speech voices can’t really express emotion or emphasis. That’s why ElevenLabs’ new audio tags feature might be a game changer for AAC.

Audio tags are commands that can be used to fine tune the tone and delivery of digital voices. You insert them among your words anywhere you want to adjust pacing, energy, emotion, etc. So in the example, “Great!”, you could preface the words with [sarcastic] to make sure they sound appropriately snarky.

What Are Audio Tags?

ElevenLabs introduced audio tags with their Eleven v3 model. They are short text commands placed inside square brackets that can shape the way your voice sounds. You can use them to adjust things like energy or emotion. This means you’re in control of not just what your message says, but how it sounds.

For example:

Want to sound upbeat? Try adding the [cheerful] tag.
Feeling frustrated? Use [angry] or [annoyed].
Want to lower your voice to calm someone? Try [softly] or [gently].
Want to show that something is cracking you up? Throw in [laughing].

These are just a small sampling of the practically infinite tag options. Eleven v3 will try to interpret anything you enter in square brackets as an audio tag. For instance, you could try something silly like [goblin noises].

Spoken added audio tag compatibility alongside other ElevenLabs features in update 1.9.3. You can read our blog post about the update to learn about the other features, like voice clones and prompt-based voice design.

How to Turn On Audio Tags in Spoken

⚠️ This part’s important: audio tags are still experimental on ElevenLabs’ side. That means they’re in early testing, and they may not always work as expected. Besides that unreliability, there’s nothing risky about using them. The main reason we don’t have them enabled by default is because the model they require is much slower than the default one we chose, eleven_turbo_v2_5. Although this new feature is a gamechanger, we believe timely playback will still be a priority for most users.

To try audio tags in Spoken, you need to switch your voice model to the version that supports tags. Here’s how:

Go to Settings in the Spoken app.
Tap on “Use Voices From Other Sources.”
Make sure you have entered a valid ElevenLabs API key and have an ElevenLabs voice selected.
Scroll to the bottom of the ElevenLabs voice list until you see the option to change models.
Tap it and select eleven_v3 from the list that appears.

Eleven v3 is the only model that currently supports audio tags.

For a more thorough tutorial with images, see our help topic on audio tags.

How to Use Audio Tags in Spoken

Heads up: Before you start using audio tags, keep in mind that they might not always work. Like we said before, v3 can be unreliable. Sometimes the voice will read your tags out loud instead of applying them. Other times, a tag might be ignored completely. We imagine this will be fixed as ElevenLabs continues to develop the model.

To use audio tags, you must be using an ElevenLabs voice. You must also have eleven_v3 selected as your model, like we explained above.

Adding Audio Tags to Everything

Once you’ve selected eleven_v3, you will see a new text field appear below the model selector. Here, you can enter tags you’d like to apply to everything you say. Spoken will invisibly add them to the beginning of anything you type. We thought this would help reduce the amount of typing required if you want to regularly include something like a personality trait (like cheerfulness or warmth).

An example of the aforementioned text field from Spoken AAC filled in. The words entered are Affectionate, Kind, and Warm, separated by commas.

To use this field, you don’t need to surround your tags in brackets. You can simply enter them as plain text. If you want to add multiple, just separate them with commas like in the example above.

Another suggestion we have for this feature is to use it to apply an accent to your voices. If you type something like “strong Australian accent” here, ElevenLabs will try to apply that accent to any voice you have selected. This can be useful if your accent doesn’t have a lot of representation in the ElevenLabs voice library, or if a voice you designed via prompt didn’t come out with the correct accent (since they have an unfortunate tendency to trend American, even when you specify something else).

Adding Audio Tags to Individual Phrases

If you want to add audio tags on a message-by-message basis, it’s pretty simple. You just need to enter them when you’re composing a message in Spoken, right alongside your other words. Audio tags can go anywhere in a sentence and you can add as many as you want. If you need to shift your tone partway through a message, you can easily do that. For instance, if you need to take a break to say something important, you can easily pivot from [humorous] to [serious] and back.

The text box in Spoken - Tap to Talk AAC with a message entered, using audio tags: [disappointed] It sure is a shame it's raining. [upbeat] Oh well, there's always tomorrow!

When you’re using audio tags in individual messages, you’ll need to surround them in square brackets so ElevenLabs understands they’re not just part of your sentence. If you don’t, they’ll never be applied.

Audio tags can be used for more than just shifting tone, though. If you want to add laughter to something you say, you can even do that. Try [laughs] if you just want the laugh in isolation, or [laughing] if you want it integrated into your speech. Some similar examples include [sighs] or [gasps]. The tags aren’t limited to noises you can make with your mouth, though. If needed, you can even add sound effects like [knocking] or [applause].

You can also adjust delivery. Need to whisper? Just add [whispering]. Need to shout? Well, you know what to do. You can even add [rhythmically] or [singsong] if you’re feeling particularly chipper. Or maybe you just need to slow things down so someone can understand you better. In that case, [slowly] will do the trick.

Why Audio Tags Might Be a Big Deal for AAC

Although providing access to speech is the most important part of AAC, that’s really just the baseline. In an ideal world, AAC technology should be as robust and efficient as verbal speech. At Spoken, one step we’ve taken toward that is removing the limitations on vocabulary traditional AAC systems impose. Now, we’re excited to introduce something that breaks another barrier. While we can’t take credit for ElevenLabs’ underlying innovation, we’re eager to explore how audio tags fit into the Spoken ecosystem and change how communication works for our users.

In verbal speech, how you say something matters just as much as what you say. Tone, pacing, emphasis, and delivery can communicate a lot. Audio tags offer a way to bring that color to AAC without being overly verbose. By embedding simple cues, AAC users can adjust the “performance” of anything they say.

This could transform not only day-to-day conversation but also self-expression. Imagine being able to tell a joke that actually sounds like one, or to express frustration without needing to overexplain. That means faster communication, fewer misunderstandings, and a stronger sense of ownership over one’s voice.

We’re certain AAC users will have a much easier time expressing themselves using audio tags and we’re excited to see that take shape now that the feature is available in Spoken.

Making Audio Tags Easier to Use

We’ll be the first to admit that typing out audio tags isn’t ideal in an AAC app. We know typing is difficult or impossible for many users, which makes the current implementation of audio tags inaccessible to some. We only made the decision to add them as-is because we didn’t want to gatekeep such a groundbreaking feature.

If we see that users love audio tags (and we think you will), we plan to build a much easier system that lets you tap to pick a tone, without typing anything at all. This may take some time, but it’s definitely in our sights and may even increase in priority depending on user response.

In the meantime, you can try the text field on the Other Voices page to test audio tags across many messages with a minimal amount of typing. It’s far from being a solution, but it should at least alleviate the issue.

Audio Tag Examples

Practically anything can be an audio tag, as long as it’s surrounded by brackets. However, it’s still worth showing examples so you can get a better idea of the many ways they can be used.

Emotion and Tone

[Excited] Woohoo! I finally did it!
[Calmly] Don’t worry. [Reassuring] It’s going to be okay.
[Disappointed][softly] I really wish that had gone differently.
[Sarcastically][under breath] Oh, great. Another update.
[Nervously] Uh… [pauses] can I ask you something?
[Angrily][clipped tone] That’s not what I said!
[Warmly] You don’t have to apologize. [Gently] I understand.
[Teasingly] You’re really going to wear that?

Delivery Style

[Whispering][urgent] Careful, they can smell fear.
[Shouting][panicked] Watch out!
[Slowly][thoughtfully] Let. Me. Think.
[Rhythmically][playful] Coffee first, then chaos.
[Robotically] Beep boop. Human detected.
[Dramatically] And then… everything changed.
Oh, [mockingly] so now you care?

Nonverbal Sounds

[Laughs] That was terrible—do it again.
[Sighs] Mondays.
[Gasps][excitedly] No way, seriously?
[Clears throat] Anyway, as I was saying…
[Groans] Why do I always fall for that?
[Snickers][quietly] You didn’t hear that from me.

Accents and Persona

[Posh British accent] Fancy a cuppa?
[US Southern accent][friendly] Well bless your heart.
[Australian accent][energetic] Let’s give it a go!
[newscaster voice][neutral tone] In today’s breaking news…
[Villain voice] Soon, they’ll all understand. [maniacal laugh]

Sound Effects

[Knocking] Anyone home?
[Applause] Great job!
Hey! [Banging on door] Open up!

Layered or Complex Combos

[Whispering] Wait. [Nervously] Did you hear that?
We made it! [Cheering][excitedly] We actually made it!
That’s… [slowly][thoughtfully] not how I remember it.
[Calmly] Breathe. [Softly][encouraging] You’re doing great.
[Excited] You did it! [Proudly] I knew you could.
[Startled gasp] Wait! [Panicked] That wasn’t supposed to happen!
[Announcer voice] And the award goes to… [applause][excitedly] Me!
[Annoyed] It’s fine. [Through gritted teeth] Totally fine.
[Tiredly] I’ve said this a hundred times [resigned sigh] but I’ll say it again.
We could [mischievously] break the rules a little, [drawn out] riiight?

Of course, these are just a small sample of what’s possible with audio tags. You can find an even longer list of tags people have tested here. The list contains around 2,000 tags that can be sorted by category. They also display user ratings, so you can get an idea of how well they’ve worked for others.

Evan Lauer

Designer

< Back to Blog Posts

Recent Blogs

A banner that says “Free for the holidays! Enjoy access to all Premium features from December 23 to January 1!” To the right of the text are mockups of Spoken on a smartphone and a tablet, both surrounded by snowflakes. One depicts Spoken’s home screen while the other shows text in fullscreen mode saying “Happy Holidays!”

Wednesday, Dec 17, 2025

Try Spoken Premium for Free During the Holidays!

Wednesday, Nov 19, 2025

How We Leveled Up Voice, Accessibility, and Personalization In 2025

About Spoken

Spoken is an app that helps people with aphasia, nonverbal autism, and other speech and language disorders.

Our Mission