0% found this document useful (0 votes)
12 views6 pages

Vapi Glossary for Voice AI Terms

VAPI definitions

Uploaded by

Michael Puscar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Vapi Glossary for Voice AI Terms

VAPI definitions

Uploaded by

Michael Puscar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Glossary Definitions

Glossary

Definitions
Useful terms and definitions for Vapi & voice AI applications.

A
At-cost
“At-cost” is often use when discussing pricing. It means “without profit to the seller”. Vapi
charges at-cost for requests made to STT, LLM, & TTS providers.

B
Backchanneling
A backchannel occurs when a listener provides verbal or non-verbal feedback to a
speaker during a conversation. Ask AI

Examples of backchanneling in English include such expressions as “yeah”, “OK”, “uh-huh”,


“hmm”, “right”, and “I see”.

This feedback is often not semantically significant to the conversation, but rather serves
to signify the listener’s attention, understanding, sympathy, or agreement.

E
Endpointing
See speech endpointing.

I
Inbound Call
This is a call received by an assistant from another phone number (w/ the assistant being
the “person” answering). The call comes “in”-ward to a number (from an external caller)
— hence the term “inbound call”.

Inference
You may often hear the term “run inference” when referring to running a large language
model against an input prompt to receive text output back out.

The process of running a prompt against an LLM for output is called “inference”.

L
Large Language Model
Large Language Models (or “LLM”, for short) are machine learning models trained on large
amounts of text, & later used to generate text in a probabilistic manner, “token-by-token”.

For further reading see large language model wiki.

LLM
See Large Language Model.
O
Outbound Call
This is a call made by an assistant to another target phone number (w/ the assistant
being the “person” dialing). The call goes “out”-ward to another number — hence the
term “outbound call”.

S
Server URL
A “server url” is an endpoint you expose to Vapi to receive conversation data in real-time.
Server urls can reply with meaningful responses, distinguishing them from traditional
webhooks.

See our server url guide to learn more.

SDK
Stands for “Software Development Kit” — these are pre-packaged libraries & platform-
specific building tools that a software publisher creates to expedite & increase the ease
of integration for developers.

Speech Endpointing
Speech endpointing is the process of detecting the start and end of (a line of) speech in
an audio signal. This is an important function in conversation turn detection.

A starting heuristic for the end of a user’s speech is the detection of silence. If someone
does not speak for a certain amount of milliseconds, the utterance can be considered
complete.
A more robust & ideal approach is to actually understand what the user is saying (as well
as the current conversation’s state & the speech turn’s intent) to determine if the user is
just pausing for effect, or actually finished speaking.

Vapi uses a combination of silence detection and machine learning models to properly
endpoint conversation speech (to prevent improper interruption & encourage proper
backchanneling).

Additional reading on speech endpointing can be found here & on Deepgram’s docs.

STT
An abbreviation used for “Speech-to-text”. The process of converting physical sound
waves into raw transcript text (a process called “transcription”).

T
Telemarketing Sales Rule
The Telemarketing Sales Rule (or “TSR” for short) is a regulation established by the
Federal Trade Commission (ftc.gov) in the United States to protect consumers from
deceptive and abusive telemarketing practices.

You may only conduct outbound calls to phone numbers which you have consent to
contact. Violating TSR rules can result in significant civil (or even criminal) penalties.

Learn more on the FCC website.

TTS
An abbreviation used for “Text-to-speech”. The process of converting raw text into
playable audio data.
V
Voice-to-Voice
“Voice-to-voice” is often a term brought up in discussing voice AI system latency — the
time it takes to go from a user finishing their speech (however that endpoint is computed)
→ to the AI agent’s first speech chunk/byte being played back on a client’s device.

Ideally, this process should happen in <1s, better if closer to 500-700ms (responding too
quickly can be an issue as well). Voice AI applications must closely watch this metric to
ensure their applications stay responsive & usable.

W
Webhook
A webhook is a server endpoint you expose to external services with the intention of
receiving external data in real-time. Your exposed URL is essentially a “drop-bin” for data
to come in from external providers to update & inform your systems.

Traditionally, webhooks are unidirectional & stateless. Endpoints only reply with status
code to signal acknowledgement.

To make the distinction clear, Vapi calls these ”server urls”. Certain requests made to your
server (like assistant requests) require a reply with meaningful data.

Was this page helpful? Yes No

Suggest edits Raise issue

Prompting Guide FAQ


Powered by Mintlify

You might also like