I Built an AI to Replace My Voicemail (And You Can Too)

Build an AI Voicemail Agent | Full AI Avatar Tutorial (LiveKit + n8n + Bey)

Voicemail hasn't changed in decades. It’s a black hole where spam calls go to die and important messages get lost in a list of audio files you don’t want to listen to.

Today, we’re going to fix that.

In this tutorial, I’m going to show you how I built a Smart AI Voice Agent to completely replace my voicemail. This isn’t just a bot that records audio—it’s an intelligent receptionist that screens spam, interviews callers, logs everything to a database, and even debriefs me via a 3D avatar in my browser.

Here is the full breakdown of how to build it using Python, LiveKit, OpenAI, and n8n.

The Architecture: How It Works

Before we write code, let’s look at the logic flow. Our agent needs to handle two very different types of interactions:

  1. The Gatekeeper (Phone Mode): When a call comes in, the agent acts as a strict bouncer. It analyzes the caller’s intent in real-time. If it hears "car warranty," it hangs up instantly. If it’s a friend, it takes a message.

  2. The Chief of Staff (Web Mode): When I connect via my browser, the agent switches personas. It becomes a helpful assistant that reads from our database and gives me a verbal summary of who called.

The system relies on this stack:

  • LiveKit: Handles the telephony (SIP) and real-time audio.

  • OpenAI Realtime API: Provides the intelligence and voice.

  • n8n & Google Sheets: Acts as our backend memory and database.

  • Beyond Presence: Provides the 3D Avatar for the web interface.

Phase 1: The Setup

First, we need a development environment. We use uv for fast Python package management.

Bash

brew install uv python
uv init livekit-voice-agent
uv add livekit-agents livekit-plugins-openai livekit-plugins-beyond

We create a simple agent.py file to get a basic "Hello World" bot running. At this stage, it’s generic—it listens and speaks, but it doesn't know who you are.

Phase 2: Connecting to the Telephone Network

To make this a real voicemail replacement, we need a phone number. LiveKit allows us to provision a dedicated SIP number directly from the command line:

Bash

lk number search --country-code US --area-code 650
lk number purchase --numbers +1650xxxxxxx

Once purchased, we set up a Dispatch Rule in the LiveKit dashboard. This tells LiveKit: "When this number rings, route the call to my Python agent."

Crucially, we configure the room name to start with call-. This allows us to add Context Awareness to our code:

Python

# In agent.py
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
is_phone = ctx.room.name.startswith("call-")

if is_phone:
    greeting = "Hello, this is James's AI. Who is calling?"
else:
    greeting = "Welcome back, James."

Now, the agent knows exactly how to greet us based on how we connect.

Phase 3: The Gatekeeper (Spam Blocking)

This is where the AI gets its power. We give the agent a Function Tool called hangup_call. This tool allows the LLM to physically terminate the connection.

We update the system prompt with strict instructions: "If the caller mentions car warranty, insurance offers, or debt relief, immediately call the hangup tool."

Python

@function_tool()
async def hangup_call(run_ctx: RunContext, is_spam: bool = False):
    if is_spam:
        message = "I'm not interested. Please remove this number. Goodbye."
    else:
        message = "Thanks for calling. I'll let James know."
    
    await run_ctx.session.generate_reply(instructions=message)
    await run_ctx.wait_for_playout() # Crucial: Wait for audio to finish!
    
    await ctx.api.room.delete_room(...) # Kill the call

Now, when a spammer calls, they get rejected instantly.

Phase 4: The Memory (n8n & Google Sheets)

Hanging up is great, but we need to save the legitimate messages. Since the agent’s memory is wiped when the call ends, we need to ship the transcript to a database before we delete the room.

I used n8n (a workflow automation tool) to build a simple pipeline:

  1. Webhook (POST): Receives the transcript JSON from our Python code.

  2. AI Summary: OpenAI summarizes the call (e.g., "Mark wants lunch Tuesday").

  3. Google Sheets: Appends a new row with the summary, timestamp, and "Spam Risk" score.

In our Python code, we simply add a helper function send_transcript_to_n8n() that fires right before the hangup_call tool finishes execution.

Phase 5: The Interface (Avatar & Retrieval)

Now that our data is in Google Sheets, we need a way to read it.

I built a second n8n workflow that GETs the recent rows from Sheets and returns them as text. We give our agent a new tool: get_call_debrief.

When I log into the web interface, the agent switches to "Chief of Staff" mode. I ask, "What did I miss?" and it queries the database to read back my messages.

To make this interaction feel futuristic, we added a Beyond Presence 3D avatar. This adds a visual face to the voice, making the "debrief" feel like a real face-to-face meeting.

The Deployment: The *71 Trick

At this point, the code works, but it’s just a sandbox. To make it my actual voicemail, I had to perform one final step: Conditional Call Forwarding.

On Verizon (my carrier), I dialed *71 followed by my new LiveKit phone number.

Plaintext

Dialing: *71 + 1 (650) 555-0199 ... Success.

Now, whenever I decline a call on my personal iPhone, my carrier forwards the call to my AI agent. The AI picks up, screens the spam, logs the message, and sends me the summary.

Conclusion

We successfully built a full-stack communication platform in under an hour. We have edge-based spam blocking, a persistent database, and a multimodal web interface.

If you want to build this yourself, all the code and n8n workflow files are available in the GitHub repository linked below.

Resources:

Thanks for reading! If you enjoyed this build, check out the full video tutorial on YouTube - https://www.youtube.com/watch?v=xflh36L6InY