A Beginner's Guide to Building AI Agents with the Anthropic SDK

If you have ever interacted with Large Language Models (LLMs) like Claude, you know they are incredibly proficient at generating text, summarizing documents, and writing code. But what happens when you need an AI to actually do something in the real world? What if you want it to check the live weather, query your private database, perform high-precision mathematical calculations, or interact with a third-party API?

This is where AI Agents come into play.

An AI Agent is essentially an AI model equipped with a set of “tools” (functions) and an overarching logic loop that allows it to interact with its environment. Instead of simply generating an answer based solely on its static training data, an agent can pause and say, “I don’t know the current weather in Nairobi, but I have a tool that can check it. Let me run that tool, read the result, and then give you an informed answer”.

We are going to build a very simple AI agent using Python and the Anthropic SDK, and in the process, understand the fundamentals of building agentic workflows. We’ll build an agent that dynamically decides when to use a calculator and when to fetch mock live data.

Prerequisites and Development Setup

What You Need:

Python 3.9 or higher installed on your machine.
An Anthropic API Key.

Setting Up Your Environment

It is always a best practice to create a virtual environment for your Python projects to keep your dependencies organized.

# Create a new directory for your project
mkdir claude-agent
cd claude-agent

# Create a virtual environment named 'venv'
python3 -m venv venv

# Activate the virtual environment
# On macOS / Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Next, install the required libraries. We will install anthropic, which is the official Python SDK, and python-dotenv, which will help us securely manage our API keys.

pip install anthropic python-dotenv

Finally, create a file named .env in the root of your project directory and add your Anthropic API key:

ANTHROPIC_API_KEY=sk-ant-your-api-key-here...

The Core Anatomy of an AI Agent

To build an agent, you must first understand how it “thinks”. The standard workflow for an agent is often referred to as the Reasoning and Acting (ReAct) loop. Here is how Anthropic’s Claude handles it:

User Input: You prompt the model with a question.
Evaluation: Claude evaluates the question and looks at the list of tools you have provided. It decides if it needs a tool to answer the question accurately.
Tool Request: If a tool is needed, Claude stops generating text and instead outputs a structured request saying, “I need to use Tool X with Parameters Y”.
Execution: Your Python script catches this request, pauses Claude, and runs the actual local Python function using the parameters Claude provided.
Observation: Your Python script takes the output of the function and sends it back to Claude in a new message.
Final Response: Claude reads the observation, integrates the new information, and formulates a final, human-friendly response.

This loop can happen multiple times in a single query. For example, if you ask, “What is the weather in Paris, and what is the square root of the temperature?”, Claude might call the weather tool, get the result, and then call the math tool before giving you the final answer.

Making a Basic Claude Call

Let’s start by making sure we can talk to Claude without any tools. Create a file called agent.py and add the following code:

import os
from dotenv import load_dotenv
from anthropic import Anthropic

# Load the API key from the .env file
load_dotenv()

# Initialize the Anthropic client
client = Anthropic()

# Set the model we want to use
MODEL_NAME = "claude-sonnet-4-6"

def chat_with_claude(user_message):
    print(f"User: {user_message}")

    response = client.messages.create(
        model=MODEL_NAME,
        max_tokens=1024,
        messages=[
            {"role": "user", "content": user_message}
        ]
    )

    print(f"Claude: {response.content[0].text}")

# Test the function
if __name__ == "__main__":
    chat_with_claude("Hello, Claude! What is 256 times 409?")

If you run this script (python agent.py), Claude will happily multiply the numbers for you. However, LLMs are known to occasionally hallucinate math or fail at highly complex arithmetic because they generate text based on probabilities, not strict mathematical rules. This is exactly why we want to give Claude a dedicated calculator tool.

Defining Our Tools

To make our agent useful, we are going to give it two tools:

A Weather Tool to check the weather (we will use mock data for simplicity, but in the real world, you would connect this to a weather API like OpenWeatherMap).
A Math Tool to evaluate mathematical expressions safely.

Let’s define the actual Python functions that will do the heavy lifting. Add these to your agent.py file:

import ast
import operator

def get_weather(location):
    """A mock function to get the current weather."""
    location = location.lower()
    if "nairobi" in location:
        return "25°C and sunny"
    elif "london" in location:
        return "12°C and rainy"
    elif "tokyo" in location:
        return "18°C and cloudy"
    else:
        return "Weather data unavailable for this location."

def calculate_math(expression):
    """
    Safely evaluate a mathematical expression using AST parsing.
    Supports: +, -, *, /, **, %, // and unary negation.
    Rejects anything that is not a pure numeric calculation.
    """
    ALLOWED_OPS = {
        ast.Add:      operator.add,
        ast.Sub:      operator.sub,
        ast.Mult:     operator.mul,
        ast.Div:      operator.truediv,
        ast.Pow:      operator.pow,
        ast.Mod:      operator.mod,
        ast.FloorDiv: operator.floordiv,
        ast.USub:     operator.neg,
    }

    def _eval(node):
        if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
            return node.value
        elif isinstance(node, ast.BinOp):
            op = ALLOWED_OPS.get(type(node.op))
            if op is None:
                raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
            return op(_eval(node.left), _eval(node.right))
        elif isinstance(node, ast.UnaryOp):
            op = ALLOWED_OPS.get(type(node.op))
            if op is None:
                raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
            return op(_eval(node.operand))
        else:
            raise ValueError(f"Unsupported expression: {type(node).__name__}")

    try:
        tree = ast.parse(expression, mode="eval")
        result = _eval(tree.body)
        return str(result)
    except Exception as e:
        return f"Error evaluating expression: {str(e)}"

We use a safer alternative to Python’s built-in eval() because eval() can execute arbitrary code. Our version only allows basic arithmetic, so anything else raises an error instead of being executed.

Creating the Tool Schemas for Claude

Claude cannot read your Python code directly. We have to tell Claude exactly what tools are available, what they do, and what inputs they require using JSON schemas. Anthropic uses a specific format for this.

Add this list to your script:

agent_tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a specific city. Use this when the user asks about weather conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The name of the city, e.g., Nairobi, Tokyo"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "calculate_math",
        "description": "Evaluate a mathematical expression. Use this for all math-related queries to ensure accuracy.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A valid Python mathematical expression, e.g., '256 * 409' or '100 / 4'"
                }
            },
            "required": ["expression"]
        }
    }
]

Why Are the Descriptions So Detailed?

Because LLMs rely heavily on semantic context. The model decides when to call tools based on the prompt and tool descriptions. The clearer your instructions, the smarter your agent will be.

It is also worth understanding what the model actually sees. The model never sees your Python implementation — only the schema. That schema is effectively your API contract with the model. If the schema is ambiguous or incomplete, the model will make poor decisions about when and how to call the tool, regardless of how well your underlying function is written.

One practical consequence of this: small changes in tool descriptions or prompts can significantly affect whether the model chooses to call a tool at all. If your agent seems to be ignoring a tool or calling the wrong one, the schema description is the first place to look.

Building the Agentic Loop

Now comes the exciting part: building the “brain” of our agent. We need a function that manages the conversation history, checks if Claude wants to use a tool, runs the tool, and loops back.

One important thing to keep in mind before we dive in: tool usage is not deterministic. The same input may not always trigger the same tool call. The model is making a probabilistic decision each time based on the prompt, the tool descriptions, and context. This is expected behaviour, not a bug.

Let’s break down the logic of our run_agent function step-by-step.

The Message Array

In Anthropic’s Messages API, the conversation is an array of dictionaries representing the back-and-forth between the user and the assistant. We need to maintain this list so Claude remembers the context of the conversation.

Handling Tool Use Blocks

When Claude decides to use a tool, it returns a stop_reason of "tool_use". Its response content will contain a special block outlining the name of the tool and the inputs.

Note: The snippet below is a partial walkthrough to explain the logic step-by-step. A complete, runnable version is assembled at the end of the article.

def run_agent(user_query):
    print(f"\n--- New Request: {user_query} ---")

    # Initialize the conversation history
    messages = [
        {"role": "user", "content": user_query}
    ]

    # We use a while loop because the agent might need to take multiple steps.
    # MAX_ITERATIONS prevents an infinite loop if the model misbehaves.
    MAX_ITERATIONS = 10
    for _ in range(MAX_ITERATIONS):
        print("\n[Agent is thinking...]")

        # 1. Call Claude with the current messages and our tools
        response = client.messages.create(
            model=MODEL_NAME,
            max_tokens=1024,
            messages=messages,
            tools=agent_tools
        )

        # 2. Append Claude's response to the conversation history
        # It's crucial to append the raw assistant message so Claude remembers its own tool calls
        messages.append({"role": "assistant", "content": response.content})

        # 3. Check WHY Claude stopped generating text
        if response.stop_reason != "tool_use":
            # If it didn't use a tool, it means it is giving us the final answer!
            # Extract the text and break the loop.
            final_text = next(
                (block.text for block in response.content if block.type == "text"),
                "No text output."
            )
            print(f"\nFinal Answer: {final_text}")
            break

        # 4. If the stop_reason IS "tool_use", we need to execute the tool(s).
        # When multiple tools are requested, Claude returns multiple tool_use blocks
        # in a single response — we iterate over all of them before replying.
        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                tool_name = block.name
                tool_inputs = block.input
                tool_use_id = block.id

                print(f"[Action Needed] Claude wants to use: {tool_name} with inputs: {tool_inputs}")

                # In production, wrap tool execution in validation and error handling
                # to prevent malformed inputs or runtime failures from breaking the loop.
                try:
                    if tool_name == "get_weather":
                        result = get_weather(tool_inputs["location"])
                    elif tool_name == "calculate_math":
                        result = calculate_math(tool_inputs["expression"])
                    else:
                        result = f"Error: Unknown tool {tool_name}"
                except Exception as e:
                    result = f"Error executing tool {tool_name}: {str(e)}"

                print(f"[Tool Result] The system returned: {result}")

                # Format the result exactly how Anthropic expects it
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_use_id,
                    "content": result
                })

        # 5. Send the tool results back to Claude
        # We append a new "user" message containing the tool results
        messages.append({
            "role": "user",
            "content": tool_results
        })

        # The loop will now restart, and Claude will read the tool results!

Why Do We Append to the Messages List?

If you don’t save the conversation history and pass it back, Claude will suffer from amnesia. It needs to see:

The original question.
The fact that it asked to use a tool.
The result of that tool, which your application sends back in a new user message as a tool_result block.

Assembling the Final Application

At this point, each piece is simple — but the power comes from how they work together. The tool functions handle execution, the schemas tell the model what’s available, and the loop ties them into a system that can reason across multiple steps. Let’s put all the pieces together into a clean, running script. You can replace the content of your agent.py file with the final code block below.

import os
import ast
import operator
from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()
client = Anthropic()
MODEL_NAME = "claude-sonnet-4-6"

# --- 1. Python Tool Implementations ---

def get_weather(location):
    location = location.lower()
    if "nairobi" in location:
        return "25°C and sunny"
    elif "london" in location:
        return "12°C and rainy"
    elif "tokyo" in location:
        return "18°C and cloudy"
    else:
        return "Weather data unavailable for this location."

def calculate_math(expression):
    """Safely evaluate a mathematical expression using AST parsing."""
    ALLOWED_OPS = {
        ast.Add:      operator.add,
        ast.Sub:      operator.sub,
        ast.Mult:     operator.mul,
        ast.Div:      operator.truediv,
        ast.Pow:      operator.pow,
        ast.Mod:      operator.mod,
        ast.FloorDiv: operator.floordiv,
        ast.USub:     operator.neg,
    }

    def _eval(node):
        if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
            return node.value
        elif isinstance(node, ast.BinOp):
            op = ALLOWED_OPS.get(type(node.op))
            if op is None:
                raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
            return op(_eval(node.left), _eval(node.right))
        elif isinstance(node, ast.UnaryOp):
            op = ALLOWED_OPS.get(type(node.op))
            if op is None:
                raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
            return op(_eval(node.operand))
        else:
            raise ValueError(f"Unsupported expression: {type(node).__name__}")

    try:
        tree = ast.parse(expression, mode="eval")
        result = _eval(tree.body)
        return str(result)
    except Exception as e:
        return f"Error evaluating expression: {str(e)}"

# --- 2. Tool JSON Schemas ---

agent_tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a specific city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "calculate_math",
        "description": "Evaluate a mathematical expression. Use for all math queries.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Python math expression"}
            },
            "required": ["expression"]
        }
    }
]

# --- 3. The Agent Loop ---

def run_agent(user_query):
    print(f"\n==============================================")
    print(f"USER QUERY: {user_query}")
    print(f"==============================================")

    messages = [{"role": "user", "content": user_query}]

    MAX_ITERATIONS = 10
    for _ in range(MAX_ITERATIONS):
        response = client.messages.create(
            model=MODEL_NAME,
            max_tokens=1024,
            messages=messages,
            tools=agent_tools
        )

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            final_text = next((block.text for block in response.content if block.type == "text"), "")
            print(f"\n🤖 FINAL ANSWER:\n{final_text}")
            break

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                print(f"⚙️ EXECUTING TOOL: {block.name} | Inputs: {block.input}")

                try:
                    if block.name == "get_weather":
                        result = get_weather(block.input["location"])
                    elif block.name == "calculate_math":
                        result = calculate_math(block.input["expression"])
                    else:
                        result = f"Unknown tool: {block.name}"
                except Exception as e:
                    result = f"Error executing tool {block.name}: {str(e)}"

                print(f"✅ TOOL RESULT: {result}")

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        messages.append({"role": "user", "content": tool_results})

# --- 4. Testing the Agent ---

if __name__ == "__main__":
    # Test 1: A simple conversational query (No tools needed)
    run_agent("Hi, my name is Joyce. How are you?")

    # Test 2: A query requiring one tool
    run_agent("What is the weather like in Tokyo right now?")

    # Test 3: A complex query requiring multiple tool calls in sequence
    run_agent("What is the weather in Nairobi? Also, if I multiply 88 by 42, what is the result?")

Analyzing the Output

When you run the code, you will see output similar to:

==============================================
USER QUERY: Hi, my name is Joyce. How are you?
==============================================

🤖 FINAL ANSWER:
Hi Joyce! I'm doing great, thanks for asking! I'm here and ready to help you with whatever you need. Whether you have questions, need some calculations done, or want to check the weather somewhere, just let me know! 😊 How can I assist you today?

==============================================
USER QUERY: What is the weather like in Tokyo right now?
==============================================
⚙️ EXECUTING TOOL: get_weather | Inputs: {'location': 'Tokyo'}
✅ TOOL RESULT: 18°C and cloudy

🤖 FINAL ANSWER:
The current weather in **Tokyo** is **18°C and cloudy**. It's a mild temperature, but you might want to have a light jacket handy given the overcast skies! 🌥️

==============================================
USER QUERY: What is the weather in Nairobi? Also, if I multiply 88 by 42, what is the result?
==============================================
⚙️ EXECUTING TOOL: get_weather | Inputs: {'location': 'Nairobi'}
✅ TOOL RESULT: 25°C and sunny
⚙️ EXECUTING TOOL: calculate_math | Inputs: {'expression': '88 * 42'}
✅ TOOL RESULT: 3696

🤖 FINAL ANSWER:
Here are your results:

1. 🌤️ **Weather in Nairobi:** It's currently **25°C and sunny** — a beautiful day!
2. 🔢 **88 × 42 = 3,696**

Let me know if you need anything else!

When Test 3 is run, notice how Claude recognized it needed both tools, formulated the inputs perfectly, paused for the script to evaluate them, and then synthesized the two distinct pieces of data into a smooth, conversational answer. This is the magic of agentic behavior.

Common Pitfalls for Beginners

Forgetting to append the assistant’s tool-use message: If you only append the user’s tool_result and forget to save Claude’s original tool request (response.content), the API will throw an error. The conversation must follow a logical user -> assistant -> user order.
Poor Tool Descriptions: If your agent is calling the wrong tool, the issue is almost always the description in the JSON schema. Be explicit. Instead of “Does math”, write “Evaluates numerical equations and arithmetic; strictly use this for numbers”.
Data Privacy: Be mindful of what you pass to third-party tools or external databases. Even with the AST-based approach used here, never blindly trust user input — validate inputs before passing them to any function that interacts with real systems.
This agent is not production-ready: The script above is intentionally minimal. Production agents typically add structured logging, retry logic on API failures, and guardrails that validate tool inputs and outputs before they re-enter the loop. Treat what we have built as a foundation, not a finished system.

Conclusion and Next Steps

We have successfully built an AI agent using Python and the Anthropic SDK. Albeit, it is a simple one and not very useful in the real world, but it was enough to demonstrate the fundamentals of building an AI agent. We moved beyond simple text generation by a model and created a system that can “think” about a problem, interact with Python functions, and observe the results before formulating an answer.

The iteration-bounded loop we built is the foundational architecture of almost all AI agent systems.

Where to go from here?

Connect Real APIs: Replace the mock weather function with the OpenWeather API or the GitHub API to fetch real live data.
Retrieval-Augmented Generation (RAG): Create a tool that searches through a local PDF or database to answer questions about proprietary documents.
State Management: Right now, the agent resets after every run_agent() call because messages is initialized fresh inside the function. To make the agent remember previous turns, lift messages out of the function into a shared scope and pass it in as a parameter, then wrap your test calls in a command-line loop (while True: user_input = input("You: ")).

The shift from chatbots to agents is one of the most exciting paradigms in software development. We now have the tools to build systems that don’t just talk, but act.