banner
Building AI agents
A whole infrastructure of my personal assistants
#️⃣   ⌛  ~50 min 🤓  Intermediate
02.08.2025
upd:
#171

views-badgeviews-badge
banner
Building AI agents
A whole infrastructure of my personal assistants
⌛  ~50 min
#171


🎓 168/2

This post is a part of the AI web agents educational series from my free course. Please keep in mind that the correct sequence of posts is outlined on the course page, while it can be arbitrary in Research.

I'm also happy to announce that I've started working on standalone paid courses, so you could support my work and get cheap educational material. These courses will be of completely different quality, with more theoretical depth and niche focus, and will feature challenging projects, quizzes, exercises, video lectures and supplementary stuff. Stay tuned!


The previous post, "AI web agents", was about theoretical foundations and design principles of intelligent agents operating in web environments. That article focused on terminology and core concepts.

This time, let's build something useful. I have many things to automate, which aren't suitable for explicit instructions.


"Do I need AI here?"

Just an instant use case: automating interactions with people who text me. I receive a lot of messages that need to be reviewed, pondered, and replied to. Instead, I can:

  1. Consolidate all message sources into one so I can quickly review them one after another.
  2. Use LLM to selectively respond to those requiring my involvement but not creativity.
  3. Use information from search queries and websites that contain the necessary, regularly updating content needed for such responses.

The last point is where web agents may come into play.

What if I move from well-familliar retrieval-augmented generation to an agent that browses web pages, collects information, and uses it in responses?

RAG implies that we update the internal database (information about me, my projects, current work, etc.) periodically, either manually or by explicitly scraping all the necessary web pages.

Assuming that we need non-primitive generations (i.e., not clichéd one- or two-sentence responses), the question is: which is more resource-efficient — explicit web scraping or a web agent?

I won't keep you in suspense: the main factor here is the LLM context window. Scraping would require us to take additional steps to clean up unnecessary data from everything that was collected. Although the web agent consumes tokens, it makes collection a very precise operation: it only accesses the sources it needs and, moreover, allows us to detect data that we cannot explicitly specify as a target for scraping. This may not be particularly significant with small amounts of data, but as the context grows, the difference will become noticeable.

We go further: a web agent is capable of accessing knowledge that isn't part of its training data. In other words, if a person asks us a question, the agent can refer to Google to generate a more concise answer.

What if I also tell you that the agent is able to perform any online transactions on my behalf, as long as I give my consent? I'll be empowered to avoid spending time on numerous websites; instead, I'll simply ask my own assistant to handle everything for me.

Here, we're starting developing something simple, but eventually I'd like to design a whole infrastructure of personal AI agents. Each one will have its own set of responsibilities, and they'll all be connected to each other.

We can and will be able to do a great deal with this technology in the future. This is just the beginning of a new era of automation.


Answering agent

Let's build a prototype starting from the central orchestrator.


class MessageManager:
    def __init__(self, db_session: Session):
        self.db = db_session
        self.ai_agent = AIAgent(db_session)
      
    def add_message(self, platform: str, sender: str, content: str) -> Message:
        try:
            # detect message type automatically
            message_type = self._detect_message_type(content, sender, platform)
            message = Message(
                platform=platform,
                sender=sender,
                content=content,
                status="pending",
                message_type=message_type
            )
            self.db.add(message)
            self.db.commit()
            return message
        except Exception as e:
            self.db.rollback()
            logger.error(f"Error adding message: {e}")
            raise

Agent itself is a kind of enhanced OpenAI integration, for now:


class AIAgent:
    def __init__(self, db_session: Session = None):
        self.model = settings.OPENAI_MODEL
        self.max_length = settings.MAX_RESPONSE_LENGTH
        self.style = settings.RESPONSE_STYLE
        
        # web search configuration
        self.google_api_key = settings.GOOGLE_SEARCH_API_KEY
        self.google_engine_id = settings.GOOGLE_SEARCH_ENGINE_ID
        self.personal_website = settings.PERSONAL_WEBSITE
        self.github_profile = settings.GITHUB_PROFILE
        
        # message type patterns for classification
        self.type_patterns = {
            'business': {
                'keywords': ['meeting', 'call', 'schedule', 'proposal', 'contract'],
                'patterns': [r'(meeting|call|schedule)', r'(proposal|contract)']
            },
            'personal': {
                'keywords': ['friend', 'family', 'personal', 'hobby', 'weekend'],
                'patterns': [r'(friend|family|personal)', r'(hobby|weekend)']
            },
            'support': {
                'keywords': ['help', 'support', 'issue', 'problem', 'error'],
                'patterns': [r'(help|support|assistance)', r'(issue|problem)']
            }
        }

...

kofi_logopaypal_logopatreon_logobtc-logobnb-logoeth-logo
kofi_logopaypal_logopatreon_logobtc-logobnb-logoeth-logo