Published on December 15, 2024- views

Egyptian AI Lens: Architecture and Design of an LLM-Powered Art Analysis System

Introduction

The Egyptian AI Lens is a web application that leverages Google's Gemini vision model to analyze ancient Egyptian art. Users can upload images of tomb paintings, temple reliefs, or hieroglyphic inscriptions to receive detailed analysis including character identification, historical context, and location insights.

This blog post provides a comprehensive technical overview of the system's architecture, design decisions, and implementation details. From frontend-backend separation to advanced prompt engineering with structured outputs, we'll explore how modern LLM APIs can be effectively integrated into production web applications.

🏺 Try it live: Egyptian AI Lens

System Architecture Overview

The Egyptian AI Lens follows a clean frontend-backend separation architecture, designed for scalability and maintainability:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Next.js       │    │   Vercel         │    │   Google        │
│   Frontend      │◄──►│   Python         │◄──►│   Gemini API    │
│   (TypeScript)  │    │   Serverless     │    │   (Vision)      │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Frontend: Next.js with TypeScript

The frontend is built using Next.js 14 with TypeScript, providing:

Server-side rendering for optimal SEO and performance
Responsive UI components built with Tailwind CSS
File upload handling with drag-and-drop support
Real-time progress tracking during analysis
Dark/light mode compatibility

Key frontend features include:

Sample image gallery with vacation photos from Egypt
Customizable analysis settings (model speed, image type hints)
Structured result display with detailed character information
Error handling with comprehensive debugging information

Backend: Python on Vercel Serverless

The backend runs as a Vercel Python serverless function, offering:

Zero-cost hosting for low-traffic personal projects
Automatic scaling based on demand
Fast cold start times (~500ms)
Integrated deployment with the frontend

Deep Dive: Gemini API Integration

Model Selection Strategy

One of the key architectural decisions was implementing dynamic model selection based on user preferences:

def get_model_by_speed(speed: str) -> str:
    model_mapping = {
        'regular': 'gemini-2.5-pro',        # Most thorough
        'fast': 'gemini-2.5-flash',         # Balanced (default)
        'super-fast': 'gemini-2.5-flash-lite'  # Fastest
    }
    return model_mapping.get(speed, 'gemini-2.5-flash')

This approach allows users to trade-off between analysis quality and response time:

Regular (~15-30s): Most detailed analysis using the flagship model
Fast (~5-10s): Balanced performance, recommended for most users
Super Fast (~2-5s): Quick analysis for instant feedback

Prompt Engineering with Context Injection

The system uses context-aware prompting by injecting user-provided hints about the image type:

def create_egyptian_art_prompt(image_type_hint: str) -> str:
    base_prompt = """Analyze this ancient Egyptian art image and provide detailed analysis..."""
    
    if image_type_hint != 'unknown':
        context_hints = {
            'tomb': "This appears to be from a tomb or burial site...",
            'temple': "This appears to be from a temple complex...", 
            'other': "This appears to be other Egyptian artwork..."
        }
        base_prompt += f"\n\nContext hint: {context_hints[image_type_hint]}"
    
    return base_prompt

This contextual prompting significantly improves accuracy by:

Focusing the model's attention on relevant historical periods
Reducing hallucinations through targeted context
Improving character identification with location-specific knowledge

Structured Output Implementation

A critical innovation in this project is the use of Pydantic schemas to constrain Gemini's responses:

from pydantic import BaseModel
from typing import List
 
class Character(BaseModel):
    character_name: str
    reasoning: str  
    description: str
    location: str
 
class EgyptianArtAnalysis(BaseModel):
    ancient_text_translation: str
    characters: List[Character]
    location_guess: str
    interesting_detail: str
    historical_date: str

This approach provides several key benefits:

Consistent Response Format: Eliminates parsing errors from inconsistent JSON
Reduced Hallucinations: Structured fields guide the model's responses
Type Safety: Automatic validation of response data types
Better UX: Predictable data structure enables rich frontend displays

Retry Logic and Error Handling

Production systems require robust error handling. The Egyptian AI Lens implements exponential backoff retry logic:

async def analyze_with_retry(image_data: str, max_retries: int = 2):
    for attempt in range(max_retries + 1):
        try:
            result = await call_gemini_api(image_data)
            return result
        except Exception as e:
            if attempt < max_retries and is_retryable_error(e):
                wait_time = (2 ** attempt) * 1.0  # Exponential backoff
                await asyncio.sleep(wait_time)
                continue
            raise e

This handles:

5xx server errors from the Gemini API
Rate limiting during high-traffic periods
Temporary network issues
Service unavailability

Hosting and Deployment

Current Architecture: Vercel Integration

The current deployment leverages Vercel's integrated Python support:

Advantages:

Zero configuration deployment from Git
Automatic HTTPS and CDN distribution
Built-in monitoring and analytics
Free tier suitable for personal projects
Seamless frontend-backend integration

Local Development Setup:

# Frontend and API routes
npm run dev
 
# Python functions work automatically
# No separate backend server needed

Performance Characteristics

Current system performance metrics:

Model Speed	Avg Response Time	Cold Start	Cost per Request
Regular	15-30s	+500ms	~$0.02
Fast	5-10s	+500ms	~$0.008
Super Fast	2-5s	+500ms	~$0.003

Next Steps: Migration to AWS Lambda

While Vercel provides excellent developer experience for prototyping, AWS Lambda offers advantages for production scaling:

Proposed AWS Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Vercel        │    │   AWS Lambda     │    │   Google        │
│   Frontend      │◄──►│   Python         │◄──►│   Gemini API    │
│   (Static)      │    │   + API Gateway  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Benefits of AWS Lambda Migration

Better Performance Control:
- Configurable memory/CPU allocation
- Provisioned concurrency to eliminate cold starts
- VPC integration for enhanced security
Advanced Monitoring:
- CloudWatch Logs for detailed debugging
- X-Ray tracing for performance analysis
- Custom metrics and alerting
Cost Optimization:
- Pay-per-millisecond billing
- Reserved capacity for predictable workloads
- Multi-region deployment for global users
Enhanced Scalability:
- Higher timeout limits (15 minutes vs 10 seconds)
- Larger payload sizes for high-resolution images
- Concurrent execution scaling

Migration Strategy

Phase 1: Infrastructure Setup

# Terraform/CDK infrastructure
aws lambda create-function \
  --function-name egyptian-ai-lens \
  --runtime python3.11 \
  --memory-size 1024 \
  --timeout 60

Phase 2: Code Adaptation

Environment variable migration for API keys
Response format standardization
Error handling enhancement for AWS-specific errors
Logging integration with CloudWatch

Phase 3: Performance Optimization

Container images for faster startup times
Connection pooling for Gemini API calls
Response caching for repeated analysis requests

Technical Lessons Learned

1. Structured Outputs Are Game-Changing

The single most impactful technical decision was implementing Pydantic-enforced structured outputs. This eliminated ~90% of parsing errors and dramatically improved response quality.

2. Context Injection Improves Accuracy

Allowing users to provide image type hints (tomb, temple, other) significantly improved model accuracy by focusing attention on relevant historical contexts.

3. Model Speed Options Enhance UX

Offering multiple speed tiers provides users control over the speed-accuracy tradeoff, essential for interactive applications.

4. Error Handling is Critical

Robust retry logic and error recovery transforms a demo into a production-ready system. The Gemini API can be temperamental, making retry logic essential.

Conclusion

The Egyptian AI Lens demonstrates how modern LLM APIs can be effectively integrated into production web applications. Key architectural decisions include:

Clean frontend-backend separation for maintainability
Structured outputs to reduce hallucinations
Dynamic model selection for user control
Context injection for improved accuracy
Robust error handling for production reliability

The current Vercel-based architecture provides excellent developer experience and zero-cost hosting for personal projects. The planned AWS Lambda migration will unlock enhanced performance, monitoring, and scalability for production use.

Try the system: Egyptian AI Lens
Source code: Available in the blog repository

The intersection of computer vision, large language models, and archaeology opens fascinating possibilities for making historical knowledge more accessible. This project serves as a blueprint for similar applications across other domains.

See all posts