sanatan_ai / SKILLS.md
vikramvasudevan's picture
Upload folder using huggingface_hub
ce6c013 verified

A newer version of the Gradio SDK is available: 6.15.1

Upgrade

BhashyamAI Project Skills & Development Guidelines

This document outlines the operational patterns and developer skills for the BhashyamAI Research Portal.

πŸ—οΈ Architecture & Stack

  • Backend: FastAPI (Python). Serves static production frontend and handles Graph/Vector search.
  • Frontend: React (TypeScript/Vite/Tailwind).
  • Database: ArcadeDB (Graph/FTS) and ChromaDB (Vector).
  • Schema: See DB_DESIGN.md for the authoritative graph structure and canonical metadata schema.

πŸ” Search & Research Logic

1. Hybrid Search Paradigm

  • Precision FTS: Combined with vector search for optimal relevance.
  • Node-Based Graph Search: Prioritize specialized traversal tools for high-efficiency entity discovery:
    • search_by_topic, search_by_character, search_by_author, search_by_location.
  • Smart Disambiguation: Use check_entity_type(name) to verify entity types (Character vs. Topic) before executing a search.
  • Robust Randomization: Set is_random=True in search tools to offload sampling to the database.

2. Scholarly Comparative Analysis

  • System Prompt: Dynamically generated to guide LLM handling.
  • Entity Linking: Use entity:// protocol for citations.
  • Cross-Scripture Search: Utilize POST /api/search/entity for paginated discovery across the entire graph.

3. Data-Driven Hierarchical Search

  • Dynamic Discovery: Hierarchy prefixes and scripture metadata discovered via SanatanConfig.
  • Regex-based Filtering: hierarchical_path filters are permissive regex.
  • Consistent Ordering: All retrieval methods strictly ordered by v._global_index ASC.
  • Graph-based TOC: Uses hierarchical graph traversal (DB_DESIGN.md).

πŸ› οΈ Development Practices

  • Cypher Centralization: Always use modules/db/cypher_templates.py for queries.
  • Observability: Decorate all tool functions with @log_tool_entry (in modules/db/logger_utils.py) to capture invocations and tracebacks.
  • Standardization First: Always canonicalize documents using SanatanConfig().canonicalize_document before returning API responses.
  • API Speed: Use include_transliteration=False in canonicalize_document for high-volume or lean API endpoints.

πŸ“‹ Standard Workflow

  1. Frontend Changes: Update code in frontend/src.
  2. Build: Run npm run build in the frontend/ directory.
  3. Backend Changes: Add/modify endpoints in server.py or modules.
  4. Validation:
    • uv run python tests/test_api_endpoints.py
    • uv run python tests/debug_...py (Graph consistency checks).
  5. Serving: Ensure production build is served by FastAPI.