Project Overview
Semantic Geospatial Agent Engineered as an initial work for a full-stack search engine to solve data discovery challenges within the OS Open Rivers topology. The solution utilises PostGIS for coordinate transformation and efficient bounding-box visualisation of the complete 190,000-link network. Simultaneously, it integrates a Python-based Transformer model to enable semantic querying on a randomly sampled dataset, validating the feasibility of vector search for hydrological features without the overhead of full-scale indexing in a development environment.
- Scalability: The current in-memory vector search (O(N) complexity) is fast for 200k rows, but will degrade linearly. Implementing HNSW (Hierarchical Navigable Small World) indexing via pgvector would reduce search complexity to O(logN).
- To handle O(N) complexity on a local development machine, the current semantic index is intentionally restricted to a representative sample of 1,000 records, ensuring real-time responsiveness during the PoC phase.
- Containerization: The dependency on a running local Python instance makes deployment fragile. Wrapping both the API and the Python service in Docker containers (via docker-compose) is a logical next step.
- Search Robustness: Currently, the system relies purely on semantic similarity. A Hybrid Search would offer the best of both worlds, catching exact name matches that the semantic model might sometimes overlook.