GeoVLMs - Earth Vision AI Alliance

Overview

GeoVLMs represents our comprehensive research initiative in developing advanced vision-language models specifically designed for geospatial and remote sensing applications. This project encompasses cutting-edge approaches to enable AI systems to understand, reason about, and interact with Earth observation data through natural language interfaces and multi-modal understanding.

Our work focuses on bridging the gap between computer vision, natural language processing, and geospatial intelligence, enabling more intuitive and powerful tools for analyzing satellite imagery, understanding environmental changes, and supporting decision-making in Earth observation tasks.

Research Projects

EarthDial

GitHub Paper

EarthDial transforms multi-sensory Earth observations into interactive dialogues, enabling natural language interactions with satellite imagery and remote sensing data. The system allows users to query, analyze, and understand complex geospatial information through conversational interfaces.

[CVPR 2025] Soni, Sagar, et al. "Earthdial: Turning multi-sensory earth observations to interactive dialogues." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025.

GeoVLM-R1

Project Page arXiv

GeoVLM-R1 is a reinforcement learning framework that enhances vision-language models' reasoning capabilities for Earth observation tasks. The system is designed with flexibility, scalability, and ease of experimentation in mind, enabling advanced reasoning in diverse remote sensing scenarios.

[arXiv 2025] Fiaz, Mustansar, et al. "Geovlm-r1: Reinforcement fine-tuning for improved remote sensing reasoning." arXiv preprint arXiv:2509.25026 (2025).