An Agentic Navigation Framework Utilizing Vision-Language Models

Designed and implemented a Vision-Language-based navigation agent leveraging Qwen 2.5-VL (7B) as a cognitive “main brain” with structured memory and reflective reasoning. Engineered custom system + user prompting strategies to enable contextual planning and semantic understanding for embodied tasks. Integrated and evaluated the framework within Habitat-Lab and Room-to-Room (R2R) environments, demonstrating improved instruction following and environment grounding.

Project Report PDF here