Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories
Abstract
We present Agentic Aerial Cinematography: From Dialogue Cues to Cinematic Trajectories (ACDC), an autonomous drone cinematography system driven by natural language communication between human directors and drones. The main limitation of previous drone cinematography workflows is that they require manual selection of waypoints and view angles based on predefined human intent, which is labor-intensive and yields inconsistent performance. In this paper, we propose employing large language models (LLMs) and vision foundation models (VFMs) to convert free-form natural language prompts directly into executable indoor UAV video tours. Specifically, our method comprises a vision-language retrieval pipeline for initial waypoint selection, a preference-based Bayesian optimization framework that refines poses using aesthetic feedback, and a motion planner that generates safe quadrotor trajectories. We validate ACDC through both simulation and hardware-in-the-loop experiments, demonstrating that it robustly produces professional-quality footage across diverse indoor scenes without requiring expertise in robotics or cinematography. These results highlight the potential of embodied AI agents to close the loop from open-vocabulary dialogue to real-world autonomous aerial cinematography.
Methods
Given a natural-language prompt, an exploratory video, and a photorealistic 3D reconstruction, ACDC follows a three-stage approach:
-
Waypoint Retrieval: Retrieves and orders initial waypoints via vision–language similarity
-
Pose Refinement: Refines each pose with preference-based Bayesian optimization
-
Trajectory Generation: Generates a smooth, collision-free, dynamically feasible quadrotor trajectory for execution
Results
Can you give an incoming Airbnb guest a detailed walkthrough of this house.
Create a drone tour that highlights the modern design of the cottage, emphasize the contemporary atmosphere.
Can you create a complete tour of this house, emphasizing a cozy and comfortable vibe.
Start from the entrance do these sequentially. 1 : go forward to shoot the dining table 2 : go to living room, circle around 3 : go to bedroom and its washroom 4 : go to kitchen and tour breakfast nook 5 : go to the second washroom.
Create a drone tour for our customers to show the comfortable and welcoming atmosphere of our restaurant's eating environment.
Create a virtual drone tour for a food safety inspector, showing that our restaurant maintains a highly sanitary food processing environment.