Integrating aerial imagery-based scene generation into applications like autonomous driving and gaming enhances realism in 3D environments, but challenges remain in creating detailed content for occluded areas and ensuring real-time, consistent rendering. In this paper, we introduce Skyeyes, a novel framework that can generate photorealistic sequences of ground view images using only aerial view inputs, thereby creating a ground roaming experience. More specifically, we combine a 3D representation with a view consistent generation model, which ensures coherence between generated images. A view consistency module ensures coherence between generated images. This method allows for the creation of geometrically consistent ground view images, even with large view gaps. The images maintain improved spatial-temporal coherence and realism, enhancing scene comprehension and visualization from aerial perspectives. As far as we have known, there are no publicly available datasets that contains pairwise geo-aligned aerial and ground view imagery. Therefore, we build a large, synthetic, and geo-aligned dataset using Unreal Engine. Both qualitative and quantitative analyses on this synthetic dataset display superior results compared to other leading synthesis approaches.
MatrixCity | FID ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | KVD ↓ | FVD ↓ |
---|---|---|---|---|---|---|
MVS | 359.15 | 27.79 | 0.30 | 0.63 | 377.20 | 2846.69 |
NeRF | 317.09 | 27.94 | 0.28 | 0.68 | 382.57 | 2390.31 |
3DGS | 245.24 | 28.13 | 0.42 | 0.62 | 340.62 | 1926.74 |
SuGaR | 260.51 | 28.13 | 0.38 | 0.60 | 204.20 | 1157.64 |
ControlNet | 63.47 | 28.08 | 0.25 | 0.57 | 281.89 | 1205.81 |
Instruct-P2P | 100.47 | 28.04 | 0.25 | 0.58 | 428.88 | 1742.12 |
GVG | 29.62 | 28.29 | 0.33 | 0.47 | 141.33 | 715.97 |
Ours | 54.73 | 32.22 | 0.45 | 0.47 | 117.93 | 528.65 |
CARLA | FID ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | KVD ↓ | FVD ↓ |
---|---|---|---|---|---|---|
MVS | 388.37 | 27.82 | 0.40 | 0.53 | 562.21 | 3606.30 |
NeRF | 248.16 | 27.98 | 0.51 | 0.68 | 618.43 | 2571.87 |
3DGS | 228.92 | 28.32 | 0.59 | 0.48 | 573.05 | 2404.44 |
SuGaR | 202.38 | 28.13 | 0.53 | 0.48 | 679.40 | 2498.16 |
ControlNet | 75.26 | 27.97 | 0.58 | 0.50 | 277.89 | 1056.69 |
Instruct-P2P | 202.12 | 27.80 | 0.38 | 0.65 | 707.08 | 3327.93 |
GVG | 45.73 | 28.29 | 0.53 | 0.47 | 266.46 | 913.07 |
Ours | 57.95 | 33.37 | 0.69 | 0.44 | 218.29 | 693.28 |
@misc{gao2024skyeyesgroundroamingusing,
title={Skyeyes: Ground Roaming using Aerial View Images},
author={Zhiyuan Gao and Wenbin Teng and Gonglin Chen and Jinsen Wu and Ningli Xu and Rongjun Qin and Andrew Feng and Yajie Zhao},
year={2024},
eprint={2409.16685},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.16685},
}