Learned Off-aperture Encoding for
Wide Field-of-view RGBD Imaging

1The University of Hong Kong (HKU),2King Abdullah University of Science and Technology (KAUST)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
(To be updated)
MY ALT TEXT

(Left) Depiction of three potential locations for integrating a DOE for encoding purposes in an imaging system. The on-aperture DOE has many degrees of freedom to affect each ray bundle, but all of them are applied globally to the whole image plane. On the other hand, a DOE near the sensor provides localized control of the PSF, but with much fewer degrees of freedom for each ray bundle. The off-aperture DOE strikes an optimal balance between these two extremes. (Center-left) Cross-sectional view of two custom-fabricated optical imaging systems. The DOEs and apertures are located at separate planes. (Center-right) Resolved wide-FoV results of App. 1 compared to encoded measurements; (Right) Color and depth results of App. 2.

Abstract

End-to-end (E2E) designed imaging systems integrate coded optical designs with decoding algorithms to enhance imaging fidelity for diverse visual tasks. However, existing E2E designs encounter significant challenges in maintaining high image fidelity at wide fields of view, due to high computational complexity, as well as difficulties in modeling off-axis wave propagation while accounting for off-axis aberrations. In particular, the common approach of placing the encoding element into the aperture or pupil plane results in only a global control of the wavefront. To overcome these limitations, this work explores an additional design choice by positioning a DOE off-aperture, enabling a spatial unmixing of the degrees of freedom and providing local control over the wavefront over the image plane. Our approach further leverages hybrid refractive-diffractive optical systems by linking differentiable ray and wave optics modeling, thereby optimizing depth imaging quality and demonstrating system versatility. Experimental results reveal that the off-aperture DOE enhances the imaging quality by over 5dB in PSNR at a FoV of approximately 45° when paired with a simple thin lens, outperforming traditional on-aperture systems. Furthermore, we successfully recover color and depth information at nearly 28° FoV using off-aperture DOE configurations with compound optics. Physical prototypes for both applications validate the effectiveness and versatility of the proposed method.

Application 1: Wide-FoV Simple Lens Imaging

Our wide-FoV imaging setup includes a simple focusing lens with an approximate FoV of 45° and a rotational symmetric off-aperture DOE, effectively compensating for most aberrations. State-of-the-art coded-aperture and/or deep optics imaging solutions mostly assume spatial-invariant PSF behavior and demonstrate in experiments a FoV up to 30°.

(Top-left) The conceptual system setup of an off-the-shelf thin lens and an off-aperture DOE. (Bottom-left) The fitted PSNR plots for recovered images across various DOE locations shows that the optimal location is a trade-off point between the aperture and the sensor. The optimal DOE location for each dataset is approximately 0.24. (Right) Comparison of PSF amplitudes at several FoVs when the DOE is placed at different locations.

Application 2: Wide-FoV Compound Lens RGBD Imaging

We introduce a compound optics prototype designed for wide-FoV depth and color imaging. The system combines an optimized Cooke triplet as the focusing lens module, achieving a practical balance between compactness and aberration correction.

(Left) Optimized PSFs at depths from 0.8 m to 10 m and FoVs up to 28°, using the off-aperture large-FoV EDoF imaging system. Depth layers are sampled uniformly. (Top-right) The 3D model of the optimized system, where the dotted line denotes the location to place the near-aperture DOE. Figure visualization is not drawn to scale. (Bottom-right) Optimized DOE height maps for near (around 0.11) and off aperture (around 0.24) settings.

System Pipeline

MY ALT TEXT
Imaging pipeline of our proposed system. We model the camera’s light propagation using a combination of ray tracing and wave propagation. The DOE placed in-between the lens module and the sensor facilitates information encoding. A multi-head decoding network based on the ResNet architecture is incorporated to support optimization for multiple visual tasks.
MY ALT TEXT
(Left) App. 1 prototyping includes an aperture, a thin lens, and a DOE; (Center) The experimental setup. (Right) App. 2 prototype includes an aperture, three refractive lenses, and a DOE.

BibTeX


      @article{wei2025learned,
        title   = {Learned Off-aperture Encoding for Wide Field-of-view RGBD Imaging},
        author  = {Wei, Haoyu and Liu, Xin and Liu, Yuhui and Fu, Qiang and Heidrich, Wolfgang and Lam, Edmund Y. and Peng, Yifan},
        journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
        year    = {2025}
      }
    

Related Projects

You may also be interested in related projects in deep optics: