Efficient Visual Computing with Camera RAW Snapshots

T-PAMI 2024


Zhihao Li    Ming Lu    Xu Zhang    Xin Feng    M. Salman Asif    Zhan Ma

NJU Vision Lab




TLDR

This paper proposes a novel ρ-Vision to directly perform high-level semantic understanding and low-level compression using RAW images. The framework is demonstrated to provide better detection accuracy and compression than RGB-domain counterparts and is shown to be able to generalize across different camera sensors and task-specific models. Additionally, it has the potential to reduce ISP computation and processing time.

raw-vision overview


Abstract

Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel ρ-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed ρ-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed ρ-Vision that eliminates the ISP are the potential reductions in computations and processing times.


Overview

  • We propose a Unpaired CycleR2R for conversion between RAW and RGB images. We train a CycleGAN to train an ISP for RAW-to-RGB and an invISP for RGB-to-RAW transformation (R2R) using unpaired RGB and RAW images. Such unsupervised learning using unpaired samples makes our approach much easier for practical implementation. In contrast, existing solutions (e.g., CycleISP, CIE-XYZ Net, and MBISPLD) are supervised models that require paired RAW and RGB images (from the same camera model).
  • RAW-domain models (in principle) can be obtained by retraining corresponding RGB-domain models using the simRAW images generated by the proposed invISP. Such domain adaptation approaches need to be separately engineered for each individual task.
  • To encourage the reproducible research, a labeled MultiRAW dataset that contains >7k RAW images acquired using multiple different camera sensors, is made publicly accessible for RAW-domain processing.
cycler2r_framework
Unpaired CycleR2R Framework.


Results


Hardware Implementation
hardware results
The hardware benchmark of ρ-Vision framework.

RAW-Domain Detection
zero-shot RAW detection results
Zero-shot RAW-domain detection results.
few-shot RAW detection results
Few-shot RAW-domain detection results.

RAW-Domain Segmentation
zero-shot RAW segmentation results
Zero-shot RAW-domain segmentation results.
visual zero-shot RAW segmentation results
Visualization of zero-shot RAW-domain segmentation results.

RAW-Domain Classification
zero-shot RAW classification results
Zero-shot RAW-domain classification results.
visual zero-shot RAW classification results
Visualization of zero-shot RAW-domain classification results.

RAW-Domain Compression
lossy RAW compression results lossy RAW compression results
Lossy & Lossless RAW compression results.
visual RAW compression results
Visualization of RAW lossy compression.
visual RAW compression results
Visualization of progressive decoding using RAW lossless compressor.

The Impact of ISP for RGB-domain detection
impact of isp results
ISP affects RGB-domain detection results.
visual of impact of isp
Visualizing the Impact of ISP.


BibTeX Citation

@article{li2022efficient, 
	title={Efficient Visual Computing with Camera RAW Snapshots},
	author={Zhihao Li, Ming Lu, Xu Zhang, Xin Feng, M. Salman Asif, and Zhan Ma},
	journal={arxiv}, 
	url={https://arxiv.org/pdf/2212.07778.pdf}, 
	year={2022}, 
}