Efficient Visual Computing with Camera RAW Snapshots

T-PAMI 2024

Zhihao Li Ming Lu Xu Zhang Xin Feng M. Salman Asif Zhan Ma

NJU Vision Lab

Code

Paper

Supplementary

Hardware Code

Dataset

TLDR

This paper proposes a novel ρ-Vision to directly perform high-level semantic understanding and low-level compression using RAW images. The framework is demonstrated to provide better detection accuracy and compression than RGB-domain counterparts and is shown to be able to generalize across different camera sensors and task-specific models. Additionally, it has the potential to reduce ISP computation and processing time.

Abstract

Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel ρ-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed ρ-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed ρ-Vision that eliminates the ISP are the potential reductions in computations and processing times.

Overview

We propose a Unpaired CycleR2R for conversion between RAW and RGB images. We train a CycleGAN to train an ISP for RAW-to-RGB and an invISP for RGB-to-RAW transformation (R2R) using unpaired RGB and RAW images. Such unsupervised learning using unpaired samples makes our approach much easier for practical implementation. In contrast, existing solutions (e.g., CycleISP, CIE-XYZ Net, and MBISPLD) are supervised models that require paired RAW and RGB images (from the same camera model).
RAW-domain models (in principle) can be obtained by retraining corresponding RGB-domain models using the simRAW images generated by the proposed invISP. Such domain adaptation approaches need to be separately engineered for each individual task.
To encourage the reproducible research, a labeled MultiRAW dataset that contains >7k RAW images acquired using multiple different camera sensors, is made publicly accessible for RAW-domain processing.

Unpaired CycleR2R Framework.

Results

Hardware Implementation

RAW-Domain Detection

zero-shot RAW detection results — Zero-shot RAW-domain detection results.

few-shot RAW detection results — Few-shot RAW-domain detection results.

RAW-Domain Segmentation

zero-shot RAW segmentation results — Zero-shot RAW-domain segmentation results.

visual zero-shot RAW segmentation results — Visualization of zero-shot RAW-domain segmentation results.

RAW-Domain Classification

zero-shot RAW classification results — Zero-shot RAW-domain classification results.

visual zero-shot RAW classification results — Visualization of zero-shot RAW-domain classification results.

RAW-Domain Compression

lossy RAW compression results — Lossy & Lossless RAW compression results.

visual RAW compression results — Visualization of RAW lossy compression.

The Impact of ISP for RGB-domain detection

impact of isp results — ISP affects RGB-domain detection results.

visual of impact of isp — Visualizing the Impact of ISP.

BibTeX Citation

@article{li2022efficient, 
	title={Efficient Visual Computing with Camera RAW Snapshots},
	author={Zhihao Li, Ming Lu, Xu Zhang, Xin Feng, M. Salman Asif, and Zhan Ma},
	journal={arxiv}, 
	url={https://arxiv.org/pdf/2212.07778.pdf}, 
	year={2022}, 
}