Existing multimodal large model-based image compression frameworks often rely on a fragmented integration of semantic retrieval, latent compression, and generative models, resulting in suboptimal performance in both reconstruction fidelity and coding efficiency. To address these challenges, we propose a residual-guided ultra lowrate image compression named ResULIC, which incorporates residual signals into both semantic retrieval and the diffusion-based generation process.
Specifically, we introduce Semantic Residual Coding (SRC) to capture the semantic disparity between the original image and its compressed latent representation. A perceptual fidelity optimizer is further applied for superior reconstruction quality. Additionally, we present the Compression-aware Diffusion Model (CDM), which establishes an optimal alignment between bitrates and diffusion time steps, improving compression-reconstruction synergy.
This work is dedicated to addressing two significant challenges in image compression based on generative models:
Multimodal Semantic Integration: The effective integration of multimodal semantic information, minimizing redundancy while ensuring high perceptual fidelity at extremely low bitrates.
Compression-Generation Alignment: Modeling the compression ratio with the noise scale in the diffusion process, enabling efficient and consistent reconstructions across varying compression levels.
For challenge 1, we propose to implement a semantic residual coding module into our multimodal image compression framework, aiming to achieve overall minimal bitrate consumption. Besides, to optimize the perceptual fidelity, we propose a differential prompt optimization strategy to find the optimal text prompts for improving the reconstruction consistency.
For challenge 2, the degradation introduced by compression and the diffusion noising process share a common characteristic: as noise increases (or the compression ratio becomes higher), less information is preserved in the degraded image. Consequently, the compression ratio aligns inherently with the diffusion time steps. In this context, we aim to model this correlation. we incorporate the latent residual into the diffusion process and propose a Compression-aware Diffusion Process, which effectively enhances reconstruction fidelity while significantly improving decoding efficiency.
Extensive experiments demonstrate the effectiveness of ResULIC, achieving superior objective and subjective performance compared to state-of-the-art diffusion-based methods PerCo with -80.7%, -66.3% BD-rate saving in terms of LPIPS and FID.
@inproceedings{Ke2025resulic,
author = {Ke, Anle and Zhang, Xu and Chen, Tong and Lu, Ming and Zhou, Chao and Gu, Jiawen and Ma, Zhan},
title = {Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion},
booktitle = {International Conference on Machine Learning},
year = {2025},
}