Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion

Anle Ke1, Xu Zhang1, Tong Chen1, Ming Lu1, Chao Zhou2, Jiawen Gu2, Zhan Ma1
1Nanjing University    2Kuaishou Technology   
ICML 2025
PDF Image

News


event [May 2025] Our paper is accepted to ICML 2025.
event [May 2025] Our project page is released.

Abstract


Existing multimodal large model-based image compression frameworks often rely on a fragmented integration of semantic retrieval, latent compression, and generative models, resulting in suboptimal performance in both reconstruction fidelity and coding efficiency. To address these challenges, we propose a residual-guided ultra lowrate image compression named ResULIC, which incorporates residual signals into both semantic retrieval and the diffusion-based generation process.

Specifically, we introduce Semantic Residual Coding (SRC) to capture the semantic disparity between the original image and its compressed latent representation. A perceptual fidelity optimizer is further applied for superior reconstruction quality. Additionally, we present the Compression-aware Diffusion Model (CDM), which establishes an optimal alignment between bitrates and diffusion time steps, improving compression-reconstruction synergy.

PDF Image

Overview


ResULIC Overview
ResULIC Overview: (1) The feature compressor transforms the original image $ x $ into the compressed latent feature $ z_c $. (2) The Semantic residual retrieval (Srr) generates optimized captions by analyzing both the decoded image $ x' $ and the original $ x $, with the plugin play module Perceptual fidelity optimizer (Pfo) to further improve reconstruction quality. (3) Text tokens are embedded into $ c $ and combined with $ z_c $ as conditions for the Compression-aware Diffusion Model (CDM) to generate the final image $ x_r $.

This work is dedicated to addressing two significant challenges in image compression based on generative models:

Multimodal Semantic Integration: The effective integration of multimodal semantic information, minimizing redundancy while ensuring high perceptual fidelity at extremely low bitrates.

Compression-Generation Alignment: Modeling the compression ratio with the noise scale in the diffusion process, enabling efficient and consistent reconstructions across varying compression levels.

Semantic Residual Coding (SRC)

For challenge 1, we propose to implement a semantic residual coding module into our multimodal image compression framework, aiming to achieve overall minimal bitrate consumption. Besides, to optimize the perceptual fidelity, we propose a differential prompt optimization strategy to find the optimal text prompts for improving the reconstruction consistency.

PDF Image

Compression-aware Diffusion Model (CDM)

For challenge 2, the degradation introduced by compression and the diffusion noising process share a common characteristic: as noise increases (or the compression ratio becomes higher), less information is preserved in the degraded image. Consequently, the compression ratio aligns inherently with the diffusion time steps. In this context, we aim to model this correlation. we incorporate the latent residual into the diffusion process and propose a Compression-aware Diffusion Process, which effectively enhances reconstruction fidelity while significantly improving decoding efficiency.

PDF Image

Results


ResULIC achieves significant improvements in both Perceptual Realism and Perceptual Fidelity

Extensive experiments demonstrate the effectiveness of ResULIC, achieving superior objective and subjective performance compared to state-of-the-art diffusion-based methods PerCo with -80.7%, -66.3% BD-rate saving in terms of LPIPS and FID.

PDF Image
PDF Image

Citation


@inproceedings{Ke2025resulic,
            author = {Ke, Anle and Zhang, Xu and Chen, Tong and Lu, Ming and Zhou, Chao and Gu, Jiawen and Ma, Zhan},
            title = {Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion},
            booktitle = {International Conference on Machine Learning},
            year = {2025},
            }