News

2025.12.05 Open source Unicorn version 1 and version 2!

2025.12.04 Unicorn version 3 was accepted by TCSVT.

2024.12.06 Open source Unicorn Pre (SparsePCGC)!

2024.10.28 Unicorn version 2 has responded to the Call for Proposals for AI-based Point Cloud Coding (m70061 & m70062 in MPEG).

2024.10.05 Initial release of part of the code and results. (The entire source code will be released to the public after the approval from the funding agency.)

2024.09.12 Unicorn version 1 was accepted by TPAMI. (Part I and Part II)

Abstract

A universal multiscale conditional coding framework, Unicorn, is proposed to compress the geometry and attribute of any given point cloud. Geometry compression is addressed in Part I of this paper, while attribute compression is discussed in Part II.

For geoemtry compression, we construct the multiscale sparse tensors of each voxelized point cloud frame and properly leverage lower-scale priors in the current and (previously processed) temporal reference frames to improve the conditional probability approximation or content-aware predictive reconstruction of geometry occupancy in compression.

For attribute compression, Since attribute components exhibit very different intrinsic characteristics from the geometry element, e.g., 8-bit RGB color versus 1-bit occupancy, we process the attribute residual between lower-scale reconstruction and current-scale data. Similarly, we leverage spatially lower-scale priors in the current frame and (previously processed) temporal reference frame to improve the probability estimation of attribute intensity through conditional residual prediction in lossless mode or enhance the attribute reconstruction through progressive residual refinement in lossy mode for better performance.

The porposed Unicorn is a versatile, learning-based solution capable of compressing static and dynamic point clouds with diverse source characteristics in both lossy and lossless modes. Following the same evaluation criteria, Unicorn significantly outperforms standard-compliant approaches like MPEG G-PCC, V-PCC, and other learning-based solutions, yielding state-of-the-art compression efficiency while presenting affordable complexity for practical implementations.

Contributions

Comprehensive coding metric:  Unicorn is the first, versatile, learning-based PCC solution.
1) It can compress the geometry and attribute information, either separately or jointly, of an input point cloud.
2) It flexibly supports the static and dynamic coding of point clouds in either lossless or lossy mode.
3) It demonstrates the leading performance for diverse types, including solid, dense, and sparse object point clouds, as well as scant LiDAR.
Better compression performance:   Unicorn provides significant performance gains to existing approaches.
Low computation complexity:   Unicorn is a low-complexity approach with comparable runtime measures to the G-PCC codec and variable-rate coding capability using a single neural model.

Abstract

Multiscale sparse representation offers significant advantages in point cloud geometry compression, delivering state-of-the-art performance compared to both standardized solutions and other learned approaches. A crucial component of this framework is the cross-scale occupancy prediction, which employs the lower-scale reference representation either from the current frame alone or from both the current and temporal reference frames to establish conditional priors for either static or dynamic coding. However, existing works mainly use local computations, e.g., sparse convolutions and kNN attention, to exploit correlations in such a representation; these methods usually fail to adequately capture global coherence. In addition, the fixed configuration of lossless-lossy scales cannot adapt to temporal dynamics, which limits the reconstruction quality of temporal references in dynamic coding. These limitations constrain the generation of more effective priors used for conditional coding. To address these issues, we propose two new techniques. The first is KPA (Key Point-driven Attention), which integrates both local and global characteristics. The second is AdaScale (Adaptive Lossy/Lossless Scale), which decides whether the transitional scale should be in lossless or lossy mode based on temporal displacement, thereby enhancing the reconstruction quality of the temporal reference. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, including rules-based standard codecs like G-PCC and V-PCC, as well as learning-based approaches like Unicorn and TMAP, across both static/dynamic and lossy/lossless coding scenarios.

KPA (Key Point-driven Attention): A novel attention mechanism that identifies salient key points to bridge local geometric details and global structural context. By focusing computation on these representative points, KPA enables long-range dependency modeling while maintaining efficiency—overcoming the limited receptive field of conventional sparse convolutions.

AdaScale (Adaptive Lossy/Lossless Scale): An adaptive strategy that dynamically selects lossless or lossy coding for transitional scales based on inter-frame motion. When significant temporal displacement is detected, the scale is encoded losslessly to produce a high-quality reference frame; otherwise, it uses lossy compression for efficiency. This improves both rate-distortion performance and temporal prediction accuracy.

Contributions

  • By integrating KPA and AdaScale into the multiscale sparse representation framework, this work demonstrates significant improvements over rule-based solutions like G-PCC and V-PCC, as well as learned approaches like Unicorn and TMAP1.
    • In static coding, it surpasses TMAP by 9.1% to 22.3% in lossless mode and by 12.7% to 40.7% in lossy mode. For dynamic coding, it outperforms TMAP by 8.8% in lossless mode and 37.6% in lossy mode. Note that TMAP requires significantly longer processing time, likely due to redundant modules such as point-wise enhancement.
    • Compared to V-PCC, our method achieves BD-BR gains of about 56% in static lossy mode and 91% in dynamic lossy mode, while maintaining comparable decoding latency.
  • The impressive compression gains of this work are attributed to the use of KPA and AdaScale. KPA combines both local and global characteristics of the input reference representation, while AdaScale improves the quality of the reference representation itself. Together, these mechanisms strive to generate better conditional priors for occupancy prediction.
    • KPA samples a limited number of key points to perform cross-attention, maintaining global structural coherence while significantly reducing complexity.
    • AdaScale loosens the fixed configuration of lossless-lossy scales, opening new opportunities to optimize the reference frame quality for better performance.

Method

Data processing in Unicorn. A specific frame Ptk includes the geometry part Otk and attribute intensity Itk; Voxelized Otk is represented using sparse tensor that only contains occupied voxels.



Geometry                        Attribute


           

Unicorn's Multiscale Sparse Representation. (left) geometry: 1 - Occupied voxel, 0 - Unoccupied voxel; (right) color attribute exemplified using luma or Y intensity. OPU is the Occupancy Processing Unit, and APU is the Attribute Processing Unit.



           

Cross-scale Processing Units. (left) OPU; (right) APU. Spatially or spatiotemporally lower-scale priors are used to support probability approximation in lossless mode or predictive/progressive reconstruction in lossy mode for respective static or dynamic coding.



                

Lossless Coder in Unicorn. (left) geometry; (right) attribute.


                

Lossy Coder in Unicorn. (left) geometry; (right) attribute.


                

Dynamic Coder in Unicorn. (left) geometry; (right) attribute.


Unified compression of geometry and attribute in Unicorn.

Method

Multiscale Sparse Representation Framework: Used for neural point cloud compression. The process starts with the full-resolution frame and progressively downscales it to create multiple representations. Compression begins with a thumbnail and uses lower-scale references for subsequent scales.

                

Left: Uniform key point sampling from a 3D tensor serialized via Hilbert curve (green: valid points, red: sampled key points).

Right: Local feature extraction using multi-level sparse convolutions at varying strides (2D illustration).

Modular Component Details. (a) Connections among Occupancy Prediction, Extractor, and Predictor. (b) Structure of KPA used in Occupancy Prediction and Extractor. (c) Structure of Cross-Frame KPA used in Predictor.

AdaScale: Adapts the (m+1)-th scale of a point cloud frame to lossless mode if significant motion is detected—producing a high-quality reference frame (HF). Otherwise, it uses lossy mode. Lossless scales reference the same scale in the previous frame; lossy scales reference the most recent HF.

Experiments

Point Cloud Examples. We conducted extensive experiments on various point cloud datasets to thoroughly understand the efficiency and generalization of Unicorn. These datasets include static and dynamic samples with diverse contents, densities, resolutions, and other characteristics.


Part I: Geometry


        

R-D comparison for static geometry coding.



        

R-D comparison for dynamic geometry coding.



Error map visualization of reconstructed point clouds.




Part II: Attribute


        

R-D comparison for static attribute coding.


        

R-D comparison for lossy compression of geometry & attribute.


        

R-D comparison for dynamic attribute coding.


Qualitative results of reconstructed point clouds with lossy attribute coding.


Qualitative visualization of reconstructed point clouds with lossy compression mode of both geometry & attribute.

Experiments

Compression Gains: The proposed method demonstrates a significant advantage over both rules-based approaches, such as V-PCC, G-PCC, and GeS-TM, as well as learned methods like TMAP and Unicorn, in both lossless and lossy, as well as static and dynamic geometry coding modes. We evaluate the lossless performance using the 8iVFB and Owlii datasets. TMAP, a recent AI-PCC model released by MPEG, serves as the anchor.

Runtime Comparison: Average encoding and decoding time (seconds/frame) of different methods. Ours* is a lightweight version of our model.

Our Team

Team members contributed to Unicorn.

Ma Zhan

Professor at Nanjing University.

Email: mazhan@nju.edu.cn

Ding Dandan

Associate Professor at Hangzhou Normal University.

Email: DandanDing@hznu.edu.cn

Chen Tong

Associate Researcher at Nanjing University.

Email: chentong@nju.edu.cn

Wang Jianqiang

Ph.D. Candidate at Nanjing University.

Email: wangjq@smail.nju.edu.cn

Xue Ruixiang

Ph.D. Candidate at Nanjing University.

Email: xrxee@smail.nju.edu.cn

Li Jiaxin

Ph.D. Candidate at Nanjing University.

Email: lijiaxin@smail.nju.edu.cn

Our Team

Team members contributed to Unicorn v3.

Ma Zhan

Professor at Nanjing University.

Email: mazhan@nju.edu.cn

Ding Dandan

Associate Professor at Hangzhou Normal University.

Email: DandanDing@hznu.edu.cn

Li Zehong

Master's Degree Candidate at Hangzhou Normal University

Email: 2024112011033@hznu.edu.cn

Zhu Jiahao

Master's Degree Candidate at Hangzhou Normal University

Email: 2023112011030@hznu.edu.cn