CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Akshat Ramachandran1, Souvik Kundu2, Tushar Krishna1
1Georgia Institute of Technology, 2Intel Labs
Correspondence: akshat.r@gatech.edu
News
- CLAMP-ViT has been accepted to ECCV 2024!
Overview
We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs). We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships, leading to the generation of simplistic and se- mantically vague data, impacting quantization accuracy. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization. Specifically, we incorporate a patch-level contrastive learning scheme to generate richer, semantically meaningful data. Furthermore, we leverage contrastive learning in layer-wise evolutionary search for fixed- and mixed-precision quantization to identify optimal quantization parameters while mitigating the effects of a non-smooth loss landscape. Extensive evaluations across various vision tasks demonstrate the superiority of CLAMP-ViT, with performance improvements of up to 3% in top-1 accuracy for classification, 0.6 mAP for object detection, and 1.5 mIoU for segmentation at similar or better compression ratio over existing alternatives.
Method
It employs a two-stage cyclic process that alternates between generating semantically rich synthetic data using patch-level contrastive learning and optimizing quantization parameters through layer-wise evolutionary search.
Results
Comparison of synthetic data generated by (a) PSAQ-ViT v1, (b) PSAQ- ViT v2 and (c) CLAMP-ViT (Ours). CLAMP-ViT generates detailed objects within contextually suitable backgrounds, boosting realism and informativeness.
Fixed-precision quantization accuracy comparison with SoTA on image clas- sification tasks with ImageNet-1k testset. ‘R’, ‘S’ signifies real and synthetic calibration data and W/A indicates weight/activation bit-width. The values in bold and underline signifies, the best performance overall, and with synthetic data, respectively.
Citation
Acknowledgement
This work was supported in part by CoCoSys, one of the seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.