Notice

Recent Posts

Recent Comments

Link

« 2026/03 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

Today

Total

관리 메뉴

인공지능

Q-Diffusion : Quantizing Diffusion Models _ICCV_2023 논문 리뷰 본문

카테고리 없음

Q-Diffusion : Quantizing Diffusion Models _ICCV_2023 논문 리뷰

사람지은 2024. 5. 27. 14:19

Li_Q-Diffusion_Quantizing_Diffusion_Models_ICCV_2023_paper

Q-Diffusion : Quantizing Diffusion Models

Abstract

Problem(motivation) : diffusion model 의 noise estimation 과정에서 slow inference, high memory consumption, computation intensity 발생하여 diffusion model 의 효율적인 채택이 어렵다.

Previous state: PTQ 는 diffusion model 이 갖는 단점을 해결하기 위한 좋은 방법이지만, 기존의 PTQ 방법으로 diffusion model 에 적용을 했을 때, 단점이 그닥 해결되지 않았다.

Key difficulty of diffusion model quantization:

multiple time step 동안 바뀌는 noise estimation network 의 output distribution
bimodal activation distribution of the short-cut layers within the noise estimation network

Solution: noise estimation network compress이 가능토록 unique multi-timestep pipeline 과 diffusion model 의 model architecture에 맞춰 새로운 PTQ 방식을 제안한다.

time-step aware calibration
split short-cut quantization

Result: 성능을 유지하면서 training-free 방식으로 4-bit 로 quantizing 에 성공 (small FID change of at most 2.34 compared to >100 for traditional PTQ) 하였다.

Introduction

Diffusion model 은 diversity(gan 단점 해결) 와 fidelity(진짜 데이터와 샘플이 닮은 정도) 둘 다에서 높은 성과를 이뤘다. 하지만 generation process 에서 복잡한 신경망을 사용하는 iterative noise estimation(50번~1000번) 을 필요로 하기 때문에 속도가 느리다는 단점이 있다.

denoising 과정에서의 step 들을 줄임으로써 속도를 향상시키는 연구는 진행되어 왔지만, noise estimation 신경망이 각 iteration 마다 compute-,memory-intensive 하다는 점을 간과하고 있다. 즉 step 을 생략해도 compute-memoty-intensive 하다는 단점은 그대로임⇒ 각각의 iteration 에서의 memory footprints, 추론 속도 느림은 그대로임.

이 논문은 모든 timestep 에서의 속도를 가속화하기 위해 noise estimation model 에 PTQ 를 적용한다. PTQ 는 go-to compression method 이고 training data 를 최소로 요구하고 하드웨어 장치에 직접 배포가 가증하기 때문에 현재 각광받는 연구이다.

하지만, diffusion model 은

1. iterative 한 computation process 가 필요하다는 점

2. noise estimation 의 model 구조

때문에 PTQ 적용이 어렵다.

전에 연구된 PDQ4DM 은 8 bit로 diffusion model 이 compress 되도록 한다. 하지만 이는 작은 데이터셋과 낮은 해상도에 중점을 두고 있다.

위 그림은 기존의 PTQ 를 적용한 것이다. (a) 를 보면 time step 마다 noise estimation nn 의 output 이 크게 다른 것을 알 수 있으며 성능 또한 안 좋은 것을 알 수 있다. (b) 를 보면 noise estimation nn 의 iterative 한 inference 과정은 quantization error 를 누적하는 것을 알 수 있다.

따라서 새로운 quantization scheme 과 calibration objectives 를 디자인 하는 것이 중요하다.

Q-Diffusion

장점:
바뀐 알고리즘:

Related work

diffusion model

diffusion model 은 가우시간 노이즈를 time step 마다 데이터(𝑥0~ 𝑞(𝑥) )에 더하는 마르코프 체인을 따른다.

forward:

𝛽𝑡 (variance schedule)는 0~1 사이 값을 갖는데, 각 time-step 마다 noise 를 얼마나 줄지를 결정한다.

𝑇가 무한대로 가면 𝑥𝑡 는 isotorpic gaussian distribution 으로 근사한다.

reverse:

노이즈가 씌워진 sample( 𝑥𝑇~ 𝑁(0,𝐼) ) 에서 매 step 마다 noise 를 제거하면서 high-fidelity image를 얻는다. 하지만 실제 𝑞(𝑥𝑡−1∣𝑥𝑡) 의 조건부 확률은 구할 수 없기에 학습한 조건부 확률로부터 reverse 를 진행한다.

역확산 분포를 구할 때, 평균과 분산은 reparameterization 을 이용하여 구할 수 있다.

time step 마다 noise 는 𝑥𝑡 로부터 noise estimation model( UNet 을 사용하고(최근에는 transformer 를 이용한 연구도 진행 중) time step 마다 weight 는 같음) 을 이용하여 구한다.

이 논문에서는 UNet 을 기준으로 새로운 PTQ method 를 설명한다.

PTQ

post training quantization 은 심층신경망의 원소(𝑤) 를 discrete 한 값으로 rounding 하여 압축한다.

quantization 과 de-quantization 은 위의 수식을 따른다. 𝑠 는 quantization scale 이다.

𝑤,𝑐𝑚𝑖𝑛,𝑐𝑚𝑎𝑥 는 weight 와 activation 의 분포로 측정이 되고, calibrated 한다. round(·)는 rounding 을 의미하는데 2가지 기법이 있다.

AdaRound
Nearest Rounding

분류, 감지 문제에 쓰이던 기존의 PTQ 는 calibration objective 와 acquisition calibration data 에 집중을 했었다.

diffusion model quantization 에서는 training dataset 이 필요하지 않다. random input 을 사용하여 full precision model 을 샘플링 해서 생성할 수 있다. 따라서 diffusion model quantization 에서는 양자화 과정에서 훈련 데이터셋을 사용하지 않고도 보정 데이터를 생성할 수 있다

→ pretrain 된 모델에 무작위 된 입력값을 넣고 숫자의 분포를 보고 calibrate 한다

하지만, multi-time step 마다 noise estimation moel 으로부터 추론값을 얻는 것은 기존의 방법으로 activation distribution 을 modeling 하는데 문제가 된다.

PTQ4DM 은 Normally Distributed time-step calibration 방법을 소개한다. 이 방법은 특정 분포를 갖는 모든 시간 단계에 대해 보정 데이터를 생성한다. 하지만 이 방법은 낮은 해상도, 8비트 정밀도, floating-point 어텐션 activation 간의 행렬 곱셈, 다른 calibration 방법에 대한 제한된 ablation study(어떤 구성 요소를 시스템에서 제거하여 해당 구성 요소의 중요성을 평가하는 방법) 한계가 존재한다.

본 논문은 calibration 데이터셋 생성의 영향을 전체적으로 탐구하여 diffusion model 을 위한 효율적인 calibration 목표를 설정한다. 512*512 해상도를 가진 대규모 데이터셋을 포함하는 실험을 통해 act-to-act matmuls 를 완전히 양자화하여 검증한다.

Method

Challenges under the Multi-step De-noising

Quantization errors accumulate across time steps
Activation distributions vary across time steps

middle 50 time steps caused smaller drops compared to cases with samples taken from either the first or the last n time steps.

Challenges on Noise Estimation Model Quantization

대부분의 diffusion model 은 latent feature 를 down-sample, up-sample 할 때 back-bone으로 UNet 을 쓴다. 최근엔 트랜스포머를 쓰는 연구도 진행 중 이지만 아직 UNet 을 가장 많이 쓴다.

UNet 은 deep feature 와 shallow feature 를 merge 하기 위해 shortcut layer 를 사용하고 이를 subsequent layer 에 전달한다.

아래 표를 보면, shortcut layer 의 activation input 이 abnormal 한 value range 를 가지고 있는 것을 알 수 있다.

DDIM 의 경우 shortcut layer의 input activations 는 이웃 layer 에 비해 200배 정도 큰 것을 알 수 있다.

깊은 feature 채널(x1) 과 얕은 feature 채널(x2) 의 activation range가 함께 연결된다. 이는 곧 bimodal distribution 을 만든다. 동일한 quantization 을 사용하여 전체 weight 와 activation distribution 을 quantization 하는 것은 quantization error 를 만들 것이다.