2.1ppq量化pytorch-＞onnx

手机
2025-08-15 08:00:01

前言

torchvision中加载一个模型，转换为 onnx 格式、导出 quantized graph.

code from typing import Iterable import torch import torchvision from torch.utils.data import DataLoader from ppq import BaseGraph, QuantizationSettingFactory, TargetPlatform from ppq.api import export_ppq_graph, quantize_torch_model BATCHSIZE = 32 INPUT_SHAPE = [3, 224, 224] DEVICE = 'cuda' # only cuda is fully tested :( For other executing device there might be bugs. PLATFORM = TargetPlatform.PPL_CUDA_INT8 # identify a target platform for your network. def load_calibration_dataset() -> Iterable: return [torch.rand(size=INPUT_SHAPE) for _ in range(32)] def collate_fn(batch: torch.Tensor) -> torch.Tensor: return batch.to(DEVICE) # Load a pretrained mobilenet v2 model model = torchvision.models.mobilenet.mobilenet_v2(pretrained=True) model = model.to(DEVICE) # create a setting for quantizing your network with PPL CUDA. quant_setting = QuantizationSettingFactory.pplcuda_setting() quant_setting.equalization = True # use layerwise equalization algorithm. quant_setting.dispatcher = 'conservative' # dispatch this network in conservertive way. # Load training data for creating a calibration dataloader. calibration_dataset = load_calibration_dataset() calibration_dataloader = DataLoader( dataset=calibration_dataset, batch_size=BATCHSIZE, shuffle=True) # quantize your model. quantized = quantize_torch_model( model=model, calib_dataloader=calibration_dataloader, calib_steps=32, input_shape=[BATCHSIZE] + INPUT_SHAPE, setting=quant_setting, collate_fn=collate_fn, platform=PLATFORM, onnx_export_file='./onnx.model', device=DEVICE, verbose=0) # Quantization Result is a PPQ BaseGraph instance. assert isinstance(quantized, BaseGraph) # export quantized graph. export_ppq_graph(graph=quantized, platform=PLATFORM, graph_save_to='./quantized(onnx).onnx', config_save_to='./quantized(onnx).json') # analyse quantization error brought in by every layer from ppq.quantization.analyse import layerwise_error_analyse, graphwise_error_analyse graphwise_error_analyse( graph=quantized, # ppq ir graph running_device=DEVICE, # cpu or cuda method='snr', # the metric is signal noise ratio by default, adjust it to 'cosine' if that's desired steps=32, # how many batches of data will be used for error analysis dataloader=calibration_dataloader, collate_fn=lambda x: x.to(DEVICE) ) layerwise_error_analyse( graph=quantized, running_device=DEVICE, method='snr', # the metric is signal noise ratio by default, adjust it to 'cosine' if that's desired steps=32, dataloader=calibration_dataloader, collate_fn=lambda x: x.to(DEVICE) ) 结果

加载预训练的mobilenet v2 model 最终生成三个文件信息

标签：

2.1ppq量化pytorch-＞onnx由讯客互联手机栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“2.1ppq量化pytorch-＞onnx”

上一篇
Redis测试新手入门教程

下一篇
iptables使用示例