ICML 2026
Semantic Granularity Navigationin Image Editing
Abstract
Despite the generative capabilities of diffusion and flow models, real-image editing remains constrained by a persistent trade-off between semantic editability and structural fidelity. We trace a primary cause of this limitation to the implicit coupling of edit progress with model scale in existing paradigms. Under this coupling, stronger edits typically require visiting noisier states, which spends computation on destabilizing layout before the semantic change is well localized. We introduce NaviEdit, a training-free inference-time controller that decouples edit progress from model scale traversal through a strict self-consistency contract. NaviEdit operates at the rollout level and leaves the underlying pretrained model unchanged. It treats scale as a control input and reallocates a fixed step budget toward semantically responsive intermediate scales instead of destructive high-noise regimes. Experiments show consistent gains across compatible editors and flow backbones, supporting decoupling as a portable inference-time control principle.
Method
The controller is lightweight, model-agnostic at inference time, and designed to plug into compatible prompt-pair editors without retraining the backbone.
- Progress--scale decoupling introduces an explicit edit-progress axis, rather than using the model scale coordinate as an edit clock.
- Editable-information navigation concentrates rollout density inside an effective scale window where the differential field is semantically strong but still spatially anchored.
- Self-consistent updates keep mixing, querying, and actuation on the same scale coordinate, avoiding the mismatch bias that appears in coupled schedules.
Visual Overview
The paper combines an empirical diagnosis of the scale axis with rollout-level control. The figures below summarize the intuition: not all scales provide the same editable information, and stable editing comes from navigating the right regimes rather than simply going noisier.
Results
PIE-Bench shows the main preservation gains of the controller, while additional experiments indicate that the same decoupled principle remains useful across other compatible editors and flow backbones.
Improved editability--preservation trade-off, with stronger semantic changes and better protection of non-edited regions under comparable budgets.
The same controller improves compatible editors such as InfEdit and FlowAlign without retraining or changing the underlying backbone.
Additional studies on SD3, SD3.5, and FLUX backbones support the claim that the mechanism is not tied to a single model family.
BibTeX
Project page, code, and preprint are now public. The citation below matches the current arXiv release.
@article{lu2026naviedit,
title={Semantic granularity navigation in image editing},
author={Lu, Liangsi and Guo, Minzhe and Chen, Xuhang and Shi, Yang},
journal={arXiv preprint arXiv:2605.21190},
year={2026}
}