ICML 2026

Semantic Granularity Navigationin Image Editing

Liangsi Lu1, Minzhe Guo1, Xuhang Chen2, Yang Shi1† 1 Guangdong University of Technology, 2 Huizhou University
Corresponding author

Abstract

Despite the generative capabilities of diffusion and flow models, real-image editing remains constrained by a persistent trade-off between semantic editability and structural fidelity. We trace a primary cause of this limitation to the implicit coupling of edit progress with model scale in existing paradigms. Under this coupling, stronger edits typically require visiting noisier states, which spends computation on destabilizing layout before the semantic change is well localized. We introduce NaviEdit, a training-free inference-time controller that decouples edit progress from model scale traversal through a strict self-consistency contract. NaviEdit operates at the rollout level and leaves the underlying pretrained model unchanged. It treats scale as a control input and reallocates a fixed step budget toward semantically responsive intermediate scales instead of destructive high-noise regimes. Experiments show consistent gains across compatible editors and flow backbones, supporting decoupling as a portable inference-time control principle.

Method

The controller is lightweight, model-agnostic at inference time, and designed to plug into compatible prompt-pair editors without retraining the backbone.

  • Progress--scale decoupling introduces an explicit edit-progress axis, rather than using the model scale coordinate as an edit clock.
  • Editable-information navigation concentrates rollout density inside an effective scale window where the differential field is semantically strong but still spatially anchored.
  • Self-consistent updates keep mixing, querying, and actuation on the same scale coordinate, avoiding the mismatch bias that appears in coupled schedules.

Visual Overview

The paper combines an empirical diagnosis of the scale axis with rollout-level control. The figures below summarize the intuition: not all scales provide the same editable information, and stable editing comes from navigating the right regimes rather than simply going noisier.

Editable-information spectrum Editable-information spectrum across scales for NaviEdit
Scale is not an edit clock. Different scales expose qualitatively different editing regimes, with a usable middle window between brittle low-scale reconstruction and drifting high-scale generation.
Rollout control NaviEdit navigation figure
Density beats range. NaviEdit keeps the rollout inside an effective window and allocates more compute there, instead of expanding range into risky high-noise states.
PIE-Bench examples Qualitative PIE-Bench comparisons for NaviEdit
Qualitative preservation gains. Under comparable budgets, NaviEdit produces larger semantic edits while keeping the rest of the image more stable than coupled baselines.

Results

PIE-Bench shows the main preservation gains of the controller, while additional experiments indicate that the same decoupled principle remains useful across other compatible editors and flow backbones.

PIE-Bench

Improved editability--preservation trade-off, with stronger semantic changes and better protection of non-edited regions under comparable budgets.

Cross-editor

The same controller improves compatible editors such as InfEdit and FlowAlign without retraining or changing the underlying backbone.

Cross-backbone

Additional studies on SD3, SD3.5, and FLUX backbones support the claim that the mechanism is not tied to a single model family.

BibTeX

Project page, code, and preprint are now public. The citation below matches the current arXiv release.

@article{lu2026naviedit,
  title={Semantic granularity navigation in image editing},
  author={Lu, Liangsi and Guo, Minzhe and Chen, Xuhang and Shi, Yang},
  journal={arXiv preprint arXiv:2605.21190},
  year={2026}
}