ReSAM: Refine, Requery, and Reinforce — Self-Prompting Point-Supervised Segmentation for Remote Sensing Images

Independent Researcher CVPR 2026
Overview of the ReSAM Refine–Requery–Reinforce pipeline
Figure 1. Overview of ReSAM. Weak views generate pseudo-masks and self-prompts that iteratively refine SAM outputs through Refine (mask cleaning), Requery (self-prompting with box prompts), and Reinforce (Soft Semantic Alignment with LoRA), while the strong view provides training supervision.

Abstract

Interactive segmentation models such as the Segment Anything Model (SAM) have demonstrated remarkable generalization on natural images, but they perform suboptimally on remote sensing imagery (RSI) due to severe domain shifts and the scarcity of dense annotations. To address this limitation, we propose a pointsupervised, self-prompting framework that adapts SAM to RSI using only sparse point annotations. Our method employs a Refine–Requery–Reinforce loop, in which coarse pseudo-masks are generated from initial points (Refine), improved with self-constructed box prompts (Requery), and embeddings are aligned with Soft Semantic Alignment (SSA) to mitigate error propagation (Reinforce). Without relying on full-mask supervision, our approach progressively enhances SAM’s segmentation quality and domain robustness through self-guided prompt adaptation. We evaluate our proposed method on three RSI benchmark datasets, WHU, HRSID, and NWPU VHR-10, demonstrating that it consistently outperforms pretrained SAM and recent pointsupervised segmentation methods. Compared to the fully supervised model, our approach reduces the performance gap to 1.3% (WHU), 4.9% (HRSID), and 8.5% (NWPU) while relying only on 1-point annotations. Our results demonstrate that self-prompting and semantic alignment provide an efficient path towards scalable, point-level adaptation of foundation segmentation models for remote sensing applications

Contributions

Method

ReSAM adapts SAM with LoRA on the image encoder and operates on weak–strong dual views. Given sparse positive point prompts on a weakly augmented image, SAM produces initial instance masks. In the Refine stage, confident pixels are selected via entropy filtering and overlapping regions are removed so each pixel belongs to a single instance. In Requery, minimal bounding boxes from refined masks are used as self-generated prompts to obtain higher-quality pseudo labels. In Reinforce, SSA aligns instance embeddings from weak and strong views using FIFO queues and a soft cosine loss, combined with focal, Dice, and IoU losses for mask quality.

Quantitative Results

ReSAM is evaluated on three RSI benchmarks with 1-, 2-, and 3-point supervision per instance. Results below report SAM-based backbones (1-point setting highlighted in the paper). ReSAM surpasses PointSAM and other baselines across datasets, with steady gains as more points are provided.

Quantitative results on NWPU VHR-10
NWPU VHR-10. Comparison with point-supervised and adaptation baselines (SAM backbone).
Quantitative results on WHU building dataset
WHU. Building instance segmentation under sparse point supervision.
Quantitative results on HRSID inshore dataset
HRSID-Inshore. SAR ship instance segmentation in cluttered coastal scenes.

Ablation Studies

Each component of the $\mathrm{R}^3$ loop contributes to overlap suppression and stable pseudo labels. On WHU with 1-point prompts, full ReSAM reaches 75.8% mIoU (+14.8 over direct SAM). SSA with queue size 128 improves training stability and final accuracy on NWPU and HRSID compared to training without SSA.

Ablation of ReSAM components on WHU
Component ablation (WHU, 1-point). Self-training, requerying, and SSA each improve over the direct SAM baseline.
Effect of Soft Semantic Alignment across training
SSA ablation. SSA stabilizes training and improves final mIoU on NWPU VHR-10 and HRSID.
GPU memory comparison between PointSAM and ReSAM
Memory efficiency. ReSAM reduces average GPU memory during training by up to 84% compared to prototype-based PointSAM.

BibTeX

@inproceedings{subhani2026resam,
  title={ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images},
  author={Subhani, Muhammad Naseer},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Preprint: @article{subhani2025resam, ... journal={arXiv preprint arXiv:2511.21606}, year={2025}}