Global and local modeling is essential for image super-resolution tasks.However, current efforts often lack explicit consideration of the cross-scale knowledge in large-scale earth observation scenarios, resulting in suboptimal single-scale representations in global and local modeling.The key motivation of this work is inspired by two observations: 1) There exists hierarchical features at the local and global regions in remote sensing images, and 2) they exhibit scale variation of similar ground objects (e.g.
cross-scale similarity).In light of these, this paper presents an effective method to grasp the global 30hh bikini and local image hierarchies by systematically exploring the cross-scale correlation.Specifically, we developed a Cross-scale Self-Attention (CSA) to model the global features, which introduces an auxiliary token space to calculate cross-scale self-attention matrices, thus exploring global dependency from diverse token scales.To extract the cross-scale localities, a Cross-scale Channel Attention (CCA) is devised, where multi-scale features are explored and progressively incorporated into an enriched feature.
Moreover, by hierarchically deploying CSA and CCA into transformer groups, the proposed Cross-scale Hierarchical Transformer (CHT) can effectively explore cross-scale representations in remote sensing images, leading to a favorable reconstruction performance.Comprehensive experiments and analysis on four remote sensing datasets have demonstrated the superiority of CHT in both simulated and real-world remote sensing scenes.In particular, our CHT outperforms the state-of-the-art approach (TransENet) in terms of PSNR by 0.11 dB on average, but only accounts for 54.
8% of ngetikin its parameters.