跳转至

DeepSpace: Super Resolution Powered Efficient and Reliable Satellite Image Data Acquistion

Large constellations of low-earth orbit satellites enable frequent high-resolution earth imaging for numerous geospatial applications. They generate large volumes of data in space, hundreds of Terabytes per day, which much be transported to Earth through constrained intermittent connections to ground stations. The large volumes lead to large day-level delay in data download and exorbitant cloud storage costs. We propose DeepSpace, a new deep learning-based super-resolution approach that compresses satellite imagery by over two orders of magnitude, while preserving image quality using a tailored mixture of experts (MoE) superresolution framework. DeepSpace reduces the network bandwidth requirements for space-Earth transfer, and can compress images for cloud storage. DeepSpace achieves such gains with the limited computational power available on small LEO satellites. We extensively evaluate DeepSpace against a wide range of state-of-the-art baselines considering multiple satellite image datasets and demonstrate the above mentioned benefits. We further demonstrate the effectiveness of DeepSpace through several distinct downstream applications (wildfire detection, land use and cropland classification, and fine-grained plastic detection in oceans).

大型低地球轨道卫星星座能够为众多地理空间应用提供频繁的高分辨率地球成像。这些卫星在太空中每天产生高达数百TB的海量数据,而这些数据必须通过与地面站之间受限的间歇性连接传输回地球。巨大的数据量导致了天级别的下载延迟和高昂的云存储成本。

我们提出了一种名为 DeepSpace 的新型深度学习超分辨率方法。该方法利用一个定制的专家混合(MoE)超分辨率框架,在保持图像质量的同时,能够将卫星图像压缩超过两个数量级。DeepSpace 降低了星地(Space-Earth)传输的网络带宽需求,并能为云存储有效压缩图像。值得一提的是,DeepSpace 仅需利用小型低地球轨道卫星上有限的计算能力即可实现上述增益。

我们使用了多个卫星图像数据集,将 DeepSpace 与一系列最先进的基线模型进行了广泛的评估,验证了其前述优势。此外,我们通过几项不同的下游应用(例如野火检测、土地利用与作物分类、以及海洋精细塑料检测),进一步展示了 DeepSpace 的有效性。

Introduction

In recent years, hundreds of satellites were deployed in Low Earth Orbits (LEO) to capture frequent high-resolution Earth imagery [6, 20]. These satellites generate unprecedented density in spatiotemporal coverage of Earth – capturing multiple images of each location on Earth every day at high resolutions (e.g., 3m per pixel). Such imagery enables wide range of valuable geospatial sensing and computing applications, including disaster detection and relief [15, 27, 29, 52], climate change and environment monitoring [11, 47, 69], agriculture [31, 48, 50] and map generation [23, 37, 63].

Images captured by these satellites need to be transferred to Earth and stored in the cloud before a downstream application processes them, as Figure 1 illustrates. In practice, earth imagery transfer and storage faces three key challenges:

(1) Networking bottlenecks and delays: First, each satellite generates nearly a Terabyte (TB) of data per day, leading to hundreds of TBs of data per constellation [62]. To make things worse, the low orbits of these satellites lead to intermittent connectivity with ground station receivers on Earth. Each satellite-ground station connection lasts for less than ten minutes and has limited bandwidth (around 200Mbps), and such connections happen few times per day [24]. Due to such intermittent constrained connectivity and large data volumes, satellite imagery can suffer from transfer delays of several hours to a few days [61]. Such delays are impractical for latency-sensitive applications like monitoring natural disasters. Recent work has also shown that the growth in satellite deployments continues to outpace space-earth data capacity, implying that only a small fraction of the data can be downloaded in the future [18, 24].

(2) High cloud storage costs: Cloud storage of earth observation data is an increasingly expensive operation. Using public datasets [21, 37, 63], we estimate that more than 2 Petabytes (PBs) of new observation data to cover the earth gets generated each month. To store this much amount of data for the past few (say, 3) years and support immediate access, it costs millions of dollars each month, based on the storage pricing information of prominent cloud service providers (e.g., AWS [7], Azure [8]). Even cold storage is an expensive proposition. These costs keep adding up as a constellation operates for years, even more so with the increase in constellation and image sizes over time.

(3) Limited onboard compute resources on satellites: Satellites have limited computational resources – the state-of-the-art earth imaging satellites have small GPUs such as NVIDIA Jetson [4]. These resources are further limited by power availability which must prioritize mission-critical satellite operations. So, it is nontrivial to accommodate compute-intensive compression algorithms on the satellite.

近年来,数百颗卫星被部署在低地球轨道(Low Earth Orbits, LEO)上,以捕获频繁的高分辨率地球影像 [6, 20]。这些卫星在地球的时空覆盖上产生了前所未有的密度——每天以高分辨率(如每像素3米)对地球上的每个位置进行多次成像。这类影像催生了众多极具价值的地理空间感知与计算应用,包括灾害检测与救援 [15, 27, 29, 52]、气候变化与环境监测 [11, 47, 69]、农业 [31, 48, 50] 以及地图生成 [23, 37, 63]。

如图1所示,这些卫星捕获的图像需要先传输到地球并存储在云端,然后才能被下游应用处理。在实践中,地球影像的传输和存储面临三个关键挑战

alt text

(1) 网络瓶颈与延迟: 首先,每颗卫星每天产生近1TB的数据,整个卫星星座每天产生的数据量高达数百TB [62]。更严峻的是,这些卫星的低轨道特性导致其与地面站接收器的连接是间歇性的。每次星地连接的持续时间不到十分钟,且带宽有限(约200Mbps),而此类连接每天仅发生数次 [24]。由于这种间歇性的受限连接和巨大的数据量,卫星影像的传输延迟可能从数小时到数天不等 [61]。对于自然灾害监测等延迟敏感型应用而言,这样的延迟是不切实际的。最近的研究还表明,卫星部署数量的增长速度持续超过星地数据传输能力的增长,这意味着未来只有一小部分数据能够被成功下载 [18, 24]。

(2) 高昂的云存储成本: 地球观测数据的云存储正成为一项日益昂贵的操作。根据公开数据集 [21, 37, 63] 估算,每月为覆盖全球而生成的新观测数据超过2PB。若要存储过去几年(例如3年)的这些数据并支持即时访问,根据主流云服务提供商(如AWS [7]、Azure [8])的存储定价,每月的成本高达数百万美元。即便是冷存储也是一个昂贵的选择。随着星座运行时间的增长,这些成本会持续累积,而星座规模和图像尺寸的增加将进一步加剧这一问题。

(3) 有限的星上计算资源: 卫星的计算资源有限——最先进的地球成像卫星配备的是像NVIDIA Jetson这样的小型GPU [4]。这些资源还受到电力供应的限制,因为电力必须优先保障卫星关键任务的运行。因此,在卫星上部署计算密集型的压缩算法并非易事。

The above challenges necessitate the design of end-to-end compression systems that meet the following goals: (a) high compression ratios (CRs) to reduce the volume of data that needs to be transferred and stored; (b) computational efficiency of onboard operations to run within the compute and power constraints on satellites; (c) reliable decompression to meet the image quality requirements of a wide array of downstream applications.

None of the existing approaches for satellite image data acquisition [9, 17, 18, 24, 28, 61] satisfy these goals, as summarized in Table 1 and discussed in §2.2. While lossless methods like “7z” [57] are unviable due to their negligible CR (1.03), all other lossy compression methods only offer a CR typically around 10 and up to 32 much smaller than the 50 or higher desired CR [18]. As such, none of the existing methods meet goal (a). Some of the auto-encoder (AE) based methods (e.g., VQ-VAE 2 [53]) are too computationally heavy to run onboard satellites, and so do not meet goal (b). Most existing methods do not meet goal (c) in that they do not provide any guarantees on decoded/reconstructed image quality.

上述挑战要求我们设计一种端到端的压缩系统,以满足以下目标:

(a) 高压缩比 (CRs), 以减少需要传输和存储的数据量

(b) 高效的 星上计算 能力, 以在卫星的计算和功率限制内运行

(c) 可靠的 解压缩能力 , 以满足各种下游应用的图像质量要求

如表1总结及§2.2所讨论,现有的卫星图像数据采集方法 [9, 17, 18, 24, 28, 61] 均无法满足这些目标。虽然像“7z” [57] 这样的无损方法因其压缩比可忽略不计(仅1.03)而不可行,但所有其他的有损压缩方法的压缩比通常在10左右,最高不超过32,远低于所期望的50或更高 [18]。因此,现有方法均未能满足目标(a)。一些基于自编码器(AE)的方法(如VQ-VAE 2 [53])计算量过大,无法在卫星上运行,因此不满足目标(b)。大多数现有方法也无法满足目标(c),因为它们不对解码/重建后的图像质量提供任何保证。

We propose DeepSpace, a novel satellite image data acquisition system that meets all three goals mentioned above. DeepSpace adopts the deep learning-based image super-resolution (SR) approach [76] for the first time in the context of satellite imagery. In DeepSpace, images captured by a satellite are compressed into lowresolution (LR) versions and sent over the downlink to the cloud, where LR versions are decompressed back to high-resolution (HR) versions that mimic the original images captured by the satellite. DeepSpace employs image sampling (resizing) onboard satellites for compressing raw captured images to LR versions, as it is not only a lightweight operation that can be run well within onboard compute constraints but is also a flexible way to adapt and control the compression level.

DeepSpace enables significantly higher CRs than existing satellite image data acquisition systems by leveraging two key insights: (i) images captured by a satellite can be compressed down to the smallest LR form (equivalently, increase the CR to the point) that ensures LR images are distinguishable from each other; (ii) compared to image reconstruction through SR for all regions, per-region reconstruction allows higher CRs without compromising image reconstruction quality. We realize the first insight through compression onboard satellites that selects CR at the image tile level based on its inherent redundancy as well as closeness to reference images, latter based on communication and space efficient hash codes sporadically fetched over the uplink from the cloud. The second insight is reflected in DeepSpace through a tailored mixture of experts (MoE) [14] style SR framework that employs multiple “expert” SR models, each specializing in different regions and image characteristics. Furthermore, we provide post-hoc measures to estimate the error between recovered HR images and their original counterparts and to guarantee that error stays under a specified level. We implement the DeepSpace system considering the typical compute hardware suitable for deployment on satellites [4, 18, 61].

我们提出了 DeepSpace,一个能够满足上述三个目标的新型卫星图像数据采集系统。

DeepSpace 首次将 基于深度学习的图像超分辨率(Super-Resolution, SR)方法 [76] 应用于卫星影像领域:

  1. 在DeepSpace中,卫星捕获的图像被压缩成低分辨率(LR)版本,并通过下行链路发送到云端
  2. 在云端,LR版本被解压缩回高分辨率(HR)版本,以模拟卫星捕获的原始图像

DeepSpace 在卫星上采用图像采样(调整大小)的方式来压缩原始捕获的图像,这不仅是一种可以在星上计算约束内良好运行的轻量级操作,也是一种可以灵活调整和控制压缩级别的有效方法。

DeepSpace 通过利用两个关键见解,实现了远高于现有卫星图像数据采集系统的压缩比:

(i) 卫星捕获的图像可以被压缩至能确保低分辨率图像之间相互可区分的最小尺寸(等效于将压缩比提升至该点)

(ii) 与对所有区域进行统一的图像重建相比,分区重建能在不牺牲图像重建质量的前提下实现更高的压缩比

我们通过在卫星上实现的压缩算法来实现第一个见解,该算法根据图像块的内在冗余度及其与参考图像的接近程度,在图像块级别上选择压缩比;后者基于通过上行链路从云端不定期获取的、在通信和空间上高效的哈希码。

第二个见解则体现在DeepSpace中一个定制的 专家混合(Mixture of Experts, MoE)[14] 风格的超分辨率框架,该框架采用多个“专家”SR模型,每个模型专门处理不同的区域和图像特征 。此外,我们提供了后验评估方法来估计恢复的HR图像与其原始图像之间的误差,并保证误差维持在指定水平之下。我们在实现DeepSpace系统时,考虑了适合部署在卫星上的典型计算硬件 [4, 18, 61]。

We perform a comprehensive evaluation of DeepSpace using satellite images from multiple different datasets [1, 20, 63] covering diverse regions across the world. We evaluate its performance relative to state-of-the-art systems in terms of compression ratios, reliability and compute efficiency. We show that DeepSpace can achieve two orders of magnitudes greater compression (equivalently, 100x less traffic on downlink) while ensuring high reliability. Moreover, the significant compression gain on downlink traffic is achieved with less overhead on the satellite: magnitudes (100s x) faster processing speed and 10x less onboard storage requirement. Finally, we showcase the effectiveness of DeepSpace through diverse downstream applications: (1) wildfire detection; (2) plastic detection in oceans; and (3) land use and cropland classification. For all these applications, we show that the recovered images with DeepSpace even after >100x compression has a similar effect to that using ground truth images.

我们使用来自多个不同数据集 [1, 20, 63]、覆盖全球不同区域的卫星图像,对DeepSpace进行了全面评估。我们从压缩比、可靠性和计算效率方面,将其性能与最先进的系统进行了比较。结果表明,DeepSpace能够在确保高可靠性的同时,实现高出两个数量级的压缩(等效于下行链路流量减少100倍)。更重要的是,下行链路流量的显著压缩增益是在卫星开销更小的情况下实现的:处理速度快了几个数量级(百倍以上),星上存储需求减少了10倍。最后,我们通过多样化的下游应用展示了DeepSpace的有效性:(1)野火检测;(2)海洋塑料检测;以及(3)土地利用与作物分类。对于所有这些应用,我们证明了使用DeepSpace恢复的图像(即使在超过100倍压缩后)所达到的应用效果与使用原始真实图像(ground truth)的效果相当。

In summary, we make the following key contributions:

• We introduce the approach of deep learning based image superresolution (SR) combined with image sampling onboard satellites to the satellite based earth observation setting.

• We propose DeepSpace system that realizes this approach, featuring a tailored MoE based SR framework design for on-demand image decompression in the cloud with control over recovered image quality as well as innovations to adaptively select CR at the image tile level onboard satellites in a lightweight manner.

• We make the DeepSpace implementation publicly available 1 to benefit the research and innovation on earth observation with LEO satellites.

• Through extensive evaluations considering multiple different satellite image datasets and state-of-the-art baselines, we show that DeepSpace achieves two orders of magnitude higher CRs without compromising reliability of HR image recovery and using minimal onboard compute resources.

• Through three diverse and representative downstream applications, we demonstrate that reconstructed images with DeepSpace even after 100x+ compression yield similar application performance as with the ground truth raw uncompressed images.

综上所述,我们的主要贡献如下:

  • 我们将基于深度学习的图像超分辨率(SR)方法与星上图像采样相结合,并将其引入基于卫星的地球观测场景
  • 我们提出了DeepSpace系统来实现这一方法,其特色在于一个定制的、基于MoE的SR框架设计,用于在云端进行按需图像解压缩,并能控制恢复图像的质量;同时,我们还创新地以轻量级方式在星上实现了自适应地在图像块级别选择压缩比
  • 我们公开了DeepSpace的实现¹,以裨益利用LEO卫星进行地球观测的研究与创新
  • 通过使用多个不同的卫星图像数据集和最先进的基线模型进行广泛评估,我们证明了DeepSpace在不牺牲HR图像恢复可靠性的前提下,仅利用极少的星上计算资源,便实现了高出两个数量级的压缩比
  • 通过三个多样化且具代表性的下游应用,我们展示了即使在经过100倍以上压缩后,用DeepSpace重建的图像也能获得与使用原始真实未压缩图像相近的应用性能
tl; dr

本文核心亮点:

  1. DL-based 超分辨率(SR) 方法 与 星上图像采样 相结合
  2. 一个定制的、基于MoE的SR框架
    1. 云端: 按需解压缩
    2. 星上: 自适应选择压缩比

Motivation

2.1 Context and Constraints

Our setting of earth observation using images from low earth orbit (LEO) satellite systems has certain characteristics and constraints, as outlined below.

我们所研究的利用低地球轨道(LEO)卫星系统图像进行地球观测的场景,具有如下所述的特定特征和约束。

Satellite orbits: LEO satellites for earth observation operate in orbits around 500-1000 Km from the surface of the Earth. These satellites have an orbital period of around 90 minutes and typically orbit around the Earth’s poles. As the Earth rotates underneath, each satellite scans different locations on Earth during each orbit. Modern megaconstellations for earth observation (e.g., Planet Dove [20]) have hundreds of these satellites deployed to capture multiple images of each location on Earth every day.

卫星轨道

用于地球观测的LEO卫星在 距离地球表面约500-1000公里的轨道 上运行。这些 卫星的轨道周期约为90分钟 ,通常围绕地球两极运行。随着地球在下方自转,每颗卫星在每个轨道周期内都会扫描地球上的不同位置。现代用于地球观测的超大型星座(例如Planet Dove [20])部署了数百颗此类卫星,以实现每天对地球上每个位置进行多次成像

Communication constraints: Each satellite captures RGB or multi-spectral images at high resolutions of few square meters per pixel. The images captured from each satellite can amount to more than a Terabyte (TB) of data per day [2], and this in turn translates to hundreds of TBs of data per constellation [62]. These images must be transmitted to Earth for processing and analysis, typically in the cloud. However, the data transfer happens through intermittent contacts with ground stations. Due to a satellite’s orbital motion, these contacts last less than ten minutes and are limited to 4-6 contacts per day. Due to such intermittent connectivity and high data volumes, earth imagery data experiences hours to days of delay [62], limiting its use for time-sensitive applications like disaster monitoring as well as necessitating large onboard storage.

通信约束

每颗卫星以每像素几平方米的高分辨率捕获RGB或多光谱图像。每颗卫星每天捕获的图像数据量可达1TB以上 [2],进而 整个星座每天的数据量则高达数百TB [62]。这些图像必须传输到地球(通常是云端)进行处理和分析

然而,数据传输是通过与地面站的间歇性接触完成的。由于卫星的轨道运动,这些 接触的持续时间不足十分钟,且每天的接触次数仅限4-6次

由于这种间歇性的连接和巨大的数据量,地球影像数据会经历数小时到数天的延迟 [62],这限制了其在灾害监测等时间敏感型应用中的使用,同时也要求卫星具备大容量的星上存储

Power and computational constraints: Modern constellations consist of relatively small cubesats with size, weight, and power constraints. They generate power through solar panels during periods when they face the sun. This energy is stored for use throughout the orbit and is prioritized for critical satellite operations such as communication, attitude determination and control. The remaining power can be used for compute that effectively limits the compute resource onboard. Past work [18, 61] has therefore explored the use of low-power GPUs such as NVIDIA Jetson AGX Orin in the 15W power draw mode for operation on cubesats. This is consistent with recently launched LEO satellites with edge-computing capabilities [4].

功率与计算约束

现代星座由相对较小的立方星(cubesats)组成,它们在尺寸、重量和功率(size, weight, and power, SWaP)方面均受到限制。卫星在朝向太阳的时期通过太阳能电池板发电。这些能量被储存起来供整个轨道周期使用,并优先用于关键的卫星操作,如通信、姿态确定与控制。剩余的电力可用于计算,这实际上限制了星上的计算资源。因此,以往的研究 [18, 61] 探索了在立方星上使用诸如NVIDIA Jetson AGX Orin等低功耗GPU,并使其在15W功耗模式下运行。这与近期发射的具备边缘计算能力的LEO卫星 [4] 的情况相符。

2.2 Limitations of Existing Approaches

By way of motivation, here we discuss the different approaches that have been applied to resolve the satellite image data acquisition problem covering the three goals stated at the outset: (i) high compression ratio (CR), (ii) computational efficiency, and (iii) reliable decompression. Table 1 provides the summary.

From a CR perspective, we first start with an indication of the desired levels of CR. Prior works [18, 24] estimate that only 2% of the images captured by satellites can be downloaded currently due to the satellite-to-ground downlink bottleneck. Overcoming this bottleneck translates to a CR of at least 50. With expected increases in constellation size, downlink will become more of a bottleneck as per the analysis by Denby et al. [18] and so an even higher CR would be needed in the future.

作为我们研究的动机,我们在此讨论为解决卫星图像数据采集问题而已应用的各种方法,涵盖我们在引言中提出的三个目标:(i)高压缩比(CR),(ii)计算效率,以及(iii)可靠的解压缩。表1对此进行了总结。

从压缩比(CR)的角度来看,我们首先指出期望的CR水平。先前的研究 [18, 24] 估计, 由于星地(卫星到地面)下行链路的瓶颈,目前卫星捕获的图像中只有2%能够被下载。要克服这一瓶颈,意味着需要至少50的压缩比。 根据Denby等人 [18] 的分析,随着星座规模的预期增长,下行链路将成为更严重的瓶颈,因此未来将需要更高的压缩比。

To meet the above CR target, we note that lossless compression methods [57] like “7z” are unviable for two reasons. First, the typical CR achieved with these methods is a mere 1.03 based on our experiments using datasets from Planet Inc’s Dove constellation [20] and DynamicEarthNet (DEN) [63]; even the best case CR with these methods is 1.09 – way less than desired. Second, the image capturing process on satellites introduces some inherent distortion, as elaborated in Appendix A.1. Consequently, all existing works for satellite image compression in the literature [10, 18, 24, 25, 53, 71] (represented in Table 1) employ lossy methods. Even these lossy methods can only support CRs up to 32, as highlighted in Table 1, falling short of the desired CR. Our proposed solution, DeepSpace, overcomes this barrier, and other limitations of existing methods elaborated below, through a tailored system design taking a new “sampling + deep learning based image super-resolution (SR)” approach.

为了达到上述CR目标,我们注意到像“7z”这样的无损压缩方法 [57] 是不可行的,原因有二:

首先,根据我们使用Planet公司Dove星座 [20] 和DynamicEarthNet(DEN)[63] 数据集进行的实验,这些方法实现的典型CR仅为1.03;即使是最佳情况下的CR也只有1.09——远低于期望值。

其次,如附录A.1所详述,卫星的图像捕获过程会引入一些内在的失真。因此,文献中所有现有的卫星图像压缩工作 [10, 18, 24, 25, 53, 71](如表1所示)都采用了有损方法。然而,如表1所强调,即便这些有损方法也只能支持最高32的CR,未能达到期望的目标。

我们提出的解决方案 DeepSpace,通过一种新颖的“采样 + 基于深度学习的图像超分辨率(SR)”方法和量身定制的系统设计,克服了这一障碍以及下述现有方法的其他局限性。

Starting with the classical methods, Lánczos interpolation [36] is a traditional method that can be applied to resize (downsample) images to reduce their size, then interpolate afterwards to upsample and recover them to their original size. This approach can be effectively error free with Nyquist sampling rate but limits the CR. Compressive sensing (CS) [10] is another classical method that can provide slightly better CR by exploiting the underlying sparsity in the image signal.

In a different category of works, auto-encoders (AEs) are employed to enable better compression ratios (CRs). In fact, there exist custom AE designs for satellite image compression [9, 17, 28] that perform lightweight encoding to obtain compact representation of images captured by satellites in latent space which are then transmitted over the downlink. The reverse decoding process is done on the other side to reconstruct the original images. Using the state-of-the-art AEs for image compression from the computer vision domain [25, 53, 71] can yield higher CRs but these encoders are computationally heavy to run onboard satellites.

从经典方法说起,Lánczos插值 [36] 是一种传统方法,可用于调整图像大小(下采样)以减小其尺寸,然后在后续通过上采样恢复到原始尺寸。在满足奈奎斯特采样率的情况下,这种方法可以做到基本无误差,但会限制压缩比。压缩感知(Compressive Sensing, CS) [10] 是另一种经典方法,它通过利用图像信号中潜在的稀疏性,可以提供略高的CR。

在另一类研究中,自编码器(Auto-encoders, AEs) 被用来实现更高的压缩比(CR)。实际上,存在一些为卫星图像压缩定制的AE设计 [9, 17, 28],它们执行轻量级的编码,以获取卫星捕获图像在潜空间中的紧凑表示,然后通过下行链路传输。解码过程在另一端完成,以重建原始图像。使用来自计算机视觉领域的最新AE图像压缩模型 [25, 53, 71] 可以获得更高的CR,但这些编码器计算量过大,难以在卫星上运行。

Orbital Edge Computing (OEC) [19] has emerged as a new paradigm in recent years that leverages compute onboard satellites to mitigate the satellite to ground downlink communication bottleneck. Kodan [18] is a representative work following the OEC paradigm. For a given geospatial analysis application, Kodan stays within the limits of onboard compute capability and maximizes the value of data communicated through the constrained bottleneck by filtering out low value data and adapting data precision. Serval [61] is broadly similar to Kodan in that it prioritizes transmission of image data for certain latency-sensitive applications (e.g., forest fire detection) through static/dynamic filters, whose computation is distributed between the ground stations and satellites. Umbra [62] is a previous work by the authors of Serval that takes a complementary networking perspective to schedule downlink transmissions across ground stations, considering their differences in ground-cloud segment network characteristics and load.

Earth+ [24] is the latest work under the OEC category in which each satellite employs an image compression scheme that transmits only “changes” with respect to reference images for different locations, as opposed to transmitting raw images themselves. It leverages the uplink bandwidth to equip the satellites with the latest reference images from across the constellation to enable change based compression and transmission.

轨道边缘计算(Orbital Edge Computing, OEC) [19] 作为近年来的一个新兴范式,利用星上计算来缓解星地通信瓶颈:

  • Kodan [18] 是遵循OEC范式的一个代表性工作。对于给定的地理空间分析应用,Kodan在星上计算能力的限制内,通过滤除低价值数据和调整数据精度,来最大化通过受限瓶颈传输的数据价值
  • Serval [61] 与Kodan大体相似,它通过静态/动态滤波器优先传输某些延迟敏感型应用(如森林火灾检测)的图像数据,其计算任务分布在地面站和卫星之间
  • Umbra [62] 是Serval作者之前的一项工作,它从一个互补的网络视角出发,考虑到不同地面站在地面-云端网络段的特性和负载差异,来调度跨地面站的下行链路传输
  • Earth+ [24] 是OEC类别下的最新工作,其中每颗卫星采用一种图像压缩方案,只传输相对于参考图像的“变化量”,而不是传输原始图像本身
    • 它利用上行链路带宽为卫星装备来自整个星座的最新参考图像,以实现基于变化的压缩和传输

Note that both Kodan and Earth+ do not provide any error bounds as they are best effort in terms of quality/reliability of images recovered on the cloud side. Like with the above OEC works, AE based methods also do not provide any error bounds on the image reconstruction quality. Also note that, with the exception of Kodan and Earth+, the aforementioned methods require setting a fixed CR across all transferred images and so CR setting needs to be judiciously chosen to yield acceptable reconstructed image quality.

需要注意的是,Kodan 和 Earth+ 都不提供任何误差界限,因为它们在云端恢复图像的质量/可靠性方面是“尽力而为”(best effort)的。与上述OEC工作类似,基于AE的方法也不提供关于图像重建质量的任何误差界限。还需注意的是,除Kodan和Earth+外,上述方法都要求对所有传输的图像设置一个固定的CR,因此需要审慎选择CR设置以获得可接受的重建图像质量。

DeepSpace System

Here we present our solution, DeepSpace (schematic in Figure 2), for satellite image data acquisition that meets the three goals as laid out at the outset. Through DeepSpace, we introduce the deep learning based super-resolution (SR) approach to the satellite image data collection setting for the first time. This approach inherently involves compression of captured high-resolution (HR) images on satellites to low-resolution (LR) versions onboard through image sampling (resizing), a computationally lightweight operation well suited with limited compute and fast processing onboard. On the cloud side, HR versions reflecting original images are recovered as needed from received LR versions through a SR framework.

在此,我们介绍我们的解决方案——DeepSpace(其示意图见图2),一个用于卫星图像数据采集并满足开篇所述三个目标的系统。通过DeepSpace,我们首次将基于深度学习的超分辨率(SR)方法引入到卫星图像数据收​​集场景中。该方法本质上涉及在卫星上通过图像采样(调整大小)将捕获的高分辨率(HR)图像压缩为低分辨率(LR)版本,这是一种计算轻量级的操作,非常适合星上有限的计算资源和快速处理的需求。在云端,系统可以根据需要,通过一个超分辨率框架从接收到的LR版本中恢复出能反映原始图像的HR版本。

3.1 Key Insights

While there exist methods for deep learning based image super-resolution from the computer vision (CV) domain [49, 56, 75, 76], as shown in Table 1 and through our evaluations (§6), their straight-forward application to our satellite image data acquisition setting yields CRs that are only marginally better than existing methods designed for this setting. Furthermore, existing SR methods do not provide any guarantees or bounds on the reconstructed image quality. In contrast, DeepSpace with its tailored design not only enables an order of magnitude higher CR but also provides error bounds on reconstructed image quality, comes with fast processing for onboard compression.

To achieve higher levels of CR than what is possible with existing solutions, our design leverages two key insights:

1) Can use maximal CR that ensures distinguishablity among sampled (LR) images. In DeepSpace, compression is realized via sampling, i.e., by reducing the image resolution to a recoverable level. From Sparse Auto Encoder (SAE) theory [16, 45, 51], we know that a faithful reconstruction of encoded data requires the encoding of distinct inputs be distinct, and obviously smaller in size than inputs themselves. In our context, sampling is used to perform encoding. For reliable LR-HR image reconstruction, we need sampled versions of images to be still distinguishable from each other. In other words, with the sampling process S(·), we want to ensure that:

If : \(Dis(A, B) > Dis(A, C)\), \(\rightarrow Dis_L(S(A), S(B)) > Dis_L(S(A), S(C))\) (1)

where A, B, C are image instances, Dis(·) is a distance measure quantifying their similarity, and Dis(·) is the same distance measure applied in latent space, i.e., the space of sampled (LR) images. Limiting S(·) to satisfy condition (1) guarantees that in theory it is possible to distinguish different inputs from their latent representations. As we use sampling as encoder in DeepSpace, note that the compressed input still keep the same representation as the corresponding original image, and so \(Dis(\cdot) = Dis_L(\cdot)\). We use the well-known SSIM [67] for \(Dis(\cdot)\) and \(Dis_L(\cdot)\).

The implication from the above is that we can push the CR to the point that ensures distinguishability between LR images. As elaborated in §3.2, we obtain cues on the maximal CR to use for an image at the satellite end via (a) interpolation based recovery of sampled images and (b) similarity with most recent reference images for same region.

2) Using a multitude of image SR models specializing for different contexts allows faithful recovery even at high CRs. The extent of sampling (level of CR) is limited to ensure faithful recovery of images from their sampled versions, when they come from a large input set spanning different regions. As an example, in Figure 3 we compare the similarity between an image of one region 𝐴 with another image of the same region at a different time (in blue) as well as with images of different regions 𝐵 and 𝐶, with increasing CRs. We use SSIM as the similarity measure for this argument. At low CR, it is very easy to distinguish different regions, as images of the same region 𝐴 are similar having SSIM around 0.9, while the SSIM to different regions 𝐵 and 𝐶 is lower around 0.7.

However, as we increase CR, we find the mutual similarity of images from same region \(A\) at different time can become lower than the similarity to different regions, suggesting the difficulty of recovering HR images across different regions. If we consider region \(A\), \(B\) and \(C\) at the same time, the maximum CR to identify all regions (64) is lower than the case when only region \(A\) and \(B\) are considered (220) in Figure 3, and that is in turn lower than the case when only region \(A\) at different times is considered (> 400). In other words, generalizing to more regions makes it harder to identify different regions at high CR, and therefore limits the potential CR for achieving faithful reconstruction. The above discussion suggests that we should leverage a pool of “expert” SR models, each specializing in the reliable image recovery (SR) of different regions and image characteristics (e.g., cloud level). This allows us to raise the CR to a higher level than would otherwise be possible with a single recovery model across regions.

尽管计算机视觉(CV)领域已存在基于深度学习的图像超分辨率方法 [49, 56, 75, 76],但如表1及我们的评估(§6)所示,将它们直接应用于我们的卫星图像数据采集场景时,所产生的压缩比(CR)仅比专为此场景设计的现有方法略有提升。此外,现有的SR方法不对重建图像的质量提供任何保证或界限。相比之下,DeepSpace凭借其定制化设计,不仅能实现高出一个数量级的CR,还能为重建图像质量提供误差界限,并具备快速的星上压缩处理能力。

为了实现比现有解决方案更高的CR,我们的设计利用了两个核心见解:

1) 可使用能确保采样后(LR)图像间可区分性的最大化CR

在DeepSpace中,压缩是通过采样实现的,即将图像分辨率降低到一个可恢复的水平。根据稀疏自编码器(SAE)理论 [16, 45, 51],要实现对编码数据的高保真重建,需要确保不同输入的编码结果也是不同的,并且编码后的尺寸显然要小于输入本身。在我们的场景中,我们使用采样来执行编码。为了实现可靠的LR-HR图像重建,我们需要图像的采样版本之间仍然是可区分的。换言之,对于采样过程S(·),我们希望确保:

若:\(Dis(A, B) > Dis(A, C)\), 则 \(\rightarrow Dis_L(S(A), S(B)) > Dis_L(S(A), S(C))\) (1)

其中A、B、C是图像实例,Dis(·)是量化它们相似性的距离度量,而Dis(·)是应用于潜空间(即采样后的LR图像空间)的相同距离度量。将S(·)限制为满足条件(1)可以从理论上保证能够从它们的潜表示中区分出不同的输入。由于我们在DeepSpace中使用采样作为编码器,请注意压缩后的输入与其对应的原始图像保持相同的表示,因此\(Dis(\cdot) = Dis_L(\cdot)\)。我们使用广为人知的SSIM [67] 作为\(Dis(\cdot)\)\(Dis_L(\cdot)\)的度量。

上述分析的启示是,我们可以将CR推高至确保LR图像之间可区分性的极限。如§3.2所述,我们在卫星端通过以下方式获得关于图像最大可用CR的线索:(a) 对采样图像进行基于插值的恢复;(b) 与同一区域最新参考图像的相似度。

2) 使用多个专门针对不同情境的图像SR模型,即使在高CR下也能实现高保真恢复

当输入集庞大且跨越不同区域时,为了确保能从采样版本中高保真地恢复图像,采样的程度(即CR的水平)会受到限制。举例来说,在图3中,我们比较了区域A的一张图像与同一区域在不同时间拍摄的另一张图像的相似度(蓝色曲线),以及与不同区域B和C的图像的相似度,同时不断增加CR。在此论证中,我们使用SSIM作为相似性度量。在低CR时,区分不同区域非常容易,因为同一区域A的图像彼此相似,SSIM在0.9左右,而与不同区域B和C的SSIM则较低,在0.7左右。

然而,随着CR的增加,我们发现来自同一区域A在不同时间的图像间的相互相似度,可能会变得低于其与不同区域图像的相似度,这表明跨不同区域恢复HR图像存在困难。如果我们同时考虑区域A、B和C,用于识别所有区域的最大CR(64)低于仅考虑区域A和B的情况(220,见图3),而后者又低于仅考虑区域A在不同时间的情况(> 400)。换言之,泛化到更多区域会使得在高CR下识别不同区域变得更加困难,从而限制了实现高保真重建的潜在CR。上述讨论表明,我们应该利用一个由“专家”SR模型组成的模型池,每个专家模型专门负责不同区域和图像特征(如云量水平)的可靠图像恢复(SR)。这使我们能够将CR提升到比使用单一跨区域恢复模型所能达到的更高水平。

3.2 Design Overview

Figure 2 depicts a high level overview of the DeepSpace system, broadly composed of two components: (1) image compression onboard a satellite; and (2) on-demand reconstruction of compressed image on the cloud end through SR. Our design builds on the two insights from the previous subsection. Also note that in DeepSpace, we collect and store all images captured by each earth observation satellite in the cloud in a compressed form. Then when a downstream application needs to access all/subset of these images, we decompress them on-demand to provide high-fidelity versions mimicking their original image counterparts.

图2描绘了DeepSpace系统的高层概览,该系统大致由两个部分组成:

(1) 卫星上的图像压缩

(2) 在云端通过SR对压缩图像进行按需重建

alt text

我们的设计建立在前一小节的两个核心见解之上。另请注意,在DeepSpace中,我们以压缩形式收集并存储每颗地球观测卫星捕获的所有图像。然后,当某个下游应用需要访问这些图像的全部/子集时,我们按需将其解压缩,以提供模拟其原始图像的高保真版本。

Image compression onboard satellite. For this, we exploit two complementary opportunities in DeepSpace, both via sampling (as illustrated in Figure 5): (i) inherent redundancy ( 𝜙 ) within an image; (ii) similarity ( 𝜂 ) to known (previously seen) images, i.e., if a newly captured image of a region is a close replica of known reference image for the same region. We use (i) to guide the base level of compression. As the next level, we rely on (ii) to further compress the result obtained from applying (i). Across both these levels/dimensions, we pursue a fine-grained compression of images by viewing each of them as a composition of smaller non-overlapping “tiles” (with a default size of 256 pixels × 256 pixels).

More concretely, during the onboard image compression process (depicted in Figure 4a and details in §4.1), DeepSpace first estimates the inherent redundancy 𝜙 ( ○ 1 in Figure 4a) and similarity to existing references 𝜂 ( ○ 2 in Figure 4a). A high value of 𝜙 indicates high inherent redundancy in the image (e.g., clouds or ice), which symbolizes potential for significant compression. A high value 𝜂 indicates high similarity with reference images (and hence, higher potential CR). To compute 𝜂 , DeepSpace obtains lightweight representation of reference images via binary locality sensitive hashing (BLSH) [38, 66] from the ground over the uplink ( ○ 3 in Figure 4a). An adaptive compression ratio is applied to each image based on 𝜙 and 𝜂 , and then the compressed version is sent out to the ground ( ○ 4 in Figure 4a). Images with low 𝜙 and 𝜂 are temporarily stored onboard for potential retransmission later with minimal or no compression ( ○ 5 in Figure 4a), as they reflect a potential outlier in that they are both complex and not similar to known images (quadrant ○ 3 in Figure 5). For other images, DeepSpace sends a compressed version of the onboard image to the ground.

卫星上的图像压缩

为此,我们在DeepSpace中利用了两个互补的机会,两者都通过采样实现(如图5所示):(i) 图像内部的内在冗余度(𝜙);(ii) 与已知(先前见过)图像的相似度(𝜂),即一个新捕获的区域图像是否是同一区域已知参考图像的精确副本。我们使用(i)来指导基础级别的压缩。在下一个级别,我们依赖(ii)来进一步压缩应用(i)后得到的结果。在这两个级别/维度上,我们通过将每张图像视为由更小的非重叠“图块”(默认尺寸为256×256像素)组成,来实现对图像的细粒度压缩。

更具体地说,在星上图像压缩过程中(如图4a所示,详见§4.1),DeepSpace首先估计内在冗余度𝜙(图4a中的○1)和与现有参考的相似度𝜂(图4a中的○2)。

  • 高𝜙值表示图像具有高内在冗余度(例如云或冰),这象征着巨大的压缩潜力
  • 高𝜂值表示与参考图像高度相似(因此有更高的潜在CR)

alt text

alt text

为了计算𝜂,DeepSpace通过上行链路从地面获取参考图像的轻量级表示 —— 二元局部敏感哈希(BLSH)[38, 66](图4a中的○3)。根据𝜙和𝜂对每个图像应用自适应的压缩比,然后将压缩版本发送到地面(图4a中的○4)。𝜙和𝜂值较低的图像会暂时存储在星上,以便后续以最小或无压缩的方式进行潜在的重传(图4a中的○5),因为它们可能反映了一个异常值,既复杂又与已知图像不相似(图5中的象限○3)。对于其他图像,DeepSpace将星上图像的压缩版本发送到地面。

On-demand image decompression at cloud. For this, DeepSpace features a tailored deep SR framework (illustrated in Figure 4b and details in §4.2) with a pool of expert SR models à la mixture of experts (MoE) models [14], guided by insight 2) above. Each expert is realized with a separately trained wavelet diffusion model [49], following the reasoning in §4.2. Overall, there are three major functions in the decompression process, namely high resolution data reconstruction ( ○ 1 in Figure 4b), triggering retransmission when the estimated fidelity is lower than expected ( ○ 2 in Figure 4b, details in §4.3), and updation of reference hash codes ( ○ 3 in Figure 4b). Furthermore, we provide a way to guarantee a specified reconstruction quality (error bound) as well as post-hoc measures for estimating quality of recovered images on the cloud side with respect to their original counterparts on the satellite end (details in §4.2 and §4.3).

Example: Figure 6 shows an example to illustrate the end-to-end operation of DeepSpace, including the combined effect of applying the two levels of onboard compression at the tile granularity.

云端的按需图像解压缩

为此,DeepSpace配备了一个定制的深度SR框架(如图4b所示,详见§4.2),该框架采用了一个专家SR模型池,类似于专家混合(MoE)模型 [14],这得益于上述的见解2)。

alt text

每个专家模型都是通过一个单独训练的小波扩散模型 [49] 实现的,其原理见§4.2。总的来说,解压缩过程有三个主要功能,即:

高分辨率数据重建(图4b中的○1)、当估计保真度低于预期时触发重传(图4b中的○2,详见§4.3),以及更新参考哈希码(图4b中的○3)

此外,我们提供了一种方法来保证指定的重建质量(误差界限),以及用于在云端估计恢复图像相对于其在卫星端的原始图像质量的后验评估方法(详见§4.2和§4.3)

示例: 图6展示了一个示例,用以说明DeepSpace的端到端操作,包括在图块粒度上应用两个级别的星上压缩的综合效果。

alt text

DeepSpace first compresses (samples) each of the image tiles separately by a factor of 𝐾 based on their inherent redundancy and then further increases the CR by a factor 𝛼 (still at the tile level) if the tile is a close replica of a recent known image tile for the same location. Resulting compressed images are transmitted over the downlink and are recovered with high fidelity through a MoE based SR framework and known information at the cloud.

For comparison, we show Kodan [18], a current state of the art approach. In OEC based methods like Kodan [18, 61], there is onboard processing to identify if an image tile is valuable by ML-based classification, i.e., non-cloudy and required by a certain downstream application. For fair comparison to support all applications, we let Kodan discard only cloudy tiles onboard and replace them with the nearest non-cloudy observation on the ground in Figure 6. This

DeepSpace首先根据每个图像图块的内在冗余度,分别将其压缩(采样)K倍,然后,如果该图块是同一位置近期已知图像图块的精确副本,则在图块级别上进一步将其CR增加𝛼倍。由此产生的压缩图像通过下行链路传输,并在云端通过基于MoE的SR框架和已知信息以高保真度进行恢复。

作为比较,我们展示了当前最先进的方法Kodan [18]。在像Kodan [18, 61]这样基于OEC的方法中,星上处理通过基于机器学习的分类来识别一个图像图块是否有价值,即是否无云且为某个下游应用所需。为了进行公平比较以支持所有应用,在图6中,我们让Kodan在星上仅丢弃含云图块,并在地面上用最近的无云观测数据替换它们。

Detailed Design

Here we elaborate on the design details of key components underlying DeepSpace. Implementation aspects of DeepSpace are provided in Appendix A.2.

warning

这一节只是为了说明上面 ch3 没有 "oversimplified"

其实没啥必要读 ...

4.1 Image Compression Onboard Satellite

Inherent redundancy quantification and CR selection: For inherent (redundancy) quantification, we first sample each tile of the original (captured) image evenly by K times, and then use Lánczos interpolation [36] to reconstruct the tile, see further discussion and visualization in §A.3. The redundancy is then defined as

\(\phi(I, K) = \text{SSIM}(I, \text{LI}(S(I, K)))\) (2)

where \(I\) is the input image tile, LI(·) is the Lanczos Interpolation, and S(·, K) represents evenly sampling by K times. We use SSIM [67] as the similarity measure because it is a commonly used image quality metric and also normalized (between 0 to 1), making it easy to align among different scenarios. Furthermore, it is computationally lightweight and also known to be a robust metric than PSNR for most computer vision applications [46, 67]. Although learned metrics like LPIPS [74] are more accurate for DL-based downstream tasks, they involve model inference and so have high computational overhead for use onboard satellites.

To attain maximal compression while also allowing reliable recovery, we choose \(K\) as: arg max\(_K \phi(I, K) > \tau_\phi\) (by default set to 0.85). Note that \(K\) (the CR value obtained in the above manner) could be different for different tiles of an image (see Figure 6, for example).

Near replicate detection: We use binary locality sensitive hashing (BLSH) [38, 66] to help detect if a newly captured image tile of a region nearly replicates the tile of a recent image of the same region, where the recent reference is updated weekly by default. We do this as follows (at grey scale):

\(\eta(I) = \arg\max_{X \in X_{\text{ref}}} 1 - \frac{\text{HD}(\text{BLSH}(I), X)}{w \times l}\) (3)

where \(I\) is the input image tile, \(X\) is an instance from reference BLSH code set \(X_{\text{ref}}\), HD is the Hamming Distance of hash codes. HD(·) is then normalized by the image size \(w \times l\). The satellite receives BLSH of the reference image on ground at the tile level for each region in its orbit from the cloud via the uplink. These reference BLSH codes on the satellite are updated infrequently, once a week in our setup. We choose BLSH for its fast computation and ultra-small hash code size, enabling real-time onboard processing and efficient uploading of reference codes from the cloud – around 4 orders of magnitude smaller in size compared to sending even a LR version of reference image over uplink. This approach significantly reduces storage requirements to a negligible size, even when saving codes to cover the entire Earth.

The BLSH based replicate detection serves two purposes in DEEPSPACE: (1) Guide compression process by giving tiles with near replicate references higher CR than the CR defined by Equation (2), i.e., increase CR for a tile from K to \(\alpha\)K if \(\eta(I) > \tau_\eta\) (by default set to 0.9); (2) Simple yet effective cloud detection mechanism as part of MoE based SR framework used for decompression (see Figure 7 and §4.2). The rationale for the latter is that when an image tile has high \(\phi(·)\) and low \(\eta(·)\) (quadrant ② in Figure 5), it has higher probability to be cloudy (caused by cloud, smoke, volcano ash, etc.).

Temporary outlier storage for retransmission: If an image is complex and not similar to any reference image, i.e., an outlier, then it does not allow high compression and faithful decompression. Accurate Out-of-Distribution (OOD) detection [70] is complex and hence infeasible to be conducted onboard, considering the compute limitations on satellites. So, we flag potential outliers in a lightweight manner when \(\phi(·)\) and \(\eta(·)\) are both low (quadrant ③ in Figure 5). In such cases, we transmit a LR version of the image with minimal CR as per Equation (2). We also temporarily store the image onboard to retransmit it as is on request when the transmitted version cannot be recovered faithfully on the cloud side.

内在冗余度量化与压缩比选择: 为了进行内在(冗余)量化,我们首先对原始(捕获)图像的每个图块进行K倍均匀采样,然后使用Lánczos插值[36]来重建该图块(进一步的讨论和可视化见§A.3)。冗余度则定义为:

\(\phi(I, K) = \text{SSIM}(I, \text{LI}(S(I, K)))\) (2)

其中,\(I\)是输入图像图块,LI(·)是Lánczos插值,S(·, K)代表进行K倍均匀采样。我们使用SSIM [67] 作为相似性度量,因为它是一个常用的图像质量度量,并且其结果被归一化(在0到1之间),便于在不同场景下进行对齐。此外,它的计算是轻量级的,并且在大多数计算机视觉应用中被认为是比PSNR更鲁棒的度量[46, 67]。尽管像LPIPS [74]这样的学习型度量对于基于深度学习的下游任务更准确,但它们涉及模型推理,因此在卫星上使用会有很高的计算开销。

为了在保证可靠恢复的同时获得最大压缩率,我们将 \(K\) 选择为:arg max \(_K \phi(I, K) > \tau_\phi\)(默认设置为0.85)。请注意,以这种方式获得的\(K\)值(即CR值)对于同一图像的不同图块可能是不同的(例如,见图6)。

近似副本检测: 我们使用二元局部敏感哈希(BLSH)[38, 66]来帮助检测一个新捕获的图像图块是否与同一区域近期图像的图块近似重复。此处的近期参考默认每周更新一次。我们按如下方式(在灰度图上)进行操作:

\(\eta(I) = \arg\max_{X \in X_{\text{ref}}} 1 - \frac{\text{HD}(\text{BLSH}(I), X)}{w \times l}\) (3)

其中,\(I\)是输入图像图块,\(X\)是来自参考BLSH码集合 \(X_{\text{ref}}\) 的一个实例,HD是哈希码的汉明距离。HD(·)随后通过图像尺寸 \(w \times l\) 进行归一化。卫星通过上行链路从云端接收其轨道上每个区域的参考图像BLSH码(图块级别)。在我们的设置中,这些星上参考BLSH码更新频率较低,为每周一次。我们选择BLSH是因为其计算速度快和哈希码尺寸极小,这使得实时的星上处理和从云端高效上传参考码成为可能——与通过上行链路发送一个低分辨率(LR)版本的参考图像相比,其尺寸要小约4个数量级。这种方法即便在需要保存覆盖整个地球的哈希码时,也能将存储需求显著降低到可忽略不计的程度。

基于BLSH的副本检测在DEEPSPACE中有两个目的:

(1) 通过为具有近似重复参考的图块赋予比公式

(2) 所定义的CR更高的CR来指导压缩过程,即: 如果 \(\eta(I) > \tau_\eta\)(默认设置为0.9),则将图块的CR从K增加到 \(\alpha\) K;

注意: (2) 作为用于解压缩的MoE SR框架的一部分,提供一个简单而有效的云检测机制(见图7和§4.2)。后者的基本原理是,当一个图像图块具有高 \(\phi(·)\) 和低 \(\eta(·)\)(图5中的象限②)时,它有很高的概率是多云的(由云、烟、火山灰等引起)。

为重传而设的离群点临时存储: 如果一个图像既复杂又与任何参考图像不相似(即一个离群点),那么它将不允许高压缩率和高保真解压缩。

考虑到卫星上的计算限制,精确的分布外(OOD)检测[70]非常复杂,因此在星上执行是不可行的。因此,当 \(\phi(·)\)\(\eta(·)\) 都较低时(图5中的象限③),我们以一种轻量级的方式标记潜在的离群点。在这种情况下,我们根据公式(2)以最小的CR传输该图像的一个LR版本。我们 同时将该图像临时存储在星上,以便当传输的版本无法在云端被高保真地恢复时,可以根据请求按原样重传

4.2 On-Demand Decompression at Cloud

Mixture of Experts (MoE) based SR Framework (Path ① in Figure 4b) Following the reasoning in §3.1, in particular insight 2) to enable high CRs while maintaining reliable image recovery, rather than have one single SR model, we employ a collection of SR models in DEEPSPACE to recover HR images from received LR images, each with a focus on a specific case (characterized by region type and image characteristics). In essence, each of these models can be viewed as an “expert” and the overall collection resembling a Mixture of Experts (MoE) model [14, 54, 58, 73].

Figure 7 illustrates our MoE based SR framework. Training conventional MoE models [54, 58] is very resource-intensive, as both classifiers and experts are dynamically generated during the training process. We instead take a tailored approach in DEEPSPACE. For satellite HR image reconstruction from LR input, as we know the key factors that influence reconstruction quality, we predefine classifier and expert configurations before training. We also limit the selection mechanism to direct each input tile to a single expert, rather than multiple experts in conventional MoE models, thereby further reducing computational overhead.

As shown in Figure 7, we allocate different models (i.e., experts) to different tiles according to the 4-step classification taken, spanning classifiers at different levels. As a result, different tiles (sub-regions) are reconstructed simultaneously by appropriate experts, ensuring both high reconstruction quality and efficient inference. We consider four levels of classifiers as follows:

  • Level 1 (Region Identification): This stage classifies the input based on the regions it covers. Note that images of water bodies are directed to a standalone category (“Region 0” in Figure 7) using a pretrained binary classifier that takes in the location and LR image. All other images are assigned to their corresponding region types (forests, cities, etc.) based on their location.
  • Level 2: This level directs the input to a specific redundancy level in terms of SSIM after interpolation (defined as in Equation 4), which is computed onboard and included as meta data from satellites. Note that this is different to input size (of transmitted LR image) because we apply higher CR to images with near-replicate reference images. This does not change the inherent redundancy but only size.
  • Level 3: This level of classifiers steer the input tile depending on its size.
  • Level 4: This level computes the cloudy level with a pretrained classifier, which takes in LR image and outputs a cloudy level score after softmax. The number of cloudy levels vary by dataset, as Table 17 shows.

After four levels of classification, the input tile is directed to an expert that is best suited to generate the high quality HR version of the tile in question.

Expert Modeling with Wavelet Diffusion. We now consider how to model each expert to perform SR for a given region. Same approach can be applied to other regions. Reliable SR in this context can be formulated as an optimization problem that yields minimum error with respect to the ground truth image. This optimization can be viewed considering both the result of naive SR via interpolation and use of reference images, as shown geometrically with a 2D projection in Figure 8. The red point in Figure 8 is the result of naive SR with interpolation and the red circle shows the constraint for this approach, defined by Equation (4) using SSIM as the quality measure:

\(T = \text{SSIM}(\text{LI}(S(I)), I),\) (4)

where I is the original input image. Note that \(T\) can be measured directly onboard. Figure 8 also illustrates the ground truth’s proximity to its nearest references using BLSH, depicted as blue dashed circles.

The optimization problem can be formulated as:

\(\arg\min_{\hat{\Delta}} \sum_i |\text{D}(I' + \hat{\Delta}, I_l) - \delta_l|, \forall I_l \in \mathbb{I}_{\text{ref}}\) (5)

s.t. \(\text{SSIM}(I' + \hat{\Delta}, I') = T\)

where \(I' = \text{LI}(S(I))\) is the result of naive SR, \(\delta_l = \text{D}(I_{\text{real}}, I_l), \forall I_l \in \mathbb{I}_{\text{ref}}\) are the reference images computed onboard from their corresponding BLSH codes, function D(·) refers to the BLSH difference as follows:

\(\text{D}(a, b) = \text{NHD}(\text{BLSH}(a), \text{BLSH}(b))\), (6)

NHD(·) is the HD normalized by the image size.

This optimization problem targets minimizing the difference between ‘the total distance of the reconstructed image to all reference images’ and ‘the total distance of original image to all reference images’. It may have multiple solutions as neither SSIM differences nor hash codes can uniquely identify an image. Note that the constraint here means that the reconstructed image has the same SSIM with respect to the image with naive reconstruction by interpolation as that with respect to the ground truth (see equation 4).

Solving this problem optimally is challenging due to the large size and high dimensionality of satellite images, so we instead seek a data-driven heuristic solution. We can represent the change from naive reconstruction \(\hat{\Delta}\) as a neural network \(\hat{\Delta} = F(I')\) and optimize it through training where the loss function would be the objective function. The heuristic solution can be interpreted as a super resolution task, where we find that the Wavelet Diffusion [49] outperforms other methods. Qualitatively too, wavelet diffusion is particularly well-suited for our purpose, as it focuses on reconstructing high-frequency details without compromising low-frequency information, crucial in super-resolution tasks. Moreover, wavelet diffusion is superior to other methods in terms of both output fidelity by leveraging the spectral sparsity of images, enabling the generative model to focus on the most relevant frequency bands in the training data and inference speed. This characteristic aligns perfectly with our MoE based SR framework, where each expert is trained for specific case. In satellite images, spectral sparsity is particularly pronounced, as different types of regions (e.g., forests, buildings) exhibit details at distinct spatial scales, corresponding to specific frequency bands. Given its advantages in both efficiency and reconstruction quality, all experts in DEEPSPACE are based on wavelet diffusion.

Outlier Retransmission (Path ② in Figure 4b). When an image is temporarily stored onboard, we fetch the raw image itself if it cannot be reliably recovered from its minimally compressed LR version. For this assessment, we use two measures – Low Resolution SSIM (LRS) in Equation (7) and Hash Similarity (HSIM) in Equation (8) to ground truth images, as described below.

\(LRS = \text{SSIM}(I_{\text{in}}, S(I_{\text{recon}}))\) (7)

\(HSIM(I_{\text{real}}, I_{\text{recon}}) = 1 - \frac{\text{HD}(\text{BLSH}(I_{\text{real}}), \text{BLSH}(I_{\text{recon}}))}{w \times l}\) (8)

Here \(I_{\text{recon}}\) refers to the reconstructed image with DEEPSPACE, whereas \(S(I_{\text{recon}})\) is its LR version downsampled to match the same size as the input for reconstruction, i.e., \(I_{\text{in}}\). Note that LRS assesses whether the model preserves the essential low-frequency details of the ground truth during reconstruction. The HSIM, on the other hand, offers a direct bit-level comparison between the reconstructed output and the ground truth, providing a more granular measure of similarity. When used together, LRS and HSIM provide a more robust and comprehensive evaluation of DEEPSPACE reconstruction performance, capturing both broad structural fidelity and fine-grained accuracy. Note that both LRS and HSIM are post-hoc measures that can be computed on the cloud side with the aid of LR image input and associated (small amount of) meta data from the satellite. Concretely, the meta data here refers to BLSH(\(I_{\text{real}}\)) along with the image size \(w \times l\). Retransmission is triggered if LRS or HSIM are lower than a predefined threshold. In other words, such retransmission based on reconstruction fidelity analysis allows the quality of the recovered/received images to be always ensured.

基于专家混合(MoE)的SR框架(图4b中的路径①)

遵循§3.1的思路,特别是为了在保持可靠图像恢复的同时实现高压缩比(CR)的见解2),我们没有采用单一的SR模型,而是在DEEPSPACE中部署了一系列SR模型,用于从接收到的低分辨率(LR)图像中恢复高分辨率(HR)图像。每个模型都专注于一个特定的情况(由区域类型和图像特征来表征)。从本质上讲,这些模型中的每一个都可以被视为一个“专家”,而整个模型集合则类似于一个专家混合(MoE)模型[14, 54, 58, 73]。

图7阐释了我们基于MoE的SR框架。训练传统的MoE模型[54, 58]是资源密集型的,因为分类器和专家都是在训练过程中动态生成的。与此不同,我们在DEEPSPACE中采用了一种定制化方法。对于从LR输入重建卫星HR图像的任务,由于我们了解影响重建质量的关键因素,因此我们在训练前就预定义了分类器和专家的配置。我们还将选择机制限制为将每个输入图块导向单个专家,而不是像传统MoE模型那样导向多个专家,从而进一步降低了计算开销。

alt text

如图7所示,我们根据一个四步分类流程, 将不同的模型(即专家)分配给不同的图块,该流程跨越了不同级别的分类器。因此,不同的图块(子区域)由合适的专家同时进行重建 ,从而确保了高重建质量和高效的推理过程。我们考虑以下四个级别的分类器:

  • 级别1(区域识别): 此阶段根据输入所覆盖的区域对其进行分类。请注意,水体图像会通过一个预训练的二元分类器(该分类器接收位置和LR图像作为输入)被导向一个独立的类别(图7中的“区域0”)。所有其他图像则根据其位置被分配到相应的区域类型(森林、城市等)。
  • 级别2: 此级别根据插值后的SSIM(如公式4中所定义),将输入导向一个特定的冗余度级别。该值在星上计算并作为元数据随卫星数据一同下传。请注意,这与传输的LR图像的输入尺寸不同,因为我们对具有近似重复参考图像的图像应用了更高的CR。这不会改变内在的冗余度,仅改变尺寸。
  • 级别3: 此级别的分类器根据输入图块的尺寸对其进行引导。
  • 级别4: 此级别使用一个预训练的分类器计算云量水平,该分类器接收LR图像作为输入,并在softmax后输出一个云量得分。云量水平的数量因数据集而异,如表17所示。

经过四级分类后,输入图块被导向一个最适合为该图块生成高质量HR版本的专家。

使用小波扩散进行专家建模

接下来我们考虑如何为每个专家建模以执行特定区域的SR任务。相同的方法可应用于其他区域。在此背景下,可靠的SR可以被公式化为一个优化问题,其目标是产生相对于真实图像的最小误差。

alt text

如图8中通过二维投影几何地展示的那样,该优化问题可以同时考虑通过插值的朴素SR结果和参考图像的使用。图8中的红点是朴素插值SR的结果,红圈则显示了此方法的约束,该约束由公式(4)使用SSIM作为质量度量来定义:

\(T = \text{SSIM}(\text{LI}(S(I)), I),\) (4)

其中I是原始输入图像。请注意,\(T\) 可以在星上直接测量。图8还用蓝色虚线圆圈描绘了真实图像与其最近的参考(使用BLSH度量)之间的邻近度。

该优化问题可被公式化为:

\(\arg\min_{\hat{\Delta}} \sum_i |\text{D}(I' + \hat{\Delta}, I_l) - \delta_l|, \forall I_l \in \mathbb{I}_{\text{ref}}\) (5)

s.t. \(\text{SSIM}(I' + \hat{\Delta}, I') = T\)

其中 \(I' = \text{LI}(S(I))\) 是朴素SR的结果,\(\delta_l = \text{D}(I_{\text{real}}, I_l), \forall I_l \in \mathbb{I}_{\text{ref}}\) 是根据其对应的BLSH码在星上计算出的参考图像,函数D(·)指代BLSH差异,如下所示:

\(\text{D}(a, b) = \text{NHD}(\text{BLSH}(a), \text{BLSH}(b))\), (6)

NHD(·) 是由图像尺寸归一化的汉明距离(HD)。

这个优化问题的目标是最小化‘重建图像到所有参考图像的总距离’与‘原始图像到所有参考图像的总距离’之间的差异。由于SSIM差异和哈希码都不能唯一地识别一幅图像,该问题可能有多个解。请注意,此处的约束意味着重建图像相对于朴素插值重建图像的SSIM,与原始图像相对于朴素插值重建图像的SSIM是相同的(见公式4)。

由于卫星图像尺寸大且维度高,优化求解该问题具有挑战性,因此我们转而寻求一种数据驱动的启发式解决方案。我们可以将与朴素重建的差异 \(\hat{\Delta}\) 表示为一个神经网络 \(\hat{\Delta} = F(I')\) ,并通过训练对其进行优化,其中损失函数即为目标函数。该启发式解决方案可被解释为一个超分辨率任务,我们发现小波扩散(Wavelet Diffusion)[49]方法优于其他方法。从定性角度看,小波扩散也特别适合我们的目的,因为它专注于重建高频细节,同时不损害超分辨率任务中至关重要的低频信息。此外,小波扩散通过利用图像的谱稀疏性,使生成模型能够专注于训练数据中最相关的频带和推理速度,因此在输出保真度方面优于其他方法。这一特性与我们基于MoE的SR框架完美契合,其中每个专家都是为特定情况训练的。在卫星图像中,谱稀疏性尤为显著,因为不同类型的区域(如森林、建筑)在不同的空间尺度上展现细节,这对应于特定的频带。鉴于其在效率和重建质量上的优势,DEEPSPACE中的所有专家都基于小波扩散。

离群点重传(图4b中的路径②)

当一幅图像被临时存储在星上时,如果它无法从其最小压缩的LR版本中被可靠地恢复,我们就去获取其原始图像。为此评估,我们使用两种度量——公式(7)中的低分辨率SSIM(LRS)和公式(8)中的哈希相似度(HSIM),将其与真实图像进行比较,如下所述。

\(LRS = \text{SSIM}(I_{\text{in}}, S(I_{\text{recon}}))\) (7)

\(HSIM(I_{\text{real}}, I_{\text{recon}}) = 1 - \frac{\text{HD}(\text{BLSH}(I_{\text{real}}), \text{BLSH}(I_{\text{recon}}))}{w \times l}\) (8)

此处 \(I_{\text{recon}}\) 指用DEEPSPACE重建的图像,而 \(S(I_{\text{recon}})\) 是其下采样后的LR版本,尺寸与重建输入 \(I_{\text{in}}\) 相匹配。请注意,LRS评估模型在重建过程中是否保留了真实图像的基本低频细节。另一方面,HSIM提供了重建输出与真实图像之间的直接比特级比较,从而提供更细粒度的相似性度量。LRS和HSIM共同使用时,能够对DEEPSPACE的重建性能进行更鲁棒和全面的评估,同时捕捉到宏观的结构保真度和微观的细粒度准确性。值得注意的是,LRS和HSIM都是后验度量,可以在云端借助LR图像输入和来自卫星的相关元数据(少量)进行计算。具体来说,此处的元数据指BLSH( \(I_{\text{real}}\) )以及图像尺寸 \(w \times l\) 。如果LRS或HSIM低于预定义的阈值,则会触发重传。换言之,这种基于重建保真度分析的重传机制,能够始终确保所恢复/接收图像的质量。

4.3 Configurable Reconstruction Quality

Here we argue how we ensure the reconstruction quality is better than a desired and configurable error bound. SR output from an expert will have a reconstruction error bounded by Equation (9), which can be easily obtained via the geometry in Figure 8. The reconstruction should be strictly on the red circle to maintain the SSIM after sampling and also preserve the similarity to the references. For an image with arbitrary dimensions, we have:

\(Err_{\text{max}} = \min \left( T', \arg\min_{\textbf{b}, \textbf{c} \in \mathbb{I}_{\text{ref}}} T' \cdot \cos^{-1} \left( \frac{\vec{\textbf{ba}}^T \cdot \vec{\textbf{ca}}}{|\vec{\textbf{ba}}| |\vec{\textbf{ca}}|} \right) \right)\) (9)

where vector a, b, c are the vector representation of images, and a is the result of naive SR with interpolation, \(T'\) is the Euclidean distance to the ground truth after interpolation. Vectors b and c are from reference set \(\mathbb{I}_{\text{ref}}\). The proper solution of the optimization problem must significantly outperform the naive interpolation when \(\exists \textbf{b}, \textbf{c} \in I_{\text{ref}} \rightarrow \cos^{-1} \left( \frac{\vec{\textbf{ba}}^T \cdot \vec{\textbf{ca}}}{|\vec{\textbf{ba}}| |\vec{\textbf{ca}}|} \right) \ll 1\), which can be easily verified by checking two conditions: (i) if b and c are similar to the ground truth; (ii) if both b and c are significantly different to a.

Calculating the exact error bound, however, is computationally expensive, as it involves pixel-wise comparisons of all potential vectors b and c across all reference images. A more efficient approach is to verify that the error bound is small by jointly using conditions (i) and (ii). Onboard operations ensure high similarity to the reference before compression; if not, lower compression ratio is used, and retransmission is prepared, thereby satisfying condition (i) by design. Simple images that remain stable through sampling and interpolation are excluded when assessing (i) as the reconstruction is indeed trivial. Condition (ii) is straightforward, as compression significantly alters the image that was initially similar to the reference before compression. Therefore, we can reasonably expect the reconstruction quality to surpass interpolation, using the SSIM after interpolation as a stringent error bound. Note that the SSIM after interpolation can be easily computed onboard, taking the fidelity configuration (in SSIM) as an input to decide CR.

在此,我们论证如何确保重建质量优于一个期望的、可配置的误差界限。来自专家的超分辨率(SR)输出的重建误差将受公式(9)的限制,该公式可通过图8中的几何关系轻松得出。重建结果应严格位于红色圆周上,以维持采样后的SSIM,并同时保持与参考图像的相似性。对于任意维度的图像,我们有:

\(Err_{\text{max}} = \min \left( T', \arg\min_{\textbf{b}, \textbf{c} \in \mathbb{I}_{\text{ref}}} T' \cdot \cos^{-1} \left( \frac{\vec{\textbf{ba}}^T \cdot \vec{\textbf{ca}}}{|\vec{\textbf{ba}}| |\vec{\textbf{ca}}|} \right) \right)\) (9)

其中,向量 a, b, c 是图像的向量表示,a 是经过插值的朴素SR的结果, \(T'\) 是插值后与真实图像的欧几里得距离。向量bc来自参考集 \(\mathbb{I}_{\text{ref}}\) 。当 \(\exists \textbf{b}, \textbf{c} \in I_{\text{ref}} \rightarrow \cos^{-1} \left( \frac{\vec{\textbf{ba}}^T \cdot \vec{\textbf{ca}}}{|\vec{\textbf{ba}}| |\vec{\textbf{ca}}|} \right) \ll 1\) 时,该优化问题的正确解必须显著优于朴素插值,这一点可以通过检查两个条件来轻松验证:

(i)如果bc与真实图像相似

(ii)如果bc都与a有显著不同

然而,精确计算该误差界限的计算开销巨大,因为它涉及对所有参考图像中所有潜在向量bc进行逐像素比较。一种更高效的方法是联合使用条件(i)和(ii)来验证该误差界限很小。星上操作确保了压缩前与参考的高度相似性;如果不相似,则会使用较低的压缩比并准备进行重传,从而通过设计来满足条件(i)。对于那些通过采样和插值后仍保持稳定的简单图像,在评估条件(i)时会被排除,因为它们的重建过程无疑是简单且有效的。条件(ii)的满足是显而易见的,因为压缩过程会显著改变原始图像(该图像在压缩前与参考图像相似)。因此,我们可以合理地预期重建质量将超越插值方法,并使用插值后的SSIM作为一个严格的误差界限。请注意,插值后的SSIM可以很方便地在星上计算,它将保真度配置(以SSIM形式)作为决定压缩比(CR)的输入。

Evaluation Methodology

5.1 Baselines

We compare against four classes of state-of-the-art baselines shown in Table 1:

  • Conventional compression schemes such as Lánczos interpolation [36] and compressive sensing (CS) [10]. For CS, we consider three representative methods based on the comparative study in [30]. Specifically, the selected methods are ADMM [13], gOMP [65], and CoSaMP [43].
  • Autoencoders such as DSCN [28] and VQ-VAE-2 [53] where the encoder runs on the satellite and communicates the latent space representation while the decoder on the cloud side reconstructs the image.
  • Orbital edge computing (OEC) schemes that leverage compute onboard satellites: Kodan [18] and Earth+ [24]. Note that we compare DEEPSPACE with the OEC schemes that employ image filtering primarily to gauge bandwidth/storage savings with respect to each other. But in fact both DEEPSPACE and OEC can be applied simultaneously to achieve even higher efficiency.
  • Deep super-resolution techniques from the CV domain applied to satellite image data acquisition. We use the state-of-the-art wavelet diffusion (WaveDiff) [49] as the representative method from this category.

我们与表1中所示的四类最先进的基线模型进行比较:

  • 常规压缩方案,如Lánczos插值[36]和压缩感知(CS)[10]。对于CS,我们基于[30]中的对比研究,考虑了三种代表性方法。具体而言,所选方法为ADMM [13]、gOMP [65]和CoSaMP [43]。
  • 自编码器,如DSCN [28]和VQ-VAE-2 [53],其编码器在卫星上运行并传输潜空间表示,而解码器在云端重建图像。
  • 轨道边缘计算(OEC)方案,这类方案利用星上计算资源,例如Kodan [18]和Earth+ [24]。请注意,我们将DEEPSPACE与那些主要采用图像过滤来评估彼此带宽/存储节省效果的OEC方案进行比较。但实际上,DEEPSPACE和OEC可以同时应用以实现更高的效率。
  • 应用于卫星图像数据采集的深度超分辨率技术,这类技术源于计算机视觉(CV)领域。我们使用该类别中最先进的小波扩散(WaveDiff)[49]方法作为代表。

5.2 Datasets

We compare the above baselines on a mix of custom and public datasets described in Table 2. We collect RGB images from Planet Inc’s Dove constellation [20] for California and Hong Kong for our custom dataset. In addition, we use publicly available data from Sentinel satellites [6]. These datasets include both RGB images and multi-spectral images. The RGB images in DynamicEarthNet (DEN) [63] was pre-processed by the authors to remove clouds before the dataset release. For the OEC methods that compress an image by removing clouds, we refer to the cloud coverage at same region from other climate datasets [42, 59] and apply an ideal cloud removal operated on tile level. DEN-3 refers to the RGB images and DEN-12 refers to the 12 channel Sentinel-2 multi-spectral images in dataset [63].

我们在表2所述的一系列自定义数据集和公开数据集上对上述基线模型进行比较。我们从Planet公司的Dove星座[20]中为加利福尼亚和香港地区收集RGB图像,作为我们的自定义数据集。此外,我们使用了来自哨兵(Sentinel)卫星[6]的公开数据。这些数据集同时包含RGB图像和多光谱图像。DynamicEarthNet(DEN)[63]中的RGB图像在数据集发布前已经由其作者进行了预处理以去除云层。对于那些通过去除云层来压缩图像的OEC方法,我们参考了其他气候数据集[42, 59]中相同区域的云覆盖率数据,并在图块级别上应用了一种理想化的云去除操作。DEN-3指代RGB图像,而DEN-12指代数据集[63]中的12通道Sentinel-2多光谱图像。

5.3 Performance Metrics

We consider three sets of metrics for our evaluation:

(i) Compression ratio: Compression ratio (CR) is defined as \(CR = \frac{\text{Original Size}}{\text{Compressed Size}}\). High compression ratio reduces bandwidth consumption and storage costs.

(ii) Reliable reconstruction: We capture the similarity between reconstructed and ground truth images using two standard image quality/fidelity metrics: structural similarity (SSIM) and peak signal-to-noise ratio (PSNR)². We report both average and worst-case numbers for these metrics. Besides SSIM and PSNR, we also consider learned fidelity metrics in the context of several deep learning based downstream applications in §7. We believe such metrics on domain specific and pixel-level real world applications are more representative than general learned metrics like LPIPS [74].

(iii) Compute and storage overheads: We measure the onboard computing speed, decompression speed, and storage requirements for different methods.

我们考虑使用三组指标进行评估:

(i) 压缩比: 压缩比(CR)定义为 \(CR = \frac{\text{原始大小}}{\text{压缩后大小}}\)。高压缩比可以降低带宽消耗和存储成本。

(ii) 可靠的重建: 我们使用两种标准的图像质量/保真度指标来捕捉重建图像与真实图像之间的相似性:结构相似性(SSIM)和峰值信噪比(PSNR)²。我们报告了这些指标的平均值和最差情况下的数值。

除了SSIM和PSNR,我们还在§7中考虑了基于深度学习的若干下游应用场景下的学习型保真度指标。我们认为,这些针对特定领域和像素级别的真实世界应用的指标,比像LPIPS [74]这样通用的学习型指标更具代表性。

(iii) 计算与存储开销: 我们测量了不同方法的星上计算速度、解压缩速度以及存储需求。

Evaluation Results

TL; DR

Case Studies

这一部分快快过, 扫一眼

7.1 Wildfire Detection

We compare the accuracy of wildfire detection on reconstructed imagery and ground truth images. We use the classical U-net [55] structure for the classification task, which is a common method used for this task [52]. We train the model with Planet-CAL dataset between July and August 2021, where the wildfire location is well labeled. Note that the cloud detection used in Kodan [18] and Earth+ [24] can confuse between clouds and smoke, and have significant false detection rates (i.e., can reject images with smoke). We also report end-to-end delivery time using the simulation framework proposed in Serval [61].

The quantitative result on response time and accuracy is shown in Table 11, where we use Median Response Time (Med. Resp. Time) to measure the response speed, Relative Size (Rel. Size) accuracy to show the quality of fire coverage measurement, and classification (Class.) accuracy to show whether we can identify areas containing wildfire. DeepSpace achieves best performance on both response time and accuracy (size measure in column 3 and label accuracy in column 4). The improvement in response time is due to the high CR achieved by DeepSpace, thereby not limited by the constrained satellite-cloud network path. This result demonstrates that DeepSpace reduces the disaster warning latency from few days to <30 minutes, enabling rapid and reliable detection.

野火检测

我们将重建影像与真实图像上的野火检测准确率进行比较。我们使用经典的U-net[55]结构进行分类任务,这是该任务常用的方法[52]。我们使用2021年7月至8月期间的Planet-CAL数据集来训练模型,该数据集中的野火位置有良好的标注。值得注意的是,Kodan[18]和Earth+[24]中使用的云检测方法可能会混淆云和烟雾,导致显著的误检率(即可能丢弃含有烟雾的图像)。我们还使用Serval[61]中提出的仿真框架报告了端到端的交付时间。

关于响应时间和准确率的量化结果如表11所示,其中我们使用中位响应时间(Med. Resp. Time)来衡量响应速度,使用相对尺寸(Rel. Size)准确率来显示火灾覆盖范围测量的质量,并使用分类(Class.)准确率来显示我们是否能识别出包含野火的区域。DeepSpace在响应时间和准确率(第3列的尺寸测量和第4列的标签准确率)上均取得了最佳性能。

响应时间的提升归功于DeepSpace实现的高压缩比(CR),从而使其不受限于受限的星地云网络路径。这一结果表明,DeepSpace将灾害预警延迟从数天减少到<30分钟,实现了快速可靠的检测。

7.2 Plastic Detection in Oceans

Next, we evaluate DeepSpace on an extremely fine-grained task plastic detection in oceans. This requires sub-pixel level analysis and very high reconstruction fidelity. We adopt the data and detection methods introduced in [11, 12], aiming to detect the objects with a shape that is close or even smaller than pixel size of multiplechannel satellite images. We evaluate the detection performance with pixel level binary labels, where each label can be plastic or noplastic. To conduct the experiment, we simply replace the original image with reconstructed images with different methods, leaving the rest of the pipeline exactly the same as [11, 12].

We test two configurations of DeepSpace on this task with different values of 𝜂 (·). A higher 𝜂 (·) increases the sensitivity of DeepSpace and is better suited for this task. Results in Table 12 show that DeepSpace can achieve a reliable detection at the cost of lower compression ratio, which is still 50x higher than the state-of-the-art method with high accuracy. The other methods failed in this task as they cannot preserve or recover pixel-level information.

海洋垃圾检测

接下来,我们在一个极其细粒度的任务——海洋塑料检测上评估DeepSpace。这项任务需要亚像素级别的分析和非常高的重建保真度。我们采用了[11, 12]中介绍的数据和检测方法,旨在检测形状接近甚至小于多通道卫星图像像素尺寸的目标。我们使用像素级的二元标签(每个标签可为“塑料”或“非塑料”)来评估检测性能。为了进行实验,我们简单地将原始图像替换为用不同方法重建的图像,而流程的其余部分与[11, 12]完全相同。

我们在此任务上测试了DeepSpace的两种配置,它们具有不同的 \(\eta(\cdot)\) 值。较高的 \(\eta(\cdot)\) 值可以增加DeepSpace的灵敏度,更适合此任务。表12的结果显示,DeepSpace能够以较低的压缩比为代价实现可靠的检测,但该压缩比仍比具有高准确率的最先进方法高出50倍。其他方法在此任务上失败,因为它们无法保留或恢复像素级的信息。

7.3 Land Use and Cropland Classification

Imagery based remote sensing has been extensively utilized for land-use classification in agriculture and other sectors [20, 23, 29, 31, 50]. Unlike a use case like wildfire detection, land use measurement prioritizes accuracy and cost over response speed.

We first verify if the reconstructed images can be recognized by other ML models by applying the reconstruction with DeepSpace to the image segmentation tasks in [33, 63]. The output is illustrated in Figure 15. Segmentation tests the performance on more general detection and measurement tasks. To quantify the performance, we count the number of object with Segment Anything (SA) [33], named as “#SA”. From the result in Table 13, we see DeepSpace, Kodan, and Earth+ all show very high fidelity on cropland and water body detection tasks 3 . But DeepSpace is two orders of magnitude more efficient than these other OEC methods, while providing an experience akin to using real data. For tasks requiring detailed classification and precise size measurement with pixel-level accuracy, both DeepSpace and conventional OECs deliver equally reliable results. This is because such tasks are less time-sensitive, and in some cases, even older image copies can produce accurate outcomes.

We now zoom-in to the particular task of cropland classification, a classical application for satellite image data, to further showcase the ability of DeepSpace to seamlessly and efficiently support such a real world application. Our labels of cropland comes from the official dataset of California [3]. We use the satellite images from the Planet dataset captured from the same year as this official dataset. We label these satellite images by first segmenting with [33] and then labeling them based on [3]. We consider the pretrained GFM model [40] for the classification task and treat the performance with original images with this model as the “ground truth”. With respect to this ground truth, we evaluate the classification performance of reconstructed images with different methods including DeepSpace.

Results in Table 14 show that DeepSpace not only achieves a high CR of 146.4 but also a cropland classification performance similar to ground truth. While Kodan and Earth+ can achieve high classification performance, they yield significantly low CR (around 2-6). The other CS and ML based methods sacrifice the classification performance to achieve as high a CR as DeepSpace, thus making those methods unsuitable for this task.

土地监测与分类

基于影像的遥感已广泛应用于农业和其他部门的土地利用分类[20, 23, 29, 31, 50]。与野火检测等用例不同,土地利用测量优先考虑的是准确性和成本,而非响应速度。

我们首先通过将DeepSpace的重建应用于[33, 63]中的图像分割任务,来验证重建后的图像是否能被其他机器学习模型识别。输出结果如图15所示。

alt text

分割任务测试了模型在更通用的检测和测量任务上的性能。为了量化性能,我们使用Segment Anything (SA) [33]来统计目标数量,记为“#SA”。从表13的结果中,我们可以看到DeepSpace、Kodan和Earth+在作物和水体检测任务上都表现出非常高的保真度³。但DeepSpace的效率比这些OEC方法高出两个数量级,同时提供了类似于使用真实数据的体验。对于需要像素级准确率的详细分类和精确尺寸测量的任务,DeepSpace和传统的OEC方法都能提供同样可靠的结果。这是因为这类任务对时间不那么敏感,在某些情况下,即使是较旧的图像副本也能产生准确的结果。

现在,我们聚焦于作物分类这一卫星图像数据的经典应用,以进一步展示DeepSpace无缝且高效地支持此类真实世界应用的能力。我们的作物标签来自加利福尼亚州的官方数据集[3]。我们使用与该官方数据集同年份拍摄的Planet数据集中的卫星图像。我们通过先用[33]进行分割,然后根据[3]进行标注,来为这些卫星图像打上标签。我们采用预训练的GFM模型[40]进行分类任务,并将该模型在原始图像上的性能视为“基准真相”。相对于此基准,我们评估了包括DeepSpace在内的不同方法重建图像的分类性能。

表14的结果显示,DeepSpace不仅实现了146.4的高压缩比,而且其作物分类性能与基准真相相当。虽然Kodan和Earth+也能实现高分类性能,但它们的压缩比非常低(约2-6)。其他基于压缩感知(CS)和机器学习(ML)的方法则为了达到与DeepSpace一样高的压缩比而牺牲了分类性能,因此这些方法不适用于此任务。

Discussion

Here we discuss limitations of DeepSpace and some potential future research directions. The approach taken in DeepSpace is inspired by prior efforts in the context of multimedia communications [68, 72], where the video frames are sent at a low bit-rate or low resolution, and then recovered on the receiver side with custom DL methods – frame prediction [68] or super resolution [72]. More recently in NetGSR [60], the idea of super resolution was used in the temporal domain in the network monitoring context to enable efficient telemetry data collection. DeepSpace is similar in spirit to these earlier works but its design is tailored to account for the unique characteristics and constraints of satellite based earth observation setting through novel techniques like MoE-based recovery system. As with these prior systems, there is a cost associated with training DeepSpace in terms of GPU time, especially to cover all regions on earth. However, this cost is outweighed by the value it creates across many application areas. A potential approach to tackle the training cost is to build a pre-trained global scale foundation model along the lines of [32], and then fine tune it for different types of satellites and images.

Besides the single-image SR focus of DeepSpace, there exist other satellite image processing works. In particular, multiple image SR [35, 41] leverage LR images from different sources to obtain HR images. This provokes us to extend DeepSpace to cover different data sources (i.e., type of satellite) to achieve even higher compression gains with good generalization while maintaining reconstruction quality.

Recent works also leverage multi-modal language models for satellite image processing [32]. An aspect for future work along these lines is to reconstruct images with customized features as prompts. Another direction for future work is to exploit higher onboard compute resources [26] and uplink bandwidth [64] where available to unlock even higher compression gains.

在此,我们讨论DeepSpace的局限性以及一些潜在的未来研究方向。

DeepSpace所采用的方法受到了多媒体通信领域[68, 72]早期工作的启发,在那些工作中,视频帧以低比特率或低分辨率发送,然后在接收端通过定制的深度学习方法(如帧预测[68]或超分辨率[72])进行恢复。 近期,在NetGSR [60]中,超分辨率的思想被用于网络监控背景下的时域处理,以实现高效的遥测数据收集。

DeepSpace在理念上与这些早期工作相似,但其设计通过采用如基于专家混合(MoE)的恢复系统等创新技术,为卫星地球观测场景的独特性质和约束进行了量身定制。与这些先前的系统一样,训练DeepSpace存在与GPU时间相关的成本,特别是当需要覆盖地球所有区域时。然而,它在众多应用领域所创造的价值远超这一成本。一个解决训练成本的潜在方法是,遵循[32]的思路构建一个预训练的全球尺度基础模型,然后针对不同类型的卫星和图像进行微调。

除了DeepSpace所关注的单图像超分辨率(SR)之外,还存在其他卫星图像处理工作。特别是,多图像SR [35, 41]利用来自不同来源的低分辨率(LR)图像来获得高分辨率(HR)图像。这启发我们将DeepSpace扩展至覆盖不同的数据来源(即不同类型的卫星),以便在保持重建质量的同时,以良好的泛化能力实现更高的压缩增益。

近期的工作还 利用多模态语言模型进行卫星图像处理[32]。沿着这一思路,未来的一个工作方向是使用定制化的特征作为提示(prompts)来重建图像。 另一个未来工作方向是,在可用时利用更强的星上计算资源[26]和上行链路带宽[64],以实现更高的压缩增益。

Conclusions

Efficient high-quality satellite image data acquisition is challenging due to large data volumes, downlink bottlenecks and high cloud storage costs. Earth observation satellites are also limited in their compute capabilities. We introduce DeepSpace, a deep learningbased system that compresses satellite images by hundreds of times in real-time onboard, while ensuring faithful reconstruction with fidelity guarantees. DeepSpace performs lightweight computation onboard the satellite and shifts the compute burden to the cloud, where images are decompressed on demand.

由于数据量巨大、下行链路瓶颈和高昂的云存储成本,高效、高质量的卫星图像数据采集极具挑战性。地球观测卫星的计算能力也受到限制。我们介绍了DeepSpace,一个基于深度学习的系统,它能够在星上实时地将卫星图像压缩数百倍,同时通过保真度保证来确保高保真重建。DeepSpace在卫星上执行轻量级计算,并将计算负担转移到云端,在云端按需对图像进行解压缩。