跳转至

SatPipe: Deterministic TCP Adaptation for Highly Dynamic LEO Satellite Networks

Abstract

Low-Earth-Orbit (LEO) satellite networks (satnets) hold significant potential for providing global internet access with high throughput and low latency. However, the high mobility of the satellites and the associated handovers cause high dynamics at the link layer, which in turn degrades end-to-end network throughput and stability. In this work, we first present a measurement profiling of the handover behaviors of the Starlink satnet, so as to better understand its dynamics and accurately determine handover timings with millisecond precision. We then introduce SAT PIPE, a new mechanism to enhance TCP and make it robust against satnet dynamics. SAT PIPE exposes the link layer handover schedule to the TCP sender, which then adapts to the link interruptions in a deterministic rather than trialand-error manner. Our implementation and experiments over Starlink indicate that SAT PIPE delivers an average throughput gain of 9.4% to 38%, achieves up to a 127.8% enhancement in the 10% lower throughput, and exhibits a 24.7% reduction in retransmission ratio, compared to the state-of-the-art TCP BBR. This advantage further propagates to the application layer, leading to a 10.8% increase in bitrate and 33.5% reduction in rebuffering time for video streaming applications.

低地球轨道(LEO)卫星网络(satnets)在提供高吞吐、低延迟的全球互联网接入方面展现出巨大潜力。然而,卫星的高移动性及其相关的切换导致了链路层的高度动态性,这反过来又降低了端到端的网络吞吐量与稳定性。

在本项工作中,我们首先对Starlink卫星网络的切换行为进行了测量剖析,旨在更深入地理解其网络动态,并以毫秒级精度准确判断切换时机。随后,我们提出了一种名为SATPIPE的新机制,用以增强TCP协议,使其能稳健地应对卫星网络的动态性。SATPIPE将链路层的切换时间表暴露给TCP发送方,发送方进而能以一种确定性的方式而非传统的试错方式来适应链路中断。

我们在Starlink网络上的实现与实验表明,与当前最先进的TCP BBR协议相比,SATPIPE实现了9.4%至38%的平均吞吐量增益,在10%分位的较低吞吐量上取得了高达127.8%的提升,并使重传率降低了24.7%。此优势进一步传导至应用层,为视频流应用带来了10.8%的视频码率提升和33.5%的再缓冲时间缩减。

Introduction

Low Earth Orbit satellite networks (satnets) are emerging as a transformative technology for internet connectivity, offering low-latency and high-throughput access at planet scale. Recent LEO mega-constellations such as Starlink [1], Kuiper [2] and OneWeb [3] are poised to enable last-mile network connectivity by integrating with existing terrestrial network infrastructures. Among these, Starlink has taken the lead with over 5000 operational satellites [4], achieving 100 Mbps to 200 Mbps throughput with latencies as low as 20 ms [1].

低地球轨道(LEO)卫星网络(satnets)正成为一项变革性的互联网连接技术,能够在全球范围内提供低延迟和高吞吐量的接入服务。近年来,诸如星链(Starlink)[1]、柯伊伯(Kuiper)[2] 和一网(OneWeb)[3] 等LEO巨型星座,正通过与现有地面网络基础设施的集成,致力于实现最后一公里的网络连接。其中,星链已处于领先地位,拥有超过5000颗在轨运行的卫星[4],实现了100 Mbps至200 Mbps的吞吐量,延迟低至20毫秒[1].

Unlike conventional geostationary (GEO) satellites, LEO satnet constellations orbit the Earth at high velocities to counteract gravitational forces. This intrinsic characteristic imposes an unprecedented challenge for LEO satnets, i.e., the link dynamics. To ensure continuous connectivity towards ground-based user terminals, the serving satellite needs to perform frequent handovers. Although each handover only lasts a few tens of ms, this short “freeze” of the satnet link often leads to a burst of packet losses and RTT escalation [5]–[8],. The performance degradation tends to be amplified at the higher layers. In particular, the transport layer protocols (e.g., TCP CUBIC [9] and BBR [10]) rely on the loss/delay estimation as indicators of network congestion. BBR reacts slowly since it relies on smoothed measurements across multiple RTTs; and when BBR actually respond, it tends to overreact, deeming the burst of delay/loss as equivalent to a severe network congestion. Consequently, TCP tends to underutilize the satnet capacity even in an ideal situation without any congestion. Although such handover-induced instability also occurs in other mobile networks such as cellular networks, the impact is less pronounced due to shorter RTTs and hence faster recovery times [11].

与传统的地球同步轨道(GEO)卫星不同,LEO星座的卫星需以极高速度环绕地球运行以对抗引力。这一内在特性给LEO卫星网络带来了前所未有的挑战,即链路的动态性。为确保对地面用户终端的持续连接,服务卫星需要执行频繁的切换。尽管每次切换仅持续几十毫秒,但这种卫星网络链路的短暂“冻结”常会导致突发性丢包和RTT(往返时延)飙升[5]–[8]。这种性能下降问题在高层协议中趋于放大。特别是,传输层协议(如TCP CUBIC [9] 和 BBR [10])依赖于丢包/延迟估计作为网络拥塞的指标。BBR的反应较为缓慢,因为它依赖于跨越多个RTT的平滑测量值;而当BBR真正做出响应时,它又倾向于过度反应,将这种由切换引起的突发延迟/丢包视为严重的网络拥塞。其结果是,即便在没有任何拥塞的理想情况下,TCP也倾向于未能充分利用卫星网络的容量。尽管这种由切换引发的不稳定性也存在于其他移动网络(如蜂窝网络)中,但由于其RTT更短,恢复时间更快,因此影响不那么显著[11]。

In this paper, we first conduct a cross-layer measurement study to unravel the impacts of satnet handover on TCP performance. Our measurement begins by examining the physical layer behaviors including signal strength patterns under satellite mobility, using Starlink as a reference system. Then, we investigate the link-level behaviors at fine-grained millisecond timescale, from which we derive the precise handover starting/ending times, frequency and duration. Our measurement results corroborate the hypothesis in prior works [12] and the Starlink’s FCC filing [13], which stipulate that a new satellite has to be scheduled for the user terminal around every 15 s. Such periodicity will likely prevail in other LEO satnets because the satellites are usually uniformly spread out within the constellation. Furthermore, our measurement reveals that, each handover lasts around 50 ms, yet the TCP’s response lags behind by a few hundred ms, and TCP’s performance degradation often lasts a few seconds before it recovers the congestion window close to the true network capacity.

在本文中,我们首先进行了一项跨层测量研究,以揭示卫星网络切换对TCP性能的影响。我们的测量从检视物理层行为开始,以星链为参考系统,研究了卫星移动下的信号强度模式。接着,我们在毫秒级的细粒度时间尺度上研究了链路层行为,并由此推导出了精确的切换开始/结束时间、频率和持续时长。我们的测量结果证实了先前工作[12]和星链向FCC提交的文件[13]中的假设,即地面用户终端大约每15秒就需要调度一颗新卫星进行服务。由于卫星在星座中通常是均匀分布的,这种周期性可能在其他LEO卫星网络中也普遍存在。此外,我们的测量还发现,尽管每次切换持续约50毫秒,但TCP的响应却滞后数百毫秒,其性能下降通常会持续数秒,之后拥塞窗口才能恢复到接近真实网络容量的水平。

Based on the measurement observations, we propose a new mechanism called SAT PIPE to enhance TCP performance over dynamic LEO satnets. SAT PIPE follows two key design principles to tailor the TCP congestion control: (i) Visibility: SAT PIPE exposes the handover-induced link-level dynamics to the transport layer, rather than relying on the trial-anderror adaptation based on end-to-end delay/loss statistics. (ii) Determinism: SAT PIPE leverages knowledge of the periodic handover events to plan its reactions in a deterministic manner. Ideally, these two principles should together bring TCP sending rate close to the “ground-truth” capacity of the satnet link. However, practical implementation of SAT PIPE begs two non-trivial questions.

(i) How does SAT PIPE know the handover time? Our observations of practical satnets reveal that simply using short-term packet loss or RTT statistics cannot discriminate satnet handover from ordinary network capacity variations (e.g., due to congestion in the backbone network). We thus propose two solution mechanisms. The first employs the network time protocol (NTP) to synchronize the TCP sender to the global schedule of handover, which is known for a given satnet system through the “two-line element” [12]. This solution requires the network nodes to periodically probe the NTP server. Our second solution further evades this overhead, by using a time-domain periodicity detection algorithm, which accumulates loss/RTT evidence across multiple handover periods in order to pinpoint the periodic timestamps when the network is “interrupted”.

(ii) How should SAT PIPE react to a handover event, so as to maximize utilization of the end-to-end network capacity while maintaining fairness? To match the congestion window to the available network capacity, SAT PIPE leverages BBR [10], a state-of-the-art rate-based congestion control protocol that employs delivery rate and delay together to estimate the bandwidth-delay product. To overcome BBR’s slow reactions, SAT PIPE updates its state machine, and introduces a queue maintenance state that explicitly flushes the data build-up due to the short-term network freezing during handover, forcing the end-to-end network path to accurately estimate the RTT by isolating the interference from queuing delay. Unlike generic TCP protocols, SAT PIPE explicitly and deterministically respond to short-term network capacity due to handovers. Unlike recently proposed link-layer aware TCP adaptation [12] that directly inhibits the congestion control during handover, SAT PIPE is essentially still a reactive protocol. It approximates the optimal congestion window through an accelerated estimation rather than blind inhibition, thus maintaining both efficiency and fairness to competing flows.

基于这些测量观察,我们提出了一种名为SATPIPE的新机制,以增强TCP在动态LEO卫星网络上的性能。为定制TCP拥塞控制,SATPIPE遵循两项关键设计原则:(i) 可见性:SATPIPE将由切换引发的链路层动态性暴露给传输层,而不是依赖于基于端到端延迟/丢包统计的试错式自适应。(ii) 确定性:SATPIPE利用周期性切换事件的先验知识,以一种确定性的方式来规划其响应。理想情况下,这两项原则应共同作用,使TCP的发送速率逼近卫星网络链路的“地面真实”容量。然而,SATPIPE的实际实现引出了两个不容忽视的问题。

(i) SATPIPE如何获知切换时间? 我们对实际卫星网络的观察表明,仅使用短期的丢包或RTT统计数据,无法将卫星网络切换与普通的网络容量变化(例如,由骨干网络拥塞引起的变化)区分开来。为此,我们提出了两种解决方案。第一种方案采用网络时间协议(NTP),将TCP发送方与已知的全局切换时间表进行同步,该时间表可通过“两行轨道根数”(two-line element)[12]为给定的卫星网络系统所获知。该方案要求网络节点周期性地探测NTP服务器。我们的第二种方案进一步规避了这种开销,它采用一种时域周期性检测算法,通过在多个切换周期内累积丢包/RTT证据,来精准定位网络发生“中断”的周期性时间戳。

(ii) SATPIPE应如何响应切换事件,以便在最大化端到端网络容量利用率的同时保持公平性? 为了使拥塞窗口与可用网络容量相匹配,SATPIPE借鉴了BBR [10]——一种先进的、基于速率的拥塞控制协议,该协议同时使用传输速率和延迟来估计带宽时延积。为了克服BBR反应缓慢的问题,SATPIPE更新了其状态机,并引入了一个队列维护状态,该状态能主动清除因切换期间网络短暂冻结而累积的数据,通过隔离排队延迟的干扰,迫使端到端网络路径能准确地估计RTT。与通用的TCP协议不同,SATPIPE对因切换引起的短期网络容量变化进行显式且确定性的响应。与近期提出的链路层感知的TCP自适应方案[12](在切换期间直接抑制拥塞控制)不同,SATPIPE本质上仍是一种反应式协议。它通过一种加速估计而非盲目抑制的方式来逼近最优拥塞窗口,从而兼顾了对竞争流的效率与公平性。

We have implemented SAT PIPE as a lightweight driver patch to state-of-the-art TCP BBR. Furthermore, we have conducted comprehensive experiments, covering diverse network conditions including different path lengths, background traffic levels, etc. We found that SAT PIPE delivers an average throughput gain of 9.4% to 38% compared with BBR. Zooming in the 10% percentile, SAT PIPE achieves 127.8% throughput enhancement. We have also evaluated SAT PIPE in a Dynamic Adaptive Streaming over HTTP (DASH) video streaming system, which demonstrates 10.8% improvement in average bit-rate and 33.5% reduction in rebuffering time.

我们已将SATPIPE实现为对先进TCP BBR协议的一个轻量级驱动补丁。此外,我们进行了全面的实验,覆盖了包括不同路径长度、背景流量水平等在内的多样化网络条件。我们发现,与BBR相比,SATPIPE带来了9.4%至38%的平均吞吐量增益。聚焦于10%分位数,SATPIPE实现了127.8%的吞吐量提升。我们还在一个基于HTTP的动态自适应流媒体(DASH)视频系统中评估了SATPIPE,结果显示其平均比特率提升了10.8%,再缓冲时间减少了33.5%

In summary, our work makes the following contributions:

• We conduct fine-grained measurements to characterize link handover and outage timing of real-world LEO satnets.

• We design SAT PIPE , the first TCP adaptation algorithm that reacts to handover interruptions in an explicit and deterministic manner.

• We implement SAT PIPE as a patch to the standard TCP module and validate its performance in Starlink satnet.

综上所述,我们的工作做出如下贡献:

  • 我们进行了细粒度的测量,以刻画真实世界LEO卫星网络的链路切换和中断时机
  • 我们设计了 SATPIPE,这是首个以显式和确定性的方式对切换中断做出响应的TCP自适应算法
  • 我们将 SATPIPE 实现为标准TCP模块的一个补丁,并在星链卫星网络中验证了其性能

Starlink PHY layer signal identification and localization.

Recent measurement studies have investigated the physical layer signal characteristics of Starlink. Researchers [14], [15] have developed blind signal identification techniques to understand detailed signal structure and to simulate received signals for positioning applications. Based on the signal structure and prior knowledge, the beamforming strategies of Starlink satellites were studied in [16]. A Ku-band dish antenna was used to capture the Starlink signals, combined with a low-noiseblock (LNB) for converting the signals to lower frequencies and a USRP software radio for baseband processing. A similar setup has been used to repurpose Starlink downlink tones for opportunistic positioning [17], [18].

Starlink upper layer characterization and evaluation.

Existing research has shown that Starlink can achieve much lower latency and higher throughput compared to conventional satellite communications using GEO satellites [5]. However, significant packet losses occur during satellite handovers, as highlighted in [6]. These packet losses lead to substantial throughput reduction on the transport layer. To gain insights into the timing details of the Starlink system, multi-timescale measurements were conducted at the network layer [19]. These measurements identify a coarse-grained handover period of 15 s at medium timescales and observe decreased link utilization during handovers. The scheduling mechanism of Starlink satellites is comprehensively studied in [7], which confirm the allocation period and reveal that local on-satellite controllers manage flow scheduling from user terminals. In this paper, we use inter-packet delay (IPD) to estimate the exact time of handovers, rather than a metric for network stability.

Starlink上层协议的特性分析与评估。 现有研究表明,与使用地球同步轨道(GEO)卫星的传统卫星通信相比,星链能够实现低得多的延迟和高得多的吞吐量 [5]。然而,正如[6]中所强调的,在卫星切换期间会发生显著的丢包。这些丢包问题导致了传输层吞吐量的大幅下降。为了深入了解星链系统的时序细节,[19]在网络层进行了多时间尺度的测量。这些测量在中等时间尺度上识别出了一个15秒的粗粒度切换周期,并观察到在切换期间链路利用率有所下降。[7]对星链卫星的调度机制进行了全面研究,证实了其分配周期,并揭示了由本地的星上控制器来管理来自用户终端的流调度。在本文中,我们使用包间延迟(IPD)来估计切换的精确时间,而非将其作为网络稳定性的度量指标。

TCP adaptation in dynamic networks.

Cross-layer congestion control has been widely explored in cellular networks to address the mobility and resource allocation problems. In particular, leveraging physical and link layer statistics to inform TCP adaptation shows promise in dynamic cellular networks. For instance, CLAW [11] and piStream [20] use signal strength information to predict available bandwidth of the cellular last-hop link. This prediction is in turned used to drive precise rate adaptation at the transport layer or application layer. S ATCP [12] is a link-layer informed TCP adaptation algorithm for LEO satnets, which shares similar spirit as S AT P IPE, and was evaluated using the LeoEM emulator. S ATCP assumes the ground station can provide explicit handover timestamp to the user terminal to assist the end-to-end TCP rate adaptation. Due to lack of information about the handover duration, S ATCP conservatively freezes the congestion window (CWND) for 2.3 seconds, which results in suboptimal use of available network capacity and potential fairness issues. In contrast, S AT P IPE is an endto-end mechanism running entirely on the TCP hosts. Our experiments within the operational Starlink satnet verify its superior performance over S ATCP.

动态网络中的TCP自适应。 为了解决移动性和资源分配问题,跨层拥塞控制已在蜂窝网络中得到广泛探索。特别是在动态蜂吾网络中,利用物理层和链路层的统计数据来指导TCP自适应展现出了巨大潜力。例如,CLAW [11] 和 piStream [20] 使用信号强度信息来预测蜂窝网络最后一跳链路的可用带宽。这一预测继而被用于驱动传输层或应用层的精确速率自适应。SATCP [12] 是一种适用于LEO卫星网络的、由链路层信息驱动的TCP自适应算法,其设计理念与SATPIPE相似,并通过LeoEM仿真器进行了评估。SATCP假设地面站可以向用户终端提供显式的切换时间戳,以辅助端到端的TCP速率自适应。由于缺乏关于切换持续时长的信息,SATCP保守地将拥塞窗口(CWND)冻结2.3秒,这导致了对可用网络容量的次优利用以及潜在的公平性问题。相比之下,SATPIPE是一种完全在TCP主机上运行的端到端机制。我们在实际运行的星链卫星网络中的实验,验证了其相较于SATCP的优越性能。

In this section, we present comprehensive measurements of Starlink handover behaviors at both the physical layer and network layer, uncovering the fundamental mechanisms of the Starlink global controller.

A. PHY Layer Measurements

The Starlink downlink signal features 8 channels, each occupying a bandwidth of 240 MHz within the Ku-band (10.7 GHz to 12.7 GHz) [14], [21], with a 10 MHz guard band between adjacent channels. Such large bandwidth exceeds the capabilities of most commercial software-defined radios (SDRs). However, in the middle of each active channel, there are multiple tones occupying several MHz of bandwidth, which can be treated as the indicators of the Starlink downlink signals.

We have established a testbed setup to capture these downlink tones following recent work [18], as illustrated in Fig. 1. Using a universal Ku-band LNB [22], we capture and downconvert the Starlink signals from the Ku-band to lower frequencies. While a parabolic dish could enhance received signal strength, we opted for the LNB-only approach to widen the field of view, facilitating signal reception from multiple satellites. The LNB connects to a USRP N310 [23] for baseband signal processing. The N310 samples at 2.5 Msps with and centers at 11.325 GHz – the carrier frequency of one of the active Starlink downlink channels. To ensure the downlink transmission is active, we continuously download high-definition videos through the Starlink User Terminal (UT), which is co-located with the LNB during measurements.

Fig. 2 shows the frequency map of different satellites, including broadcast GEO satellites and Starlink LEO satellites. We can observe two different Starlink satellites based on Doppler shifts. Satellite 1 exhibited the strongest signal strength for 15 seconds before experiencing a sharp drop, indicating a handover event. Subsequently, Satellite 1 switched to a different beam (beam 2), possibly covering a different region, so the leaked signal strength captured by our receiver became much weaker. Meanwhile, Satellite 2 showed the strongest signal strength. After 15 seconds, Satellite 2 also switched beams, but no further handovers were detected. The reason for this could be that the satellite used an alternate active channel for data transmission, while our signal detection system is set up to monitor only a single channel. We conducted multiple independent experiments and the measurement results consistently support our findings and speculations. Based on the physical layer observations, we can conclude that each satellite indeed serves a target region for approximately 15 seconds before it hands over to the next.

星链(Starlink)的下行链路信号包含8个信道,每个信道在Ku波段(10.7 GHz至12.7 GHz)内占据240 MHz的带宽 [14], [21],且相邻信道间有10 MHz的保护频带。如此大的带宽超出了大多数商用软件定义无线电(SDR)的处理能力。然而,在每个活跃信道的中心位置,存在着占据数MHz带宽的多个信号音,这些信号音可被视为Starlink下行链路信号的指示器。

我们遵循近期的研究工作 [18],搭建了一个测试平台以捕获这些下行链路信号音,如图1所示。我们使用一个通用的Ku波段低噪声降频器(LNB)[22]来捕获星链信号并将其从Ku波段下变频至较低频率。尽管抛物面天线可以增强接收信号的强度,但我们选择了仅使用LNB的方法以扩大视场,从而便于从多颗卫星接收信号。该LNB连接到一个USRP N310 [23]以进行基带信号处理。N310以2.5 Msps(每秒百万次采样)的速率进行采样,其中心频率设为11.325 GHz——这是星链其中一个活跃的下行链路信道的载波频率。为确保下行链路传输处于活跃状态,我们在测量期间通过与LNB放置在同一位置的星链用户终端(UT)持续下载高清视频。

图2展示了不同卫星的频率图,其中包括广播地球同步轨道(GEO)卫星和星链低地球轨道(LEO)卫星。基于多普勒频移,我们可以观察到两颗不同的星链卫星。卫星1在15秒内展现出最强的信号强度,随后信号急剧下降,这表明发生了一次切换事件。接着,卫星1切换到了一个不同的波束(波束2),该波束可能覆盖了不同的区域,因此被我们接收器捕获到的泄露信号强度变得弱得多。与此同时,卫星2的信号强度变为了最强。15秒后,卫星2也切换了波束,但我们没有检测到更多的切换。其原因可能是该卫星使用了另一个活跃信道进行数据传输,而我们的信号检测系统被设置为仅监控单个信道。我们进行了多次独立实验,测量结果一致地支持了我们的发现和推测。基于物理层的观测,我们可以得出结论:每颗卫星确实在为目标区域服务大约15秒后,便会切换到下一颗卫星。

B. Fine-grained Handover Profiling Using Network Statistics

For a more fine-grained understanding of Starlink handovers, we measure the packet-level RTT and TCP throughput variations. We generate TCP traffic from an AWS EC2 server iPerf3 3.14, and receive the data using a Starlink UT. The UT is located in California, whereas the server is in Ohio by default. A 1 Gbps Ethernet cable connects the UT with a PC which acts as the TCP client.

为了对星链(Starlink)的切换有更细粒度的理解,我们测量了包级别的RTT(往返时延)和TCP吞吐量的变化。我们使用iPerf3 3.14版本从一个AWS EC2服务器生成TCP流量,并通过一个星链用户终端(UT)接收数据。该UT位于加利福尼亚州,而服务器默认位于俄亥俄州。一条1 Gbps的以太网电缆将UT与一台作为TCP客户端的个人电脑连接起来。

We repeat several iperf3 measurements running with single TCP connection using CUBIC. We then calculate the throughput within each 100-millisecond window. The results in Fig. 3 reveal significant TCP throughput drops around every 15 seconds. Fig. 4 further shows the periodic pattern of RTT, which is likely caused by the routing path changes upon handover from one satellite to the next. We also notice that while RTT variance can indicate periodic handovers, there are times when changes in RTT are not significant, which makes it difficult to pinpoint the exact timing of the handover directly from RTT.

我们使用CUBIC算法在单个TCP连接上重复进行了数次iperf3测量,然后在每个100毫秒的窗口内计算吞吐量。图3的结果显示,TCP吞吐量大约每隔15秒就会出现一次显著的下降。图4进一步展示了RTT的周期性模式,这很可能是由于从一颗卫星切换到另一颗卫星时路由路径发生改变所导致的。我们同样注意到,尽管RTT的变化可以指示周期性的切换,但有时RTT的变化并不显著,这使得直接从RTT中精确定位切换的确切时间变得困难。

Meanwhile, after analyzing the absolute timing of each received packet, we found the RTT change tends to occur deterministically, at the 12th, 27th, 42nd, and 57th second past every minute. This observation matches recent measurement studies [7].

与此同时,在分析了每个接收到的数据包的绝对时间后,我们发现RTT的变化倾向于以确定性的方式发生,具体时间点为每分钟的第12秒、27秒、42秒和57秒。这一观察结果与近期的测量研究[7]相符。

To obtain a precise estimation of the handover duration, we establish multiple independent TCP connections from the client to the AWS EC2 server and use TCPdump [24] to capture the timestamp of the receiving packets. We record the timestamps of two consecutively received packets and obtain the corresponding IPD. The IPD should be a random variable with a negligible mean value when the satellite handover is not included between the two consecutive packets, and should have a relative large mean value otherwise.

为了获得对切换持续时间的精确估计,我们从客户端到AWS EC2服务器建立了多个独立的TCP连接,并使用TCPdump [24]来捕获接收数据包的时间戳。我们记录两个连续接收的数据包的时间戳,并由此获得相应的包间延迟(IPD)。当两个连续数据包之间不包含卫星切换时,IPD应为一个均值可忽略的随机变量;反之,则应具有一个相对较大的均值。

We perform a 10-minute iperf3 measurement and analyze received packets whose the timestamps falls within the intervals [12s-∆, 12s+∆], [27s-∆, 27s+∆], [42s-∆, 42s+∆], [57s∆, 57s+∆] past every minute. The value of ∆ is set to 100 ms, used as an estimated upperbound for handover duration. Having a coarse-grained estimate of when the handover happens decreases the amount of data we need to process, and mitigates the interference from backbone traffic. These data points constitute the experimental group, from which the IPD values are extracted. In order to more accurately depict the variance in IPD distributions in different time slots, we use the intervals [10s-∆, 10s+∆], [25s-∆, 25s+∆], [40s-∆, 40s+∆], [55s-∆, 55s+∆] as control group, which deviates from the ground-truth handover timing (observed in the physical layer measurement) by around 2 seconds. Since handovers occur about every 15 seconds, we expect at most 40 handovers in a 10-minute test. To minimize the possible IPD outliers due to backbone network congestion, we take the largest 30 IPDs out of all the data points within the experimental and control groups, respectively. We then remap the timestamps to the [0s, 15s] period and plot the IPD values in Fig. 6. Each line segment represents an IPD, with IPDs from the experimental group serving as unbiased estimations of handover duration.The time offset of the experimental group is relative to 12s past every minutes, while the control group is relative to 10s.

我们进行了一次为期10分钟的iperf3测量,并分析了那些时间戳落在每分钟[12s-∆, 12s+∆]、[27s-∆, 27s+∆]、[42s-∆, 42s+∆]、[57s-∆, 57s+∆]区间内的数据包。∆的值被设为100毫秒,用作对切换持续时间的估计上限。对切换发生时间有一个粗粒度的估计,可以减少我们需要处理的数据量,并减轻来自骨干网流量的干扰。这些数据点构成了实验组,我们从中提取IPD值。为了更准确地描绘不同时间段内IPD分布的差异,我们使用[10s-∆, 10s+∆]、[25s-∆, 25s+∆]、[40s-∆, 40s+∆]、[55s-∆, 55s+∆]作为对照组,该组的时间与我们观察到的地面真实切换时间(在物理层测量中观察到)大约偏离2秒。由于切换大约每15秒发生一次,我们预计在10分钟的测试中最多会发生40次切换。为了最小化因骨干网拥塞可能导致的IPD异常值,我们分别从实验组和对照组的所有数据点中取出最大的30个IPD值。然后,我们将时间戳重新映射到[0秒, 15秒]的周期内,并在图6中绘制出这些IPD值。每条线段代表一个IPD,其中来自实验组的IPD可作为对切换持续时间的无偏估计。实验组的时间偏移是相对于每分钟的第12秒,而对照组则是相对于第10秒。

Unlike randomly distributed IPDs in the control group, the top-30 IPDs in the experimental group are densely concentrated within the same interval, around [-30,60] ms. These periodic large IPDs are caused by the link outage during the handover. These IPDs can be considered as unbiased estimates of handovers, allowing us to deduce that the estimated handover duration is almost always less than 100 ms.

与对照组中随机分布的IPD不同,实验组中排名前30的IPD密集地集中在同一区间内,大约在[-30, 60]毫秒。这些周期性出现的大IPD是由切换期间的链路中断引起的。这些IPD可以被视为对切换的无偏估计,使我们能够推断出,估计的切换持续时间几乎总是小于100毫秒。

However, while the duration of the link outage remains relatively stable, the starting and ending times vary across measurements. We conduct these measurements from different servers to different clients. The CDF for the starting and ending times of each measurement is depicted in Fig. 7. Clearly, the relative starting and ending times differ across various measurements, where the timestamps are generated by different local clocks. Therefore, time synchronization plays a crucial role in reducing the estimation error of the handovers for different servers and clients.

然而,尽管链路中断的持续时间保持相对稳定,但其起始和结束时间在不同的测量中有所不同。我们从不同的服务器到不同的客户端进行了这些测量。图7描绘了每次测量的起始和结束时间的累积分布函数(CDF)。很明显,在不同的测量中,相对的起始和结束时间是不同的,因为这些时间戳是由不同的本地时钟生成的。因此,时间同步对于减少不同服务器和客户端对切换估计的误差起着至关重要的作用。

To verify the feasibility of truly deterministic estimation of the handover timing, we synchronize the client to a global stable clock through Network Time Protocol (NTP) [25]. Fig. 8 illustrates the CDF for both starting and ending times, which are highly consistent across different measurement tests.

In summary, we can conclude that the handover process in Starlink follows a periodic and deterministic pattern. All the handover events are observed to take place within some specific time intervals: [12s + ∆ 1 , 12s + ∆ 2 ], [27s + ∆ 1 , 27s + ∆ 2 ], [42s + ∆ 1 , 42s + ∆ 2 ], [57s + ∆ 1 , 57s + ∆ 2 ]. Using a conservative setting of ∆ 1 = 15 ms and ∆ 2 = 100 ms, we observe no outliers.

为了验证真正实现切换时间确定性估计的可行性,我们通过网络时间协议(NTP)[25]将客户端与一个全局稳定时钟进行了同步。图8展示了同步后起始和结束时间的CDF,它们在不同的测量测试中表现出高度的一致性。

综上所述,我们可以得出结论,星链中的切换过程遵循一种周期性且确定性的模式。所有切换事件都被观察到发生在一些特定的时间区间内:[12s + ∆1, 12s + ∆2], [27s + ∆1, 27s + ∆2], [42s + ∆1, 42s + ∆2], [57s + ∆1, 57s + ∆2]。在使用一个保守的设置∆1 = 15毫秒和∆2 = 100毫秒后,我们没有观察到任何异常值。

TL; DR

借此机会, 系统学习一下 Google BBR (Princeton Seminar, 2024)


跑题了,现在来总结一下这篇论文的亮点与核心机制:

SatPipe 并非一个全新的传输协议,而是作为现有TCP协议(特指BBR)的一个轻量级内核补丁

最核心的改动是在TCP BBR原有的状态机中,增加了一个名为 “队列维护”(Queue Maintenance)的新阶段

(1) 核心机制:

完全没看懂, 拥塞控制+缓冲管理 独有的令人头大...

(2) 贡献:

  1. 与之前的先进方案SATCP需要依赖地面站提供明确的切换时间戳报告不同 ,SATPIPE是一个完全在TCP两端主机上运行的端到端机制
    1. 独立自主地感知切换
    2. 摆脱了对底层网络设施信令的依赖
  2. SATCP在切换时会保守地将拥塞窗口冻结长达2.3秒, SATPIPE仅在毫秒级的切换窗口内将CWND降至零