跳转至

On the resilience of cellular networks: how can national roaming help?

Abstract

Cellular networks have become one of the critical infrastructures, as many services depend increasingly on wireless connectivity. Therefore, it is important to quantify the resilience of existing cellular network infrastructures against potential risks, ranging from natural disasters to security attacks, that might occur with a low probability but can lead to severe disruption of the services. In this paper, we combine models with public data from national bodies on mobile network operator (MNO) infrastructures, population distribution, and urbanity level to assess the coverage and capacity of a cellular network at a country scale. Our analysis offers insights on the potential weak points that need improvement to ensure a low fraction of disconnected population (FDP) and high fraction of satisfied population (FSP). As a resilience improvement approach, we investigate in which regions and to what extent each MNO can benefit from infrastructure sharing or national roaming, i.e., all MNOs act as a single national operator. As our case study, we focus on Dutch cellular infrastructure and model risks as random failures and correlated failures in a geographic region. Our analysis shows that there is a wide performance difference across MNOs and geographic regions in terms of FDP and FSP. However, national roaming consistently offers significant benefits in some regions, e.g., up to 13% improvement in FDP and up to 55% in FSP when the networks function without any failures. We then show that a similar performance improvement can be obtained by partial implementation of national roaming.

蜂窝网络已成为关键基础设施之一,越来越多的服务依赖无线连接。因此,评估现有蜂窝网络基础设施在面对各种潜在风险时的弹性至关重要,这些风险虽然发生概率较低,却可能导致严重的服务中断。本文结合模型与来自国家机构的公开数据,包括移动网络运营商(MNO)基础设施、人口分布及城市化程度,对国家层面蜂窝网络的覆盖能力与容量进行了评估。我们的分析揭示了当前网络系统中可能存在的薄弱环节,这些环节需要改进以 确保低比例的断连人口(FDP)和高比例的服务满意人口(FSP)。

作为提升网络弹性的一种手段,我们研究了在不同地区, 每个运营商通过基础设施共享或国家漫游(即所有MNO作为一个统一的国家级运营商运作)能够获得多大程度的益处。

本文以荷兰的蜂窝基础设施为案例研究对象,并将风险建模为地理区域内的随机故障或相关故障。我们的分析表明,在FDP与FSP方面,不同MNO及地理区域之间存在较大性能差异。然而, 在某些地区,国家漫游始终能带来显著优势,例如在无故障场景下可将FDP改善多达13%、FSP提升高达55% 。进一步分析表明,通过部分实施国家漫游,也可以获得相似的性能提升。

Tip
  • FDP: fraction of disconnected population, 断连人口
  • FSP: fraction of satisfied population, 服务满意人口

Introduction

Cellular networks play a key role in today’s communications, as many services depend on the proper functioning of these infrastructures. However, they can be vulnerable to failures resulting from various sources such as large-scale natural disasters including earthquakes [45] and wildfires [20], cyberattacks on the network infrastructure [10], [47], or regional power shortages [14]. These events will either affect certain regions, such as earthquake areas [45], or can be randomly spread (e.g., hardware-related failures). Indeed, the functioning of cellular networks becomes even more important during such failure events, e.g., for rescue and recovery in the aftermath of disasters. The key question then is: what is the coverage and capacity that a mobile network operator (MNO) can provide, given some links or network nodes do not function?

蜂窝网络在当今通信体系中发挥着关键作用,众多服务的正常运行依赖于其基础设施的稳定性。然而,该类网络易受多种因素引发的故障影响,包括大规模自然灾害(如地震 [45]、野火 [20])、针对网络基础设施的网络攻击 [10][47],以及区域性的电力短缺 [14]。这些故障可能集中发生于特定区域(如地震带 [45]),也可能以随机方式分布(如硬件故障)。事实上,在上述事件发生期间,蜂窝网络的稳定运行变得尤为重要,例如在灾后救援和恢复工作中具有不可替代的作用。因此,一个关键问题在于:在部分链路或网络节点失效的情况下,移动网络运营商(MNO)能够提供多大的覆盖范围和服务能力?

While the resilience literature is broad in other areas of critical infrastructures, to the best of our knowledge, there are only few studies on quantifying a cellular network’s resilience at a national scale such as [34] and [61], the former defining resilience as “the maximum number of sites that can fail before the performance metric of interest falls below a minimum acceptable threshold”, and the latter using the number of served users as the resilience metric. Since both ensuring coverage and satisfying quality of service (QoS) are important, we use fraction of disconnected population (FDP) and fraction of satisfied population (FSP) considering data services to quantify the resilience of an MNO. Combining cell tower data with population density statistics as well as urbanity levels in the Netherlands, we investigate the current state of the Dutch MNOs 1 and then study their resilience to (i) random failures which could occur due to human errors and (ii) failures confined to a certain geographical region occurring due to disasters. The insights from our analysis can help to improve the MNO infrastructures to absorb crises or to recover quickly from their effects.

尽管在其他关键基础设施领域已有大量关于系统弹性的研究,但据我们所知,目前仅有少数几项工作尝试在国家层面量化蜂窝网络的弹性,例如文献 [34] 和 [61]。其中,[34] 将网络弹性定义为“在性能指标下降至最低可接受阈值之前可容忍的最大站点失效数”;而 [61] 以服务用户数量作为弹性度量。鉴于覆盖范围和服务质量(QoS)均为重要指标,本文引入“断连人口比例”(FDP)与“满意人口比例”(FSP)作为度量MNO弹性的核心指标,重点考虑数据服务场景。我们结合荷兰的基站数据、人口密度统计信息及城市化程度,首先评估该国MNO的现状,并进一步分析其在以下两类故障场景下的弹性:(i)由于人为错误导致的随机故障;(ii)因自然灾害在地理区域内集中发生的相关性故障。本研究提供的分析结果可为改进网络基础设施以增强其应对危机的能力或加速灾后恢复提供数据支撑。

While there are various ways to improve the resilience of a cellular network, one approach is national roaming which facilitates MNOs to use the infrastructure of each other when needed, e.g., in case its own infrastructure is not functional or does not suffice to serve with the required service levels. For increasing data coverage, national roaming has already been widely adopted in countries such as India [12] where MNOs often operate in certain parts of the country or where infrastructure is not as ubiquitous. In 2022, the Federal Communications Commission (FCC) in the US has introduced roaming under disasters (RuD) based on bilateral agreements between MNOs that requires MNOs to serve each other’s users in case of service disruptions due to the infrastructure damage during disasters and emergencies [29]. Moreover, in 2022 after national infrastructures are damaged considerably due to the Russian invasion, MNOs in Ukraine implemented national roaming [15]. Prior studies such as [27] advocate national roaming for more resilient cellular infrastructures, and studies such as [52] investigate different modes of MNO network sharing. However, to the best of our knowledge, there is no study quantifying the gains in coverage, capacity, and resilience facilitated by national roaming. Toward filling this gap, we quantify in this paper the gains when MNOs work together to serve all the citizens as a single national operator. Furthermore, to provide some insights to the MNOs or to the national telecommunication regulatory bodies on in which areas and which technologies they could prioritize such roaming agreements, we analyze implementation of roaming in a more restricted way such as in only urban areas or rural areas, or considering a certain technology such as 4G or 5G.

在提升蜂窝网络弹性的多种方法中,国家漫游是一项具有实际可行性的策略。该机制允许MNO在自身基础设施失效或无法满足服务需求时使用其他运营商的基础设施。在提升数据服务覆盖方面,国家漫游已在印度等国家广泛应用 [12],特别是在运营商地域分布不均或基础设施覆盖不足的地区。2022年,美国联邦通信委员会(FCC)通过引入“灾害漫游”(Roaming under Disasters,RuD)机制,基于MNO间的双边协议,要求在灾害和紧急情况下,一方必须为另一方的用户提供服务,以应对基础设施损毁导致的服务中断 [29]。此外,乌克兰在2022年因俄乌战争造成国家基础设施严重受损后,也实施了国家漫游机制 [15]。已有研究如 [27] 倡导将国家漫游作为构建更具弹性的蜂窝网络的手段;[52] 则探讨了不同类型的网络共享模式。然而,据我们所知,目前尚无文献系统量化国家漫游在覆盖能力、网络容量与网络弹性方面的潜在收益。为填补这一研究空白,本文量化了多个MNO协同服务、构建统一国家级网络所带来的益处。此外,为便于MNO或国家电信监管机构了解在哪些区域或技术维度应优先推动漫游协议的签署,我们进一步分析了国家漫游在城市/乡村区域以及特定接入技术(如4G、5G)下的分区域部署效果。

To summarize, our goal in this paper is to address the following questions:

• What is current coverage and capacity of MNOs in the Netherlands? Are there regional differences (e.g, across cities) or differences among the MNOs?

• What would be the performance gain if all MNOs act as a single national operator and in which regions will this provide the highest gains?

• How resilient are MNOs to various types of failures and how much can national roaming help to ensure resilience of the MNOs against failures?

• Which technologies (3G, 4G, 5G) or which areas (urban vs. rural) should MNOs consider first implementing such roaming agreements for ensuring the highest benefits in terms of coverage and user satisfaction for data services? In a nutshell, our contributions are as follows:

• We provide an approach to assess the resilience of a cellular network using public data and models on coverage and capacity. To reflect the effect of disruptions on the citizens, we use FDP and FSP as our metrics.

• Using publicly-available data from national bodies in the Netherlands, we assess the current state of the Dutch cellular networks on both province and municipality level.

• We show that national roaming leads to significant benefits in some regions for both FSP and FDP while there is a large performance difference across MNOs and geographic regions. These areas with high performance gains from national roaming could be considered first by MNOs to enter such roaming agreements. Moreover, we provide an analysis of roaming implemented only for certain cellular technologies, e.g., 3G and 5G, and in urban vs. rural areas in case MNOs prefer national roaming in a more limited way rather than sharing their all network infrastructures.

• We model two risk scenarios to investigate the resilience of the Dutch MNOs with and without national roaming. Our analysis suggests that MNOs are resilient against isolated failures owing to high base station (BS) density. That is, FDP remains roughly the same. On the contrary, FSP decreases due to the increased number of users served by the surviving BSs. Meanwhile, the impact of correlated failures is more drastic due to BSs in the same region becoming dysfunctional simultaneously. To ensure resilience in such cases, alternative approaches are paramount, e.g., aerial connectivity or cells on wheels. The rest of the paper is organized as follows. First, Section II provides an overview of the related work on resilience metrics and analysis of communication networks and national roaming. Then, Section III introduces the considered system model which is followed by Section IV that presents the definition of the metrics used in our analysis. Next, Section V presents a case study of Dutch cellular networks 2 and publicly-available data, i.e., on MNOs, population statistics, and urbanity levels of different areas. Section VII also considers the failures and how they will affect the MNOs. Finally, Section VIII discusses the limitations of our work and Section IX draws conclusions.

综上,本文旨在解答以下几个关键问题:

  • 荷兰当前各MNO的覆盖范围和服务能力如何?不同地区(如各城市)或不同MNO之间是否存在显著差异?
  • 若所有MNO作为统一国家运营商运行,网络性能将有何改善?哪些地区将从中获益最多?
  • 当前MNO在面对不同类型故障时的弹性如何?国家漫游在提升其弹性方面能发挥多大作用?
  • 若需优先部署国家漫游协议,MNO应选择哪些接入技术(3G、4G、5G)或哪些区域(城市/乡村),以最大化数据服务的覆盖与用户满意度?

本文的主要贡献如下:

  • 提出了一种基于公开数据和覆盖容量模型评估蜂窝网络弹性的方法。为衡量网络中断对用户的影响,我们引入FDP和FSP两个指标。
  • 基于荷兰国家级公开数据,从省级和市政级两个层面评估了该国蜂窝网络的现状。
  • 显示了国家漫游在部分地区可显著改善FSP与FDP指标,且MNO之间及不同地区间存在显著性能差异。这些网络性能收益高的地区应被优先纳入漫游协议范围。此外,我们分析了仅在特定接入技术(如3G、5G)或特定区域(如城市或乡村)中实施国家漫游的方案,为MNO选择更具针对性的共享策略提供指导。
  • 建立了两类风险场景以研究荷兰蜂窝网络在启用与未启用国家漫游时的弹性表现。分析结果表明,由于基站密度较高,MNO在应对孤立故障方面具有较强弹性,FDP基本保持稳定;但FSP将下降,原因在于剩余基站需服务更多用户。而在相关性故障场景下,由于区域内多个基站同时失效,影响更为严重。此类情形下,需引入替代性技术方案,如空中通信系统或移动基站(Cells on Wheels, COW),以保障网络弹性。

文章余下部分安排如下:

  • 第二节介绍与蜂窝网络弹性度量与国家漫游相关的已有研究
  • 第三节描述本文所采用的系统模型
  • 第四节定义分析中所用的各项指标
  • 第五节则基于荷兰蜂窝网络及其公开数据展开案例研究
  • 第七节进一步分析不同类型故障对MNO的影响
  • 第八节讨论本研究的局限性
  • 第九节总结全文