The SKA Will Generate 700 Petabytes Per Year Across Two Continents. The Routing Problem Is Familiar.

For radio astronomy data engineers, SKA infrastructure teams, VLBI correlator architects, and anyone building distributed signal synthesis pipelines at exabyte scale.

The Number You Already Know

If you work in radio interferometry, you already know this number:

Baselines = N(N-1)/2

Enter fullscreen mode

Exit fullscreen mode

For the Event Horizon Telescope's 8 stations: 28 baselines. For SKA-Mid's 197 dishes: 19,306 baselines. For SKA-Low's 512 stations: 130,816 baselines.

Every baseline is a pair of telescopes whose signals must be cross-correlated to synthesize an image. The computational cost of correlation scales as O(N²) in the number of antennas. This is the fundamental scaling law of interferometry — and it is identical to a scaling law discovered in a completely different domain.

On June 16, 2025, Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm (QIS) protocol while working on distributed health data routing. The core insight: when N nodes each generate validated outcomes, the number of unique pairwise synthesis opportunities is N(N-1)/2 — and there exists a complete architecture that activates those synthesis paths at O(log N) communication cost per node, not O(N²).

The math is the same. The architectural problem is structurally identical. The domains are different.

What Radio Astronomy Routes Today

VLBI works by recording radio signals independently at each telescope, timestamped by hydrogen maser atomic clocks to sub-nanosecond precision. The signals are then correlated — either by physically shipping hard drives to a central correlator facility, or via e-VLBI real-time streaming over dedicated fiber networks.

The DiFX software correlator at the Max Planck Institute for Radio Astronomy in Bonn (one of two primary EHT correlation facilities, alongside MIT Haystack Observatory) processes these signals in an FX architecture: Fourier-transform each station's signal first, then cross-multiply every pair in frequency space. A master node (FxManager) coordinates data management nodes (Datastream) that feed time-sliced baseband data to processing nodes (Core).

For the EHT's M87* black hole image in April 2017, each of 8 telescopes recorded at 64 Gbps, producing approximately 350 terabytes per day per station — 3.5 petabytes total across the campaign. The data was physically shipped on banks of helium-filled hard drives (half a metric ton of storage hardware). Data from the South Pole Telescope could not be retrieved until Antarctic winter ended — an 8-month latency for one station's contribution.

The SKA operates at a different scale entirely. SKA-Low's 131,072 antennas across 512 stations in Western Australia generate approximately 8 Tbit/s into the Central Signal Processor. SKA-Mid's 197 dishes in South Africa generate approximately 20 Tbit/s. After digitization, the combined raw signal rate approaches 2 petabytes per second. The observatory will archive over 700 petabytes per year of processed science data products.

The SKA's architecture reflects the impossibility of centralizing this data:

Tier 1: On-site Central Signal Processors (CSP) — real-time correlation Tier 2: Science Data Processors (SDP) — ~135 PFlops each, Perth + Cape Town Tier 3: SKA Regional Centres (SRCNet) — federated global data centres

Enter fullscreen mode

Exit fullscreen mode

The SRCNet is explicitly federated: computation moves to where the data resides, not the reverse. Data flows between SRC nodes via dedicated 100 Gbit/s intercontinental links, managed by Rucio (inherited from CERN/WLCG). Nodes are being established across Australia, South Africa, the UK, Canada, India, and other member states.

This three-tier architecture — on-site processing, central supercomputer, federated regional centres — is structurally identical to a federated health data architecture of hospital-edge processing, national data processor, and federated international health data network.

The Architectural Parallel

| Dimension | Radio Astronomy (SKA/VLBI) | Distributed Health Data (QIS) |
| --- | --- | --- |
| Data generation | Distributed antennas generate signals independently | Distributed clinical sites generate patient outcomes independently |
| Pairwise synthesis | N(N-1)/2 baseline correlations between antenna pairs | N(N-1)/2 outcome synthesis paths between site pairs |
| Transport constraint | Physical shipping or dedicated 100 Gbit/s fiber | HIPAA/GDPR regulatory boundaries, network capacity |
| Central vs. distributed | SKA moved from central correlator to federated SRCNet | QIS: peer-to-peer synthesis, no central aggregator |
| Real-time decisions | SDP decides what data to keep vs. discard in real-time | Clinical decision support acts on streaming patient data |
| Raw data movement | Raw radio signals \= white noise, incompressible, must be reduced before transport | Raw patient records \= identifiable, regulated, must be distilled before routing |
| Governance | IGO with member states, data access policies per SRC | HIPAA/GDPR, institutional ethics boards, national health authorities |
| The fundamental problem | Synthesize a coherent picture from distributed, independent, heterogeneous signals | Synthesize coherent intelligence from distributed, independent, heterogeneous outcomes |

The deepest parallel is in the routing constraint. VLBI cannot transmit raw baseband data from every telescope to every other telescope — the bandwidth is physically impossible. Instead, it routes signals to a correlator that computes pairwise products. SKA's SRCNet goes further: it routes processed data products to federated centres where computation meets data.

QIS does the same thing at the outcome level. Raw patient data cannot leave each institution — regulation prohibits it. Instead, each node distills its validated outcome into a compact packet (\~512 bytes) and routes it to peer nodes via semantic addressing. The routing cost is O(log N) or better per node. The intelligence scales as N(N-1)/2 pairwise synthesis paths. No raw data moves. Only distilled outcomes route.

Where the Analogy Holds and Where It Breaks

Where it holds:

The N(N-1)/2 scaling is real in both domains. In interferometry, every pair of telescopes produces a unique baseline measurement that contributes to the synthesized image. In QIS, every pair of nodes that shares a semantic address can synthesize each other's validated outcomes. The combinatorial structure is identical.

The federated architecture is real in both domains. SKA's SRCNet and QIS both route processed results between distributed nodes, with computation happening locally. Neither moves raw data to a central facility.

The transport-agnosticism is real in both domains. SKA uses physical shipping, e-VLBI, dedicated fiber, and Rucio-managed transfers depending on the scenario. QIS is explicitly transport-agnostic — DHT routing at O(log N), database index at O(1), pub/sub, REST APIs, or any mechanism that can deposit and query packets at deterministic addresses.

Where it breaks:

Radio interferometry cross-correlates raw signals to produce visibility measurements — a physics operation that requires the actual waveform data from both stations. This is genuinely O(N²) in communication because both endpoints' full data must meet at the correlator.

QIS does not cross-correlate raw data. It routes pre-distilled outcome packets — fixed-size summaries that were computed locally. The "synthesis" in QIS is aggregation over validated statistics, not signal cross-correlation. This is why QIS achieves O(log N) communication where interferometry requires O(N²): the distillation step compresses each node's contribution to a fixed-size packet before routing.

The analogy is architectural, not operational. Both domains face the same structural challenge — N(N-1)/2 pairwise synthesis across distributed, independently-operating nodes — but the mechanisms differ because the data types differ. Radio signals require waveform-level correlation. Clinical outcomes require statistical aggregation. Both produce quadratic synthesis from distributed inputs.

What QIS Offers Radio Astronomy

QIS was discovered in the context of health data routing, but the architecture is domain-agnostic. The protocol specification describes no healthcare-specific mechanisms — it describes a loop: distill locally, fingerprint semantically, route to matching addresses, synthesize locally, repeat.

For radio astronomy data infrastructure, this loop applies at the science product layer — above the signal processing tier, where correlated and calibrated results exist as structured data products:

Distill: After a VLBI observation is correlated and calibrated, the validated result (source flux, position, spectral index, variability metric) is distilled into a compact outcome packet.
Fingerprint: The packet is addressed using astronomical coordinates (RA/Dec), frequency band, observation epoch, and source identifier — a deterministic semantic address. Every observatory working on the same source at the same frequency produces the same fingerprint.
Route: The packet is deposited at its semantic address in a shared address space (potentially integrated with SRCNet's Rucio-managed infrastructure). Any SRC node or research group working on the same source queries the address and retrieves validated results from peer observatories.
Synthesize locally: Each node aggregates incoming results with its own observations. Multi-epoch variability studies, spectral energy distributions across bands, and transient follow-up coordination all benefit from continuous cross-observatory result routing.
Loop: The synthesis produces a new, enriched result that can itself be routed — creating a continuously compounding intelligence network across the global observatory infrastructure.

This is not a replacement for the signal processing pipeline. It is a layer above it — routing the validated science products that the pipeline produces, so that the next observation at any facility starts with the accumulated intelligence from every previous observation of the same source.

The B3D Opportunity

The Big Bang to Big Data (B3D) cluster — led by MPIfR and comprising 8 institutions across North Rhine-Westphalia including the Argelander-Institut für Astronomie at the University of Bonn — was established specifically to pool distributed expertise in radio astronomy and data science. Its mission: develop automated data processing using Big Data and AI methods for radio astronomical data streams.

B3D's architecture already distributes computation across institutional boundaries. QIS outcome routing is the protocol that connects the outputs of those distributed computations — so that a machine learning model trained at Forschungszentrum Jülich and a spectral analysis pipeline at Ruhr-Universität Bochum can route their validated results to each other through semantic addressing, without either institution centralizing its data.

The N(N-1)/2 math applies directly: 8 B3D institutions create 28 pairwise synthesis paths. Each path is an opportunity for one institution's validated result to enrich another institution's next analysis. Without outcome routing, each institution's results exist in publications and internal databases — discoverable by humans reading papers, but not by infrastructure routing signals.

The Scaling Law Is Domain-Agnostic

QIS was discovered in healthcare. Its first 200+ published articles focus on clinical data routing. But the architectural claim has never been domain-specific:

N nodes generating validated outcomes create N(N-1)/2 pairwise synthesis opportunities. A protocol that routes distilled outcome packets via semantic addressing activates those synthesis paths at O(log N) or better communication cost per node.

This applies wherever distributed nodes independently generate outcomes that could inform each other:

Hospitals generating treatment outcomes → health data routing
Telescopes generating observational results → astronomical data routing
Climate sensors generating measurements → environmental intelligence
Industrial facilities generating quality metrics → manufacturing intelligence

The math does not change. The distillation step adapts to the domain. The routing protocol is the same.

The Discovery

Christopher Thomas Trevethan discovered the Quadratic Intelligence Swarm protocol on June 16, 2025. The breakthrough is the complete architecture — the loop that enables real-time quadratic intelligence scaling without compute explosion, not any single component. 39 provisional patents filed. Humanitarian licensing ensures the protocol is free forever for nonprofits, research institutions, and educational use.

For distributed data architects: the QIS Protocol specification, the Yao communication complexity rebuttal, and the 20 most common technical questions are published.

This is part of an ongoing series on QIS — the Quadratic Intelligence Swarm protocol — documenting every domain where distributed outcome routing closes a synthesis gap that existing infrastructure cannot close.