Hackers Stole 10 Petabytes from China’s Supercomputer – Here’s How to Stop It From Happening to You + Video

Listen to this Post

Featured Image

Introduction

In what could be one of the largest data thefts ever recorded in China, a hacker using the alias “FlamingChina” claimed responsibility for stealing over 10 petabytes of sensitive information from the National Supercomputing Center in Tianjin (NSCC). The breach, which reportedly occurred over approximately six months without detection, allegedly includes classified defense documents, missile schematics, aerospace engineering research, bioinformatics data, and fusion simulation records. The facility serves more than 6,000 clients across China, including advanced scientific institutions and defense-related organizations, making this incident a critical case study in large-scale data exfiltration and HPC (High-Performance Computing) security failures.

Learning Objectives

  • Understand the attack vector and techniques used to exfiltrate 10+ petabytes of data over six months undetected
  • Implement detection mechanisms for “low and slow” data exfiltration patterns in HPC and cloud environments
  • Apply NIST HPC security frameworks, Munge vulnerability mitigations, and zero-trust principles to protect large-scale computing infrastructure

You Should Know

  1. How the 10-Petabyte Heist Was Executed – The “Low and Slow” Method

According to cybersecurity researchers who communicated with the alleged attacker, the breach did not rely on highly sophisticated zero-day exploits. Instead, the hacker gained initial access through a compromised VPN domain – a surprisingly simple entry point that bypassed perimeter defenses. Once inside, the attacker deployed a botnet of automated programs to systematically extract data over a six-month period.

The key evasion technique was distributing extraction across multiple servers and pulling small amounts of data simultaneously. This “low and slow” approach prevented traditional traffic volume alerts from triggering, as individual data transfers remained below detection thresholds. “Somebody on the defensive side is less likely to notice small amounts of data leaving the system,” explained Dakota Cary, a consultant at SentinelOne.

How to detect and prevent “low and slow” exfiltration:

Linux command to monitor unusual outbound data patterns:

 Monitor per-process network traffic over time to detect slow exfiltration
sudo nethogs -d 5

Track cumulative outbound bytes by destination IP (run over extended period)
sudo tcpdump -i eth0 -nn -c 10000 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr

Set up baseline monitoring for abnormal data flows using auditd
sudo auditctl -w /etc/passwd -p wa -k exfil_detection
sudo ausearch -k exfil_detection --format raw | aureport -f -i

Windows PowerShell command to detect slow data transfers:

 Monitor network connections with data volume tracking
Get-NetTCPConnection | Where-Object {$<em>.State -eq "Established"} | ForEach-Object {
$proc = Get-Process -Id $</em>.OwningProcess -ErrorAction SilentlyContinue
[bash]@{Process=$proc.Name; LocalPort=$<em>.LocalPort; RemotePort=$</em>.RemotePort; RemoteAddress=$_.RemoteAddress}
}

Track cumulative bandwidth usage by application over time
Get-NetAdapterStatistics | Select-Object Name, ReceivedBytes, SentBytes

SIEM detection rule example (Splunk query for “low and slow”):

index=network_traffic sourcetype=firewall 
| stats sum(bytes_out) as total_out, count as connections by src_ip, dest_ip, _time span=1h
| where total_out < 1000000 AND connections > 100
| eval exfil_ratio = total_out / connections
| where exfil_ratio < 10000
| table _time, src_ip, dest_ip, total_out, connections

The attacker’s claim of accessing the system with “comparative ease” underscores a critical lesson: VPN compromise remains one of the most under-defended attack surfaces in enterprise and government networks. Organizations must implement multi-factor authentication (MFA) for all VPN access, conduct regular credential rotation, and deploy continuous session monitoring.

  1. The Scale of the Breach – Understanding 10 Petabytes in Context

To grasp the magnitude: one petabyte equals 1,000 terabytes, while a typical high-performance laptop holds about one terabyte. The stolen 10 petabytes is equivalent to approximately 10 million gigabytes – enough data to fill over 2 million standard DVDs.

The compromised dataset allegedly includes:

  • Classified defense documents and detailed missile schematics
  • Aerospace engineering research and fighter jet test data
  • Bioinformatics and fusion simulation records
  • Internal technical manuals and weapon system schematics
  • Administrative login credentials and internal folder structures

The attacker is reportedly selling limited previews for thousands of dollars, while full dataset access is priced at hundreds of thousands, with payments requested in cryptocurrency. Cybersecurity experts who reviewed sample data noted the presence of documents marked “secret” in Chinese, alongside animated simulations and renderings of defense equipment including bombs and missiles.

Practical exercise: Simulating large-scale data inventory and classification

Linux command to inventory sensitive file locations:

 Find and classify files by size to identify high-value data targets
find /data -type f -size +100M -exec ls -lh {} \; | awk '{print $5, $9}' | sort -hr | head -20

Identify files containing classification markers (customize for your environment)
grep -r -l "CONFIDENTIAL|CLASSIFIED|SECRET|RESTRICTED" /data 2>/dev/null

Generate data inventory report with size and modification timestamps
find /data -type f -printf "%s\t%TY-%Tm-%Td\t%p\n" | sort -nr | head -100 > data_inventory.txt

Windows command to map data repositories:

 Inventory large files across network shares
Get-ChildItem -Path \server\share -Recurse -File | Where-Object {$_.Length -gt 100MB} | 
Select-Object FullName, Length, LastWriteTime | Export-Csv -Path large_files.csv

Check for open shares that may expose data
Get-SmbShare | Where-Object {$_.ShareType -eq "FileSystemDirectory"} | Select-Object Name, Path
  1. HPC Security Vulnerabilities – Why Supercomputers Are Prime Targets

High-Performance Computing environments face unique security challenges that traditional enterprise defenses often fail to address. NIST Special Publication 800-223, released in February 2024, explicitly addresses HPC security architecture, threat analysis, and best-practice recommendations. Key vulnerabilities include:

The Munge Authentication Flaw (CVE-2026-25506): A critical vulnerability discovered in the Munge authentication service, which is ubiquitous in Slurm-managed HPC clusters, allows local adversaries to exfiltrate secret cryptographic keys and impersonate any cluster user. The vulnerability, which persisted in the source code for nearly two decades, carries a CVSS score of 7.7. Every node in an HPC cluster shares an identical secret key; compromising this key grants attackers the ability to forge tokens and execute tasks under any identity across the entire infrastructure.

Remediation commands for Munge vulnerability:

 Check current Munge version
munge -V

Update to patched version (0.5.18 or later)
sudo apt-get update && sudo apt-get install munge  Debian/Ubuntu
sudo yum update munge  RHEL/CentOS

Verify all nodes have updated version
clush -a "munge -V"  For cluster management tools

After update, regenerate shared secret
sudo mungekey -c -f
sudo systemctl restart munge

NIST-recommended HPC security controls:

  1. Network segmentation – Separate different services within the cluster using VLANs or dedicated network interfaces
  2. Compute node sanitization – Clean compute nodes after job completion to prevent data leakage between users
  3. Data encryption – Implement file-level or block-level encryption for HPC storage systems
  4. Specialized authentication – Deploy authentication software specifically designed for HPC multi-user environments

  5. Cloud Infrastructure Abuse – How Attackers Store and Exfiltrate Massive Datasets

One of the most revealing questions from the LinkedIn discussion asked, “Where did the hackers store all of that data?” The answer likely involves distributed cloud infrastructure. The MITRE ATT&CK framework technique T1567 (Exfiltration Over Web Service) describes how adversaries exfiltrate data to cloud storage services rather than over primary C2 channels.

In cloud environments, attackers increasingly use legitimate cloud management APIs and compromised credentials to create copies of storage volumes, databases, and backups – all without deploying malware that might trigger detection. The Otelier breach in 2024 demonstrated this pattern: attackers used compromised employee credentials to access Atlassian servers, obtained credentials for Amazon S3 storage, and exfiltrated 7.8 terabytes of customer data.

Detecting cloud data exfiltration with AWS CLI:

 Monitor S3 bucket data transfer volumes
aws s3api get-bucket-inventory --bucket your-bucket-name

Check CloudTrail for unusual S3 API calls
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=GetObject --start-time "2025-01-01T00:00:00Z"

Identify newly created S3 buckets (potential staging areas)
aws s3api list-buckets --query "Buckets[?CreationDate>='2025-01-01']"

Audit IAM roles with excessive S3 permissions
aws iam list-roles | grep -A 5 -B 5 "s3:"

Azure CLI detection commands:

 Monitor blob storage egress
az storage account show -n yourstorageaccount --expand geoReplicationStats

Check for unusual storage account creations
az resource list --resource-type Microsoft.Storage/storageAccounts --created-time ">=2025-01-01"

Audit storage access logs
az storage logging show --services blob --retention
  1. AI-Powered Anomaly Detection – Catching the “Low and Slow” Exfiltration

Traditional security monitoring relies on static thresholds that attackers can easily learn and bypass. The Tianjin breach succeeded precisely because the attacker understood these thresholds and operated beneath them. AI-driven network observability platforms address this limitation by establishing behavioral baselines and identifying subtle deviations.

cPacket’s AI-driven approach, demonstrated at Security Field Day 13 in May 2025, uses machine learning and unsupervised anomaly detection to analyze trillions of packets and billions of sessions. The system can identify both burst and slow-drift data transfers by monitoring session lengths and data volumes – precisely the technique used in the Tianjin heist.

Implementing basic anomaly detection for data exfiltration:

Python script for baseline modeling:

import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
from datetime import datetime, timedelta

Load network flow data (bytes_out per session)
data = pd.read_csv('network_flows.csv')
data['bytes_out'] = data['bytes_out'].astype(float)

Train Isolation Forest on historical baseline
model = IsolationForest(contamination=0.01, random_state=42)
data['anomaly'] = model.fit_predict(data[['bytes_out', 'duration']])

Flag anomalies where bytes_out is consistently low but duration is high (slow exfil)
slow_exfil = data[(data['anomaly'] == -1) & (data['bytes_out'] < 1000000) & (data['duration'] > 3600)]
print(f"Potential slow exfiltration sessions detected: {len(slow_exfil)}")

Setting up Zeek (formerly Bro) for exfiltration detection:

 Install Zeek for network traffic analysis
sudo apt-get install zeek

Configure custom script to detect slow data transfers
cat > /usr/local/zeek/share/zeek/site/slow_exfil.zeek << 'EOF'
event connection_state_remove(c: connection)
{
if ( c$orig$num_bytes_ip > 0 && c$orig$num_bytes_ip < 1000000 && 
c$duration > 3600 && c$resp$state == 1 )
{
print fmt("SLOW_EXFIL: %s -> %s : %.2f bytes in %.0f seconds", 
c$id$orig_h, c$id$resp_h, c$orig$num_bytes_ip, c$duration);
}
}
EOF

Load the script and start Zeek
echo '@load site/slow_exfil.zeek' >> /usr/local/zeek/share/zeek/site/local.zeek
sudo zeekctl deploy
  1. Legal and Regulatory Framework – China’s Cybersecurity Event Reporting Requirements

This breach occurs at a critical moment in China’s cybersecurity regulatory evolution. On November 1, 2025, the Measures for the Administration of National Cybersecurity Incident Reporting took effect, establishing stringent mandatory reporting requirements. Key provisions include:

  • Critical Information Infrastructure (CII) operators must report incidents within 1 hour of discovery
  • Major and particularly serious incidents require protection work departments to report to national authorities within 30 minutes
  • Reports must include incident type, impact assessment, attack path analysis, and mitigation measures
  • Delayed, omitted, false, or concealed reporting faces severe penalties under relevant laws

The Tianjin supercomputer qualifies as CII, meaning the responsible authorities were legally obligated to report this incident within strict timeframes. The absence of official confirmation at the time of writing raises significant compliance questions.

What Undercode Say

  • The “low and slow” exfiltration technique defeated traditional perimeter defenses – The Tianjin breach succeeded not through sophisticated exploits but by operating beneath detection thresholds. Organizations must implement behavioral analytics that identify prolonged, low-volume data transfers, not just high-volume spikes.

  • VPN compromise remains the most under-defended attack vector – Initial access through a compromised VPN domain highlights a critical gap in remote access security. Multi-factor authentication alone is insufficient; continuous session monitoring and anomaly detection for VPN traffic are essential.

  • HPC environments require specialized security frameworks – Traditional enterprise security controls fail to address the unique architecture of high-performance computing. NIST SP 800-223 provides a standardized reference architecture that organizations should adopt immediately.

The 10-petabyte heist represents a paradigm shift in data breach thinking: attackers no longer need to steal everything at once. By distributing extraction across months and multiple servers, they can exfiltrate nation-state scale data without ever triggering an alert. This incident should serve as a wake-up call for any organization managing large datasets – from financial institutions to research laboratories to government agencies.

Prediction

This breach will likely trigger a global reassessment of HPC security standards. Expect accelerated adoption of NIST SP 800-223 across government and research institutions worldwide within 12-18 months. Additionally, the commercial market for AI-powered behavioral analytics – particularly solutions that detect “low and slow” exfiltration patterns – will see significant growth, potentially reaching $5-7 billion by 2028. China’s mandatory cybersecurity incident reporting framework, effective November 2025, will face its first major stress test as authorities navigate the geopolitical implications of confirming or denying this breach. Finally, anticipate increased regulatory scrutiny of VPN security postures globally, with new mandates for continuous authentication and session behavioral monitoring emerging from multiple national cybersecurity agencies.

▶️ Related Video (74% Match):

🎯Let’s Practice For Free:

IT/Security Reporter URL:

Reported By: Cybersecuritynews Share – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeTesting & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky