HomeBlogAplenty Icon PublicationNew anti-phishing system finds scam networks by mapping domains, IPs, and certificates

New anti-phishing system finds scam networks by mapping domains, IPs, and certificates

by mcsvtln@gmail.com

Jun 15, 2026

Aplenty Icon Publication

Phishing sites do not always look dangerous when security tools arrive. Sometimes they show an error page. Sometimes they redirect to a real company website. Sometimes they simply refuse to respond.

That cat-and-mouse problem has helped online scams stay one step ahead of many defenses. Now a team at Tokyo Metropolitan University says it has built a system that uses that evasive behavior itself as a clue, then works outward to uncover the broader phishing campaign behind it.

The system, called PhishLumos, does not start by asking whether one suspicious web link is good or bad. Instead, it treats hidden or misleading content as a signal to inspect the website’s surrounding infrastructure, including domains, IP addresses, certificates, and related network connections. The goal is to map out the campaign, not just judge one link in isolation.

In tests on 103 real phishing campaigns, the system identified malicious activity an average of 8 days before expert verification. In a separate six-month real-world study, rules generated from 600 difficult starting links led to the discovery of 192,407 additional URLs, and 92% were later flagged as malicious by at least one engine in VirusTotal.

Overview of the problem setting and our approach. A phishing URL may show malicious content to victims but benign or error content to scanners due to cloaking or selective blocking. (CREDIT: IEEE Access)

When the page tells you almost nothing

Phishing remains one of the most common forms of cybercrime. The FBI’s Internet Crime Complaint Center received more than 880,000 complaints in 2023, with potential losses exceeding $12.5 billion. Phishing and spoofing were the most frequently reported complaint category.

The damage does not fall evenly. Older adults and people with limited digital literacy are often hit hardest, a pattern that can deepen the digital divide and weaken trust in banking, e-commerce, and other essential online services.

Most existing defenses still react link by link. They fetch a page, examine its text, images, or layout, and try to decide whether it is malicious. Those methods can work when a phishing page is plainly visible. But attackers increasingly rely on cloaking and selective blocking, showing one version to victims and another to automated scanners.

That creates what the researchers call content-inaccessible cases. In those situations, a scanner might see only a timeout, a benign redirect, or an HTTP 403 or 404 response. The page content, if it is visible at all, may not be representative. According to the study, 77.0% of URLs in the retrieved scan records fit that description, and 82.5% of campaigns included them.

For PhishLumos, that is not a dead end.

PhishLumos system architecture and operational workflow. (CREDIT: IEEE Access)

Following the campaign instead of the single link

The research team, led by Associate Professor Daiki Chiba, designed PhishLumos as an adaptive multi-agent system. It begins with a single seed URL, then shifts to infrastructure evidence when content appears missing, deceptive, or blocked.

The system builds what the authors describe as a typed property graph, a knowledge base that links URLs to domains, IP addresses, autonomous system numbers, certificates, issuers, scan records, HTTP responses, and other related entities. From there, a supervising agent decides which piece of the graph to investigate next.

That matters because phishing campaigns often reuse parts of their infrastructure, even when individual sites come and go quickly. A domain may point to the same IP address as another known scam. A certificate may connect several sites. A pattern in an initial URL may keep appearing even when the landing page changes.

Rather than labeling one URL at a time, PhishLumos tries to characterize the campaign itself and generate detection rules that can be deployed in security systems. Those rules can then be translated into formats used by tools such as intrusion detection systems, security information and event management platforms, and threat-sharing systems.

The researchers argue that this changes the balance between attackers and defenders. A criminal group can spin up many phishing sites cheaply. Defenders, meanwhile, often have to inspect them one by one. A campaign-level system can look for the shared fingerprints that tie those sites together.

Knowledge Base (KB) modeled as a typed property graph schema. (CREDIT: IEEE Access)

Faster detection, broader reach

On the curated dataset of 103 analyst-verified campaigns, covering 6,020 unique URLs, PhishLumos achieved median campaign coverage of 100%. For at least half the campaigns, the rules it generated matched all ground-truth URLs.

It also found 77,391 additional URLs not present in the original curated dataset, an average of 751.4 new URLs per campaign. Of those, 62,321, or 80.5%, were later flagged as malicious by at least one VirusTotal engine. The study notes that this should be viewed as a conservative lower bound rather than a definitive label.

The system’s median detection lead time was 192.8 hours, roughly 8 days, before expert verification. Median analysis time was 60.4 seconds per seed URL.

Its strongest advantage appeared when conventional content-based tools were most limited. In content-inaccessible cases, PhishLumos achieved an F1 score of 0.994, with 1.000 recall and a false positive rate of 0.001. The content-based baselines fell sharply in the same setting. An infrastructure-only machine learning baseline improved recall, but did so with a much higher false positive rate.

The study also found that phishing rule patterns have been changing. By early 2025, PhishLumos relied more heavily on initial URL patterns and less on IP addresses or content-derived clues, a shift the authors say is consistent with growing use of cloaking, redirects, and shared hosting.

Cost–coverage trade-off across four LLM backends. Each point represents the mean campaign coverage and mean LLM cost per campaign for a given model. (CREDIT: IEEE Access)

What the system can and cannot do

PhishLumos is not meant to inspect live internet traffic at line speed. The researchers describe it as an analyst-facing offline tool, one that helps investigate a small number of high-priority suspicious links and turn them into reusable mitigation rules.

That design helps keep the expensive reasoning step separate from lightweight enforcement. It also creates an auditable trail showing where the evidence came from and why a rule was generated.

The system still has limits. It depends on observable infrastructure reuse, historical web scans, passive DNS, and certificate logs. Campaigns that use disposable infrastructure, rotate too quickly, or hide on compromised high-reputation services may not yield rules that are both accurate and safe to deploy. In the six-month in-the-wild test, rules were generated for 322 of 600 seed URLs, or 53.7%.

The authors also describe the work as dual-use, since it analyzes phishing infrastructure and produces deployable detection rules. They say examples were sanitized and that the implementation and datasets are being provided under a responsible access model.

Practical implications of the research

For defenders, the study points to a different way of thinking about phishing. When a suspicious page hides its content, that no longer has to mean the investigation stops. It can become the starting signal for a broader search.

That matters in practical terms because the system outputs rules that can feed existing security controls and threat-sharing workflows, including hunting, blocking, takedown requests, and coordination with national response teams or service providers. It also reduces the need to repeatedly analyze similar malicious links one by one.

The broader message is simple: if phishing campaigns increasingly hide the page, defenders may need to follow the infrastructure instead.

Research findings are available online in the journal IEEE Access.

The original story “New anti-phishing system finds scam networks by mapping domains, IPs, and certificates” is published in The Brighter Side of News.