IPinfo - Comprehensive IP address data, IP geolocation API and database
21 days ago by Tom Mount 4 min read

Detecting Invalid Traffic at 98.5% Accuracy with Last Digit Anonymized IPs

Detecting Invalid Traffic at 98.5% Accuracy with Last Digit Anonymized IPs

Get Unlimited Access to IPinfo Lite

Start using accurate IP data for cybersecurity, compliance, and personalization—no limits, no cost.

Sign up for free

In adtech, one core problem remains constant: platforms spend billions on impressions that never reach real users. Invalid traffic (IVT) not only drains ROI but also distorts campaign metrics, making it difficult for advertisers to gauge true performance.

Lately, this battle has become even more complex. To protect user privacy, many of the data feeds that DSPs and SSPs rely on now truncate IP addresses. This means instead of seeing a specific address like 1.2.3.4, you only see 1.2.3.0. 

When adtech companies first came to us at IPinfo asking for a way to score these anonymized IPs, I knew we had to find a way to let them know, before they bid, if they were bidding on a non-human source.

The Technical Challenges of Last Digit IP Anonymization

When the last part of an IP address is hidden, you aren't looking at one IP address anymore. You are looking at a group of 256 possible addresses.

Normally, my team and I would use our data to identify non-human traffic by looking for specific flags: Is the IP associated with a hosting data center? Is it a known VPN? Has it been recently active on a residential proxy network? The problem is that these attributes aren't always the same for every single address in that 256-address block. 

Without a specific IP to look up, most DSPs are forced to guess and simply hope for the best. They pick a few random IP addresses in that range, look them up individually, consider details like the ASNs involved, and try to guess if it's on some sort of hosting infrastructure or not. None of that will reveal anything more than surface-level data, however, like whether the IP address is affiliated with a VPN or proxy.

Spot the Fakes. Reduce Ad Waste.

Use IP data to flag VPNs, proxies, and traffic anomalies.

Download the IVT One-Pager

But we discovered a way to make a more calculated decision based on aggregated signals.

How We Reconstructed the Lost Context

To overcome this challenge, I developed a way to look at the data "coarsely" rather than individually. I decided to aggregate our privacy and residential proxy data at the /24 level (the 256-address block) to find the coverage of our privacy signals across that block.

Here is how I structured the process:

  1. Data Aggregation: I identified key fields, like is_vpn, is_proxy, and is_hosting, and counted how many IPs in that block were marked as "true."
  2. Coverage Scoring: I calculated a value based on the ratio of flagged IPs (e.g., number of flagged IPs / 256).
  3. The 50% Rule: For the pre-bid process, I established a baseline: if a coverage value is 0.50 or higher, the platform should treat the hidden IP as likely "true" for that flag.

By compiling this into a single MMDB lookup file, I was able to check these coverage values for the anonymized IPs instantly during the pre-bid stage.

Why "Coarse" Data Still Delivers High Accuracy

You might assume that losing the last digit of an IP would make detection unreliable, but my analysis of 3 million random IPs proved otherwise.

I found that hosting infrastructure is remarkably consistent across a network range. If one IP in a block belongs to a data center, it's highly likely that most, if not all, the other IPs in that same block do too. Because of this homogeneity, my consensus method catches 98.5% of hosting IPs with a false positive rate of only 5%.

Even when looking at more challenging signals like VPNs or proxies, the results remained strong. When I ran an aggregate analysis across all privacy signals, the system was still highly accurate at ranking the quality of an IP. In fact, even when including difficult-to-track residential proxy data, the methodology maintained a 93% accuracy rate.

And because I had access to the actual values for all 3 million random IPs, I was able to model the outcome at different threshold values to optimize those thresholds in different ways. Raising or lowering the thresholds allow individual companies to optimize for reducing false positives, even at the expense of increasing false negatives, or vice versa. This flexible approach lets companies manage their risk tolerance and construct a robust scoring model tuned to their specific business requirements.

Building a Better Adtech Ecosystem

By leveraging high-confidence data at the pre-bid stage, adtech companies can lower or eliminate bids on invalid traffic, leading to more accurate reporting and optimized ROI.

This methodology proves that privacy-filtered data doesn't have to mean a loss in fraud detection capability. Whether through API or database downloads, these technical insights allow platforms to know, instantly, if they are bidding on non-human traffic, regardless of whether the IP is truncated.

Estimate Your Invalid Traffic Spend

Use the Traffic Leak Audit to estimate how much of your ad spend may be reaching anonymized infrastructure.

Run the Traffic Leak Audit

Share this article

About the author

Tom Mount

Tom Mount

Tom is a Solutions Engineer at IPinfo, where he helps users integrate and apply IP data effectively. He brings deep technical expertise to customer use cases and ensures that real-world needs inform product development.