The Spamhaus Project

blog

You can't buy data hygiene

by Laura AtkinsApril 23, 20218 minutes reading time

Jump to

Introduction

Thank you to our guest author, Laura Atkins, for this blog post. Laura is a founding partner of the anti-spam consultancy and software firm Word to the Wise. With years of experience in tracing Internet abuse, Laura is a recognized leader in the anti-spam arena.

Over the past few years, dozens of data hygiene services have come on the market. But are these services really delivering? Sadly, in most cases, the answer is no. For many companies using data hygiene services is a waste of money because good data hygiene starts at home.

What are data hygiene services?

The philosophy of data hygiene is simple. Addresses that bounce are bad for delivery. Therefore, remove all these addresses from a list, and all delivery problems will fade away.

The selling point of data hygiene services is equally simple. Customers hand over a list of email addresses. The hygiene company then removes any bad data, i.e., undeliverable (bouncing) email addresses, and returns the list to the customer before being used in an email campaign. Many services also claim that they hold lists of spam traps and complainers, i.e., lists of people who continually complain about mail (often called ‘flamers’ lists).

Suppose you attempt to reduce deliverability issues down into a single simple problem, i.e., sending too much mail to non-existent addresses. In that case, you may be able to fool yourself into thinking you can apply an equally simple solution. The reality is that deliverability is not a simple problem, and a single simple solution isn’t going to fix it.

Spammers invented data hygiene

In the late 1990s and early 2000s, one way spammers harvested addresses was via ‘dictionary attacks’ on Internet Service Providers (ISPs). A dictionary attack was simple. Spammers would open a connection to a mail server and attempt to send mail to an email address. Once the mail server responded that the address was valid (or not valid), they would close the connection without ever sending mail.

The name ‘dictionary attack’ derived from the fact that the spammers often made up email addresses using words from the dictionary and tested them alphabetically. ISPs quickly developed defenses against these kinds of attacks, including blocking any IP that tried to send mail to multiple non-existent addresses.

The first data hygiene companies adopted similar technology to that of the spammers but renamed it “SMTP address verification.” However, instead of using random words from the dictionary, they used their customers’ email address lists. Many services added a second transaction with a long and likely unused address to identify servers that accepted mail to any address.

ISPs and other receivers have long treated dictionary attacks and SMTP verification as abusive. The original dictionary attacks revealed customer data and caused mail delivery failures for other senders. This type of traffic is routinely blocked, forcing many data hygiene companies to rotate IP addresses and domains to continue providing services.

Customers of these companies typically are naive to the fact that they are not only participating in the abuse, but they are also paying to do so.

The abuse continues

As I write this, I can hear some readers saying, “Well, most of the data hygiene companies no longer use SMTP verification. They’ve moved away from that.” There’s truth in this statement. As a result of the blocks and filters ISPs have made, it is increasingly difficult for SMTP verification to work. I know of one large mailbox provider who re-engineered their system to thwart SMTP verification!

As ISPs’ defenses improved, the data hygiene providers pivoted to a new way to verify addresses. This didn’t stop the abuse; it just changed the target. Now, instead of abusing ISP resources, the data hygiene companies are collecting data they shouldn’t have access to:

  • Some data hygiene companies are paying Email Service Providers (ESPs) for customer bounce, open, and click data.
  • Other data hygiene companies keep enormous consumer behavior databases (potentially violating various privacy laws, including GDPR.)
  • At least one data hygiene company is run by a spammer who uses their database of harvested and spammed addresses to compare against their customer data.

Many data hygiene companies abuse third-party resources for profit. The worst part is, all this abuse is happening without customers getting the service they are paying for.

The fundamental flaw of data hygiene services

One of the measurable metrics of good deliverability is a low bounce rate. The reason why senders with good deliverability have a low bounce rate is due to their initial data collection practices. They collect addresses from people who want to receive mail from them and build in email verification into their marketing processes, in addition to further data checks. Their mail is wanted, and users engage with it, and thus their mail has a good reputation.

Data hygiene services assume deliverability is only the result of good metrics. If they can artificially manipulate a list into having a low bounce rate, then that list will perform as well as any carefully managed list.

This is the fundamental flaw in many data hygiene services. They manipulate the statistics around a list without addressing any of the issues that make a list perform poorly in the first place.

Remember, typos and errors happen on both sides of the @ sign. However, data hygiene companies can’t find mistakes that direct mail to an innocent third party. These third parties then report the email, which contributes to poor reputation. The sender is left confused and doesn't understand why; they consider their data to be 'clean,' having used one of these services.

Dishonesty at the core of the business model

Sadly, many data hygiene companies are not transparent about their product and its limitations. They've successfully convinced marketers that removing bounces via their service is the only way to maintain a list, and some senders are falling for it.

Certain senders go back every few months to have a hygiene company clean their list. This is totally unnecessary. A list is clean if it is actively being mailed and bounced email addresses are removed. There's no reason to pay a third party for an inferior service.

One of the most misleading claims by data hygiene companies I’ve witnessed is their ability to remove spam traps. Often their claims are exaggerated. Want some examples?

  • A few years ago, a new hygiene company wanted me to endorse their product, and part of their pitch was their ability to remove spam traps. When pressed, they finally admitted they had only found approximately 100 different trap addresses.
  • A client came to me to assist with a Spamhaus Blocklist (SBL) listing. I was their second stop; their first had been a data hygiene company. The client revealed that the hygiene company said they had more Spamhaus traps than any previous customer. Meanwhile, Spamhaus explained that they observed no difference in volumes between pre and post-cleaning.
  • One hygiene company tells potential and current customers of their "close working relationship with Spamhaus" and discloses their ability to remove Spamhaus traps. Unfortunately, this is untrue. Spamhaus doesn’t share its trap feeds with any third parties.

You can do data hygiene without expensive services

None of this is to say data hygiene is terrible. In fact, it's a vital and necessary part of database maintenance. The good news is, most senders never need to pay a service to clean their lists. Senders who mail regularly to engaged contacts and remove bounces are doing better hygiene than any 3rd party. Any organization that uses an app or has a website with logins will have far more insight into their customers than any 3rd party.

As previously mentioned, successful senders incorporate data checking and verification into their address collection process. In some cases, they use a data hygiene service to correct typos before the consumer hits submit. This checking can minimize mis-typed addresses before any email is sent.

Confirmed opt-in is also an excellent way to ensure accurate data collection. The good news here is, consumers are used to confirming their email addresses for access to services. Billions of people have social media accounts and had to verify their address to access the full range of features.

Conclusion

Data hygiene is essential for good delivery. But data hygiene is more than just removing addresses that bounce. Senders need to focus less on hitting specific metrics. Metrics don’t make the email program, happy recipients do, and happy recipients start with the address collection process.

Senders that focus on obtaining explicit permission and verify their recipients are providing accurate data will have more successful marketing programs. There is no need to pay for expensive third-party services to clean data because the initial data is already clean.