The Spamhaus Project

news

The beta nature of the Threat Intel Community Portal

by The Spamhaus TeamNovember 01, 20234 minutes reading time

Jump to

Introduction

If you haven't noticed, the Threat Intel Community is in beta, and to be honest, it will be for some time - probably until the end of 2024. "Why?" we hear you chorus. In a nutshell, we're all learning together - it's a process of discovering what data you want to send us and understanding what feedback mechanisms motivate you.

What data and how much?

Our detection methods are based on big data (we were doing big data before big data was even a "thing”). We apply machine learning, heuristics, and manual investigations to terabytes of data. For example, we analyze more than nine billion SMTP connections every 24 hours (and that's just the tip of the iceberg). Big data we're used to, smaller datasets, not so much, and neither are our machine learning algorithms. Despite the wonders of AI, the Spamhaus crystal ball couldn't guess what volumes and types of data you, the community, would be sending through until the portal was in production, so we’re in a discovery phase.

Thank you for your feedback

We can't thank our contributors enough for the feedback. Has it all been glorious accolades? No, nor did we expect it to be - we refer you back to the opening paragraph - this is a learning process, and it won't happen overnight.

One of the most significant pieces of constructive feedback we've received is some users' disappointment when their contributions aren't reflected in our datasets. Yes, we hear you. Our researchers would be extremely frustrated if all their rule-writing and manual investigations were to result in zero detections. We acknowledge that enhancements need to be made to our internal processes and machine-learning models. As I type, our researchers are reviewing your data to infer what modifications to make.

Why can't we place your submissions directly into our datasets? There are a couple of reasons: Policy and accuracy. Every DNSBL dataset we publish has a policy associated with it. By policy, we mean the contributing factors that cause an internet identifier to be listed in the dataset. These policies have been carefully crafted in collaboration with the internet community over the past two decades to meet the needs of those who consume the data, and to ensure false positives are kept to a minimum.

To list an internet identifier in our DNSBLs, we must ensure they are consistent with the individual list policy. Therefore, your submissions are reviewed and re-processed to confirm they meet Spamhaus' policies and the individual list criteria. Most of our policies can be found on this website, for example the one for the SBL.

However, as Spamhaus continues its journey deeper into a world of reputation, individual contributor reputation can be automatically evaluated and assigned according to the accuracy of their submissions - and how closely those submissions adhere to list policy and criteria.

We will reach a point where we verify contributors as “trusted” so their submissions can go into reputation-based datasets and threat intelligence feeds more quickly. As we start to review the volume, quality and type of data submitted we will be able to make informed choices as to how best to move forwards with this.

Nonetheless, the current measuring factor that you see in the Threat Intel Portal is based on the number of submissions that match detections, and these detections are governed by policy.

Where policy isn't adhered to, false positives occur

Spamhaus' data protects users across the globe - we're known and trusted for having reliable data. For example, for every domain detection, over 100 rules are applied to that domain, and are sometimes subject to additional, manual investigation.

Likewise, we know that most of our submitters have invested significant time in establishing if the internet identifier is suspicious; however, we also must ensure it won't trigger a false positive, hence the reason we can’t immediately drop your intelligence into our datasets.

But what if submitting feels pointless?

Please be patient. Our machine-learning models, processes, and manual investigators are all learning from the data you submit. But without your data, that discovery stops.

What does the future hold?

There's a plethora of features and functions on the long-term roadmap; we plan to introduce league tables, illustrating the volume of community submission, and more importantly, their quality. We hope to introduce a private, invite-only communication channel on a platform of the community's choosing, for those serious about increasing their threat-hunting skills. The list goes on.

But first and foremost, we need to ensure that you are getting value out of your contributions, and to do this, we need to keep receiving them. Thanks for your patience on this journey. We appreciate it. And please keep the feedback - and the data - coming.