Written on Saturday, May 05, 2007 by Gemini
Since its inception, the front page of Project Honey Pot has promised that we will "Help Stop Spammers Before They Even Get Your Email Address." Today we make good on that promise. Introducing http:BL, a service that allows you to use the data generated by Project Honey Pot in order to keep malicious web robots off your website.
For example, a web administrator could use the http:BL date to establish rules that automatically block known comment spammers, harvesters, and other suspicious visitors from accessing your site and using your bandwidth. The data is provided through the existing DNS system in order to be extremely fast, highly redundant, and very reliable.
The basic http:BL service is free to active members of Project Honey Pot. Users of the Apache 2.0 Web Server can begin taking advantage of the service today using a powerful module that is built directly into the Apache framework. The module is currently in an early beta and Project Honey Pot members can sign up to help with its testing.
In addition, we have published an API outlining the http:BL specifications. We hope and expect more software to be written to take advantage of the http:BL service. If you have an idea, please contact us to talk about how we can make it work.
1. Is http:BL a traditional DNSBL?
No. Traditionally, DNSBLs track the IP addresses of those computers sending spam. Mail servers query them and reject known spam-sending servers. We considered including our information on spam servers into some sort of DNSBL. For a number of reasons, we decided that we would not do so at this time. If you run a mail server and are looking for a DNSBL, we would suggest those run by Spamhaus or SURBL.
2. If http:BL is not a DNSBL for mail servers, what exactly is it?
Http:BL is similar to a DNSBL but for web traffic rather than mail traffic. This data can be used in order to stop malicious robots from accessing your web pages. For example, the http:BL system can let you know that a visitor to your website is likely to be a comment spammer. With that information, you can choose block the visitor from even accessing your site. If the bad guys can't get on, they can't do damage.
3. How much of a benefit does http:BL provide?
So far the results are stunning. In our internal tests, over the course of several months, email addresses on web pages that are protected by http:BL receive nearly 70% less spam than those not protected. This is a reduction in spam sent to the mail server, not merely a reduction in spam delivered to the end user. This means a real reduction in load on your spam servers. The decrease in spam load also comes with zero risk of false positives. Over the coming months, we will conduct more extensive tests and report on what we find. Please do not hesitate to share your results with us.
4. How can I take advantage of http:BL to protect my web server?
We have authored an Apache module which automatically queries http:BL when visitors access your site. The module is extremely flexible, allowing you to set rules for different types of visitors. For example, you could choose settings that would block search engines from accessing images on your site, you could eliminate email addresses from the HTML returned to harvesters, and block comment spammers from accessing any pages at all. The module is still in development, so we are working on finishing up all these features, but we expect to have them ready for public consumption very soon. Until then, please consider signing up for the beta and helping us test.
5. I don't run Apache or can't install a module on my web server, is there some other way to take advantage of http:BL?
We have published an API that allows other to create services that rely on the http:BL data. Have an idea, drop us a line and we're happy to brain storm with you.
6. How fast is http:BL?
Http:BL is built on top of the DNS infrastructure in order to be extremely fast, highly redundant, and very reliable. The http:BL Apache module has been running on the Project Honey Pot site for some time now. Our site receives a fairly high level of traffic. We have not seen any noticable slow down in the site or significant increase in processing resources. In fact, overall the decrease in traffic generated by malicious web robots seems to have modestly decreased the load on our web servers.
7. I have honey pots on my site, won't http:BL make them ineffective?
It depends on how you implement http:BL. We suggest that you not block traffic to your installed honey pots. There are many ways to accomplish this. For example, the Apache module will allow you to create a virtual honey pot and even route malicious robots directly to it when they try and visit your site.
8. Can anyone access http:BL?
In order to access http:BL you need to have an access key. Access keys are granted to active members of Project Honey Pot. The first step in getting an access key is to create an account.
9. Is there any cost to accessing http:BL?
We intend for http:BL to always be a free service for the majority of users. We may decide at a later date to charge a nominal fee for extremely high traffic users or users that require support. If you know that your site will create a large amount of traffic to http:BL, please let us know so we can make sure we best accommodate you.
10. Can I download a copy of the zone file in order to run http:BL from my local DNS server?
We support this for some high-traffic websites. Please contact us if you run a high-traffic website and are interested in doing this.
11. I work for a firewall/router/load balancer/hosting provider/ISP/or whatever company and we would like to build http:BL's protection into our product.
That's great! Stopping malicious web robots at the gateway is a perfect use of http:BL. Please contact us if you are interested in doing this.
12. I have data that I think would be useful to the users of http:BL.
There are a ton of sources of data on malicious web robots. We expect that, over time, we will incorporate as much of it as possible into the http:BL service. If you have access to data on bad bots or other suspicious IPs, please contact us so we can discuss how best we can work together.