Hi @
MrT69,
May I know tat those blacklist or whitelist that send to you, how do u filter these lists?
For example: If you whitelist based on the given list, do you whitelist the entire website (including those ads/privacy/malware within it?) or whitelist the webpage only which allow it to be surfed through? same as blacklist..
Sorry for noob question...
This is in the meantime a complex thing. I hope that I can explain it well.
The very fist start of this Project was at 2003. I started with hosts files and I started to manually filter out the wrong things. During the time the whitelist increased in the meantime up to a few Gigabyte.
The first thing I do is to import the list into my Database server. Than I will compare them and write the rest into a separate Database. I will extract this and double check it manually. I have a few domains within my head and check if there are still addresses which needs to be blocked.
The same thing is with whitelisting. I process them via my blacklist and check them again if there is a error inside. The problem is that I need to check them manually because I have only limited API access to a few providers.
I need to filter out strange and unknown addresses and e.g. check them by Virus Total. I would like to have it full automated but there unlimited API will cost 6.700 US$ per Month. I already have invented an algorithm for a full check but in addition I would need some high speed Internet Lines, a bunch of Servers and a license for an Enterprise Database. In summary to clean up the entire Internet I have calculated the costs. For this toy I expect round about 1.3 Million EUR per year.
At the moment I'm doing only an DNS check. I take the address and check it against Google, Cloudflare and Level3 DNS Servers. If it's a broken CSS or if some Videos are not working than I need to do this manually. That's sometimes a few minutes job but sometimes I need to do a deeper look. Especially when they believes they need to use dynamic Server addresses or tricky Javascript. That's not my favorite language and sometimes this Scripts drives me really crazy.
In addition I can only use public and licensed free sources. At the moment I have round about 40 Million addresses on the Blacklist. If I could use public sources and my own development I'm far over 100 Million entries. In addition is my IP Adress list. This list has extracted round about 6GB.
I can do a lot of more things. But I'm technical limited, I'm financially limited and I'm limited by time because I have a Job, Family and sometimes I need to take a sleep ?
If I get some list I will check them against my DNS. If I see they will be blocked than I remove it. If it happens that they are still blocked than I need to identify the Load Blancer. But because of the huge feedback I received actually I can't do a manually check of each site. That's the reason why it's sometimes a need to send the domain again.
Sometimes a few user set the hash tag at the end and set a comment e.g. Video or CSS is not working. This is really helfull.
Is this the answer you expect or do you want to know more details?