The idea of having them send an e-mail to an address containing their IP is clever, however you need to authenticate that the person who sent the e-mail is either somebody who queried your site, or somebody that got the address from somebody who queried your site or else you could just figure out how to generate that base64 yourself and impersonate somebody else’s IP address which could have catastrophic results if you then fed these IPs into something like a block list and suddenly you’ve blocked Microsoft/Office 365. To be fair, I doubt anybody is going to try and reverse engineer one person’s code to then figure out how to impersonate who sent spam, but if this became a widely distributed program you could just pull off Github then it would be more concerning.
A couple ways to solve this:
Sign the information before encoding it in Base64 so you can verify it came from your site and wasn’t just spoofed. This has the upside of being stateless since you don’t need to keep a record of every e-mail you’ve generated but comes with the disadvantage of spending CPU time signing the text which could be exploited as a DDoS.
Spit out a random e-mail address and record which e-mail address was given to each IP. Presumably you wouldn’t hold on to this list forever since IPs change owners frequently and so an IP that was malicious 1 month ago could be used by a completely different person now and so you can trim this list down once a month to avoid wasting disk space. You’d probably also want to keep some amount of these requests in memory (maybe 10Mb or so) to avoid ruining your IOPS.
All this said, I think your time is better spent with the using unique e-mail aliases as the author suggested but with 2 changes: 1) use aliases which are not guessable to prevent somebody from making it look like somebody else was hacked (e.g. me+googlecom@ gets compromised, but the spammer catches on and sends from me+microsoftcom@ instead to throw off the scent) and 2) don’t use me+chickenjockey@, use chickenjockey@ or else the spammer can just strip “+chickenjockey” from the address to get the real e-mail address.
Spit out a random e-mail address and record which e-mail address was given to each IP.
The author mentions it’s a violation of GDPR to record visitors’ IP addresses. I’m not sure that’s correct, but even so, it could be possible to make a custom encoding of literally every ipv4 address through some kind of lookup table with 256 entries, and just string together 4 of those random words to represent the entire 32-bit address space, such that “correct horse battery staple” corresponds to 192.168.1.100 or whatever.
The idea of having them send an e-mail to an address containing their IP is clever, however you need to authenticate that the person who sent the e-mail is either somebody who queried your site, or somebody that got the address from somebody who queried your site or else you could just figure out how to generate that base64 yourself and impersonate somebody else’s IP address which could have catastrophic results if you then fed these IPs into something like a block list and suddenly you’ve blocked Microsoft/Office 365. To be fair, I doubt anybody is going to try and reverse engineer one person’s code to then figure out how to impersonate who sent spam, but if this became a widely distributed program you could just pull off Github then it would be more concerning.
A couple ways to solve this:
All this said, I think your time is better spent with the using unique e-mail aliases as the author suggested but with 2 changes: 1) use aliases which are not guessable to prevent somebody from making it look like somebody else was hacked (e.g. me+googlecom@ gets compromised, but the spammer catches on and sends from me+microsoftcom@ instead to throw off the scent) and 2) don’t use me+chickenjockey@, use chickenjockey@ or else the spammer can just strip “+chickenjockey” from the address to get the real e-mail address.
The author mentions it’s a violation of GDPR to record visitors’ IP addresses. I’m not sure that’s correct, but even so, it could be possible to make a custom encoding of literally every ipv4 address through some kind of lookup table with 256 entries, and just string together 4 of those random words to represent the entire 32-bit address space, such that “correct horse battery staple” corresponds to 192.168.1.100 or whatever.