Email harvesting is the automated process of collecting email addresses from the internet without the consent of the people those addresses belong to. It's one of the primary ways spam lists are built, and it happens constantly — running in the background of the web, invisible to most users.
How email harvesting works
Harvesters are automated programs called bots or spiders. They work similarly to search engine crawlers, systematically visiting websites and scanning their content. But instead of indexing pages, they're looking for email addresses.
Whenever they find a string of text that matches the pattern of an email address — anything with an @ symbol and a domain name — they collect it and add it to a list. These lists are then sold or used directly for spam campaigns.
Harvesters collect addresses from a wide variety of sources: websites, social media profiles, forum posts, comment sections, online directories, WHOIS domain registration records, and anywhere else email addresses appear in plain text on the web.
Where your email address might be exposed
Many people expose their email addresses without realizing it. Common places include:
- Personal or business websites with contact pages
- Forum posts and comments where you include your email
- Social media profiles with public contact information
- Domain registration records (WHOIS data), if you own a domain
- Online job boards and professional directories
- Academic papers, public documents, and press releases
Even a single public mention of your email address is enough for a harvesting bot to collect it. The bot doesn't need to find multiple references — one is sufficient.
Common harvesting techniques
Beyond simple web crawling, harvesters use several more sophisticated techniques. Dictionary attacks involve generating thousands of potential email addresses at common domains — like [email protected] — and testing which ones exist. Some use social engineering to trick users into revealing addresses.
More recently, harvesting has expanded to social media APIs, where publicly available profile data is scraped at scale. Even if you don't post your email publicly on social media, your username and associated data can sometimes be used to infer or locate your address.
How to protect yourself
If you need to display an email address publicly on a website, there are techniques to make it harder for bots to collect. These include displaying the address as an image rather than text, encoding it in JavaScript, or writing it in a human-readable but bot-confusing format like "name [at] domain [dot] com."
For online signups and registrations, using a disposable email address keeps your real address out of databases that could be scraped or breached. The disposable address collects any confirmation emails you need, and if it ends up harvested, it doesn't matter.
If you own a domain, consider using WHOIS privacy protection, which most domain registrars offer, to keep your contact email out of the public registration database.