Here is simple ways to extract emails from PDFs, Microsoft Docs or Text Files with examples in Linux or Windows Cygwin:
1. Extract Emails from Microsoft documents (*.doc,*.docx):
- Install tool catdoc using apt-get or yum or download in windows.
- Copy all the docs in a single folder from which you want to extract emails.
- Run the below command inside the folder:
for i in *.doc; do catdoc "$i" | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'; done
2. Extract Emails from PDFs:
- Install tool pdf2txt.py using apt-get or yum or download in windows.
- Copy all the pdfs in a single folder from which you want to extract emails
- Run the below command inside the folder, and emails will be printed in console:
for i in *.pdf; do pdf2txt.py "$i" | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'; done
3. Extract Emails from textfiles:
- Use grep command in linux/cygwin
- Copy all the textfiles in a single folder
- Run the below command and emails will be printed in console:
grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'
Please share your comments below for any further details…