How to extract emails from multiple documents…

Here is simple ways to extract emails from PDFs, Microsoft Docs or Text Files with examples in Linux or Windows Cygwin:

1. Extract Emails from Microsoft documents (*.doc,*.docx):

  • Install tool catdoc using apt-get or yum or download in windows.
  • Copy all the docs in a single folder from which you want to extract emails.
  • Run the below command inside the folder:

for i in *.doc; do catdoc "$i" | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'; done

2. Extract Emails from PDFs:

  • Install tool pdf2txt.py using apt-get or yum or download in windows.
  • Copy all the pdfs in a single folder from which you want to extract emails
  • Run the below command inside the folder, and emails will be printed in console:

for i in *.pdf; do pdf2txt.py "$i" | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'; done

3. Extract Emails from textfiles:

  • Use grep command in linux/cygwin
  • Copy all the textfiles in a single folder
  • Run the below command and emails will be printed in console:

grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}'

Please share your comments below for any further details…

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.