pdf_to_information

Extract information like author and keywords mentioned from pdfs

'input_pdfs' is the input folder for pdfs

'images' is the input folder for images to be saved, when using OCR

'main_func.py' is the main file to be run (string objects to information)

'utils/data_prep.py' is the data prep file (pdf to text) 'utils/data_prep_image.py' is the data prep images file (pdf to text using images contained)

json files load/dump has also been used for integration across modules

output is a '.csv' file containing filename, author names, institute, companies mentioned

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
input_pdfs		input_pdfs
utils		utils
README.md		README.md
__init__.py		__init__.py
main_func.py		main_func.py
requirements.txt		requirements.txt

Provide feedback