NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
Stars
37
Forks
5
Watchers
37
Open Issues
3
Overall repository health assessment
No package.json found
This might not be a Node.js project
52
commits
Merge pull request #6 from moj-analytical-services/removing_files
1f8a58aView on GitHubadd option to copy scraped pdfs to specified folder
db14f44View on GitHubMerge pull request #5 from moj-analytical-services/move_unscraped_file_to_specific_folder
87cac52View on GitHubadd functionality to copy unsuccessfully scraped file to specified folder
6ead86aView on GitHubadd try except for dealing with occasional PDFs failing to parse
f4a47f1View on GitHub