converts binary PDF to JSON and text, for server-side PDF processing and command-line use. Zero dependency.
Stars
2.2k
Forks
390
Watchers
2.2k
Open Issues
74
Overall repository health assessment
^10.0.1^29.0.0^9.1.0^6.1.0^16.0.2^0.4.4^12.1.4^25.3.3^8.46.0^8.46.0^10.0.2^30.2.0^4.52.4^2.6.25.9.3269
commits
9
commits
7
commits
7
commits
6
commits
6
commits
5
commits
4
commits
3
commits
3
commits
fix: add spatial sort in getRawTextContent to ensure reading order (#422)
b0067d7View on GitHubfeat: add support for transparent groups, ensure endGroup would merge sub-canvas text/line/etc. back to primary output data. this completes the fix for #418 (#420)
48b50bfView on GitHubfix: issue #418: resolve obj ref before invoking getAll (#418)
de176e5View on GitHubfix: unify error and exception handling for cli start with invalid in… (#414)
c8b372bView on GitHubmaint: prep major release with version bumps for both self and dev dependencies (#413)
b9d5cb9View on GitHubfix: #355, #361, #319: calculate text block gap and spacewidth from fontMatrix to preserve spaces in both content.txt and json output (#411)
b193d9fView on GitHubfix: #385 [3.3.0 BREAKING CHANGE] removed encodeURIComponent and ensure utf8 extraction and output (#410)
7b05aa9View on GitHubfix: #408: fix text block coordinates, add tests (#409)
5569bf7View on GitHubrefactor: [3.2.2] separate out logger functionality from nodeUtil (#405)
da9e5d3View on GitHubrefactor: add TypeScript types and improve null handling in PDFParser class. Need to address nodeUtil (#403)
36e9fe6View on GitHub