A RAG agent for assisting with literature review related to bioRxiv articles.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
9
commits
feat: increase pyspark session memory, add chunking logic and caching handling for improved efficiency for large scale preprocessing
9289b26View on GitHubfix: add fix for semantic chunking udf keyword bug by using positional args instead of keyword
92f70baView on GitHubfix: add more nuanced newline handling (remove mid-sentence occurrences indicating arbitrary newlines, preserve structural ones for paragraphs etc.)
1e27e15View on GitHubfix: use proper column name for previewing output after adding chunking logic (chunk_text)
5036cfbView on GitHubfeat: add text chunking logic with langchain and udf to preprocessing workflow
f3c83d1View on GitHub