A C++ implementation of tinyllama inference on CPU.
Stars
14
Forks
1
Watchers
Open Issues
Overall repository health assessment
No package.json found
This might not be a Node.js project
User
26
commits
Remove -ffast-math option.
bec061d
Bug fixes
cf1dc2b
Bug fix.
c10cd3c
Remove unportable glibc_unlikely macro.
097826b
Add colab link in README.
ab0068f
Bug fixes.
ec145a0
8c8f40e
Add tinyllama origin links.
9a195c5
Add AVX q4_q8 dot product implementation.
1f5fd38
Add 4-Bit support.
c17ff40
69623e5
Refactor and merge FP16 and Q8 ops.
1084e4e
4255673
Improved Quantisation block layout structure on disk and memory
1108fdc
Refactor ops.
a267e16