Search Results

Found 4,502 repositories(showing 30)

easy-dataset

ConardLi

💚98

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

13.9k

1.4k

NOASSERTION

JavaScript

Updated 4 hours ago

datasetfine-tuningjavascript+2

LLM-eval-survey

MLGroupJLU

🧡68

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1.6k

Updated 1 day ago

benchmarkevaluationlarge-language-models+3

Awesome-LLMs-Evaluation-Papers

tjunlp-lab

🧡66

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

799

Updated 5 days ago

Awesome-LLM-Eval

onejune2018

💛71

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

630

MIT

Updated 1 day ago

awsome-listawsome-listsbenchmark+16

code-eval

abacaj

🧡61

Run evaluation on LLMs using human-eval benchmark

430

MIT

Python

Updated 1 week ago

humanevalwizardcoder

awesome-llm-human-preference-datasets

glgh

🧡66

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

390

MIT

Updated 3 days ago

awesome-listdatasetseval+5

claw-eval

🧡61

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

363

Python

Updated 3 minutes ago

agentharnessllm+1

haven

redotvideo

🧡65

LLM fine-tuning and eval

347

Apache-2.0

TypeScript

Updated 4 days ago

LLM-Evaluation

rajshah4

❤️41

Sample notebooks and prompts for LLM evaluation

161

Jupyter Notebook

Updated 1 month ago

llm-jp-eval

llm-jp

❤️46

No description available

150

Apache-2.0

Python

Updated 1 week ago

Evaluation-Multimodal-LLMs-Survey

swordlidev

🧡65

A Survey on Benchmarks of Multimodal Large Language Models

150

Updated 6 days ago

LLMScenarioEval

Turing-Project

❤️40

Scenario-based Evaluation dataset for LLM (beta)

136

Apache-2.0

Updated 3 months ago

41-llms-evaluated-on-19-benchmarks

jayminban

🧡55

This project benchmarks 41 open-source large language models across 19 evaluation tasks using the lm-evaluation-harness library.

Jupyter Notebook

Updated 1 week ago

serbian-llm-eval

gordicaleksa

💛70

Serbian LLM Eval.

NOASSERTION

Python

Updated 3 days ago

bosniancroatianeval+2

LLM-Agent-Evaluation-Survey

Asaf-Yehudai

🧡55

Top papers related to LLM-based agent evaluation

MIT

Updated 3 weeks ago

llm-eval

justplus

🧡65

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

MIT

Python

Updated 1 day ago

evalscopeevaluationlarge-language-models+2

simple-llm-eval

cyberark

❤️45

Simple LLM Evaluation Using LLM As a Judge 👩‍⚖️

Apache-2.0

Python

Updated 3 weeks ago

llm-eval-simple

grigio

💛70

llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection

MIT

Python

Updated 2 days ago

llmllm-evalllm-evaluation-benchmark

LLMSecEval

tuhh-softsec

❤️45

No description available

CodeQL

Updated 4 weeks ago

llm-planning-eval

OSU-NLP-Group

❤️35

[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"

Python

Updated 1 year ago

language-agentlarge-language-modelsmathematical-reasoning+4

grpo-llm-evaluator

mkurman

🧡60

Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluations.

Apache-2.0

Python

Updated 1 week ago

h2o-LLM-eval

h2oai

❤️35

Large-language Model Evaluation framework with Elo Leaderboard and A-B testing

Apache-2.0

Jupyter Notebook

Updated 1 year ago

vectorboard

VectorBoard

❤️40

Open Source Embeddings Optimisation and Eval Framework for RAG/LLM Applications. Documentations at https://docs.vectorboard.ai/introduction

NOASSERTION

Python

Updated 4 months ago

embedding-evaluationembeddingseval+4

LLMServingPerfEvaluator

friendliai

❤️25

No description available

Apache-2.0

Python

Updated 5 months ago

llm-evaluation-methodology

aws-samples

❤️35

No description available

MIT-0

Python

Updated 2 months ago

evaluate-llm-on-korean-dataset

daekeun-ml

🧡50

Performs benchmarking on two Korean datasets with minimal time and effort.

MIT

Python

Updated 1 month ago

llm-fhir-eval

flexpa

❤️15

Benchmarking Large Language Models for FHIR

TypeScript

Updated 4 months ago

evalsfhirfhir-llm+4

llm_evaluation_for_gene_set_interpretation

idekerlab

❤️40

Code space for 'Evaluation of large language models for discovery of gene set function'

MIT

Jupyter Notebook

Updated 3 months ago

llm_evaluation_4_mmlu

percent4

❤️45

Using LLM to evaluate MMLU dataset.

Python

Updated 2 months ago

llm-jp-eval-mm

llm-jp

💛70

A lightweight framework for evaluating visual-language models.

Apache-2.0

Python

Updated 6 hours ago

GitHub Explorer

Search Results

easy-dataset

LLM-eval-survey

Awesome-LLMs-Evaluation-Papers

Awesome-LLM-Eval

code-eval

awesome-llm-human-preference-datasets

claw-eval

haven

LLM-Evaluation

llm-jp-eval

Evaluation-Multimodal-LLMs-Survey

LLMScenarioEval

41-llms-evaluated-on-19-benchmarks

serbian-llm-eval

LLM-Agent-Evaluation-Survey

llm-eval

simple-llm-eval

llm-eval-simple

LLMSecEval

llm-planning-eval

grpo-llm-evaluator

h2o-LLM-eval

vectorboard

LLMServingPerfEvaluator

llm-evaluation-methodology

evaluate-llm-on-korean-dataset

llm-fhir-eval

llm_evaluation_for_gene_set_interpretation

llm_evaluation_4_mmlu

llm-jp-eval-mm

easy-dataset

LLM-eval-survey

Awesome-LLMs-Evaluation-Papers

Awesome-LLM-Eval

code-eval

awesome-llm-human-preference-datasets

claw-eval

haven

LLM-Evaluation

llm-jp-eval

Evaluation-Multimodal-LLMs-Survey

LLMScenarioEval

41-llms-evaluated-on-19-benchmarks

serbian-llm-eval

LLM-Agent-Evaluation-Survey

llm-eval

simple-llm-eval

llm-eval-simple

LLMSecEval

llm-planning-eval

grpo-llm-evaluator

h2o-LLM-eval

vectorboard

LLMServingPerfEvaluator

llm-evaluation-methodology

evaluate-llm-on-korean-dataset

llm-fhir-eval

llm_evaluation_for_gene_set_interpretation

llm_evaluation_4_mmlu

llm-jp-eval-mm