Found 1 repositories(showing 1)
matchyc
【年关将至!】Benchmark for evaluating LLMs on Chinese kinship term inference (中文亲属关系). Given a relation chain (e.g., "my father's elder brother"), models must output the correct address term (e.g., 伯父). LLM-as-Judge scoring; supports SiliconFlow, OpenRouter, OpenAI, Gemini.
All 1 repositories loaded