← Community
ActiveVeri (JSON/Parquet)
Fine-tune Datasets
A community project gathering high-quality Turkish/English dialogue datasets in a single format to adapt models to a task, language, or persona.
datasetsfine-tuningturkishhuggingfacecommunity
Vision
Build, with the community, the high-quality, openly licensed fine-tune datasets Turkish language models need to be genuinely useful. Each contributor publishes from their own HuggingFace profile; this repo serves as the standard and index.
Categories
Identity (✅ done), Tool Call, Conversation, Instruction, Structured Output, Math, Coding (📋 open to contribution). Target: 100+ TR + 100+ EN examples per category.
Contribution areas
- Produce 100+ Turkish + 100+ English examples in any category
- Extend existing datasets or run quality control
- Contribute fine-tune notebooks and scripts
- Build data quality validation tools
Tech stack: Python · HuggingFace Datasets · Parquet · License: CC BY 4.0
Resources & links
I want to join this project
Verify your Google account, fill out the form, then pick a task from the GitHub issue list to get started.