← Community
ActiveVeri (JSON/Parquet)

Fine-tune Datasets

A community project gathering high-quality Turkish/English dialogue datasets in a single format to adapt models to a task, language, or persona.

datasetsfine-tuningturkishhuggingfacecommunity

Vision

Build, with the community, the high-quality, openly licensed fine-tune datasets Turkish language models need to be genuinely useful. Each contributor publishes from their own HuggingFace profile; this repo serves as the standard and index.

Categories

Identity (✅ done), Tool Call, Conversation, Instruction, Structured Output, Math, Coding (📋 open to contribution). Target: 100+ TR + 100+ EN examples per category.

Contribution areas

  • Produce 100+ Turkish + 100+ English examples in any category
  • Extend existing datasets or run quality control
  • Contribute fine-tune notebooks and scripts
  • Build data quality validation tools

Tech stack: Python · HuggingFace Datasets · Parquet · License: CC BY 4.0

I want to join this project

Verify your Google account, fill out the form, then pick a task from the GitHub issue list to get started.

Enterprise pilot, API access, investment, and partnership requests require a verified Google account.

Checking session…