§ EcosystemOpen Source Ecosystem

A complementary open-source chain.

Large language models produce inefficient tokens in most languages and fail on domain-specific questions. At Magibu we solve this at every link of the chain with open source: from tokenization faithful to a language's morphology, to a methodology for building embeddings in your own language; from high-quality fine-tune data, to tools that connect the model to the outside world.

Language base

Morphological Tokenizer

Split text into morphology-faithful units and recombine them.

↓→

Methodology

Language-Native Embeddings

An open method for building tokenizers + embeddings in your own language.

↓→

Data

Fine-tune Datasets

High-quality open datasets tailored to task and persona.

↓→

Inference tools

LLM Tools

The ability to call the right tool at the right time.

How do I contribute?

01Go to the repo you're interested in on GitHub and review open issues
02Comment on an issue or open a new one
03Fork the repo and create a new branch
04Make your change, test it, and document it
05Open a pull request - explain what and why
06After review, merge - join the contributor list

Open Source Projects

Turkey-focused open R&D. Pick up issues on GitHub or apply to join the team. These contributions are prioritized in future hiring.

Turkish Morphological Tokenizer

Active

A modern tokenizer that splits text into morphological units faithful to Turkish phonetics and can recombine them.

tokenizermorphologyturkishbenchmark

View project →

Language-Native Embeddings

Active

An open methodology compiling methods and steps for anyone to build efficient tokenizers and embedding models for their own language and domain.

embeddingstokenizermethodologymteb

View project →

Fine-tune Datasets

Active

A community project gathering high-quality Turkish/English dialogue datasets in a single format to adapt models to a task, language, or persona.

datasetsfine-tuningturkishhuggingface

View project →

Magibu LLM Tools

Active

Tools and auxiliary systems the model uses at inference time - overcoming model limits by calling the right tool at the right time.

toolstool-callingraginference

View project →

Community

Open science, benchmarks, and community contributions.

Open-Ended Live Stream

A Sunday live stream with no end time: code, questions, papers. Leave a request for the topics you want covered.

Request form →

Magibu AI Weekly

Open-source weekly digest: AI news, papers, models, benchmarks, and underrepresented language updates.

View archive →

GitHub

Benchmark code, eval harness, datasets, and community contributions.

github.com/magibu-ai →