Live · 2026

MAGIBU RESEARCH GROUP / İSTANBUL · TR

Building foundation models for
AI sovereignty.

Data sovereignty is national sovereignty.

Magibu turns Turkish foundation and embedding models into secure, measurable AI systems that run inside your organization. We invite every organization that values its data to join this journey.

embeddingmagibu-200m · open model TR-MMLU · 6,200+ questions Magibu Q3 · live dev.magibu.ai · API

Magibu Q3 · Foundation Model✦TR-MMLU · 6200+ questions✦embeddingmagibu-200m✦Turkish Morphological Tokenizer✦TR-MTEB · 26 datasets✦Open source✦Magibu Q3 · Foundation Model✦TR-MMLU · 6200+ questions✦embeddingmagibu-200m✦Turkish Morphological Tokenizer✦TR-MTEB · 26 datasets✦Open source✦Magibu Q3 · Foundation Model✦TR-MMLU · 6200+ questions✦embeddingmagibu-200m✦Turkish Morphological Tokenizer✦TR-MTEB · 26 datasets✦Open source✦

Token efficiency

Fewer tokens mean lower cost and lower energy.

When Turkish text is represented closer to the structure of the language, the same meaning can travel through fewer pieces. That chain connects directly to inference cost, energy use, and secure systems that can run inside the organization.

Fewer tokens

A tokenizer approach that treats Turkish suffixes and meaning units more carefully.

Lower cost

Less processing for the same text and more predictable API/infrastructure cost.

Lower energy

Shorter context and less compute create a measurable energy advantage.

Data stays inside

Efficient models make on-prem and private cloud deployments more practical.

Try your own sentence

Turkish tokenizer comparison

Open demo ↗

If the demo does not load, open the comparison on Hugging Face.

Three arms, one mission

Where do you want to start?

We run Magibu across three arms: measurable products for institutions, an academy that teaches the work, and the open-source community that produces it.

Enterprise

Enterprise AI

Measure which model performs best on your own data — a measurable path from pilot to production.

Plan a Retrieval Audit →

Academy

Magibu Academy

Training in LLMs, embeddings and AI architecture — contributing to open source from week one.

See cohorts →

Community

Open-Source Community

Contribute to open benchmarks, models and tokenizer work; follow the weekly newsletter.

Join the community →

§ 01Live Demos

Try what we have shipped.

Test our models, tokenizers, and embeddings in live demos. Every link points to an open-source release or a published product.

MODEL

Magibu Q3

Our foundation model for Turkish text generation and understanding. Try it in the live chat interface.

Start chatting

BENCHMARK

TR-MMLU

Open MMLU list with 6200+ questions across 57 domains. Compare Turkish models side by side.

Open full list

TOKENIZER

Turkish Tiktokenizer

Morphological tokenizer built for Turkish. Compare suffix handling and token efficiency live.

Open demo

DEMO

embeddingmagibu-200m

Turkish-first embedding model. Live semantic similarity demo.

Open demo

BENCHMARK

TR-MTEB Scoreboard

Open embedding leaderboard across 26 datasets and 6 tasks. Compare models side by side.

Scoreboard

§ 02Products

Retrieval Platform three layers.

A full stack from embedding to AI system to in-house deployment. Take one layer, or run all three together.

I Live Demo

Magibu
Embed API

An OpenAI-compatible, high-performance embedding API optimized for Turkish and underrepresented languages. Superior semantic representation with long-context support.

api.magibu.ai/embeddings

API Platform ↗

II Pilot

Magibu
Search Kit

Production-ready retrieval infrastructure for AI applications. Automated document ingestion, semantic chunking, vector database connectors, and query evaluation tools.

+ Qdrant · pgvector · Weaviate

III Enterprise

Magibu
Private AI

An isolated AI architecture running fully on-premises or within your private cloud. Domain-adapted search, source-grounded answers, SSO/LDAP integration, and security audit logs.

Docker / Kubernetes / On-Prem

IV Live Demo

Magibu
Q3 Foundation

Our foundation model optimized for Turkish text generation and comprehension. Integrated with enterprise security standards during pilot deployments.

chat.magibu.ai

Chat with Magibu Q3 ↗

§ 03Entry Product

Measure first, then deploy.

Magibu Retrieval Audit is a 2-week measurement package before pilot. We compare models and architectures on your data together.

Magibu Retrieval Audit · 2 weeks

"Let's measure which model works better on your documents; then deploy a secure in-house AI system."

Not a sales pitch - a way of working. You can't be sure of progress in a system you haven't measured; we set direction with metrics like recall, precision, and MRR, not guesswork. Before buying a large system, we measure on your own data. Most organizations skip this step and come back months later. We start here.

Audit2 weeksFixed fee · 1 department

AI Pilot4 weeks1–3 data sources · working demo

Private Deployment8–12 weeksOn-prem · SLA · SSO/audit

Data Sampling and Analysis

We select a representative subset of your documents together. Data stays inside your environment.

User Test Scenarios

30–100 real user questions with expert-labeled correct passages.

Model & Architecture Benchmark

Magibu, OpenAI, Cohere, Voyage, and E5 measured on the same data. Chunking strategies compared side by side.

Comprehensive Metrics Report

recall@5, precision@10, MRR, nDCG@10, and latency. Top 5 wins + 5 critical failures with case studies.

Topology Recommendation

One-page technical and financial rationale for model, chunking, reranker, vector DB, and deployment topology.

Roadmap & Decision

Continue or stop for pilot. Audit delivers value on its own; not required before pilot.

Scientific assurance

Research output turned into product evidence.

Our technology is built on open benchmarks, doctoral research, and peer-reviewed or openly published work. Instead of naming committee members on the homepage, we surface verifiable publications and model outputs.

AI Safety2026

Benchmarking Prompt Injection Attacks on LLMs

Applied Sciences · accepted

Open paper ↗

Benchmark2024

Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

arXiv:2501.00593

Open paper ↗

Tokenization2025

Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark

arXiv:2502.07057

Open paper ↗

Embedding2026

Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

arXiv:2605.29992

Open paper ↗

Medical LLM2025

Healthcare-Focused Turkish Medical LLM

ACM TALLIP

Open paper ↗

§ 04Vision

Three arms, one evidence culture.

Company ships systems inside organizations; Academy teaches the work; Community grows open measurement and open science.

CompanyCommercial

Products
behind your firewall.

We build AI systems for security-sensitive organizations, starting with measurement and moving toward private cloud or on-prem deployment.

01
Retrieval AuditModel and architecture comparison on your own data
02
Retrieval PlatformEmbed API · Search Kit · Private AI
03
Private DeploymentOn-prem / private cloud · SSO · audit logs
04
Enterprise TrainingArchitecture design, data security, and model evaluation

AcademyTraining

Knowledge
turned into practice.

We turn LLM, embedding, RAG, and evaluation expertise into production-oriented training where participants build instead of only watching.

01
Builder-first programsLearning through GitHub and Hugging Face outputs
02
Enterprise trainingAI architecture for technical teams and executives
03
Certificate disciplineVerifiable achievement and portfolio, not attendance only
04
Open-source connectionTraining outputs connect back to community projects

CommunityOpen Source

Open measurement
and open science.

Our community branch grows open model, tokenizer, dataset, and benchmark work for Turkish and low-resource languages.

01
Transparent DevelopmentGitHub Issues + PRs · open contribution flow
02
Open BenchmarkTR-MTEB · TR-MMLU · domain-specific eval kits
03
Models & DataOpen model and dataset work on Hugging Face
04
Community EventsMeetups, hackathons, webinars, and the weekly digest

§ 05Open Source

Built with the community.

Open projects advancing Turkish language technology. Open issues on GitHub, send a PR, or apply to join the team.

Turkish Morphological Tokenizer

Active

A modern tokenizer that splits text into morphological units faithful to Turkish phonetics and can recombine them.

tokenizermorphologyturkishbenchmark

View project →

Language-Native Embeddings

Active

An open methodology compiling methods and steps for anyone to build efficient tokenizers and embedding models for their own language and domain.

embeddingstokenizermethodologymteb

View project →

§ 06Our Partners

Building together.

Organizations and communities we collaborate with. Click a logo to visit their site.

Beta Space Studio

Human-centered AI transformation. Our foundational partner for enterprise Work OS, KVKK-compliant AI programs, academy workshops, and applied legal-sector training — building together in the Anthropic Turkey ambassador ecosystem.

Visit ↗

Matematik ve Yapay Zeka Enstitüsü

Non-profit organization on AI engineering, research and development

Visit ↗

Vivax

Our software and technology partner in healthcare technologies

Visit ↗

Announcements

Latest announcements

All announcements →

Event·July 17, 2026★

Watch the lessons first, then let's go deeper together live

Complete the Magibu Academy video series, submit your questions, then join us on Sunday, July 19 for code, papers, and hands-on exploration until the topics run out.

§ ContactEnterprise

applications and partnerships.

Apply for a Retrieval Audit, API access, investment, or research collaboration. Our team will respond within the shortest possible time.

"Data sovereignty is national sovereignty."

→ dev.magibu.ai · Embedding API
→ TR-MTEB · 26 datasets
→ On-prem / Private AI

✦ ✦ ✦

MMagibu is a research-driven technology group building AI infrastructure for Turkish and underrepresented languages.

Global models treat Turkish as secondary. We treat it as a primary research and product language.

Measurement sits at the foundation of everything: without measuring, you can't be sure progress is heading the right way. The community produces open measurement and science; the company turns that output into secure, scalable products for organizations.

Three arms, one vision: produce evidence, teach the work, and ship evidence as product.

Data sovereignty is national sovereignty. Under this sentence sit published benchmarks, open models, and live demos.

Building foundation models forAI sovereignty.

Fewer tokens mean lower cost and lower energy.

Fewer tokens

Lower cost

Lower energy

Data stays inside

Turkish tokenizer comparison

Where do you want to start?

Enterprise AI

Magibu Academy

Open-Source Community

Try what we have shipped.

Magibu Q3

TR-MMLU

Turkish Tiktokenizer

embeddingmagibu-200m

TR-MTEB Scoreboard

Retrieval Platform three layers.

MagibuEmbed API

MagibuSearch Kit

MagibuPrivate AI

MagibuQ3 Foundation

Measure first, then deploy.

Data Sampling and Analysis

User Test Scenarios

Model & Architecture Benchmark

Comprehensive Metrics Report

Topology Recommendation

Roadmap & Decision

Research output turned into product evidence.

Benchmarking Prompt Injection Attacks on LLMs

Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark

Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

Healthcare-Focused Turkish Medical LLM

Three arms, one evidence culture.

Productsbehind your firewall.

Knowledgeturned into practice.

Open measurementand open science.

Built with the community.

Turkish Morphological Tokenizer

Language-Native Embeddings

Building together.

Beta Space Studio

Matematik ve Yapay Zeka Enstitüsü

Vivax

Latest announcements

Watch the lessons first, then let's go deeper together live

applications and partnerships.

Building foundation models for
AI sovereignty.

Magibu
Embed API

Magibu
Search Kit

Magibu
Private AI

Magibu
Q3 Foundation

Products
behind your firewall.

Knowledge
turned into practice.

Open measurement
and open science.