The European Marketplace
for Language Data
Share. Connect. Benefit.
Premium multilingual EU datasets for AI training. 24 official languages.
Full compliance with EU laws and European values.
For AI Companies
Train multilingual models on verified, legally-compliant EU data. Deontic annotations tell you what law requires, permits, or prohibits.
For Enterprises
Real-time feeds on EU legislation and procurement. Never miss a regulatory change or tender opportunity.
For Researchers
Sentence-aligned parallel corpora, 4-star verified terminology, and structured knowledge graphs for NLP research.
Data Products
Live subscriptions for real-time compliance. One-time datasets for AI training.
EUR-Lex Legal Feed
EURLEX24
EU legislation with deontic classification. Know what law requires, permits, or prohibits.
- 160,000+ documents across 21 domains
- 24 EU official languages
- 4× daily updates
- Deontic modality: prohibition, obligation, permission, freedom
TED Procurement Feed
TED24
EU public procurement tenders with opportunity scoring. €700B+ annual EU procurement market.
- 500,000+ tenders per year
- 24 EU official languages
- Real-time updates
- Opportunity score: AI-ranked relevance
IATE Terminology
IATE24
4-star verified EU terminology. Gold standard for translation AI training.
- 4.4 million term entries
- 24 EU official languages
- 4-star EU verified quality
- Essential for multilingual NMT
DGT Translation Memory
DGT24
Official EU translations from the European Commission. Highest-quality human translations.
- 10+ million aligned segments
- 24 EU official languages
- Human-translated by EU professionals
- Gold standard for translation models
JRC-Acquis Corpus
JRC22
Sentence-aligned EU legislation. The benchmark for MT research.
- 8+ million aligned segments
- 22 EU languages (pre-Croatian)
- Sentence-aligned pairs
- Research-grade quality
Wikidata EU
WIKI24
Structured knowledge graph with EU language labels. Knowledge grounding for LLMs.
- 10+ million entities
- 24 EU language labels
- Structured relations
- Entity linking for NER
Pricing
Bloomberg model: premium data for AI companies and enterprises.
Per-Domain Subscriptions
Subscribe to the EuroVoc domains that matter to your business
| Tier | Domains | Annual |
|---|---|---|
| Premium | Finance, Law, European Union | €29,990/domain |
| High Value | Energy, Environment, Trade, Competition | €19,990/domain |
| Standard | Transport, Agriculture, Employment, Industry | €9,990/domain |
| Specialized | Economics, Science, Education, Politics | €4,990/domain |
| All 21 Domains | Complete EUR-Lex coverage | €99,990 |
Bundle discounts: 3 domains (15% off) · 5 domains (25% off) · 10 domains (35% off)
Training Corpora
IATE, DGT, JRC-Acquis, Wikidata EU
| Tier | Content | License | Price |
|---|---|---|---|
| Evaluation | Sample subset | 30-day trial | €0 |
| Starter | Subset + 3 domains | Internal use | €4,999–14,999 |
| Complete | Full corpus | AI training | €24,999–49,999 |
| Enterprise | Full + updates | Unlimited | €49,999–99,999 |
Enterprise Bundles
Complete packages for AI companies
| Bundle | Includes | Price |
|---|---|---|
| Translation AI Starter | IATE + DGT + JRC (starter tiers) | €24,999 |
| Translation AI Complete | All 4 training corpora (complete) | €119,999 |
| Legal AI Enterprise | EURLEX training + IATE + WIKI | €299,999 |
| Complete EU Language Data | All 6 products, Enterprise tier, annual updates | €499,999 |
Free Samples
Try before you buy. Evaluate quality and format with sample datasets.
EUR-Lex: Chips Act
EU Chips Act (Regulation 2023/1781) in 24 languages with deontic classification.
Download (2 MB)LDS Connector
Official participant in the European Language Data Space.
Secure B2B Data Exchange
Pauhu is an approved participant in the EU Language Data Space. Our connector enables secure data exchange with full audit trail, commercial transactions, and regulatory compliance.
- Endpoint: connector.pauhu.eu
- Protocol: LDS v2.0.0
- Location: Helsinki, Finland (EU jurisdiction)
- Central Catalogue: Products visible in LDS public catalogue
Query Catalogue
curl -X POST \
-H "Content-Type: application/json" \
https://connector.pauhu.eu/api/v2/catalog/request \
-d '{
"@context": {
"edc": "https://w3id.org/edc/v0.0.1/ns/"
}
}'
Ready to access EU language data?
Download free samples or contact us for enterprise licensing and LDS connector access.