Language Data Space Participant

The European Marketplace
for Language Data

Share. Connect. Benefit.
Premium multilingual EU datasets for AI training. 24 official languages.
Full compliance with EU laws and European values.

160K+ Legal documents
4.4M Terminology entries
24 EU languages
6 Data products

For AI Companies

Train multilingual models on verified, legally-compliant EU data. Deontic annotations tell you what law requires, permits, or prohibits.

For Enterprises

Real-time feeds on EU legislation and procurement. Never miss a regulatory change or tender opportunity.

For Researchers

Sentence-aligned parallel corpora, 4-star verified terminology, and structured knowledge graphs for NLP research.

Data Products

Live subscriptions for real-time compliance. One-time datasets for AI training.

Live Subscription

EUR-Lex Legal Feed

EURLEX24

EU legislation with deontic classification. Know what law requires, permits, or prohibits.

  • 160,000+ documents across 21 domains
  • 24 EU official languages
  • 4× daily updates
  • Deontic modality: prohibition, obligation, permission, freedom
From €4,999/mo Enterprise: €9,999/mo
Download Sample
Live Subscription

TED Procurement Feed

TED24

EU public procurement tenders with opportunity scoring. €700B+ annual EU procurement market.

  • 500,000+ tenders per year
  • 24 EU official languages
  • Real-time updates
  • Opportunity score: AI-ranked relevance
From €2,999/mo Enterprise: €7,999/mo
Download Sample
Training Corpus

IATE Terminology

IATE24

4-star verified EU terminology. Gold standard for translation AI training.

  • 4.4 million term entries
  • 24 EU official languages
  • 4-star EU verified quality
  • Essential for multilingual NMT
From €4,999 Complete: €24,999
Download Sample
Training Corpus

DGT Translation Memory

DGT24

Official EU translations from the European Commission. Highest-quality human translations.

  • 10+ million aligned segments
  • 24 EU official languages
  • Human-translated by EU professionals
  • Gold standard for translation models
From €9,999 Complete: €49,999
Download Sample
Training Corpus

JRC-Acquis Corpus

JRC22

Sentence-aligned EU legislation. The benchmark for MT research.

  • 8+ million aligned segments
  • 22 EU languages (pre-Croatian)
  • Sentence-aligned pairs
  • Research-grade quality
From €14,999 Commercial: €49,999
Download Sample
Training Corpus

Wikidata EU

WIKI24

Structured knowledge graph with EU language labels. Knowledge grounding for LLMs.

  • 10+ million entities
  • 24 EU language labels
  • Structured relations
  • Entity linking for NER
From €29,999 Enterprise: €49,999
Download Sample

Pricing

Bloomberg model: premium data for AI companies and enterprises.

Per-Domain Subscriptions

Subscribe to the EuroVoc domains that matter to your business

Tier Domains Annual
Premium Finance, Law, European Union €29,990/domain
High Value Energy, Environment, Trade, Competition €19,990/domain
Standard Transport, Agriculture, Employment, Industry €9,990/domain
Specialized Economics, Science, Education, Politics €4,990/domain
All 21 Domains Complete EUR-Lex coverage €99,990

Bundle discounts: 3 domains (15% off) · 5 domains (25% off) · 10 domains (35% off)

Training Corpora

IATE, DGT, JRC-Acquis, Wikidata EU

Tier Content License Price
Evaluation Sample subset 30-day trial €0
Starter Subset + 3 domains Internal use €4,999–14,999
Complete Full corpus AI training €24,999–49,999
Enterprise Full + updates Unlimited €49,999–99,999

Enterprise Bundles

Complete packages for AI companies

Bundle Includes Price
Translation AI Starter IATE + DGT + JRC (starter tiers) €24,999
Translation AI Complete All 4 training corpora (complete) €119,999
Legal AI Enterprise EURLEX training + IATE + WIKI €299,999
Complete EU Language Data All 6 products, Enterprise tier, annual updates €499,999

Free Samples

Try before you buy. Evaluate quality and format with sample datasets.

EUR-Lex: Chips Act

EU Chips Act (Regulation 2023/1781) in 24 languages with deontic classification.

Download (2 MB)

TED: Sample Tenders

100 recent public procurement tenders with opportunity scores.

Download (1 MB)

IATE: 1,000 Terms

High-frequency EU terminology with 4-star quality ratings.

Download (500 KB)

DGT: 10K Segments

Translation memory sample from European Commission translations.

Download (3 MB)

LDS Connector

Official participant in the European Language Data Space.

Secure B2B Data Exchange

Pauhu is an approved participant in the EU Language Data Space. Our connector enables secure data exchange with full audit trail, commercial transactions, and regulatory compliance.

  • Endpoint: connector.pauhu.eu
  • Protocol: LDS v2.0.0
  • Location: Helsinki, Finland (EU jurisdiction)
  • Central Catalogue: Products visible in LDS public catalogue

LDS Documentation | LDS Portal

Query Catalogue

curl -X POST \
  -H "Content-Type: application/json" \
  https://connector.pauhu.eu/api/v2/catalog/request \
  -d '{
    "@context": {
      "edc": "https://w3id.org/edc/v0.0.1/ns/"
    }
  }'

Ready to access EU language data?

Download free samples or contact us for enterprise licensing and LDS connector access.