Private KI vs. Cloud-KI: Was IT-Entscheider im Mittelstand wissen müssen
Cloud-KI sendet eure Unternehmensdaten auf US-Server. Private KI hält sie in Deutschland. Kosten, Compliance und Kontrolle im Vergleich.
Your legal team signed off on the OpenAI data processing agreement. Checked the box. Then someone actually read the retention policy: OpenAI holds API inputs for 30 days by default for abuse monitoring. That data — your customer queries, your internal documents, your financial summaries — sat on servers in Iowa for a month.
The private AI vs cloud AI decision comes down to three things: who controls your data, what compliance exposure you carry, and whether your infrastructure costs are predictable three years from now.
Cloud AI: The Real Picture
OpenAI, Google Vertex, Azure OpenAI — fast to start and genuinely capable. You get world-class models with no hardware to manage. For a proof of concept or an internal tool with no sensitive data, cloud AI is hard to beat.
The problems appear once your data is sensitive or your usage scales.
Data residency. Every prompt you send to the OpenAI API crosses the Atlantic. OpenAI processes it on US infrastructure. For German companies handling employee data, client files, or anything touching financial records, this creates a direct conflict with GDPR Article 46. Standard contractual clauses exist, but "we have a contract" is not the same as "the data never left Germany."
Retention you do not control. Default API retention is 30 days. Enterprise agreements can reduce this — though you are negotiating with a vendor who has their own reasons to keep data. Every retention policy change they make, and they do make them, affects you without your input.
Per-token costs that compound. GPT-4o runs roughly $5 per million input tokens. A mid-market company processing 10,000 internal documents per month, each 3,000 words, generates approximately 60 million tokens in that ingestion alone. That is $300 just for the initial indexing run, before a single employee asks a question. Scale that to daily use by 100 employees and the monthly bill becomes structurally unpredictable.
Vendor lock-in. Your RAG architecture, your prompt engineering, your integrations — all built around one vendor's API. When OpenAI changes their pricing (they have, multiple times) or deprecates a model (they do, on 6-month cycles), you rebuild or pay whatever they charge.
On-Premise AI: The Honest Tradeoffs
Private AI on your own infrastructure solves the data control problem. The costs are real, though, and worth laying out directly.
Infrastructure. A production-ready private AI deployment on Hetzner dedicated hardware in Nuremberg or Falkenstein runs €800-2,000 per month for compute, depending on GPU configuration. Add PostgreSQL hosting, object storage, and monitoring, and total infrastructure lands at €1,200-2,500 per month. Fixed. Predictable.
Compare that to cloud AI at scale: a company with 100 active AI users can reach €3,000-5,000 per month in API costs with no ceiling. The crossover point is usually around 10 million tokens per month.
Model capability. Self-hosted AI does not mean weak AI. Mistral's models — hosted in France, EU-resident by default — match GPT-4 class performance on most enterprise tasks. Mistral 7B handles document Q&A, data extraction, and structured output without sending a token outside European infrastructure. For RAG deployments, answer quality depends far more on your retrieval layer than on raw model power — a point most cloud AI vendors have little incentive to make.
Deployment complexity. Running Docker containers on Hetzner with Caddy as a reverse proxy and automatic TLS is not a weekend project. A production deployment needs HNSW vector indexes in PostgreSQL for fast similarity search, per-client schema isolation so one tenant's data cannot surface in another's query results, and structured JSON logging for production observability. Built from scratch, that is 4-6 weeks of engineering.
The Multi-Tenant Problem Most Comparisons Skip
Both cloud and private AI share a problem that rarely appears in vendor comparisons: multi-tenant contamination.
If you run multiple business units or clients on the same AI system, what prevents the HR department's query from surfacing content from the finance team's document store? With cloud AI, you depend entirely on the vendor's data separation architecture. You have no visibility into how OpenAI separates API customers at the infrastructure level.
With private AI built on separate PostgreSQL schemas per client — scoped vector search, API-key authentication hashed with SHA-256, no cross-schema queries possible — data isolation is structural and auditable. You can show your DPA auditor the database schema. You can prove the separation exists. That is a different category of compliance confidence than "the vendor says it is isolated."
For German companies that handle client data across departments or serve multiple business entities, this is not a niche concern. It is the reason several of our customers switched away from cloud AI after their first internal data governance review.
Which Setup Fits Your Company
Cloud AI makes sense when you have no sensitive data in your AI workflows, you are validating a use case before committing to infrastructure, or your token usage is genuinely low — under 5 million per month.
Private AI on European infrastructure is the right call when you handle employee data, client files, financial records, or legal documents. When you have 20 or more employees using AI tools daily. When GDPR compliance needs to survive a DPA audit rather than just a vendor agreement review. And when your monthly token usage exceeds roughly 10 million — the point where fixed infrastructure costs consistently beat per-token pricing.
There is a middle path: Mistral AI via their EU-hosted API solves the data residency problem without requiring you to manage infrastructure. You keep the per-token model, but the data stays in France. A reasonable option for companies not yet ready to run their own stack.
See private AI running on your own data — documents indexed in Germany, queries answered in German, costs fixed at a number you can budget. Book a 30-minute demo and we will run it live on sample data from your industry.