The global narrative says everything moves to the cloud. In LATAM enterprise projects, we see the opposite more and more often: on-prem and air-gapped are gaining weight again.
1. Fragmented regulatory landscape
Each country has its own data-protection regime. Brazil’s LGPD, Mexico’s Federal Law, Argentina’s 25.326, Colombia’s 1581, Peru’s 29733 — all with different criteria on international transfer, third-party processing and notification obligations.
Moving data to an AI provider in another jurisdiction triggers long legal discussions. Keeping data where it lives cuts those discussions short.
2. Higher-than-expected egress costs
Per-token cost looks cheap until you multiply by real volume. When an agent handles 50K daily queries with long RAG prompts, commercial APIs turn expensive fast.
An open-source model on your own GPU has fixed cost. Past a certain volume, on-prem wins by a wide margin.
3. Vendor risk
When your product depends on a third-party API, you suffer their price changes, their terms changes, and their outages. In B2B enterprise that’s hard to defend in audit.
What it doesn’t mean
On-prem doesn’t mean “everything old”. Modern practices — observability, IaC, CI/CD, canary rollouts — apply just the same. What changes is that the GPU lives in your rack instead of the public cloud.
How we mix it in practice
In most implementations we end up with an open-source model on on-prem GPU for the majority of traffic, and an external SOTA model reserved for cases where the extra cost pays off (complex reasoning, long-form writing). Routing between the two is also code.