From 21.3 Million to 245 Million: How Wells Fargo's Multi-Model AI Architecture drove explosive growth in customer interactions
The Multi-Model Revolution: From Wells Fargo's 245 Million Interactions to Humiris' Game-Changing Innovation
What happens when a 172-year-old bank thinks like a tech startup?
While the tech headlines buzz with the latest OpenAI release or Gemini update, something much more profound is happening behind the scenes at America's largest enterprises.
In a stunning demonstration of AI at scale, Wells Fargo's AI assistant processed 245.4 million customer interactions in 2024 alone an 11.5x increase from the previous year. Even more remarkable? They did it without ever exposing sensitive customer data to external language models.
This isn't just incremental improvement. It's a paradigm shift in how AI systems should be architected.
The Secret? Stop Thinking About Models and Start Thinking About Orchestration
Wells Fargo CIO Chintan Mehta revealed their breakthrough approach in a recent interview: "We're poly-model and poly-cloud."
While most organizations obsess over which single AI model to adopt, Wells Fargo built something far more sophisticated a layered intelligence system where:
Speech is transcribed locally
A small language model screens for personal information
Google's Flash 2.0 determines user intent
Internal systems handle all sensitive data processing
No customer data ever touches the external AI models
The result? An AI assistant that helps customers pay bills, transfer funds, and manage accounts all while maintaining bank-grade security.
Three Lessons Every Business Should Learn From Wells Fargo's AI Success
1. The "Best Model" Debate is Missing the Point
"The performance delta between the top models is tiny," notes Mehta. The real question isn't which model is best, but how they're orchestrated into effective pipelines.
Wells Fargo uses different models for different tasks:
Gemini for customer interactions
Claude and OpenAI for coding
Llama for certain internal processes
2. Context Windows Matter More Than You Think
One area where model differences still matter: context window size. Wells Fargo leverages Gemini 2.5 Pro's massive 1M-token capacity for certain applications, with Mehta noting it "absolutely killed it" for handling unstructured data without extensive preprocessing.
3. Multi-Agent/Multi-model systems Are the Future
Perhaps most exciting is Wells Fargo's move toward autonomous systems. They recently deployed a network of specialized AI agents to re-underwrite 15 years of archived loan documents, a task that would have required armies of human analysts just months ago.
The Wells Fargo case study reveals that the future belongs not to those with the biggest models, but to those who master the art of AI orchestration, combining multiple specialized models into secure, scalable systems.
This multi-model approach offers several advantages:
Enhanced security through layered processing
Optimized performance by matching models to specific tasks
Scalability for handling massive transaction volumes
Flexibility to incorporate new models as they emerge.
At Humiris, we're not just observing this trend, we're leading it. Our infrastructure is purpose-built to create the most effective agentic models by combining the strengths of multiple specialized AI systems:
For Code: Our mix-models understand context, generate efficient solutions, and debug with human-like reasoning, accelerating development cycles while maintaining quality.
For Finance: Like Wells Fargo, we've engineered systems that deliver intelligence without compromising security, enabling financial institutions to automate complex workflows while maintaining regulatory compliance.
For HealthTech: Our specialized orchestration approaches ensure patient data remains protected while delivering insights that improve care outcomes and operational efficiency.
We believe the future isn’t about one model that can do everything,
It’s about the right mixture of intelligence working together.
This April 15, we’re releasing Codiris, our first multimodal, post-trained multi-model engineered specifically for complex code workflows.
Built from the ground up to handle parallel and sequential tasks simultaneously, Codiris unlocks a new level of performance for teams building across AI, software, and systems.
🚀 Generate up to 10,000 lines of code
âš¡ In under 100 seconds
🧠With reasoning that spans logic, memory, and structure
This isn’t just speed.
It’s a shift in what’s possible when multimodality meets code reasoning at scale.
You can try our previous general model in Cursor or VS Code.