Databricks CustomerLake and the Next Era of Marketing Intelligence

Full disclosure: Neuralift is a Databricks “Built-On” Partner, so I am not looking at this from the outside.

Last week in SF at Databricks annual DAIS was an inflection point in Martech. Not because Databricks is entering the CDP market. In fact I predicted this exact thing would happen 1.5 years ago. What makes CustomerLake interesting is where Databricks is placing customer intelligence. It’s where we made our bet at Neuralift AI 2 years ago.

The inflection point is the architectural shift that customer intelligence is going to live inside the data warehouse/lakehouse, not outside it.

Inside the data foundation. At the source of truth. Part of governance. Next to models. Consumable to agents. Connected to execution, activation and measurement.

That is the shift. Intelligence and the customer data in the warehouse/lakehouse are now one thing. It has always been Data + AI as Databricks has aptly named their annual Summit.

This replaces data and intelligence scattered across too many tools. Identity in one system. Loyalty in another place. Ecommerce data somewhere else. Media data in a silo. CRM in another. Product usage somewhere. Support data, store data, transaction data, margin data, all a fragmented mess. The kind of mess AI cleans up.

The CDP emerged a decade ago to solve that problem. It tried to unify customer profiles and make them usable for marketing. That was important. In many ways the CDP was a precursor to what comes next. It recognized that customer data should not be trapped inside applications. It should be organized around the customer. But making good use of that data was still an issue.

Precisely because this data is so important and valuable – likely the most valuable asset any company has – the center of gravity kept moving. From CDPs to Snowflake, Databricks, BigQuery, Azure. This is now where enterprise data ends up. Customer data. Transactional data. Behavioral data. Product data. Marketing data. Journey data. Identity data. Enrichments. 

AI wants it all. AI does not want a narrow slice of the customer. AI wants the whole customer. It wants transactional breadth with behavioral depth. It wants data both disconnected and continuous. It wants history, lots of history. It wants the business context and goals. 

CustomerLake is important for the industry because it is the first real validation of this direction. And it’s a direction taken by the team that built the most respected CDP a decade ago. They understand the problems and know the most powerful marketing and advertising AI will come from customer data in the data warehouse / lakehouse. 

So the question now becomes what intelligence needs to be created? This is the question that will define the winners and losers in the Marketing AI era.

The missing layer

Most marketing systems still treat customer data as something to query.

A dashboard answers a question the business decided to ask. A SQL query retrieves a pattern someone defined. A rules-based segment applies boundaries to customers. A propensity model predicts an outcome the business labeled as important.

All useful. None of it is discovery. The data is not learning anything. It’s just finding what you have determined needs to be looked at.

When you look at the same things the same way over and over, as a person or as a business,  it is impossible to grow. 

Discovery is finding customer structures, behavioral patterns, value signals, and growth opportunities the business did not already know how to look for.

Discovery is the missing layer of intelligence that comes next.

This is where deep learning neural networks matter. They do not start with a marketer’s definition of a segment or assumption of a problem. They start with data. Purchases, searches, sessions, visits, content consumption, offer response, product affinities, loyalty activity, media interactions, churn signals, value progression, timing, sequence, frequency, intensity.

The model learns from the data. No bias. No expectations.

During training, a neural network is exposed to orders of magnitude more customer data than any team of PhD analysts could manage. It makes predictions. It gets things wrong. It adjusts its internal weights. It does this again and again until it has learned a compressed representation of the customer base.

That representation is the important part.

It is not just a score. It is not just a segment or an audience. It is the model’s learned understanding of how customers are similar, different, valuable, vulnerable, responsive, under-monetized, over-marketed, and likely to move. It is multi-dimensional, just like your customers. Just like people. Ironically it is more personal and has more empathy to humans than your SQL query.

It learns the things you didn’t even know to ask.

Customers who look average in a dashboard may be very different in that learned space. Customers who appear unrelated in a CRM may actually be behaviorally adjacent. Customers managed by different channel teams may actually be moving along similar paths.

That is why learning matters.

The warehouse contains the data. The neural network learns the shape of it. Those shapes, those patterns are the actual representation of your different customers. Representations you could have never known without neural networks and the NVIDIA GPUs they run on.

Why this matters now

The imbalance in modern marketing is pretty clear.

Execution has gotten very good. A brand can push audiences to platforms, trigger journeys, personalize messages, suppress customers, run tests, measure outcomes, and optimize campaigns faster than ever.

But before any of that happens, someone still has to decide what is worth doing. Where is the money? Which audience? Which message? Which channel? Which KPI? Which opportunity? How do we prioritize everything we could do?

These needs repeat for every use case. They are not trivial and thus are very costly to fulfill. Today this is entirely guesswork. 

The execution stack cannot answer those questions on its own. It acts on the instructions it is given by people whose job it is to hit a number. Usually a short-term number. The business can activate customers faster than it can understand which customers, behaviors, and opportunities deserve activation, let alone priority.

Faster isn’t smarter. The era of failing fast is over. We have entered the era of knowing what to do ahead of time.

The first thing one usually learns is growth is not sitting in obvious segments. It is hidden in combinations of signals. A slightly different purchase sequence. A timing pattern before churn. A content affinity that predicts retention. A loyalty behavior that predicts cross-sell. A group that responds to marketing but erodes margin. A dormant audience that is more recoverable than other lapsed customers. A moderate-value group that behaves like an earlier-stage version of future high-value customers.

No marketer, analyst, agency strategist, or platform algorithm can manually test every possible combination of signals across the full customer base.

Deep learning exists for exactly this kind of problem. It finds the shape before the business names it.

The AI shift is from execution-led marketing to discovery-led growth.

CustomerLake becomes obvious

This is where CustomerLake becomes relevant to the bigger picture.

Databricks is effectively saying customer data, AI models, agents, identity, audiences, activation, and personalization belong should operate from the lakehouse connected to the warehouse. That now feels inevitable.

For years, marketing lived downstream from enterprise data. Data teams managed the warehouse. Marketing teams worked in CDPs, campaign tools, media platforms, journey systems, and point solutions. Data was copied, synced, transformed, activated, measured, reconciled, and governed across too many systems.

The old Martech stack was built around movement. Move the data. Move the audience. Move the profile. Move the campaign. Move the measurement.

The AI-native stack will be built around learning from the data.

If agents are going to analyze behavior, recommend audiences, build campaigns, personalize experiences, and optimize outcomes, they need access to governed customer context. They need access to the full behavioral field of view. They need models, data, identity, audiences, and activation in a more unified environment that is managed and governed. This is not data that should be handed over to CDP vendors that don’t have public trust centers. 

Agents also create another important distinction. Agents automate work. Deep learning discovery helps decide what work is worth doing.

That difference matters.

It is not enough to make campaign execution more agentic. It is not enough to let marketers create audiences in natural language. It is not enough to automate the movement from audience to activation. These are all workflows. 

The more important question is still upstream.

Which customers, behaviors, segments, and opportunities are actually worth acting on? What is going to move the needle?

CustomerLake validates the infrastructure shift. The next step is the intelligence layer that shares in that infrastructure where growth is hiding.

Advertising is where this gets expensive

Advertising is probably the clearest place to see why this matters. For years brands were able to rent a lot of intelligence from media platforms. Cookies, device IDs, third-party data, retargeting pools, lookalikes, black-box optimization. The platforms did a lot of the audience work.

That world is weaker now, so the advantage is moving back into the brand’s own first-party data. But this is where I think people get a little too simplistic. Having first-party data is not the same thing as knowing what to do with it.

A brand can upload a customer list to a platform. That does not mean it is the right audience. It can build a lookalike from purchasers. That does not mean it is the highest-value acquisition seed. It can retarget site visitors. That does not mean the spend is incremental. It can suppress recent buyers. That does not mean it is protecting margin.

This is no small matter. A media platform may know who clicked. The brand knows who bought, stayed, upgraded, churned, redeemed, visited, watched, subscribed, renewed, complained, returned, referred, and became more valuable over time. That broader customer relationship is the intelligence that should be going into paid media.

The most valuable audience is not always the one most likely to click. It may be the audience most connected to future value. Or the one most worth suppressing. Or the dormant group that is actually recoverable. Or the high-response group that is terrible for margin. We are moving into a world where those unknowns become known. 

The flip side is what makes this both scary and really interesting. Much of what is “known” now will soon be useless.

The next layer is learning

The customer data warehouse is the system of record and that record includes both the starting point of identity and the end point of activation. When people refer to Databricks “collapsing the Marketing stack” this is what they mean. 

The next marketing advantage sits between these features. It is a new layer of intelligence. Accessible, extensible and built on data insights and discovered opportunities.

Not more dashboards. Not more manual segments. Not another API. A system that can learn from the customer record and discover where growth is hiding before the business decides what to do.

Real intelligence. Actual learning. Actions that will solve your use case and lift your KPIs.

That is why CustomerLake is both so important and inevitable. AI applications will be connected to the data warehouse. But where the data lives is only part of the solution. The real advantage comes from what the business can learn from it now that it’s there. And that part is what’s next. 

Comments

Leave a comment