How a Measured Data Stack Can Impact Business Decisions Artwork

Data Analytics Chat

🎧 Welcome to Data Analytics Chat – the podcast where data meets real careers.

Data isn’t just numbers; it’s a journey. Each episode, we explore a key topic shaping the world of data analytics while also discussing the career paths of our guests.

This podcast brings together top experts to share:

- Insights on today’s biggest data trends
- The challenges they’ve faced (and how they overcame them)
- Their career journeys, lessons learned, and advice for the next generation of data professionals

This is for anyone passionate about data and the people behind it.

👉 Hit subscribe and join us on the learning journey.

Connect with host - https://www.linkedin.com/in/ben---parker/

All Episodes

Data Analytics Chat

How a Measured Data Stack Can Impact Business Decisions

October 01, 2025 • Ben Parker • Episode 57

In this episode of Data Analytics Chat, host Ben welcomes Matthew Paruthickal, Global Head of Data and AI Architecture Engineering at Sanofi, to discuss the evolution of data technology from single-box systems to agent AI.

They explore Matthew's career journey, the importance of solving business problems with technology, and the significant moments that shaped his approach to data. Topics include the layered data stack, overcoming technical and organisational challenges, the role of trust and speed in data, and how businesses can leverage AI for better decision-making.

00:00 The Evolution of Data and AI
00:48 Introduction to the Podcast and Guest
01:48 Matthew's Career Journey
02:17 Technological Shifts in Data Management
03:46 The Importance of Trust and Speed in Data
07:40 Defining Moments and Career Setbacks
09:46 Mentorship and Business Value
16:51 Building a Layered Data Stack
19:09 Understanding the Layered Data Stack
20:12 Challenges in Executing a Layered Data Stack
21:14 Top-Down and Bottom-Up Approach
22:07 The Importance of Business Focus
23:53 Aligning Data Stack with Strategic Decision Making
26:30 E-commerce Data Challenges and Solutions
29:14 Common Difficulties in Data Tool Investments
31:17 Evaluating New Data Tools
35:27 Balancing Automation with Real-Time Insights
37:12 Conclusion and Final Thoughts

Thank you for listening!

mathew paruthickal: 0:00

I would say the evolution has been quite, it's quite an arc, right? From a single box to agent ai. Use tech to solve business problem. Such that the tech itself disappears. In data, it is not a, if, it's always a when. When was the first setback or when was the second, third one? The numbers can be right by our logic, but if it's still wrong for the business, it is still wrong. I don't think in today's age and time technology should be a challenge. Like I told you, we have the tech right now such that the tech can disappear away and you're just solving business problems. We are solving business problems such as the tech disappears away, and that has always been my mantra, and the democratization is absolutely key for us. Early days, taught me that trust beats speed sometimes, right? The big data era proved that. Speed. When you're doing things at speed, it creates new questions. Our

ben parker: 1:02

Welcome to Data Analytics Chat, the podcast where we discuss the world of data, ai and the careers shaping it. Today I'm excited to welcome Matthew Peru Fial Global head of Data and AI Architecture Engineering at au pair Sanofi. So in this episode we'll explore his exciting career journey and discuss how a laid and measure data stack can impact business decisions. Matthew, welcome to the podcast.

mathew paruthickal: 1:34

Hello, Ben. Thank you so much. Absolutely delighted to be here and continue the conversation.

ben parker: 1:39

Yeah, likewise. And it's obviously yeah, the data topic is yeah. Really important in, especially in today's environment. So looking forward to discussing that. Do you wanna start off just sharing your career journey with the listeners?

mathew paruthickal: 1:52

Absolutely. I'm more than delighted to actually take it forward. I started my career in the two thousands, when storage and compute lived in one box. I was with retail clients like Metro and Dunkin Donuts. And you're talking about like shipping, classical data transformation and business intelligence, which lived on SQL servers. Oracle, you're talking about star schemas, materialized views and brutal overnight batches and those kind of things. And performance back then meant all about indexing and partitions when, fast forward to the 2000 tens, when everything flicks. Flips we're talking about like cheap distributed storage, with Spark, gaining a lot of traction. So you're pushing compute to the data. So you're talking about like standardized spark, a blended Kafka streams and streaming became very prominent, in the late two thousands. And then fast forward to the 2020s, when you. Working across retail planning and CPG and everything is cloud based right now, right? We're talking about, you start with a cloud warehouse, elastic concurrency, and then we have concepts like Lakehouse. You have one copy, which serves both your data transformation, your bi, your machine learning. So we are talking about how things have evolved, from everything living in. One box to, in the last five years or few years, I would say everything is Gen AI based, right? You're talking about vector search and agent TikTok to your data platform. So I would say the evolution has been quite, it's quite an arc, right? From a single box to agent ai. The mission stays the same, right? It's the time taken to actually deliver trusted decisions to the business. It's just the, how we've done it has evolved over the years, and that's about it. Yeah.

ben parker: 3:31

Brilliant. And how have you obviously technology's progressed quickly. How have you managed. Sort of career progression and also I guess keeping it, keeping up to date with the latest tech trends.'cause that's a big challenge for a lot of people.

mathew paruthickal: 3:46

It is actually right, because if you look into it, the two big things actually, which is struck in my career journey, first was the, the whole distributor and column data set. When you have a business problem, right? When you're working with single machines, there's a so much of. Pipeline tuning that you have to do, whether it's like you interacting with the business, it's mostly about performance, but then when the whole distributed computing started clicking, you're talking about like billions of records, being able to push down to the business like you are making an impact directly. To the business, at their hands. Your jobs, which took hours on one machine, could return seconds in clusters. And that's when it first hit me. That you could, you're not limited by computer loan actually, you're talking about billions of record, interactive, instant intra second analysis. And the real, when wasn't the benchmark itself, it was the meeting itself, right? Your marketing teams or your sales teams, you're talking about cohort lift or path to purchase, you can get an answer right there. And that was the first defining moment for me. The second part, the journey was like democratization, done right? Which means that it's not just about speed, it's also about trusted answers for the business. So especially with gen AI now, right? When you're talking about talking to your data, we're democratizing. The data access to everyone in the organization. So what does that mean? Right now, trust is the most single important we've already solved for speed, but now comes trust, right? You don't want to create tickets and no queues and, those kind of things, which takes, you want to get a result in minutes and not days, right? So you can still, a finance team can still trace it back to the books. So behind. All this, it's the same discipline, right? You have your data contracts, your quality checks, but at the same time, right? How do you design a product right now? So speed only matters if the number is right and if it is traceable. And that's something that I've actually held that principle, my entire theme to my entire career has been this right? Use tech to solve business problem. Such that the tech itself disappears. And I believe we are in the golden age right now and we are living in it right now, is what I feel completely,

ben parker: 5:52

Yeah. Yeah. No, I completely agree. Actually, I think a lot, actually, a lot of people in the podcast have said, look, we're in a blessed and yeah, golden time. With what's happening. So much of that can happen and it's yeah, gonna be fascinating times ahead. So I guess for yourself, obviously why did you fall into this field? Is it was it something like skills at school that you felt or was it something that interested you?

mathew paruthickal: 6:18

When I started my career it was all about tech. I was very much fascinated into data and data processing. And over time, when you starting to learn about the domain in itself, you're starting about like business tools. What do you mean by promotion mechanics? What do you like? And I'm, I've been mostly in the retail space, so you heard about. Terminologies like promotion, mechanics, price back architecture, demand latency, margin, math, and how do you encode all of them into, a data schema? How do you encode that into constraints, policies and tests and, what are the KPIs and measurements, right? And how does all this fit together? And that is what has actually sparked me because, when I started mixing in both domain knowledge and engineering, then you can start creating like, repeat. Playbooks, right? Years at a national scale, it taught me about like throughput, correctness and cost controls. And then, you try to create those reusable patterns when it comes to pricing decisions, when you come to promo decisions, and teams get the. Outcomes without having to reinvent the plumbing. And that's what actually, like I mentioned before, the tech has to disappear because you just, you're just applying the latest and greatest of technologies to actually solve your business problems such that the tech itself is not there anymore. And that's what I love about this entire, this chaotic world right now that we live in and how we've been able to actually see value from this engineering discipline.

ben parker: 7:40

Brilliant. And throughout your career then, have you, is it been, could you pinpoint that one or two defining moments that changed your career trajectory?

mathew paruthickal: 7:50

Two things, right? The distribution data, right? When I was able to actually answer back to a business, right? When you're not trying to say technology's a bottleneck here, and that kind of went away in, I would say mid. 20 2015 or so, when GPUs was gaining a lot of traction. You have this whole distributed computing and you ask a question when you can, you're getting like the big data problem was actually solved. When you started having that distributed data being solved by GPUs and amazing CPU architectures, you could wire up just a four note cluster and then. Without even doing any aggregates, just on the raw transactional data, you could actually create solutions, which means that the users can ask anything about the data because the moment you aggregate, you're losing data fidelity. And that was the first defining moment when, you know in, in front of a large town hall, we were able to actually demo this product to the. End user community that we understood the business. And without doing any aggregation, probably in real time doing instant analysis on billions of records, subsecond analysis, that was like an eye-opener to the business because that opened up a lot more avenues for eventually the data team itself to get more to get more business plans baked in. And we could create more products, data products for the business so that they can add value to the business. So that's when that whole. 360 degree loop started happening and business started to trust us. We started to understand more and there was no friction. So it was like the perfect bridge between tech and core business. I would say that was, for me, it was a core defining moment.

ben parker: 9:26

Brilliant. And then, so I mean with yourself, so how, when you've progressed in your career, how have you gone about this? Has it been, have you had a mentor, have you had training or what's been the sort of be, what's been, what's been the most impactful addition to your skillset in that regard?

mathew paruthickal: 9:46

I would say when you say about mentorship, right? There's been a lot of mentors, right? One is from the technology side and the other one is from the domain side, and domain engineering takes time to understand like what exactly is going on in that particular space, right? Like I mentioned before about understanding the business. And so you talked to a lot of business leaders, right? How is it actually, how is a business? Problem. How if I do this initiative or how is it going to actually move the business by? Is it gonna increase the top lines? Is it gonna increase the bottom lines? For me, putting a business value to every single thing that we are doing was early learnings for me to actually see why are we doing, we should not be doing. Something for the sake of doing it. There has to be a value, right? And value can comes in terms of efficiency gains. It can terms in terms of EBIT gains, right? For the company you're directly affecting the top lines, right? Of course. From a tech angle, you talk a lot about, productivity and efficiency gains, I can do so and so X percentage less, it took me X hours and I can do it in half the time. So that was how you started, but then eventually you start moving on, right? How is it moving my business? Any solution that I make? And when you start making that connection, that business value to every single thing that we are doing, be it bottom line, be it. Top line, be it affecting the strategic levels of the company or the business leaders of the company, that's when you start making the connection. And for me, that was a massive way, it changed my outlook on how to approach problems too.

ben parker: 11:13

Yeah. Brilliant. And it's, I think it's important. It's getting that domain knowledge is it's becoming more important nowadays, I think.'cause there's so many tools out there that can do like the heavy lifting for you. It's now if you can understand the business problems to a high, really high level, obviously you can add more value to where you're at.

mathew paruthickal: 11:32

Correct. Exactly. Exactly.

ben parker: 11:35

So you, obviously, you've had a great career obviously worked for some great companies. So have you've ever faced a major career setback?

mathew paruthickal: 11:43

Oh, the, I would be lying if I say no. In data, right? In data, it is not a, if, it's always a when. When was the first setback or when was the second, third one? You always have it right? And I'll say, I think when I it was early on. Back again. When you're living in the data world, part of your primary responsibility is to actually eventually bring out numbers, right? You are showcasing insights and some kind of like results back to your executors. You're showing some insights or analysis results, and I. Still remember one of my toughest one, this was an an executive who had to actually restate the results in a town hall of a dashboard that we built. And it was a finance dashboard. And as finance people, they're not very forgiving, we, they had to flag a lot of mismatches and they had to issue a lot of correction post the town halls, we were trying to actually aim for speed that time, our pipelines were running fast was green, everything was successful. But we did not really fully match the business process. We thought what we had did was right. Especially when you talk about. Finances, like numbers, like cross and net and what consists you the net, right? What are you actually subtracting from across to get to the net and the return time? There's a whole lot of things, especially when it comes to fiscal calendars cutoff and that setback is what eventually. Led me to understand more about the business tool. Let me try to understand what they're looking for. You need to actually make sure that the left hand and the right hand are talking the same language, right? The numbers can be right by our logic, but if it's still wrong for the business, it is still wrong. And that's what it is. I had, we had to own it. We had to apologize it, because there was some numbers that was going out to the markets and that. Fundamentally changed how we shipped data products too from instituting what we call as data go, no go processes. You're designing for variety, not just the volume in itself, so we wanna make sure that we handle multiple edge cases. Which led to a lot of things that we are seeing in the last few years, like automating data contracts, like putting things like data observability, data lineage, which is visible even to the finance ops team, and one. Semantic definition, per KPA. So a lot of the good practices that we put back then is actually helping us in AI right now. If you look into what does AI need, right? AI needs trust, AI needs semantic definitions, and it's the setbacks that is actually leading us to, the world of today on giving us that edge to play out in the AI world.

ben parker: 14:09

Brilliant. And how do you deal with a setback?'cause obviously it can, obviously at the time it could be, obviously Yeah. Painful. But how do you get yourself back into the game sort as

mathew paruthickal: 14:18

it is all about the, it's all about the business process, right? So we didn't have, we had a technical process, but what we did not have is that process, right? It's all about the process. What are you putting in your steps in moving from left to right? You may have a well-defined process, but if it's not really approved by across the board you're having like that the cha tunnel vision, you're just looking at the way you are doing things. But then if you start going abroad, you bring in all the stakeholders. Up early in the game. The key here is to move left in the process. Go early in the process, make them aware of what exactly is going on, make them understand right, what your processes are, to see if you understood what they're doing. They've understood what we are doing. And like I told you right, it's about going left. Inserting some of those business processes, changing it so that they understand. You catch it early, that's the mantra of it. You catch your problems early before it gets down deep into the system. It becomes, uncurable and untraceable.

ben parker: 15:12

Brilliant. So then what do you think has given you a genuine edge to be successful in your career?

mathew paruthickal: 15:20

It's the wearing the both hats of late, some of these setbacks, like I was mentioning, how has led me to understand more about the domain in itself. Building repeatable playbooks has been very key to me actually, right? I don't want to keep, so it's gonna be like, how do you build that whole factory model, right? You are doing something and you want to keep replicating your cost. So repeatable model of growth are more, what I call it for my teams is. Absolutely essential. You don't want to keep, so you can, I have this old mantra called what you're building you should think global, but you should be able to build locally. And that's how I think it, something that is actually built for a local market should be able to be a replicable across global markets, especially when you are working with the global organization. So that whole. Think global, built local mantra by creating repeatable playbooks and the North Star is still the engineering discipline. For me, be it data, be it machine learning, be it ai, or be it bi, right? It's all aimed at one thing, right? The time that it takes to get your trusted decision. It's just one. Single month time to trusted decision, right? Correctness, traceability, all has to happen first, right? Speed and usability comes next. And the cost, of course, right? Cost is something that you can always defend in this grand scheme of things, when the tech fades, the decision gets better. And that's, for me, that's a win. And that's how I've always tried to actually play a part. The tech has to fade in such that the decision gets better.

ben parker: 16:49

Okay, cool. Good to hear. We move on the data topic and obviously, I guess getting more and more important than data stack, in my opinion. Obviously understanding data so what does a like layered and a layered and measured data stack actually look like in practice?

mathew paruthickal: 17:05

Okay, so a data stack for me, right? It has to be layered because, data. We'll change shape and form. Right from the time that you're capturing data to decision time data is changing shape. You know it, it goes through multiple layers, right? Because every stage makes a promise to the next layer that you can actually trust the next zone, right? So you have a landing zone. Where its promise could be about freshness. I need to make sure my data is very fresh. I keep getting in streaming fast data. I have to make sure that I can process it. I have to make sure the data is complete. I have to make sure the data is valid. And so that's what I call as a landing zone. And it's called a different promise to it. Now there comes a trust layer, which is about like your fixes. Yeah. Your data reconciliation and your remediation layer. Because you don't want to have like garbage in. Garbage out. So what I have is a landing zone. My next layer could be like a Trus layer where I'm trying to do all this data fixes so that I am ensuring that all the data that gets into the system, I can make it usable. And then comes the semantic layer and semantic layer. Absolutely very important, right? Because you wanna actually turn your metrics, your named metrics with the lineage, and so that you can use that for multiple applications, so once you've built all those layers then comes the serving layer, right? And who's it going to serve, right? You're gonna be serving for machine learning use cases. You're gonna be serving it for reporting use cases. You're gonna be serving it for. For external consumption, you're gonna be serving it for AI use cases, right? So anything and everything that you're building on top, you, be it your analytics, right? Typically data platforms are meant for read only analytics, right? You perform analytics, you do be deliver insights to the business. You deliver forecast to the business, right? KPIs that can load in second. So for us, this outcomes layer. When you're building, going up, sync of it like a pyramid, right? You're starting with the landing zone, your trus layer, your semantic layer, and then comes your outcomes layer, right? So you can start layering on any kind of analytics on top of it. It's not just your descriptive analytics you're talking about like you. Diagnostic layer. Then you're talking about like predictive layer, you're talking about prescriptive layer and running across everything is the governance and observability layer. The data contracts that I was mentioning to you, your standard rback controls your data lineage controls any SLOs, the SLO that you have, right? So that you have a, your handoff is measurable across and that's how. What I mean by, the layered stack. And finally, when you have Gena, which sits on top of your, say, talk to your data initiative, which is a common buzzword these days, right? It's only after all these promises exist so that you know that your answers are grounded every time you ask a question. Right? Show me my what? My sales last week, how would I do this year versus last year? Answers are always grounded in the semantic layer. Answers are always grounded in truth, and it respects every decision that you actually, build to reach that layer. That's how my mantra of a layer stack looks like today.

ben parker: 20:03

Brilliant. And also it seems like it's the concept is simple. Quite easy to understand. But I guess executing this is complex'cause you've got, I guess every business is different. You've got different legacy systems, technical debt strategy, what is there is, what are the main reasons where businesses do struggle to execute this?

mathew paruthickal: 20:27

It's, again, we know whether left hand side meets the right hand side for me, right? Certain things at the bottom of the stack is your engineering principles that you have to layer in, but beyond that, anything from top down, you always have to put the business case wrapper first. Start with decisions. You're not thinking about tables and data models. That's the last thing that you would wanna be tricking. You just wanna be, what is the business looking for to do? What are the three to five? The broad questions for this quarter, right? When you're talking about what markdowns do we take this week, without ODing eroding margins, for example, right? And then you start okay, what are the metrics that I need to actually take care of that log, right? Is it about revenue lift? Is it about stockout reduction? So you're trying to actually learn. Purely business terms. So you're putting the business case wrapper from the top down of the pyramid, and the bottom up is your pure engineering principles. And at some place we have to match. And that's the matching area, right? What are you trying to solve out here?'cause once you have that picture in your mind, what are you trying to solve for the business? Then you can eventually define what's your metric formula? What's your target timeframe owner, and how will you start measuring it? And then. Getting across the board. That's what in my in our mind, how do how do you go up the pyramid? How do you come down the pyramid? So it's a top down, bottom approach to solving problems.

ben parker: 21:46

So it's more of a challenge lies. It's more of an organizational challenge as opposed to a technical one.

mathew paruthickal: 21:52

It is actually, it is a technical, I don't think in today's age and time technology should be a challenge. Like I told you, we have the tech right now such that the tech can disappear away and you're just solving business problems.

ben parker: 22:06

Okay. Interesting. And then how would this sort of more layered approach differ from, the traditional data architecture that many companies still use?

mathew paruthickal: 22:16

I think I think it's the lack of focus on the business side. As as lead to many tools being used, you're talking about even from a one area, say for example data transformation, then you're talking about you're not using the right tool for the purpose. So it's that engineering discipline. That's actually missing in a lot of companies that I've actually seen. It's led to a tool sprawl with no symptom of no operating model, right? Target operating model is a very core component of things, right? How do you organize your teams and, why is there been a tool sprawl for doing the same thing? You know what works. So it's that. It's about the decision making process, right? Certain standards that you would put into place as to decide why are we choosing such, then which tool for what purpose, and going for an open stack these days. As see, ever since the advent of distributed compute, right? We have some things haven't changed, like SQL and Python for me, for transformation hasn't changed. And you have separation of storage versus compute and you have amazing data storage capabilities like Par K, right? Efficient data storage. Mechanism then you have evolved into lakehouse these days. So how do you ensure that, the data platform itself can evolve with the changing needs of the business? How do you stay yourself relevant? You don't want to be beholden to one particular technology, right? But if you invest in this kind of like separation of storage versus compute, and you have open standards, things which have stood the test of time, then I'm saying. You are golden in the data age, you can actually migrate to the latest and greatest. You can move on to the AI world much faster than some companies who have been like with legacy technologies. That's how I see it.

ben parker: 23:52

Okay, cool. And then, so how can businesses ensure that each layer of their data stack is aligned with the strategic decision making?

mathew paruthickal: 24:04

So ROI for us is absolutely important, right? Every, I mentioned about the business value, right? To everything that we are doing here. So for us, that's absolutely key here. And it could be bottom line, productivity, efficiency gains. Of course, by virtue of the digitization of the entire process, you are already laying or layering that in. And with ai, definitely we know. You talk about anything, you look in any number of research reports, they always talk about product productivity gains. But then the business is starting to look, they starting to look beyond productivity. Now, how can I improve my business? How can I improve my top lines? How can I increase my revenues? How can I increase my profit? And this is where the real value of AI is happening today and why that layered data stack on what we are building. It's not just building. The descriptive layer, but you can actually predict what is gonna happen. And you can also do prescriptive analytics in the sense that what if I want to do, increase my revenue? What should I be doing? So when you ask, just imagine the, the power of that question, what should I be doing to improve my sales? And if the system, if you build a system in such a way that it can do this kind of descriptive analytics, predictive analytics, prescriptive analytics, that itself is a win for the business. And the more they start. Adopting your system, they're using your systems. They are going to directly affect the ROI of the company.

ben parker: 25:24

So is there a lot of businesses focused a more, tech focused first as opposed to I guess business strategic thinking first?

mathew paruthickal: 25:36

It is both. It's the engineering discipline that you put into a centralized operating target model that you can actually create repeatable playbooks for the business. So for me, in my mind when I mentioned some terms like time to trusted decisions how do you get there faster, right? Those are all. We are solving business problems such as the tech disappears away, and that has always been my mantra, and the democratization is absolutely key for us. So how do you do all this is by ensuring that you've created that repeatable playbook. And the playbook could exist in the terms of, it could be a, insight creation pipeline engine by using all the principles that you've learned of data engineering to actually solve that need, and then replicating it across any market and every market.

ben parker: 26:19

Okay. Brilliant. And so then have you, can you share any examples where a well structured data stack has directly led to a better business decision.

mathew paruthickal: 26:30

I can definitely think of many, but one thing which comes to me from last year, I'll tell you about, as in today's agent world, especially in the retail world, e-commerce is gaining a lot of traction. And almost everyone is trying to solve the problems with e-commerce, right? So when you're looking into your traditional brick and mortar sales, and then you're comparing that with e-commerce, e-commerce has got a different problem, right? One is e-comm. It's talk, it's got some data engineering challenges in itself because the d everyone's buying then and there the store is open 24 by seven. So you're talking about like data ingestion problems, right? Your, how do you ensure that you have a, your data mod is refreshed with a 15 minute accuracy. You're talking about creating, re-engineering some of the pipeline so that when your e-commerce is looking at the data, they see exactly what happened a few minutes ago. So to make decisions and be it. The ratings reviews because every single thing from a social click is important to actually show the data then and there. So there is an engineering problem. How do you actually layer in your streaming plus your batch data? And what is important for that? So that was how we first started on, right? Is it was like a speed problem once that was solved. Then come, okay, what's the new set of data that is actually that e-comm has, which you don't typically get from a traditional brick and mortar store? So what we did is, we started to look into concepts like page one, responsiveness, your brand discoverable index. Because when you go to e-commerce you're just going and doing a search, right? All you are given is like a text box. You're just typing in something and you're searching for the data. What we did is, when we started to analyze it more, we told the business there's more data sets that's actually going to augment the existing, the standard sales and ratings and reviews. It's your search functionality, right? Where do you rank in search? Your search rank is absolutely important. Your Amazon search frequency rank you're talking about your brand discoverability when you search for generic keywords. Where does your brand come in? And that has led to actually new data contracts that we had to have with data partners to bring all the data. And of course there was a cost involved. But the thing of layering in that extra module of brand discoverability and page one, Frank. We are talking about our ROI was actually just for this one particular brand was increased by$1.5 million in one of the brands for one year, which was huge. This has led to us actually now making sure that new data set that we've learned is now part of our stack now. So it's not just your traditional ways of looking at it. You're looking at new ways of data to actually augment every kind of insights that we can actually give to the business to make better decisions for the company.

ben parker: 28:59

And that's a, yeah. Fantastic example there. So thank you for that. So obviously there's a lot that goes into ledge Stack. So what would be some common difficulties for when companies invest in data tools?

mathew paruthickal: 29:14

Tools sprawl, right? I think I was mentioning that before about like multiple ETL tools. So we ourselves, we had three ETL tools. We have two different scheduler tools. We have multiple BI tools. Some people get comfortable with, the top one out there with Power bi. Another team gets comfortable with Tableau and none of them share the definitions. It's a big problem actually. You talk about like net sales examples that I was giving to the finance before. They could mean multiple things. So you're talking about like SLA mismatches, right? Dashboards, when I mentioned about the e-comm, 15 minute freshness, but then upstream you're talking about jobs which are run hourly. So there's a whole lot of things where data can go wrong. There's no semantic layer, right? Teams can query raw tables and all those kind of concepts. And this is where. The, some of the biggest difficulties, which I've seen over the years and again, the operating model itself, right? So how do you actually bring all these things together for me is where that layered data stack the federated delivery, even having centralized standards, but then doing federated delivery. So think of a hub and spoke model, right? A central data platform. Which owns the paved road, right? We own the identity and access, the catalog lineage, the metric, the semantic registry, the data contracts, the observability, the cost guardrails. They'll all be set by the central team, and they also publish reference architectures, which other teams can then follow, right? With of course, a short approved tool list, right? And then if. They need to make an exception. We do an RFC right? To see if they so this way, right? Federated domain teams, right? Be it sales, be it supply, be it finance, marketing. They can still build data products faster, but they're all still within the guardrails of the company. This is what we've seen, when you have that whole. Centralized standards, but federated delivery, you're still keeping innovation at the edges, at edges of the teams, which know the business best, but the standards are still in the middle. So every new dashboard, or every new model, or every new agent these days, they all plug into the exact same truth, the same controls, the same cost model. And that's actually worked amazing for us,

ben parker: 31:17

okay, cool. And then obviously tech's moving at a fast paced today. How, how would you, what, how, what's your approach to. Looking at ways to improve. There's, so you don't wanna keep mo changing tools every so often, but what's your sort of approach to seeing whether there's better tools that can do your problem?

mathew paruthickal: 31:37

Can you repeat the question again? Sorry.

ben parker: 31:39

So basically, obviously this text moving quick, this is every day is new tools coming out. How, what's your approach to evaluating these new tools? For your for your problems.

mathew paruthickal: 31:49

Yeah. So one thing is right, the openness of the tools. When you, if you look into the data landscape, there are two fundamental principles. One is separating storage and compute. You don't want to keep everything in one box, right? And I, what I mean by that is right when you have an engine where you can, you're not tied to one particular. Commodity, right? Be it like a cloud lakehouse or a cloud warehouse, right? How can I switch out from one, whether that you don't want to be beholden to one technology, that you're like stuck with it for a long time, and these days, in today's Asian world, especially when you have distributed com. Pure in storage, you have AI baked in and you have a lot of AI frameworks. How can you deliver insights faster while keeping trust the same? So certain principles like separation of storage and compute are absolute must have. And the other comes like open standards, right? What has not changed in the last so many years? Ever since data's been there, right? L has not changed, right? You're talking about SQL and distributed compute machines, right? You in interactive analytics on. Billions of records. And then you talk about storage aspect, right? You have concepts like Lake House where it can read both structured and unstructured data. When you talk about insights to the business, what's an insight? And insight is both is, it just doesn't come in your structured data, it comes in your unstructured data too. You have to mix in and match and data coming from your. Multiple PDFs, documents, and mix that with the data that you have. So a cloud lakehouse approach is ideal to actually seeing that vision, to enabling that vision for the business. When you think of the human brain, you make up of all information from the left hand side. Think of the left hand side brain being your structured data, your right hand side being your unstructured, and then when you make a decision, you have all that information at your fingertips. So we wanna make sure that. All the principles. Separation of distribution and compute, your open data processing standard, which could be SQL or pi. Spark. Spark, like that distributed framework so you can process data faster while at the same time you do governance and cost controls. So if you take into all these parameters, then you're not beholden to one tool. You know what exactly the tool does, it's purpose, and then you can switch out between. Multiple different engines. You're not beholden to one cloud too actually, because if you keep the an open cloud lakehouse format, which is accepted across the board, then you have an open strategy. So for me, it's always about the open strategy that you can actually maintain in today's world.

ben parker: 34:06

Okay. I guess I guess that teams must get obsessed with technical capability as opposed to why it's needed, if you get what I mean.

mathew paruthickal: 34:16

When you look into teams, right? Early days, taught me that trust beats speed sometimes, right? The big data era proved that. Speed. When you're doing things at speed, it creates new questions. Our business started asking new questions because they figured out, okay, now we don't have a technical drawback anymore. We can start to put in more avenues for it to answer questions. Now, in the Gena moment, access is instant, right? And so are the guardrails, right? So guardrails decide whether it's a magic. Moment or whether it's a messy moment, right? You ask a question and you get complete this thing. So balancing, right? The entire thing, right? How do you automation. Automation is absolutely key, right? Automation, the pipeline, and not the judgment in itself, right? You create certain things like automated data contracts at ingest time so that things are not changing. Business acceptance test, building that data, going to go process in the trust layer and that one semantic layer. So for me, all of these things, they know the the different layered stack and what promise we need to put at the different tiers is actually eventually helping us to get further. For me, that's what I see in the grand scheme of things.

ben parker: 35:25

Okay, cool. Fascinating. And then as data maturity grows, then how can leaders balance automation with real time insights?

mathew paruthickal: 35:36

Today's why I would say use gen ai only on the grounded layer of the semantic layer. Your lineage should be visible and you need to still need to have human in the loop for high impact calls, right? And track and start tracking, what's my value log? Going to look like right when I ask a question, right? Why am I asking the question? Is it because I today I have many systems to ask that question. It takes me like X amount of time. So now with this ai, when I just ask, I just have to go to this one single UI interface where I can just ask and I can get it back with answers, right? And what kind of answers are we delivering, right? So that value log becomes important because you would want to definitely, you all start off with productivity gains and everything, but eventually, and you don't want to. Left with creating a lot of vanity dashboards when every person can create their own custom dashboards, right? Just by asking a question and you start pinning that question to a, what you call as a pin board, that itself is a win for the users because then when I have a say a hundred. Person, team, I'm not having one dashboard that solves all needs. Every person is having their own custom set of questions, which they can, refresh it at a pinboard time. So for me, we are at the golden age right now, right? We have elastic compute from the backend side we have elastic compute, separation of compute versus storage. We have lakehouse concepts, again, which are very asset, which has got asset properties. You can, start tracking every single cell change. And of course, with agentic ai which means that you can have real time. And how do you do real time with governance is what matters in today's agent time.

ben parker: 37:11

Brilliant. Cool. Yeah, you've, you've provided so much insight on how businesses can succeed with their data transformations. So yeah, thank you for joining the podcast. And yeah, it's been a great conversation and obviously wish you all the best in your future career.

mathew paruthickal: 37:29

Thank you so much, Ben, and it was an absolute pleasure talking about it. It's a topic that is so near and dear to me, and I love talking about data and AI and how things have changed from decades ago to where we are right now. And like I told before, we are living in the golden age right now.