Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Tobias Macy interview Paul Blankly and Ryan Janssen about their no-code business intelligence tool, Zenlytic, which aims to enable self-serve BI through conversational technology. They explain that the intersection between large language models and semantic layers is necessary for effective self-serve BI. To achieve this goal, they focus on asking clarifying questions to help end-users articulate what they actually want from the data.

Product

April 16, 2023

Listen to full the podcast here: https://www.dataengineeringpodcast.com/zenlytic-self-serve-business-intelligence-episode-371

‍

Timestamps and full transcript below:

‍

00:11
Speaker 1
Hello and welcome to the data engineering podcast, the show about modern data management. Legacy CDPs charge you a premium to keep your data in a black box. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution. Plus, it gives you more technical controls so you can fully unlock the power of your customer data. Visit dataengineeringpodcast.com. RudderStack today to take control of your customer data.

00:36
Speaker 2
Your host is Tobias Macy, and today I'm interviewing Paul Blankly and Ryan Janssen about Zenlytic, a no code business intelligence tool focused on emerging commerce brands. So, Paul, can you start by introducing yourself?

00:49
Speaker 3
I'm Paul, I'm the co founder and CTO Colin Zenlytic.com. I got started doing math and computer science and undergrad, worked for Roche in their math department for then went on to grad school at Harvard, also in machine learning, which is where Ryan and I met.

01:08
Speaker 2
And Ryan. How about yourself?

01:10
Speaker 4
Yeah. Thanks, Tobias. So glad to be here. My background is I was an engineer in undergrad, and I was an engineer just out of undergrad, but I kind of quickly jumped across to the end user side, I guess. I moved to the UK, became a VC there, and spent a bunch of time in kind of different kind of analytical roles on the other side of the bi equation. I was like a data consumer type. When I went back to the States, that's when I kind of decided to cross the table and knew I wanted to start being a data practitioner again and started that by going back to grad school, which is where Paul and I met, and we kind of worked on everything together since then. That was the start of a beautiful friendship.

01:51
Speaker 2
Going back to you, Paul, do you remember how you first got started working in data?

01:56
Speaker 3
Yeah, so Ryan and I actually started doing data consulting in grad school. That was everything from DD on acquisitions to setting up machine learning fraud models. A big part of what that business became was setting up these analytical stacks that's going into companies, everything from seed stage startups to Fortune 100, and helping them set up their analytical stack, setting up what's effectively called the modern data stack. That was part of the genesis of analytic, where we realized that the tools were just not keeping up with the rate of change that we saw in AI and the underlying just improvements in the data warehouses. I mean, the performance of Snowflake and BigQuery and their performance as they've improved over time is just mind boggling.

02:46
Speaker 4
Yeah, it was quite an exciting place to be, actually. When were doing that work, there was a genesis of several different things. I mean, first were seeing the formation of the modern data stack, right? We're seeing those tools start to form up. We were seeing the capabilities of warehouses get much faster. Were seeing the capabilities of at the time, it was like so that the great grandparents of modern large language models, but Paul and I were studying computational data science and were working with, I guess, what were the small language models at the time. Were watching all those things evolve at a pretty fast rate. I guess that ultimately led us to building Zenlytic. Watching that and watching the pace was just such an exciting place to be.

03:26
Speaker 2
In terms of what you are building at Zenlytic, can you give a bit of an overview about what it is and some of the story behind how you came to this particular problem as the thing that was worth consuming your time and energy and sleep?

03:41
Speaker 3
Absolutely. I would say the main driver was seeing the changes that were seeing in AI coupled with the changes in the modern data stack and the improvements, but just the abysmal lack of adoption inside of the companies were helping on a consulting basis. It was just really hard for end users of these products to use them. They just wouldn't go in. The usage on all these dashboards and on all these bi tools is really bad. It's because no matter how much work you do to try to configure these and set them up and keep things clean, it's just hard to poke around its interface and actually make self serve work. Our whole goal as a company is to make self serve truly possible.

04:28
Speaker 4
We all have been there where it's like you're kind of banging your head against the wall as analytics engineer trying to drive adoption of the tools. Right. It's like, why won't people just use these cool data consumption tools in the end? That's one of the biggest challenges actually working in the field. I guess our belief, or what we learned in our time doing that, is that a lot of the tools for the end users are really built by data nerds. For data nerds. Right. And they're actually fairly sophisticated. Some of them require a lot of them start with a SQL query, for instance. It's just beyond the capabilities of someone who just is more focused on the domain of their job. They don't want to make understanding and using these tools like a full time profession. We just think there's a gap there. We think part of that has been limitations of the tech.

05:17
Speaker 4
Right. I think that a lot of tools have been as self serving as possible as they could be given the limitations of the tech. We saw static dashboards at the start because that's as fast as a warehouse could run. The tech is slowly advanced, but I think it's really been like we're almost talking in weeks now, but it's really like the capabilities that are really being unlocked by the sophistication of what we're seeing with large language models. As we know, the end state for a lot of Bi is that driving that adoption usually ends with an email to the data team and it's like, hey, can you pull this data for me? We know that the end state is actually very conversational. It just happens to be with a data analyst or analytics engineer. Right now we're finally starting to see capabilities with the underlying tech.

06:08
Speaker 4
That interaction can be handled a lot of the time with a large language model. So that gets us really excited.

06:17
Speaker 2
You've mentioned being a consumer of business intelligence tools for various portions of your career. Business intelligence is obviously a very mature and crowded market and has gone through several different generational shifts with different kind of areas of focus. There are business intelligence products for every industry, vertical as well as horizontals. I'm wondering what was your process for kind of identifying the specific problem that you were focused on solving and some of the ways that you can build a new entrant into this market that would actually stand out and be compelling to the audience that you're focused on.

06:57
Speaker 3
It's a great question. I think the big thing that we're focusing on is self serve is that you're able to actually have end users answer their own questions as opposed to, again, like Ryan said, email the data team and the main drivers, in my opinion of what changes the Bi market is when the underlying infrastructure changes. It's like you had OLAP Cubes way back in the day when you couldn't really do much else computationally besides OLAP Cubes. As technology advances, you have Tableau come in with indexes and all of a sudden you're able to index this data in memory in Tableau and have this level of interactivity with it, a higher level of self serve than.

07:39
Speaker 4
Was possible with OLAP Views.

07:41
Speaker 3
You have this data warehouse like Looker and the data warehouses like Snowflake and BigQuery that Looker sits on top of. All of a sudden you're able to explore around and explore from here on a dashboard and actually slice something by something maybe not on the dashboard. A level of interactivity and a level of self serve that wasn't previously possible with Tableau. The main thing that we saw was that the rate of change AI meant that was going to be the next wave, that was going to be the next big underlying change that drives capabilities that were previously not possible in the Bi stack. That's exactly what we're building and what we're taking advantage of to bring that next layer and that next level of self serve.

08:27
Speaker 4
The really important thing is that self serve has always been the goal posts have moved as to what self serve actually means right. I guess the framework that we think about that is like, what can be achieved by someone using data tools who doesn't know SQL or Python or someone who would be very technical. As Paul says, we moved from originally it was just basic dashboards, all their food more dimensional, some exposure to dimensions and metrics that can be accessed in an easy way. Yeah, the next step for us is self evident with this conversational technology. I think there's an interesting discussion to be had as well around what needs to happen in the data stack to enable the LLMs as well. That's something we also feel very passionate about is that these LLMs are very capable of hallucinating and making mistakes. My wife and I were just fooling around with Chat GPT the night and they were asking us our BIOS, or were asking it our BIOS.

09:25
Speaker 4
It gave a great biography of my wife and it talked with the whole paragraph about her time at Goldman Sachs and what she did there and her progression and everything very articulately. My wife has never worked for Goldman Sachs before, so that's really funny. When you're sitting at home playing around with Chat GPT, that's catastrophic. When you're relying on business mission critical reporting, business reporting, you can't just make stuff up. We actually believe that while the large luggage models are great, this like Texas equal approach is not the right approach. We think it's necessary to also really think about the semantic layer as an essential tool for enabling this. It's really like the intersection of these models plus the semantic layer that enable this self serve paradigm.

10:07
Speaker 2
As you've mentioned, what self serve actually constitutes and who that self happens to be has been a very kind of progressive exploration of what's possible, what's practical, what's pragmatic and kind of how much effort a given team is. Willing to put into making that self serve aspect possible from both directions of the engineering effort to constitute the data stack and set up all of the semantic elements to be able to make data exploration possible and understandable. Intuitive for people who don't necessarily have a technical background as well as the amount of effort that the kind of business users will put in to understand those technical and data semantics versus just saying as we've been discussing from a conversational perspective, tell me how my sales are doing, because that's not specific enough. Like, which sales, where, why, how. Even in this conversational mode, there's a level of nuance and kind of background that's necessary for somebody to be able to make effective use of it.

11:10
Speaker 2
I'm wondering, before we get too much into the semantic modeling and the conversational UI, what are the elements that are coming together now that make you think that you're actually going to be able to effectively deliver on, self, serve as everybody thinks it's supposed to mean, versus what we've mutated it to mean? Because this is the limitation of what we could actually do.

11:31
Speaker 3
Absolutely. I think the biggest thing there is actually the conversational component itself because exactly what you said. Right? It's like, how are my sales doing? The right response there isn't actually a plot with some Arbitrarily chosen revenue metric. It's, what do you mean? You've got gross sales, net sales and then net profit. Which sales are you talking about? The person might be like, oh, well, I mean net sales and then, okay, now you can show them the plot. It's like being able to ask those follow up questions and being able to push the end user toward what they're actually trying to get to. That's like a question, like an example that we would see is someone's like, what's my best marketing channel? That's sort of a loaded question. You've got to ask clarifying questions and be able to help the end user who's asking the question articulate what it is they actually want.

12:23
Speaker 3
Because as a person who's received a lot of these emails asking about data, that is part of what you're doing. Someone's asking about churn and you're able to say, do you mean customers, we've completely lost, or just subscriber churn? Just being able to ask those clarifying questions is a really big part of being able to make self serve possible because there's never a one shot answer.

12:45
Speaker 4
And both sides exist for that too. Right. Even if you give it the right definition of turn, maybe you didn't specify you wanted it weekly instead of monthly or whatever. Right. Some of the earliest versions of this latch, when were still experimenting with the tech, before Chat, GPT was a thing, before GPT was a thing at all. The earliest versions of analytic, it started off with a single question, which is like a good start, but as we all know, as data practitioners, when is the last time someone asked you for a quick data poll and it ended in a single turn? Right? Almost never. That's why we're so bullish on chat actually, is like the ability to both understand that nuance and context and also prompt and clarify it on both sides to get versions of the right answer is very powerful.

13:29
Speaker 2
Digging more into that kind of semantic modeling and the kind of upfront investment that's required to be able to actually enable something like a large language model to operate on top of the business context and the kind of domain objects that are necessary to be able to do that data exploration. What are the kind of upfront investments that have to be made before Zenlytic can then actually be used effectively?

13:57
Speaker 3
The setup is going to be easier than looker, but similar in nature right now. You're going to basically set up views that sit on top of tables. You can define metrics one of the key differences between us and Looker is that we don't have explorers. You just define primary and foreign keys and we handle all of the joins basically. That makes the setup easier. The really exciting thing is that we have a recent beta feature that actually lets you have this semantic layer set up, also done by GCT, where you basically say hey, these are the tables that I want to include, go and give me basically a first pass. It's a superpowered copilot for setting up these semantic layers, because they are everyone who's done this before knows it's a pain to maintain them, the pain to set them up. Most of that's just because you're writing this boilerplate LookML or DVD metrics or metrics in the YAML definitions or whatever it is, and it's a pain.

14:55
Speaker 3
We have some AI tooling around making that less of a pain and.

15:00
Speaker 2
Digging more into Zenlytic itself. Can you talk a bit about some of the implementation detail, the architecture that you've had to build, some of the kind of internal data modeling that you've had to do to be able to kind of flexibly map these? Different domain objects into the kind of ML late large language model space and that translation layer to be able to move from the very structured, hierarchical, data driven aspect of what we're actually trying to dig into and the very messy, confused, adaptable, constantly evolving human language space.

15:36
Speaker 3
Totally. It's a great question. The main thing there is being able to have an effective way to serve the context of the semantic layer to the model. Because what the model brings to the equation kind of is think about it. The model brings comprehension, it understands what the user is asking about very well. The semantic layer brings correctness. If you ask for net revenue by marketing channel, it will never give you a wrong answer. It's like the intersection of those is actually giving the model the right context, where the model has the ability to look at the semantic layer and if someone asks for total net revenue over time, it knows about the metric net revenue. It also knows the right date field to choose to trend net revenue over time and it knows how to apply the filter last month to that date field to make sure that the query actually gets executed correctly.

16:27
Speaker 3
Part of what the semantic layer needs to encode is those core gaps that people just speak about colloquially but have to mean explicit things in the data warehouse. Time being one of the most complicated ones and one of the ones that we've invested a lot in making sure that we handle correctly.

16:44
Speaker 4
One important thing to discuss here too, is that what we're trying to do, the important part of the semantic layer design is getting to the right primitives and the primitives for actual like end user consumption, I guess, right? Essentially, those semantic layers are like it's a translation between data terms and user understandable business terms. If you get those primitives right, then it becomes much easier for the language models to understand it as well, right? Those are the stake in the sand, which is where a lot of the magic happens, I think. We've collectively, as data folks, we've kind of gotten pretty close to discovering what those right primitives look like. And they're fairly consistent across tools, right? Like metrics and dimensions and filters and those primitives are both flexible enough to handle the necessary cases but also understandable to the end user and powerful. So it's a good combination.

17:38
Speaker 4
I guess one area where our approach differs slightly is that we tend to avoid the use of policy, either Explorer or, like, a Data Mart or some end consumption table in which those cases where if there's a separate metric and if there's a separate sales metric and three different Data Marts, that can also be confusing. Right? There's five data marks that have sales in them. That's confusing even if you're a human trying to use semantic layers. I think one thing that we endeavor to do is actually abstract that part out of the consumption layer. You don't need to choose a data mart or some set of tables or whatever. You just need to deal with your own metrics, basically. I think that actually is one of the things that we've been, again, as data people have been kind of just exploring and flirting with for a while.

18:25
Speaker 4
It's like we're finally starting to put that behind the scenes, basically, so that the end users don't need to focus on it.

18:33
Speaker 2
As far as the kind of evolution of your work and the corresponding evolution of the supporting technologies. I'm wondering how your understanding and implementation of semantic layers has evolved and some of the necessary substrate that you've had to put in to be able to have an effective kind of representation of those domain semantics so that users can create their own mappings, create their own business objects, but do it in a way that is kind of maintainable from your side without being a spaghetti mess.

19:10
Speaker 3
I think the first part of that is not having a concept like Explorers, because having a concept that inherently makes you duplicate, joins and stuff leads to code that's into code and a semantic layer that's pretty just hard to maintain overall. Not having that concept is super helpful. I think on the Iterative side, there's for sure been things that we've realized as we've progressed, like, hey, there's a real need for people to be able to do X. Like a good example is a lot of the times you have metrics, actually, that span multiple tables that cannot be joined together you have to basically run those queries and then merge the results at the end and then create some metric based on the merging of those results. That's something that we actually built in to the semantics layer to give it the ability to be able to do that.

19:59
Speaker 3
From the end users perspective, they can just look at number of sessions and inventory over time and there's no way to join those tables together. You basically just have to aggregate them up by date and merge the results at the end. A feature like that is something that we initially weren't sure that should be something or that we would need to include in the semantic layer. There's a huge amount of demand for that ability and just making it as easy as click instead of figuring out a whole merge results kind of interface.

20:32
Speaker 4
The other big philosophy I think, which is just over and scope, it's obvious, but I don't think we do it all the time as data folks is just dryness. We endeavor to use software engineering best principles and in most cases we do, but spaghetti dags, one thing that they all have in common is that they generally get pretty wet, right? There's lots of repeated code and there's lots of whatever things are inheriting in multiple places. That's a hard problem for data folks because it's a lot harder to make a dag dry than an application dry. I think the tools are still evolving in that direction. That's something that we've always kind of put a pin in as like all right, we want this to be as dry as possible. That's actually kind of been like a North Star for us in design of everything and I think that simplifies things a lot.

21:23
Speaker 4
I think there's still a long way to go for us and for all tools to really achieve that. I think as a guiding principle for simplifying the spaghetti, that's probably a good first step.

21:32
Speaker 2
The other interesting element of this space is the rapid evolution of large language models. As you mentioned when you first started, this GPT was either not a thing yet or just barely becoming a thing. Now we're on to Chat GPT 4.5, I think, and obviously there have been kind of exponential leaps and bounds in terms of their capabilities as well as their capability to, as you said, hallucinate in quite detailed fashion and sometimes quite convincingly. I'm wondering how that has impacted your overall kind of product approach and some of the additional testing and validation that you've had to do as these language models have become more sophisticated and potentially more problematic.

22:17
Speaker 3
Totally. I think that's one of the best advantages we get from the semantic layer, actually, because I would be so worried if we had GPT Four generating SQL statements because it could be wrong in such sophisticated, like, impossible to detect ways. Since we're referencing a semantic layer, if it just comes up with a metric that doesn't exist, the semantic layer can say, hey, that doesn't exist. Error method. You asked about something that doesn't exist. It's like that hallucination just ends up being like a simple error as opposed to something that's like catastrophically bad for the business.

22:52
Speaker 4
Yeah. I would say that our approach here especially, I mean, things are changing so fast and no one knows where they're going to land or like we're talking day by day things are developing, but I think that our approach is plan for the worst and hope for the best. I will say that we're building for current tech, so nothing in Zanalytic requires an increase in GPT's comprehension capabilities. GPD Five would be nice, but it's built to work with GPD Four and stuff that's already been published. Basically, if they turned off the thing that gets me excited is that they turned off the development of all large language model technology today. They did more than a six month pause. They did a six year pause on that. Even as a software organization, I feel like Copilot and GPT have made us as a team. What would you say, Paul?

23:43
Speaker 4
At least 25% more productive, right? That's just tools that are in the market. There's real world use cases happening right now and we're building for the tech that uses those. We're building real use cases that uses the real world tech. If things get better, faster, smarter, that's really decising in the cake for us.

24:01
Speaker 3
The other thing is, I'd add is that it's enabled features that I personally would not have even dreamed of being possible. Like being able to just point at a bunch of tables and figure out with no actual keys, like, okay, what are reasonable joins, what are reasonable metrics, what makes sense here? Just being able to have a really good first pass. I mean, it's never perfect, but it's like having the developer tools to be able to just go in and have most of your work done and you just tweak a few things. I never would have even thought that was possible. The capabilities of the underlying LLMs are just so strong that we're able to build features that I wouldn't have even dreamed of before. Like chat. GPT came out.

24:47
Speaker 4
I think I've obviously said a lot of hype right now, but I think people are still underappreciating the second order effects of this tech. In traditional software, for instance, you see people tweeting all the time where it's like, yeah, I'm an indie hacker, I've always wanted to build these different tools, but I couldn't quite do it with my capabilities. With GPT, this would have been impossible for me now and impossible for before. Now I can do it in a morning. It's like, it's just unlocking all sorts of opportunities for innovation elsewhere. Just by being such a good coder, for instance, I think we're going to see a lot of similar second order effects that happen when you can make that really fast build, measure learning loop possible with great substance. I'm just excited about that in general.

25:31
Speaker 2
In terms of actually onboarding onto Zenlytic, I'm wondering if you can talk through kind of what are the expectations that you have on the customer side as far as do they have a data warehouse? Are they brand new, they have no idea what they're doing? What are the kind of technical capabilities, system capabilities that you're expecting? The process of getting somebody onboarded and integrated with Zenlytic to be able to then start actually exploring their data and asking questions.

26:00
Speaker 3
All you basically need are credentials for a data warehouse. So we do rely. We support all the major data warehouses. If you come with credentials for a data warehouse, you can basically put those credentials in and click this AI deploy button we've been talking about and make a few tweaks to make sure that things are customized, how you need them to be, and you're off to the races. So the implementation comparatively is quite easy. You just come with database credentials.

26:29
Speaker 2
In terms of database credentials, are you kind of largely assuming that somebody is just a single application? This is their line of business product. You just want to tap into their application database, that's all. Denormalized transactional? Or are you expecting we've got everything in our data warehouse, or is it just yes?

26:50
Speaker 3
Either of those expecting a data warehouse. Yeah, definitely expecting data warehouse. You can have any data from all these sources loaded in there, but anything that's not in that one data warehouse is going to not be joinable with the other stuff that's in a different area. If you wanted to do something like that, you'd have to use some distributed query system, like a starburst or something like that, which is not something we are building.

27:14
Speaker 4
I will say it's cool though. One of the things we haven't really talked about, I keep coming back to Lmtl, I'm so excited about it. One of the neat things is that with a lot of the set of text and SQL stuff we're seeing, if you read between the lines, it's a toy example, right? It's always a couple of tables, usually one, sometimes a couple of tables that are very small both in rows and columns. The semantic layer makes it possible to actually make queries across an entire data warehouse of arbitrary size, right? Doesn't matter how complex the tables are or how much data is in there or how many metrics you have, those are all defined in advance and you can actually have bulletproof confidence in joining and manipulating that data using the semantic layer. That's why we're only aware of Centric, so we don't do any.

27:57
Speaker 4
Sort of application layer, direct connectivity.

28:00
Speaker 2
Once somebody is set up, they're using Zenlytic, doing data exploration. I'm wondering if you can talk through some of the typical workflows and in particular, some of the collaboration aspects of being able to work across from data teams to operational teams, across operational teams, within operational teams. Some of the ways that data exploration can be kind of made visible across those different roles.

28:26
Speaker 3
I think the biggest collaboration opportunity actually isn't in our interface or in anyone's interface at all. It's where the team already is. For a lot of teams that's in Slack, so we have a deep integration with Slack where you can just be in a channel and be ads and litigate. Like, how has this campaign been doing in the last week? Boom, you get answer in the thread right there. One of the things that we've focused on a lot in collaboration is bringing the data to where people already are. Because for a person who's in the flow trying to run a new campaign, trying to send out a new email newsletter or something, they don't want to go and log into some other tool, no matter how easy it is to use. They want to just stay right where they are and just get the answer they need.

29:13
Speaker 3
That's one of the big things we focus on collaboration wise, taking data to where people already are.

29:19
Speaker 4
This is a marketing story. I remember a friend of mine who was a marketer said that a lot of bad marketers assume. People wake up and say, oh man, I wonder what Coca Cola is doing today. In a way, we're kind of collectively guilty of that with bi tools as well. We assume that people are enthusiastic about logging in and configuring and finding the right stuff in that tool, when in fact they're incredibly distracted with the day to day and they just want to get their data right. A big part of that collaboration is going to where they are. Where they are happens to be the place where they all are, so they can collaborate together there.

29:55
Speaker 2
In terms of your engineering effort of building Zenlytic, I'm wondering if you can talk through kind of the proportional effort of dealing with large language models, how that gets deployed into your infrastructure versus managing the kind of semantic layers and helping to expose that to users and deal with some of the kind of edge cases around that or the kind of UI and chat interface to it. What are the areas that have taken the most time versus what your expectations were going into this?

30:26
Speaker 3
I was going to say the most surprising one was how fast were able to deploy a lot of the large language model stuff. Especially once you figure out the tooling and you become comfortable with writing. Code and writing code in a way that works well with large language models, which is of a different skill than writing more deterministic programs. Once you become familiar with that, you can push new features and improvements really fast. That's been surprising to me, just how fast that process has been.

30:56
Speaker 4
Yeah, I would say the green light is deploying language tech, the yellow light is the semantic layer tech, which is led by Paul and our team. A big part of that is like Paul just going mad scientist and locking himself in a room for a week and making a big adjustment to this very sophisticated semantic layering tech. But that's the yellow light. I think actually the hardest part, the red light is probably the user interface design for the Bi tool, I would say. We haven't really talked about the fact that Xanalytic is also a fully featured Bi tool as well outside of the chat. First, just building a Bi tool in general takes a tremendous amount of thought and iteration. On the UI side. You're taking a very complex thing with a ton of boundary conditions, and you're taking a very murky use case with a ton of boundary conditions, and you're kind of the glue that holds those together.

31:45
Speaker 4
You have to be simple enough so that the end user will understand, as per our self served pig, So UI design is difficult in general. We also have the added difficulty of how does that mesh with what people will use or how people will use bi next, and how does that mesh with the chat and how do you make it jump back and forth between the chat and it's kind of challenges on challenges. I'd say in the development of Zen Linux, we've probably spent more time on making sure that is really elegant and easy to use than probably just about anything else. Is that fair?

32:17
Speaker 3
Paul yeah. It's just so hard to make an interface take something that's really complex, like a large data warehouse with a bunch of complicated metrics, and make that as palatable and as easy as possible, where you can just if you're a non technical user, just click in, click a few things, and it works how you expect it to. It's shockingly hard to make it just work, basically.

32:40
Speaker 4
Actually so when I was investing as a VC, I come from an era actually when mobile was actually the big thing and the companies were winning and losing based primarily on how usable their interface was. Right. It was the same thing were inventing new designs for mobile and it had to be very simple and straightforward and that was actually an act as a competition. We've always tried to carry that design Litig and keep things simple and understandable as Zen. It's also just a million times harder because you're dealing with such an inherently complex thing like data. It's been an interesting but very fun.

33:12
Speaker 2
Challenge in terms of that kind of business intelligence, user interaction design, being able to have that chat interface so that it's exploratory, but also the more kind of structured. Here's a dashboard, here's a set of charts, here's a way to actually dig deeper. Once you've gotten to a starting point from your conversational aspect, particularly given the focus on self serve as the default operational mode, what are some of the context clues and kind of guardrails that you've built in to be? Able to guide people into the pit of success, as it were, in their process of exploring data where it's tell me all about all my sales in North Dakota, and then saying, okay, well, what do you mean by sales? Do you mean gross revenue? Do you mean adjusted revenue? Do you mean total just units sold? Once you get to some visualization, giving them the context clues and then particularly given that conversational kind of lead in, are there ways that you're able to pull out the useful kind of semantic elements?

34:15
Speaker 2
From that conversation to then add those as labels or context clues within the graph to be able to say, okay, based on what you're asking, these are the axes that we're going to present to you. These are the kind of grains that we're going to use for you to be able to dig deeper.

34:28
Speaker 3
Yeah, no, I think that's one of the big things and one feature we have is the summarization. It's like as you go at asking questions, you'll see both the plot and then a summary of what's actually going on. What's the thing you asked about specifically? If you ask about a spike over the holidays in traffic or something like that, then that might show you the line with the spike and then it would also draw to your attention, hey, this is the spike, this is 75% higher than the previous day, or something like that. And it helps actually make that understandable. The other thing that does is that carries that information in the conversation. As you keep asking questions like you were mentioning, if you referenced something that was kind of up there, it's still able to be aware of that and to use that as it are answering your subsequent questions.

35:23
Speaker 4
A couple of neat things, just two quick things that came away from some of our user testing throughout our design process. What Paul's alluding to I think is that I think the right approach is like concise in the inputs, verbose in the outputs is the way I think about it. We found that people like to see more on the way up. It's like you don't want people to be lost basically. Right? Over communicating is generally good on a feedback basis and then the input basis, people tend to be, again, busy, distracted, and you want to make that as simple and as clean as possible. One great example of that is early on in our development, we realized people love to like paw over their visualizations. It's funny because this is actually one of those limitations of tools has shaped the interaction. It's like every bi tool has a visualization library that has its own way of handling clicks and stuff and it rarely gets translated into very dynamic interactive plots.

36:22
Speaker 4
We found just watching people click on stuff way more than you'd expect. Actually, our way to address that was with lots and lots of context menus and very smart context menus where people would click on a bar and it's like, oh, so yes, of course there's a drill option or things like that filter here. You see that? We've gotten that far. We've seen people, they take a line chart and they kind of drag over it to see what happens. We pop up a special context menu when that happens, saying, do you want to zoom in here? Do you want to explain this particular change and make sure that all of the steps there are highly contextual. Whether you're doing it in a gui or in a chat, it's treated like a conversation that guides you to the next most logical question. You try and anticipate the next steps in that conversation.

37:04
Speaker 2
In terms of your experience of building Zenlytic, working with your customers, using it internally to build Zenlytic itself, what are some of the most interesting or innovative or unexpected ways that you've seen it used?

37:18
Speaker 3
I think one of the most unexpected ways for me was in one of our customers was using the Explain the Change functionality, which usually we've seen people use it's like, oh, there's a spike. Why did that thing spike? There's a dip. Why did that thing dip? They were actually using it on just a flat, no real change, like week over week sales chart. The reason was that they wanted to be able to see the breakout that inside of even a week or a month where there's no real major change in the top line number. There's all these reps that did really well, reps that did poorly, reps that didn't do, there's all these changes within the reps, and they were able to zoom in and see, okay, so these customers for this rep aren't actually that healthy. Despite the top line number not looking like it moves very much.

38:04
Speaker 3
That was fset by this other new customer we acquired that went to this rep that did really well. They were able to find these drivers of how they can actually improve sales performance and sales efficiency for their reps even when there's not like a major change month to month in their sales numbers. So that was surprising for me. I did not anticipate people using it on flat charts.

38:27
Speaker 2
Anything else to add?

38:30
Speaker 4
I remember that example, they use the word, it's like zooming in. It's like in Biology 201 or whatever, when they give you a microscope and you look at the little slide of water and there's all these little parameciums or whatever, and you're like, whoa. There's all sorts of stuff happening just under the surface. Even though we call it explain change, it's funny to be used in a flat context.

38:50
Speaker 2
Given the fact that you're using these large language models as an operational component, what are some of the most interesting or unexpected kind of edge cases or weird behaviors that you've seen or had to kind of retune around?

39:07
Speaker 3
Yeah, I'd say one of the big things is just how much even how much the direction matters. Just like when you're actually doing the prompt engineering to get these things to work, and then how much work you have to spend to handle the hallucinations, to handle the weird scenarios where it goes off the rails or it doesn't quite do what you expect it to do. Just going through and being able to handle all of those is a different type of programming, kind of. That's been really interesting because it will go off the rails sometimes and just make stuff up, format things in ways that you're like, where did you get this from? So handling those is tricky.

39:54
Speaker 4
One of the most interesting things about doing with LLMs is you have to throw away a lot of what about conventional programming. I'll give you one great example of that, is obviously a big part of bi is called search like functionality, for instance. In our early prototypes and experiments, the search worked okay until we realized you have to move away from actual search to semantic search, right? Instead of searching for a dimension name, you have to search for sliced by this dimension. In doing that actually gives you a meaningful representation of the way that the LLM thinks. There's a whole new paradigm in dealing with these things that kind of defies a lot of conventional logic when it comes to programming.

40:41
Speaker 2
In your own experience of building up Zenlytic, growing the business, working with customers, exploring the problem space, and kind of understanding the art of the possible versus what is actually never going to happen, what are some of the most interesting or unexpected or challenging lessons you've learned?

40:58
Speaker 3
Personally, I would say it's a slog to build a bi tool. There's so many small features, permissioning, like all the roles, and just there's so many features in there to build. So it's a lot, it's a slog.

41:15
Speaker 4
There's like an old joke in VC where it's like, don't build an ERP tool. It's like building like a second railroad next to an existing road. There's so much that goes into it. I would actually extend Bi to that category as well. Some days I get jealous of people who are just building prod apps where they have a single structure, well structured table that they're kind of reading and writing to, and a bi tool is a whole other beast for sure. In that regard. There's a lot of different edge cases and there's a lot of considerations and they're all very closely related. Again, same thing. You want to change a feature for reasons of interface design, that's good. Also how is that going to impact governance? How does that impact composability the semantic layer? And everything sort of links together. It's a challenging, it's a challenging build, but it's really rewarding when it comes to gift.

42:02
Speaker 2
For people who are looking for that Holy Grail of self serve business intelligence, what are the cases where Zenlytic is the wrong choice?

42:12
Speaker 3
I feel like the first one is if you're fine with the off the shelf tools, if those get the job done, if you're not feeling that much pain from this. For instance, if you have a shopify dashboard and Google Analytics, if that's getting the job done for you or close enough to getting the job done, then there's no reason to pay for the extra tools. There's no reason to use us, basically. The other one that I'd add too is if you're writing Python, if you're training in all models, if you're using Pandas for this kind of stuff, we're also not the tool for that. You should use a notebook like Jupyter Hex or one of the notebook solutions.

42:50
Speaker 2
Out there as you are continuing to iterate on and grow and evolve analytic. What are some of the things you have planned for the near to medium term or kind of particular projects that you're excited to dig into or evolutions in the LLM and semantic space that you're keeping an eye on?

43:09
Speaker 3
I think one of the most exciting ones for me is more and more tooling around making management of the semantic layer just super easy. I think that's one of the biggest kind of gaps now. That's one thing that we'll be pushing a lot of features in. And one thing I'm super excited about.

43:26
Speaker 4
When I have chats with other data practitioners, I often find myself going back to what I say is like the biggest problem in the modern data stack today, which is it's legitimately more work to maintain and build these tools. Right. The rise of analytics engineering is a profession is because of that added complexity of the tools and there's tremendous benefits, right? Of course you get powerful customization flexibility, you get everything just exactly the way the business needs it. There are also trade offs and with those trade offs, there's actually rationales for those off the shelf tools that Paul alluded to, right? There is a time and place for those. The modern data stack does not dominate those because those are deployable in a few clicks and you can have decent analytics with less investment. I have a hunch that over the past few years we've built up all this incredible power and complexity.

44:24
Speaker 4
I think that the next step for the modern data stack is we're going to start to invert that pyramid and find ways to streamline this and make it easier to maintain, easier to deploy, easy to build, and I'm guessing I don't exactly know what that is. If it's more use of templates, I think it's deploying better use of AI to automate what's possible to automate with the construction of a data configuration. I think there's lots of ways we can go with it, but I think that going forward, this is kind of the watershed moment is when we increase the accessibility of these tools as well.

44:55
Speaker 2
Are there any other aspects of the work that you're doing at Zenlytic? This overall space of self service, business intelligence, the impact and opportunities for semantic layer improvements, the use of large language models, and kind of AI more generally in the path of data generation and consumption that we didn't discuss yet that you'd like to cover before we close out the show?

45:19
Speaker 3
The only thing I'd say is that it's really the intersection of LLM and the semantic layer that make this possible. It doesn't get done with text to SQL. The hallucination problem is just too much. Even if it's not hallucinating, these definitions vary from company to company. The semantic layer by itself is just too complicated for end user like business users to navigate. You really need the combination of LLMs and the semantic layer to make self serve truly possible.

45:49
Speaker 2
All right, well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. As the final question, I'd like to get your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management today.

46:05
Speaker 3
I would wholeheartedly agree with what Ryan was saying about just how complicated it is to manage a semantic layer. I think we're still in the early innings of finding out what good looks like there. But I completely agree with Ryan. It's one of the biggest problems overall from managing data.

46:24
Speaker 4
Yeah, I'll revisit that and just like, yeah, I call that the most important problem in Monday stack. I think there's a lot of really talented people who are working to address that right now as well. So I think that'll be quickly solved. The second thing which I think is important is going back to Usability and self serve. We all know Looker coined was it like the data breadlines as a term? That's still a very real pain. We know that end users want to be more data driven in the way that they think and act. I feel like they have not been given due consideration in the modern data stack yet. I think a lot of tools have been built by data people for data people, whether it's that's also why we have all sorts of observability tools, that's a very important problem. For some reason we haven't really focused on the needs of how people are going to use this data to improve the way that they do through drops.

47:16
Speaker 4
So I think addressing self is important. I think right now, if you're a non technical end user, your choices are to spend a bunch of time grogging around in a spreadsheet, which is time consuming and very brittle and very consistent, or to ask a data team, which is quite often a bunch of iterations that take so long that your question is a rearview mirror and not really actionable. Or the third situation, which is probably the most common one, is you just don't use data. You go finger in the air and you say, yeah, I think it's probably this, and I think that if you're doing that, you're missing out on tremendous opportunities to better at your job. I'd say that's the second most important problem is making sure that self served users or the end user is actually getting access to the data and analytics that they need.

48:07
Speaker 2
Well, thank you both very much for taking the time today to join me and share the work that you're doing at Zenlytic. It's definitely a very interesting product, interesting problem space. It's great to see some of the ways that AI is kind of coming full circle where it used to be. It was just the thing that you did at the end and now it's actually part of the beginning work. It's definitely exciting to see kind of how that evolves. Appreciate the time and energy you're putting into that and I hope you enjoy the rest of your day.

48:30
Speaker 3
Awesome. Thanks for that.

48:38
Speaker 1
Thank you for listening. Don't forget to check out our other Shows Podcast init which covers the Python language, its community and the innovative ways it is being used. The Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site@dataengineeringpodcast.com to subscribe to the show. Sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com with your story and to help other people find the show, please leave a review on Apple Pod Cast and tell your friends and coworkers.

‍

Want to see how Zenlytic can make sense of all of your data?

get a demo

AI data analysis for all.

Get a demo