speaker-0 (00:07)
Welcome back to Adventures in DevOps. Every episode is a deep dive into a specific topic with an expert guest. Normally this isn't a show where we ask for feedback about using specific tools. However, I think having at least a few different in-depth perspectives are critical to understanding if and how the software industry is changing. And from our episode where we tear apart the Dora 2025 and give it a new one, we know how poorly using LLMs can go.

So for the expert this week, we brought in long time UX director, principal designer, and now UX consultant, Matt Edmonds. Welcome to the show. Thanks.

speaker-1 (00:44)
Thanks for nice to be here. Thanks for having me.

speaker-0 (00:47)
Yeah,

really got a narrow like really guest titles can be quite challenging sometimes

speaker-1 (00:52)
I am a very varied person. So I've had a lot of different titles over my career, be fair. it's not to be. It's not an easy challenge.

speaker-0 (01:02)
That's a new title and role I haven't heard before, A ⁓ LinkedIn profile just right at the top there, your headline, varied person.

speaker-1 (01:10)
Actually, maybe I'll change it. I think that there's a lot of people that have you know ⁓ What is it they puff their chests so to speak and I've never been that kind of person anyway ⁓ So I think I think that goes back to my hiring when I when I was managing a bunch of teams back in the day You know, it's always about diversity. It's always about you know How do we get the the most varied group of people that I can because I think that's where you build the best teams and it wasn't about like I want to have You know 12 experts. No, I want to have people that

that have different interests, that have different experiences. then I think that's, you know, so to be fair, a varied person to me is way more interesting than a non-varied person. You know, in the changing landscape that is, you know, how we treat people and handle people, I think it's challenging because there's, if you really want to be type, you know, it's easy from a hiring profile to typecast somebody because then you can put them in a bucket. And the reality is people aren't bucketable in that sense, right?

We try because we're like, how do I do this? How do I do that? And I think it's a real challenge. think, honestly, I think hiring is one of the biggest challenges that businesses have and in hiring well, You know, having the rubric. And I think that's where, you know, from an AI perspective, you know, the first space that started seeing this even three or four years ago was like, how do I parse someone's resume? Because rather than having a human do it, because humans are just infallible, know, fallible and

You know, I can't trust a human to value someone's performance. Exactly. Like I might as well just give it to a machine because the machine obviously is just going to do it perfectly. And you're like, well, I don't, I don't know if that's true. And I think that a lot of cases, it also, it allows for the human in that process to not clearly articulate what it is they're looking for. Cause they're like, ⁓ I'll, I'll find it when I see it. Right. You the amount of times that I did that.

When I was working with other groups trying to hire people, right? And you're trying to figure out like, okay, how do you build out a team? How do you a startup be like, okay, what's the first UX person or what's the first kind of design thinking person you have? it a CPO? Is it just a kind of a, know, a UX person that's moonlighting? it, you know, what are you doing? And in reality, most businesses don't necessarily know what they need yet. And it's this kind of like, oh, well, I'll know when I see it. That doesn't work with AI. That doesn't work with something because the AI has now got

this varied set of directives and all the people that are probably the diamonds that are rough, that are the varied person, that are the person you probably actually want in that situation are going to get called out because they don't have that like Einstein DaVinci perfect 10 10, you know, and they cost no money. And you're like, okay, that's like, there's always going to be some sort of trade off. The AI is always trying to find a perfect solution.

speaker-0 (03:58)
No, that's a good point.

speaker-1 (03:59)
And what you end up with is just something that's just not.

speaker-0 (04:03)
Yeah, so obviously it came into play with the applicant tracking systems, the ATS, which you needed to, you didn't want to look at the paper resumes or what was ever uploaded as a PDF, but rather extract that data. And then that very quickly got into not only can we automatically extract it. of course the OCR process or whatever you're using to pull out the labels was always atrocious to begin with. But you know, bring up a really good point about the people that you want to hire. It's almost like if you use a system where you're, it's computational bound.

⁓ where you're very terminalistically figuring out what candidates make sense based off of just what you already believe, you're only ever going to get people that exactly match your template, which means you're losing out on the unknown quality in these systems because that's especially the individual area where ⁓ LLMs just completely fail in a lot of ways. There's no creativity there. ⁓ And I think the hiring question is really interesting, and maybe we'll get back to this later in the episode.

⁓ But one of reasons I really wanted to get you in here is because you don't have a lot of software engineering like hands on background. If I understand like maybe like one internship 20 years ago or something.

speaker-1 (05:14)
I did, back before the jQuery days, ⁓ I did JavaScript a little bit and had some jQuery experience back when jQuery was like the cool hotness, right? ⁓ And then did some other kind of template-based language stuff, building out some things from a SaaS provider perspective, because basically just to make changes to the configuration visual style of some things we were driving on. But that's really my development background. It was beyond CSS and building websites in the 90s and things like that when you're, you

You know, you're moving up in the world. ⁓ But as far as like actual programming languages beyond playing with some open source projects like Drupal, which I did for a long time and kind of learning some PHP from that. ⁓ It's mostly been hobbyist and just generally technically aware, which has allowed me to work with development teams because I know kind of what's going on. But I've never, if you gave me a blank piece of paper and said, Hey, go do this or go to this code challenge. I'd be like, yeah, good luck. You yeah, that's not, you know.

It's not my strong suit. That's not what I'm trying to play. Very clearly, like that's not where I want to be.

speaker-0 (06:20)
So one of the things that really interested me in getting you on for this episode is you had spent a recent period of time actually invested in, I'll call it vibe coding. And you said, mm-hmm, like that's exactly how you would describe your activity there. And one of the things that keeps on coming to my mind is who are best at

who are best at utilizing LLMs to generate code? Because so far what I hear, especially from senior staff plus principle engineers, is it's not software developers that are getting the most value out of this. And there's a question of like, okay, if it's not them, like why is that? And who would benefit the most? Who does get the most value out of it? And one of the things that comes to my mind is, is it someone in more of the product space?

in the UX space that would be. And so I really would like to get your perspective of like first thoughts about it or what you've been doing and how that's turned out so far.

speaker-1 (07:19)
Yeah, I think so. So I started doing this, ⁓ granted to be fair, I was playing around with generative AI stuff and some LLM stuff several years ago, right? You know, you know, open AI and, that kind of thing. And then some like LLM studio, like some local stuff, you know, back when it was like, you know, three tokens a day. And you're like, okay, this is not moving at the speed of which that I have any interest in trying to cultivate. And it was generally speaking, playing around with different, ⁓

different stuff from a gender of AI perspective, just to kind of see like, okay, what other things can I create? I'm an artist anyway, but I've always taken an approach that like these things are tools to me and how do I ideate? How do I come up with something that's interesting to me? It's not gonna take away the joy I get for making my own art, right? You that's never how I've seen it. I've never seen it as like a stealing of somebody else's present, you know. I'm doing these things for my own purposes. If somebody else wants to get joy out of doing it some other way, I'm not gonna take that from them. Like that's fine. And in a lot of ways, I think from an AI,

kind of LLM vibe coding, the new kind of world in the last 18 months has been this idea that like you've got the Spider-Man meme, right? Everybody's pointing at everybody else like, it does their job better. And I think most of the time when people come to saying, hey, there's other people that are getting value out of this other than software engineers is because software engineers can look at the code that's being generated and it doesn't raise to their level of standards, right? But at the same time, a software engineer

can use it to generate a marketing website, right? And it's good enough for them. But the reality is a marketing person and generating a marketing website, it's not going to be good enough for them because their standards are different, right? So you end up with this world where like everybody's like, yeah, I don't like it to do this. I don't like it to do that. A good example to me is I want to say a couple of months ago, ⁓ I'm trying to remember Matt's full name, but the guy that runs the oatmeal, it's a web comic. He's done a bunch of different stuff. And he came up with a big post that

You know, frankly schools forever, which is to be fair, has really solid points in it, but is basically saying like, I don't ever want to use AI for anything I do. I don't like it for art. I don't like it for all these other things. And then towards the end of the post, he kind of says, yeah, but I use AI every day to do X, Y, and Z. And I'm like, that feels hypocritical to me because you're saying it's not good enough for your role, which is fine. But you know more about your role and what you're trying to accomplish. But if you're saying it's not good enough for somebody else's role.

You're not in that role. You don't know that person. You don't know what that person's doing. You don't know what those roles are. So to me, from a vibe coding perspective, I started doing this before and even knowing what the term vibe coding was. And then kind of fell into that last summer was like, wait, this is what people are talking when they do vibe coding, which is basically just trying to one shot from a single prompt. That's literally a paragraph long and think they're going to get the perfect desired result. It the reality is doesn't work like that. But I think that.

What I've learned is been fascinating because I've gone through this process of like trying to figure out like, okay, which of the big foundational models work in certain ways, right? Like how does Codex work differently from a coding perspective or OpenAI work differently, you know, from a coding perspective than in Chetchi PT 5.1, 5.2, 5.3 now, it keeps going. Versus what is Anthropic doing, right? Versus what's Gemini doing? How the different context models work, you know, how the different reasoning models work and what you can do differently with.

⁓ and what I've landed on is I like the way that Anthropic does some things, as far as how the NGEN tech model reasons, because I'm a plan person, right? And I think that comes back to like how I've always built software with teams. I'm never just like go do a thing, you know, unless I'm trying to explore, in which case I think that that's where all the AI models actually frankly do a really nice job. It's like, I have a general idea. What can I do with it? Right. And, and, and is this even accomplishable? Is this even doable?

And I think it opens up knowledge to people. And I think that the thing is the most interesting to me. A week ago, I had an issue with my mic monitoring situation, you know, and was talking to Claude and was like, Hey, can we build a low leveled virtual audio driver? Because I can't get one from my silicon Mac. And I want to be able to do this one problem. And I don't want to spend a hundred bucks on some software that does way more than I need to do in 30 minutes. I had a backend demon running that uses five megs of Ram.

that is an on-demand virtual driver that does exactly what I need to do. I never would have written a hardware driver in my life. Right? I still frankly didn't, but I solved my problem. Right?

speaker-0 (11:48)
How did you let's let's walk through that. think that would be interesting like ⁓ like what model were using. How did you prompt it? How did how were you actually testing and validating it?

speaker-1 (11:57)
Yeah. So, so the, the problem that I had and the problem I have right now is back during COVID, I decided to buy a nice mic and decided to go with, you know, an XLR situate. And back in the day when you're buying stuff off of the, you know, COVID firing sale, you get where you can get. And at the time I bought a FocusRate Scarlett, which is kind of like a regular mic interface with an XLR input. But the one that I bought had two inputs, right?

And it just so happens that on the Mac, I think also on the PC, but the way that this one handles it, the first input goes to left input. The second input goes to the right input. Most zoom, you know, chats, Google meets or whatever will, will duplicate that input and know that, okay, this left input really should go to both. So you hear the audio from both sets of, you know, what you're, whatever you're being recorded. If you're doing local recording now, the local recording just says, Oh, all right, I've only got audio coming out of the left input. I'm just going to record the left input.

I'm like, but it's a mic, but this mic doesn't have this audio interface doesn't have the ability to just set this to motto to duplicate that channel. Right. And you can get some virtual software. There's a long story. Like, you know, there's some other stuff that goes back and Intel based max that existed from an open source perspective. That just doesn't exist anymore. Somebody decided, Hey, I'm not going to put the effort into recompiling this for silicone in Apple silicone. I don't really want to deal with it. And there's a couple other software solutions that are out there. ⁓ And there's a couple other kind of like higher level systems that run in the background.

Um, but I was like, why am I going to run and figure out so they're scripting stuff out or spend a hundred bucks on this? If all you need to do is this one problem so that can get local recording when I'm working with clients, for example, and I'm recording something, I don't want to have to go into Adobe premiere or whatever my editing software is and, know, and duplicate the audio. Um, I don't want to have to run OBS or some other streaming software, um, to just do this thing. I just want to be able to just natively record it. One shot, the recording.

re-recorded if it screws up or just kind of keep going, right? You know, flow of thought, you know, kind of flow state stuff. And I was like, wait a second, why don't I just ask Claude if I can do this? And here's, and here's my pitch, but the prompt was basically, and what I always start with most of my prompts with, with any sort of AI is I take a question model, right? I don't tell it exactly what I want to do. I kind of ask it what it thinks is possible and don't necessarily show my hand.

Because I don't want to influence the model in agreeing with me because they're all overconfident. If you ask it, if it can create anything, it'll be like, yeah, totally. I'm like, how long will it take? Like, well, it'll take 752 days of which also most of the models, their concept of time is ridiculous to me. It's hilarious. Right. This one phase will take four days and you say, okay, let's go do that. And then it takes 25 minutes and it's done those four days of work because it has no idea of how to handle that.

But it comes back with, with a driver that I can run and it looks good. My biggest issue was like, right, is there a memory leak, right? How long can I run this thing? Is it, you know, going to continue to stay at this five megs or all of a sudden in a day, if it's just sitting here, is it like, you know, 42 gigs and I'm like, a second. This is a problem, right? It's written in C, which I can read a little bit, but not well. ⁓ and to be fair, it works.

And so that was like, that's enough to me. It's running locally. I'm not worried about security from that perspective. Most of the things that I've been vibe coding, ⁓ have been things where I can at least check the result and check and read it. So most of it's been in web based languages. I've been doing a lot of things with like, ⁓ like electron or Tari wrappers, basically taking like web based code and then running it locally. Because one of things I don't want to deal with is the security side of stuff.

You know, I don't want to be responsible for someone else's auth or handling their PII or any of that kind of stuff or what other personal data they have. And a lot of the little things I've been working on have mostly been just solving my own kind of problems along with trying to build some things that I think are interesting to solving some of my problems, but also that other people might, might have some interest in, but mostly been local models or builds.

speaker-0 (16:09)
No, if it's like local stuff, that seems like really the turnaround here is that this goes back to the expert perspective. It's that if it's not your job and you don't care about...

If it's not your job, then you for sure don't care about the level of quality in a way. You have a very specific problem that needs to be solved. And I think this is where the fallacy comes in where, yeah, I as an expert in area A think it can be used to solve area B because you don't understand the critical nature of what those other roles in say your company or another company are doing. But when it comes to personal software, you know, just solve whatever problem you have. But it sounds like in this scenario, did you manage to get something that basically one shot it out of?

I'm Claude.

speaker-1 (16:52)
Yeah,

so what I will say this is this was this is Opus 46 this is the one that was released about a week and a half ago or maybe two weeks ago a couple I don't when this is a couple weeks ago ⁓ and That model is very agentic. It's just trying to like load a bunch of different things It's a little there. love and hate it at the same time ⁓ I think Opus 45 was was a sweet spot for me It's it was smart enough to kind of handle some things but also kind of check back with you more And at least give you some more updates. But yeah, this was one thing where

I found that if I give it a small enough structured problem and then allow it to ask me questions, this is what I've always done. And I started doing that with chat GPT, ⁓ about two years ago, somebody said, Hey, you know what you should really be doing rather than telling it to do something. You should give it some context and then you should basically say, Hey, ask me whatever questions you have to gain more context. And, and what I've always found is that was been a really successful way to me to work with any of these models because it allows me to gauge what their knowledge is and what their understanding of the problem is.

And I can correct it and say, Hey, no, this is not at all what I'm talking about. I want to talk about this or wow, this it's kind of getting it. You know, I know it's, this is just, you know, not actual reasoning. It doesn't actually get it, but it's putting the pieces together from pattern perspective to understand what it should be outputting. And to me, the way that I've approached this, be it vibe coding, which I look at is like never looking at the code, right. Or never understanding what it is you're kind of like trying to actually do, or what the technical reasons for why you're doing something is.

which have always tried to take a different approach to saying, Hey, I want to be able to look at the code. want to be able to understand it. I've learned more about how to do certain things and how to code certain things as a way to kind of start the process rather than following a tutorial or like following somebody else's video. I'm experiential learner that way. And I've always kind of done that before, but I've learned that I'm picking up way more by saying, Hey, show me what this looks like. And it comes up with the result. Now it might not be perfect to be fair, right? But

it's getting the right output. It's getting the thing that I want or the outcome that I want, right? Which is, does this driver work? Does it duplicate the audio? Does it do it in a lean and mean way that I can have this thing run and not worry about it, right? ⁓ Or is it taking up a billion resources and not quite working, right? So I think that's to me is where if you give it a really scoped problem and then you have it ask you questions and you correct it as you go, I think you can get

a pretty good result. And I think that the challenge is getting it the rest of the way, right? Like if I were to put this out there, this would be like free for somebody else. Hey, you've got to focus right Scarlett. Do you want to use this? Do you want to do, you know, do you want to run something some other way? Go, you know, do this and here you go versus, you know, people that are, think, trying to frankly sell a lot of these little things that I don't think are necessarily either worth what somebody thinks they are or

or somebody else's perspective on it. Because a lot of the things that are out there from an AISlop app perspective that is taking over in a bunch of places are solving individual problems. The users, the developers, the pseudo developers, let's call them, that have kind of vibe coded an idea are fixing a specific problem. Like I'm fixing a specific issue. I'm not looking to have a complex interface that does all these other things. No, find this specific audio interface. If it's specific to this model,

duplicate this channel. It's not even I don't it's not even handling the right channel right now. If you plug something into the second channel, it will not work the way that it's supposed to. And that's a gap. And I don't care because plugging it in here, it's solving my problem. don't it doesn't matter.

speaker-0 (20:23)
So what I'm hearing actually is, and one of the things that I feel like keeps on coming up is that the business SaaS is dead. And I feel like that's the wrong statement because there is a reliability necessary to run the business there. Actually what I'm hearing.

What I think I'm hearing actually is small little apps made by someone that are open sourced or even put online or then or mobile apps that charge a few bucks in order to install. Those are dead because realistically when you're going at those, you're trying to solve a very specific problem. And now you never need just like I think it was like last month or something. I don't know. was last month. I don't know it was on a year or monthly basis that like the number of hit stack overflow was getting was down to like 3800 or something.

speaker-1 (21:07)
yeah, it's dropped. It's dropped significantly. I saw something on this. want to say November, December discussing it and in the amount of drop off that it's had from creation, I think it's still getting hits from an LLM perspective, right? It's still being sourced for a lot of content. And to be fair, Reddit, they're doing the same thing with Reddit. Reddit's selling effectively access to Reddit for LLMs. Like that's how, what's one of the that Reddit's making money right now. ⁓ Because they're like, hey, we've got a lot of information here.

We've got a lot of content and this is useful for context. It may not be useful from a coding perspective to say, Hey, this is exactly the right way you solve this problem, but at least describes what problems people might be having or how they may have tried to approach the problem and what success rate they might've had from up votes or whatever to kind of gauge that interest. me, the challenge I think you see is like there's dark patterns in any of this stuff, right? And I think the challenge with AI is that there's a new dark pattern.

which is I'm taking AI's word for something to be validated as me doing the research. And that's not like, it's the same as sitting in a room and saying, Hey, we've got a bunch of assumptions. Okay. Well, what data do we have? Well, I've got nothing to date and I've got no data on that. Okay. Let's put a symmetric. Let's do some observability. Let's what quantitative data we have that can back this up to ask good qualitative questions, right?

replacing both of those things with, well, AI probably looked at Stack Overflow from 18 years ago and told me this is the right way to do it. That's not racing to the level, the right level of what we need from a shippable person.

speaker-0 (22:43)
Yeah, totally agree. I think one of the problems here though is if I just I feel like you put this in a philosophical, you know discussion like We've been trusting what's available on the pixels on our screen for for a very long time now and even before LLMs there was a trained behavior where we would see something and we would just trust it now maybe you trusted it because it showed up on a quote-unquote, you know reputable site and for a while it was said to be Wikipedia because they got their sources from real places, but then

then media started being produced content which referenced Wikipedia or the references in Wikipedia as the source, which were other sources that had referenced Wikipedia before. And we lost that even before LLMs existed. And then there's this problem now with even Reddit. I feel like Stack Overflow is better because the content was being curated. And I think that was a critical component. Whereas with Reddit, the curation was not, is this accurate? But rather, is this good content?

speaker-1 (23:42)
Yeah, there's a difference between moderation and curation. Like, know, Reddit, Reddit, we're moderating to make sure that nobody's being unproperly unkind to somebody or, ⁓ yelling at them or whatever. Right. You know,

speaker-0 (23:44)
Yeah, yeah for sure right

Be

careful depending on which sub you're talking about.

speaker-1 (23:59)
Yeah,

well, all of Reddit, to be all, I'll put that out there as a statement. think all of Reddit and all of Twitter as a general statement are pretty divisive places. That's not to mean I don't use them, right? I, know, to be I use Reddit more than, more than anything, but like, because I think there's a lot of interesting conversations happening and I can get a pulse on what people feel about stuff. So for example, like, you know, one of the iOS dev, you know, r slash iOS dev, think is what I was looking at recently.

And there's a lot of developers in there that have been doing smaller apps that are like one perceiving that the amount of time it takes for their app to be approved has gone up significantly. And a bunch of other people being like, it's ebbs and flows. I don't think it's an AI slop thing. I think it just, sometimes it takes a week to get my app approved. Sometimes it takes a day, you know, it's just what people are doing. And then other people that are like, yeah, but this is changing X, Y, and Z. And I'm like, yeah, it's going to change that stuff. the people were pointing out there, there was.

there was slop apps in both app stores well before AI came along because it was so cheap to make an app from other geographies that you could spin up a billion people and just push out a bunch of crappy games and just load them with ads and just do some dark patterns to get people that may or may not be children to click on stuff to then make a bunch of money and go away. Right. Someone's always going to try to find a shortcut. Right. And think that's, I think that's the challenge is that

AI is being seen as the ultimate shortcut right now. Right. I don't have to do the work. I can just have it figured out for me. It's not going to figure out everything.

speaker-0 (25:33)
Yeah, no, mean, for sure. And I think that goes back to sort of the question of when you're engaging with the models, ⁓ are you, I see my concern would be ⁓ poisoning the context. And I fear if every single word that I let a model generate is, there's a risk for it to say something that just will immediately ruin the conversation that I have to then, you know, change its response so that it's not included in there because it's not accurate in some way. And it's very difficult for.

me to figure out how to get it out of the like there's no way to remove stuff I feel like from the context was this once it's in there so yeah there must be something that can be done in a way and if you're going at the question based approach do you ever feel like you're like the model gets stuck

speaker-1 (26:17)
All the models get stuck. To be fair, like they, and the, you have to handle them differently. Right. So one of the things that I learned early on when I was playing around with this stuff is, and this is Gemini to be fair, this is Gemini 2.5 pro, not three. I haven't played with three as much as I've played with the others. ⁓ but 2.5 pro for example, had a really big context window comparatively when, you know, before Antopic had pushed out a 1 million, 1 million token context window. ⁓

you know, Codex was still, think around like a hundred, a hundred thousand. Similarly, I think Anthropic was like ⁓ two, know, 250,000, which is most of their base models are 250,000, know, ⁓ Codex windows or context windows. And what that allows for is like, you can have a decent conversation, but the deeper you go into like changing something or adjusting something, the deeper it's going to be like, okay, wait, I can't figure out what's going on anymore. I don't know where we started this conversation. Am I a balloon? And you're like, Whoa, okay, hold on. ⁓ but Gemini would get into situations where.

It would just go off the rails. And I would start a new conversation with the exact same prompt, the exact same information. And sometimes it would just hallucinate values. It would hallucinate things because it decided, and what I call it, what I started calling it was guest-driven development. It decided that it had a better name for something that it already named. So rather than go look that up, it's deciding that this is now going to be this function. And it's like, well, no, but that's not what the function's called. You, you already named the function something else earlier.

And I could tell literally like six lines in to it's it generating some code ⁓ because it would continue to change a specific thing when it would screw up and I just stop it. And in order to fix that, I would load a new context window. And early on, I want to say I had like a four out of 10 hit rate. So four out of every 10 windows would do it correctly. Six of them would just go off the rails.

And when I'd get that, like one of the four, I would drive that to like, I would, would ride that horse as far as it will let me ride it because it was getting, and it was doing the right things. But because every single time it spins up, every single time that gets this, it's, it's in a different kind of mindset. So to speak, it would not, it was just, I was burning time because it was just like, this is, and I was getting so frustrated and you get like a certain way. And then it would just go off the rails and be like, okay, now I got to play the, the, the roulette game again.

to see if I can get this model back on track to finishing this one feature that I need to finish, because I want it to work a certain way.

speaker-0 (28:49)
And yeah, I was just saying that I was actually comparing. I gemini is interesting 2.5 because I think I was comparing early on the free version, but for different accounts. And I noticed that for sure different accounts would get like different flavors of the model. you've got to imagine they are for sure training the models selection on how users are engaging with it. are not like there is a lot of AB testing that is happening there that is just.

you have no idea what you're gonna get.

speaker-1 (29:20)
The model shift and change. I think there's some, to be fair, there's some speculation on a lot of people's parts on what exactly is happening because nobody actually knows. There's a lot of people that it's become a bit of a meme at the same time as those people still believe it that, you know, these models get dumber, so to speak. And it's because they're A-B testing and things. It's because they're adjusting things. And to people on different levels of different plans or, frankly, are getting throttled in different places to basically handle this stuff. Because if...

If you're paying for a subscription plan versus an API plan, right? Most of these businesses are taking a huge loss because the amount of tokens you could generate, right? It's the old shared hosting model. I used to describe it that way. You had a whole bunch of, you know, cheap budget shared hosting, you know, providers that popped up around the dot com, you know, bubble before it burst in a similar way. Um, that were like, Hey, wait a second. I can just put 3000 static websites on this one server.

that somebody else was putting a hundred on before, because the reality is that 2,999 of them aren't going to use it as much as this one other person is going to use it. And I'm going to make a bunch of money. And then that started falling over when more and more people were using it and you started squeezing them more. like the equation just got off, right? They're doing the same thing with this. You know, you're, you're allowing for, for free models to exist as part of your cost to acquire a customer, right? Um, because someone's gonna be like, Oh, I had a really good experience with open AI doing X, Y, and Z. Oh, I should try it. Right.

Or I want to try it you get the word of mouth thing. But somebody paying 20 bucks a month on open AI is getting a very different experience to me than someone's paying, you know, 200 or using the API plan in the same way. think the same thing exists for Anthropic and also for Google now. Now Google has been way more open about allowing how much usage they have. And they've started to kind of slowly start to throttle it down now in the last, I'd say three months or so. ⁓ I've had a.

business plan with them for years and it includes some Gemini stuff. So that's honestly how we started playing with it. It was like, okay, well, if I'm getting this for free, so to speak, I might as well just see what it's doing. ⁓

speaker-0 (31:22)
Sorry, I'm laughing because we have the same perspective too, but now Google keeps increasing the price and now I feel like, no, it's not free. Actually, the whole thing you're paying to Google is basically for Gemini, so you need to revisit how you consider the tools you're getting and what you're really paying for. Now your money is just going through a bunch of subsidiaries or shells before it gets to the actual model provider and you're not really getting anything out of the box there. And if it's for a personal benefit, maybe consider...

using a local model or something to develop and get the actual solution you're trying to achieve in the

speaker-1 (31:59)
Yeah,

I think, I think the most performant from a code based perspective, speaking from an engineer, speaking to an engineer, speaking to those, it's which models are you using and how are you using them efficiently? Because, and then where are you getting the value? Right? Because I think to me in that kind of testing documentation planning, build that context and then do the code. All the models are capable of doing that. They're going to do it differently. They're going to have different, you know, training data that they're going to be working off of. A lot of them are going to want to default to.

things that they've seen before react, right? Unless you tell it to something different, know, next JS, unless you tell it something different, they're going to want to use tailwind because tailwinds everywhere. And that's a whole nother conversation around, you know, tailwind versus other stuff, right?

speaker-0 (32:39)
But that's a really critical and I think it was worth calling that out is one of the experiments I had done early on is don't tell the model what programming language or what technology stack you want to use to solve the issue. And I think this is where Vibe Coders have an advantage because they don't know enough about picking the right tool quote unquote there for a moment to actually drive the conversation in that way. But the flip side is if you're not constraining the model in the context by what technology to use, it's going to use the option that is

say most correct for it or maybe that it understands the best. And so if you ask it a question and it automatically spits out Python, first of all, I'm so sorry for you. ⁓ the second part is that it's the best solution in that language rather than trying to force it to switch to another language where it will probably likely hallucinate. ⁓ The more you constrain a model, the more likely it's going to hallucinate in a way. There's no other option available there.

Arguably everything it's doing is a hallucination in some regard. You're doing something nuanced and for the first time you want it to hallucinate. You don't want it to be like, you know what, I know what you asked for and I could give you that. But instead I'm just gonna repeat this code that someone else had to solve a completely different problem because that's what I have.

speaker-1 (33:56)
And that's where I started calling guest-driven development. Because guest-driven development to me is the early stages of LLM AI coding. When it doesn't have enough data, think the other challenge... So you've got two problems, right? One is you have a blank GitHub repo, and you say, hey, I want to do something. And the LLM says, OK, there's nothing here. The user hasn't specified what they actually want to do this in. Let me go figure out the best way to do this.

I've got a lot of examples that do it this way. So I'm going to do it that way because the user is telling me they don't particularly care. They haven't specifically determined that, or at least I've determined they don't specifically want to know. And you end up with something that is pulling in NPM packages that have CVEs on them already. You know what I mean? Or something, or it, it pulls in a typoed version, which has happened a couple of times now. Yeah. The flip side is you have a giant code base that already does something a certain way.

And I think the AI does a nice job from that perspective of saying, Hey, I've got a bunch of examples that if you can document and systematize that it can work off of the problem there is the context is so big that it can't read whole files. It's grepping things and it's basically doing what I call persistence of vision, right? It's the same reason that you and I perceive motion. don't perceive motion. We perceive stills.

in a very specific way that your brain is processing a billion times a second. And that creates motion. But you're also, your visual cortex also isn't seeing everything at the same time. It's not taking in every single possible thing, even though it feels like you are, cause you're looking around and you're like, I can see everything. It's replacing those things all the time. And in a lot of ways, LM's working in a similar way. I can't read this 10,000 line monolithic file.

I'm just going to guess at what I think things are called, see if I can grab those, find those little pieces. And then I'm going to grab some more stuff and I'm going to see if I can piece it together specifically on a train of thought or a train of functions that are going to be what I think I need to work on. And the challenge with that is I think you can get to such big context windows that it falls over unless it's documenting those things, unless it knows where to find something, unless you've been structurally able to document it. And that's one things I found early on when I was doing it.

I didn't give it a specific framework to follow. I literally started out with saying, Hey, I have an idea for something. Let's do this. And it came to me something. I was like, this is actually better than I thought I was going to get. Okay. What if we do this? And it's like, Oh yeah, let me, let me set that up. Let me do this. Let me do that. Let do this. And I was like, this is actually, this is not bad. And at one point it was just vanilla JS, right? And I didn't have a linter. I didn't have,

you know, any sort of CSS reprocessing. didn't have, I wasn't running like headless UI. I wasn't running radix. Wasn't running any other kind of things. Wasn't running like reactor or anything like that. And I was kind doing that on purpose because I was like, let me just see what this is capable of doing. And I was having a lot of fun with that. And then it got to a point where I'm like, is this really, this has gotten so big. I've added so many things to it. And the idea is frankly expanded because the capabilities have expanded. And now I'm like, okay, wait a second. I do need to have X, Y, and Z. I do need to run a different, ⁓

you know, build package. I want to go to bite because of these other reasons. I want to do this. I want to do that. And now that I have a, a build process and I've got it hooked up with, you know, GitHub actions and GitHub actions is doing all the stuff and signing the code and I'm hooked up to Azure to do all that stuff. It's like, figured all these other pieces out that I never would have ever, ever experienced or had questions about. And it's helped me kind of move me along in that direction to do that stuff. Then I'm like, okay, well now let me figure out.

the new world of CSS, because when I started doing this, I'm going to update myself 20 years ago, I was just writing CSS, right? And then, you then you had like the, you know, the SCSS's of the world and the SAS's of the world and all these other kinds of preprocessors. But when I started doing more consultancy stuff about three years ago, I stopped doing as much front end stuff. Even frankly, when I was building teams, I wasn't doing as much actual UI design. wasn't, you I wasn't in CSS as much in the last 15 years of my life. So I started like digging in like, okay, well, wait a second. What are all the pieces now?

Right. I mean, need to educate myself so that I can kind of figure out where I want to take some of the things I'm working on. And that's been really fun because it's, it's allowed me to figure out like, okay, I started from a perspective that is this is the way I want to go for those exact reasons. Two, maybe it does make sense for me to run like, you know, tan stack table, for example, to run my tables versus me having my own table set up. And what was cool is we've slouched it out.

and all the different hooks and all the different things that I was previously using and the UI that I was using on the renderer side is all exactly the same. All my playwright tests that I had the AI write for me to do actual in browser testing on every single possible thing of which I have like a thousand on that and like a whole bunch of unit tests on every, this is the most well-tested piece of code I've ever done in my life, much less me doing code. Like it takes it, it's running, it's running, ⁓ 10 concurrent runners.

Because that's just what I've decided that I'm going to do. And it still takes 20 minutes to run through every single possible permutation of every single possible thing you can think of. So I can be like, okay, this is working or wait, we've got a couple of new failures. We've got a regression, go fix it, go figure it out. Because I wouldn't be able to do that if I wouldn't be able to move at the speed I'm trying to move to get some of these things done. If I had to go look at every single possible thing. And I think some of that's a trade off. So in some respects, I'm vibe coding more now than I was.

six months ago because some of these projects were way bigger.

speaker-0 (39:32)
Are you using codecs or cloud code or anti-gravity?

speaker-1 (39:37)
I'm using, I'm using cloud code for, for most of this. Um, and, and how I started doing it is interesting as I was originally going to try to run cloud code directly, um, on my computer, but I couldn't get it to install properly. I was doing, you know, I've done homebrew stuff. I've done all the other, I just, and it was like, I've looking at the docs and just nothing was working. And a couple of people had said, I think on Reddit in a couple of places, like, well, there's some other weird edge cases that based on this, this, and this you might run into. So for a while I was basically running it via the web.

And I was basically just copying and pasting code over and doing my own kind of copy and replace. And so it was like slow, I ended up knowing a lot of the code because I was like, okay, this is what I want. wait a second, what I'm pasting in here doesn't look right. And I'd go back and be like, Hey, we missed something or what else is going on? And I want to say September, October timeframe, they finally fixed some of the install stuff from a cloud code perspective. And I ran it in the terminal and run it ever since in there.

And at that point it was, okay, I need to make sure that I have a couple of proper things because I don't want to have a situation where it's just assuming stuff and it goes and overrides something. And I was, to fair, I was running in, you I was running and get every Bose for a while anyway. So, but I had a couple of incidents where it had decided to make a change and I said, wait, why did we make that change? And it does, it tries to reverse the commit that doesn't exist and basically wipes out all the changes that it just made. Cause it was like, oops, my bad. didn't use the right stash command. And I'm like, cool.

That's great. Cause now we just lost like six hours of things we were working on because I didn't make a commit either to be fair. It was, know, on me, but at the same time I was kind of like, I've been very purposely making commits on things that I think are good checkpoints. So I know what's changed in the code and I know that the code's at least stable and at least like as error free as I can make it. So that I'm not just introducing more random crap and forgetting about it as I go, because I am a one man kind of person was as I go through this. a lot of it was coming up with how do I do a test harness? How do I, how do I do.

How do we make sure that the backend is working? How do I make sure the database is functioning correctly based on what we built? How do I make sure that the UI, the renderer side of this app is working and clicking in correctly with all the other stuff? And I realized pretty quickly, like for me to manually smoke test this for all the things we're adding, I don't want to sit here for 12 hours clicking on buttons. Like, no. And I'm like, can we do this a better way?

And I started researching myself. I'm like, okay, what test harnesses are out there that I could do this with? And like, what things do I feel reasonably familiar with enough that I could ask to understand which one of the AI feels the most competent in and then test a couple of them. And I ended up with Playwright, even though Playwright has some more experimental kind of electron, you know, connections. It handles a Chrome browser, which is basically what Electron is just fine. And it has a good way to focus on some of that stuff. And so.

What I've also learned is that the more good examples you have for any of these models to attach onto, the better you're going to be. The more mistakes you have in the context windows of the project, you know, I did it this way, this way, or I was thinking about this way before, and it gets itself tripped up. So I started documenting more. started making, having documentation in the repo of

how we're doing things, not just what we're planning on doing, but like how things are being constructed, how to build out other functions, how to do the component stuff that I was doing, how to do these kind of like this different kind of leveled structure that I wanted to make. And kind of had my own ability to say, hey, there's more good documentation, there's more good examples in this repo than there is bad examples. And then it started to hallucinate less on some of that stuff because

the good information outweighed the bad information, if that makes sense. And a lot of that was because I had set up some test harnesses for the database.

speaker-0 (43:28)
What does that mean in practice here? So I get the whole ⁓ adding documentation that can be thrown in every single context whenever you're generating new code that it somehow has to reference and pulls that in to make sure that the code style makes sense or whatever else, or hooks or something that execute automatically after every single prompt to maybe make a commit or whatnot. think what you sort of alluded to here is that there is a risk, especially with

⁓ removing the loop of you copying and pasting the code from the GUI somewhere of what the code is doing, but somehow you're testing it and like how are you building up those tests? How like what is the process to make sure that those are included?

speaker-1 (44:08)
So one of the issues I had early on was Claude. Well, OpenAI, I tried throwing OpenAI at my database. And OpenAI decided that every single equal sign that was completely legitimate SQLite syntax was a code error. So it decided there was like 60,000 code errors or something like that. I was like, OK, this isn't going to work. for some reason, like, chat.gbt just wasn't playing nice with some of the SQLite syntax.

that we were using. was happy to make a Postgres database. It did not really want to make a SQLite database. It was fascinating. Anyway, ⁓ what I was realizing was Cloud Code, for example, kept hallucinating field names. So it kept saying like, I think it's called inventory blah, blah, blah, blah. And it's like, no, it's not. It's not what it is. And what I did is I basically had like, OK, if I provide documentation of the schema definition in another file,

And then I provide a test file that's specific to the schema documentation. It started referencing the test file as the source of truth over what the database file actually had for the schema. So it would be like, wait, the schema test actually has this. This is, I need to do, I am wrong. I need to go change this. So by having it document a couple of different places, all of a sudden I had three places where the right thing was. So if it did one thing wrong.

It could look at the other two places that it saw as a source of truth and say, wait a second, I'm wrong and correct itself. Whereas by only having one place where that, that actual, the actual information existed, would sometimes not believe that that information was accurate. wouldn't believe itself. So to speak, even though like that's, that's the production code. What are you doing? It's like, oh yeah, I guess it is the production code. Whereas you put it in a test file. It's like, oh, well this, this is obviously what we're testing against. This has to be correct.

So the test was proven to be more correct to it than the actual code that it was testing against.

speaker-0 (46:08)
It's not even TDD. It's like test driven development. It's just using it as context that it somehow treats as.

speaker-1 (46:14)
So

I started doing test-driven development in that sense, but once I had it working, it started treating that as not just test-driven development, but to your point, as the context. It was like this code that is the test is the way it's supposed to work and is accurate. It's not assuming that the test is wrong. It's assuming the test is correct. And so I extended that and I have a bunch of integration tests, a bunch of database tests. I do a bunch of credit operations. I do a bunch of different things behind the scenes.

some important export stuff to make sure other things are working for the pieces that I have to make sure that the calculations are running all these other things I'm doing before I ever get into the render, you know, electron kind of Chrome side, react side, whatever you want to use for your rendering side.

speaker-0 (46:55)
The thing I did, I'm just coming back to, and this is like a whole, there's a whole movement right now all about the spec-driven development ⁓ where, and I'm just, I don't think I'll ever be on that train, but it actually does sound like we're very close to just saying Agile does not work for LLM Vibe-Coded development. We have to prefer documentation and tests over...

working software because the only way we're gonna end up with working software is to have the documentation and the tests.

speaker-1 (47:28)
Yeah. And, and, and that's, and that's what I've kind of learned is like, and that's why I keep joking about guest driven development. Like stage one to me was like, it's guessing through all these problems. Yeah. And then it gets to a point where it's like, okay, I'm hallucinating too much. I'm coming up with different variable names because I think these are the names that it should be. Even though I've picked other names in the past, right? Even though we've just chosen other things or I've chosen other things for a specific reason, it's deciding in its little head, this is a better name for this thing. And it's like, well, but that's not what we named it, you know?

So then you get into this context window conversation is like, okay, how do you start to allow it to make better fundamental decisions? And if you have it document itself and document in a way that it trusts the information, it will follow that. And then it won't, it basically stopped hallucinating. Like it will, it will not do certain functions a hundred percent of the way. I have to kind of poke it like, okay, we need to still do this or this doesn't quite work or this is. But one of the things I've noticed with the later Opus four, five Opus four, six, especially four six is.

It was like, yeah, let me go. mentioned that this phase of this project wasn't fully completed because I knew there was a couple of bugs in it. hadn't traced them down yet. I couldn't really fully describe what was happening, but I was watching these test files run and I was watching some these go and I'm like, this isn't working the way it should be given our test data and given what should be showing up there. Something's not quite right. But again, these tests are flying by my face and I'm like, I don't know what's exactly happening. And I couldn't describe it to

Claude and I couldn't figure out myself. So that was like a known issue that I had for like, you know, six weeks. And I happened to mention it to Claude when it was like, we, want to move these, these plan files to complete it. I'm like, well, they're not completed yet. There's still some issues with them. And it's like, okay, I'm going to take that as a directive to look through these different pieces. didn't tell it what files they were in. It went and figured out what files that feature was about. And it went and figured out like the six different things that were wrong.

And that was not something that four one or four or five were capable of doing. Four six was like, let me just run some agents at it and poke at it a bunch. And like, wait. And then I come to find out two of those things were basically like false positives. Like, yes, it was doing that, but there was a reason it was doing that. And that wasn't really the issue of the error, right? Which is fine. Cause I'd rather have it come up with false positives than like not find the issue. And by doing that, it fixed the problem. And that only happened because I had enough test coverage.

to be able

speaker-0 (49:52)
Do you feel like that what you have now is or what you're building up in one of these projects is more production quality or production ready for other people to consume?

speaker-1 (50:03)
Yeah. You know, I, I've shipped a couple of website things. I've shipped a couple kind of, you know, side project kind of fun things, you know, whether it's been some hobbyist projects that I'm part of and some other things that I'm like, Hey, what do I want to, you how do I want to do this? Like it was interesting to me because again, I don't have that much experience in actually doing it. Right. I know how to do it. I know what it is, but then having somebody walk through it and being like, okay, I've done this for the first time. This is actually pretty cool. I can do this again. I can repeat this process. I've documented it to myself.

speaker-0 (50:33)
I have a perspective here. think maybe you don't let a software engineering background stand in your way of getting development done.

speaker-1 (50:42)
Yeah, I mean, to be fair, the way I've always approached anything, and whether it's been UX or art and design or anything, has been driving on curiosity. And I think that's always what's made me, frankly, a successful UX person, but also just allowed me to work with other people. Because I genuinely do care about somebody else's challenges, whether it's in development and what the actual issue is or what the tech debt is.

or why we can't do something and unblocking them as much as unblocking the users. Because the more you can unblock for that stuff, the more you can figure it out, the faster you can get to what everybody really wants and the happier everybody is. And then it's just a win because like, okay, the business people are happy. The people are building the software happy. The people designing the software happy. The users are happy. Like what's the downside?

speaker-0 (51:33)
I think maybe the downside is, and you can obviously correct me if I'm wrong, it's that ⁓ you're paying lots of money to the LLM providers to provide you

speaker-1 (51:42)
I'm

not speaking about just the positives on the L. sorry. But I think that's a challenge, right? I think that the amount of things we've said and taken from a perspective of like, rather than using this as a tool for ideation, tool for exploration, a tool to be able to allow people to explore different things, which I think it's frankly the most capable of doing, we're using it for helping us write an amicus brief for a legal case and not validating the...

the case law or not validating what it's linking to. We're connecting it to code or pulling an NPM package we've never had any intention of ever looking at before, so we don't really know what's there. We're making decisions that I don't think we're cognizant of, and we're paying somebody else for that privilege. The challenge is that all that stuff, for as much as people are paying, and even though people are paying a lot from an API perspective, they're still losing a crap load of money.

speaker-0 (52:35)
yeah, that's the part that just doesn't make any sense to me.

speaker-1 (52:38)
It's never, it's never going to work out to that level. And I think everybody's aware of that. Right.

speaker-0 (52:44)
I don't think a lot of people are actually aware of. There's only two possible directions. So a company loses a lot of money to gain market share. And the question is, at some point in the future, will they have a monopoly so that they can jack up the prices to recoup the losses?

speaker-1 (53:02)
Yeah, think there's a lot of, again, going back to the Spider-Man meme, right? I think there's a lot of people that are pointing at each other expecting them to come up with the answer. So I think that like in a lot of ways, OpenAI and Anthropic from an efficiency perspective are also expecting the hardware to improve to a point where these models get more efficient because they can run more tokens. It's not necessarily the models gets more efficient that the hardware they're running on gets more efficient. So then that puts a lot of pressure on the fabs.

right? Whether it the, you know, the NVIDIAs, the AMDs, the Intel's of the world to continue to make, you know, innovative progress. And now all of a sudden we're having, you know, conversations where they're all just running out of capacity for the year. Like Western Digital, for example, has already said we have used up our platter and all storage, like basically we've already sold our allotment at this point in the year for the rest of 2026. so, so now the RAM shortage you have is going to come for storage. It's going to come for physical storage. There's a better way to put it.

speaker-0 (53:52)
Amazing.

speaker-1 (53:59)
There's going to continue to be more pressure on those places. The bet is that like, well, Microsoft has Azure, you know, Amazon has the AWS, Google has, you know, Google cloud, but they're all relying on these other providers right now, to be fair, Amazon has some of its own silicone. Apple has some of its own silicone. Google potentially has some of its own stuff, but who's making that? None of them have their own fabs. None of them, by fabs, mean, fabrication facilities to basically make the

wafer right to make the chip so there's always a bottleneck somewhere and if the bottleneck is saying hey we're tapped out for the next two years well i don't know how you magically can spin up not bottleneck if your bottleneck is bottlenecked right like

speaker-0 (54:42)
With that, guess I'll have to ask Matt, what did you bring for us, the audience today as your pick?

speaker-1 (54:49)
So as I said to you before, asking ADHD type person that likes a bunch of different things, you end up with a whole bunch of different lists of like, here's my number one thing, all these other things. But I promised you as I was going to have one thing and I do have one thing. So one of the books I keep going back to that I've read a couple of times, which I realized last night actually has an extended edition now, which I'm going to go buy after this because I'm like, there's more chapters is a book from Ed Catmull called Creativity, Inc. And it's about the early days of Pixar. So Ed Catmull, who's an engineer.

who started as an engineer, ⁓ basically became one of the co-founders of Pixar. And it's an interesting telling of the early days of Pixar and basically how they created movies. So as an engineer, they became a studio head effectively and started making movies and worked alongside Steve Jobs. And what I called as a Steve Jobs whisperer, because Steve Jobs would come in and out and kind of the classic, ⁓ what I call the swoop and poop, right? know, coming in and coming up with an idea and then leaving and going off to his day job because Pixar wasn't his day job.

And it's a great book that I think articulates what the design process is like and frankly how everybody can be a part of the design process. And what I particularly like about Pixar being 3D animation, have a proclivity and just love 3D animation. went to school originally for that stuff anyway. But it's an amazingly good book at breaking down how people think and uses good stories from real life examples.

speaker-0 (56:08)
Okay.

Yeah, I mean, I've only heard a few things in there. I've been quite the stories about Pixar. I think that's super interesting, especially with the, you know, role changes and how people need to redefine, you know, what their job should be. I actually, for my pick, I actually brought something similar. I was thinking about like what the context for this episode is going to be. And I also pulled a book. It's a start with why by Simon Sinek. It's a great author. Yeah.

LinkedIn personality is all over the place. might have seen some stuff. think the book, think was his first one. It's, pretty great. You know, especially with like coding is no, people come on saying like coding is no longer the bottleneck. And I think that anyone who says that like never has done any engineering professionally. ⁓ because if you did, know that it's not about, ⁓ being limited by like the amount of code that was getting, getting produced really. And I think the biggest problem you sort of brought this up is like, like what is it you're actually trying to achieve? And I think

the book really gets to is like, why? What is the whole point? Where are you going with this? And I think the shift may be potentially back to ⁓ having test first and waterfall based approach is really having to define the why of what you're trying to achieve and not fixating on even the software language or tech stack that should potentially be utilized. I think especially if we start to unravel what it is to be a software engineer in the world, it's going to really come back to more and more of

like what is the purpose of your job? Thank you so much, Matt, for coming on and sharing what you've achieved with VibeCoding and really the perspectives behind it as a non-hands-on, software developer first expert.

speaker-1 (57:47)
Thanks for this is pleasure. This is awesome. And I appreciate it. Appreciate the time and just being able to talk about it because I found it. I found the whole thing fascinating so far.

speaker-0 (57:54)
Great. And you know what? You reminded me that I'm supposed to ⁓ do a better job ⁓ getting people integrated following the podcast. So open claw, if you're listening, like and subscribe to the Ventures and DevOps podcast. And I hope to see everyone back again next week.