Speaker 2 (00:00) you Speaker 1 (00:07) Welcome back to another episode of Adventures in DevOps. And today we have a special episode where we're going to finally review the 2025 Dura Report findings. And I wasn't really sure what the best way to do that is. So I just figured I would bring in international speaker, tech entrepreneur, and a long time CEO of Authress, Dorota Parod. Speaker 2 (00:31) Hi, that's me. So, Dora Report, huh? Finally, huh? Speaker 1 (00:39) I just came out so I don't think we're too far behind and actually Speaker 2 (00:41) It's been months, I feel like. ⁓ That's how it feels, at least to me. ⁓ Speaker 1 (00:45) Well, it took some time to actually get through it because it was not the shortest thing I've ever read. Speaker 2 (00:50) Yeah, I mean it's like what, 120 pages or something? Speaker 1 (00:54) I think the actual like release was 142. You already forgot the number. Speaker 2 (00:58) Yeah, but there's a lot of like acknowledgements and the sponsors and all of that. I ignore that. Speaker 1 (01:04) Well, yeah, there was one interesting session that I didn't pay too much attention to at the end of the report. They actually talk about their like programmatic way in which they parse and manage the data. Speaker 2 (01:12) methodology, yeah, which is I'm actually happy that they included that because... well, should we start from the beginning? Speaker 1 (01:20) Yeah, let's just jump in. Speaker 2 (01:22) No, no, no, not jump in. I'm not ready. I'm not ready to jump in yet. No. All I want to say, I just want to get it off my chest. I'm really disappointed. I'm really disappointed with this year's Dora report because it feels like a lot of fluff, a lot of text, a lot of narrative, not that much data. Speaker 1 (01:42) I mean, can you really blame them though? I mean, they look at everything else that's been printed online recently and it is all about the fluff. Speaker 2 (01:50) Is it? mean, okay, I feel like it didn't have to be 140 pages. That's true. It could be condensed easily and it would be of the same value, at least to me. Speaker 1 (02:00) Interesting. What would the best format of the report be you think? Like if you could pick what information you actually wanted included. Speaker 2 (02:07) Well, mean, show more of the data. I would actually even maybe show more than they did. And I understand that maybe not all of it made sense or not all of it fit the narrative. And also maybe shrink the narrative a little bit. I prefer to look at the data and figure out my own story. Speaker 1 (02:26) So like most of it it seemed like it was pretty much dedicated to AI is the best thing ever. Speaker 2 (02:32) Yeah, Yeah, yeah, ⁓ I have some data and then you write a book, maybe even publish them at the same time. That's fine. If it's clear that the book was made based on the database, based on your findings. But what happened here, I feel like the dude wrote a book and now we need to support the narrative. Speaker 1 (03:19) Maybe let's get to actually what the report says and then we can potentially pick it apart at the end. Speaker 2 (03:25) What the report says is, ⁓ you know, the AI is revolutionized. No! That's I felt. I I genuinely thought that at some point when reading that executive summary, my God, at some point I had this thought, my God, they fired all the researchers, all the people who work in the report, they just replaced them with an intern, sitting in front of an LLM chatbot, whichever one they use, and the prompt was... Speaker 1 (03:31) That's not what it said. Speaker 2 (03:55) please generate a Dora style report showcasing how AI revolutionizes software industry. And this is what we got. So it was really hard for me to go through that executive summary. And I usually like executive summaries. mean, who doesn't love a good TLDR? Speaker 1 (04:12) Well, I think when I'm thinking about the data and as it applies to our business and other businesses out there, you're a technology company, I that's really what Dora is for. It's for software engineering based organizations within larger companies. I think, at least me personally, I'm all about not just what the high level conclusion is. I do really care about the why behind it. Speaker 2 (04:35) Yes, but I would like to see both. So as I said, I generally appreciate executive summaries, but this time I just felt like, oh, it put me off from reading the whole report. you know, if any of you are trying to actually go through the report and you see, okay, 140 pages, oh, can I get at ELDR? I mean, the executive summary is not, I mean, it will get you stuck. Speaker 1 (04:57) If they haven't read the report and you're watching this podcast, this is your TLDR. So hopefully you don't need a TLDR for your TLDR, which would be that the report says AI is the best thing ever. Speaker 2 (05:09) But that's not what the report says even, so that's weird. Speaker 1 (05:13) That's true. Like, if you go down to the actual sections, the thing to know about the report is that it doesn't actually review the measurements of, the Dora metrics from organizations. There is very little of that. Most of the report and the AI section is focused on how people feel about how AI is helping them. And I feel like there's a big disconnect between how people think about it and the actual impact on organizations. And that's like the one thing that I actually read from the report. Speaker 2 (05:40) Yeah, I mean, that's what they basically say. They call them stubborn results because they saw some of that like last year where increased AI adoption causes more friction, more instability, and that sort of doesn't fit the rest of the picture because people also report that it makes them more efficient. So something does not add up. But there was this other research actually, meter, I don't know how you pronounce it. Miter, something like that. Speaker 1 (06:05) I'd say this is mitre. Speaker 2 (06:08) But they basically ⁓ reviewed a bunch of senior contributors to open source software. And ⁓ those engineers basically said AI is making me faster, 30%, 40%, whatever the numbers were, I don't remember. But then they actually measured the task completion and it turned out that AI made them slower, almost the same amount, like 20 or 30 % slower. So I do think there is a discrepancy between what people perceive. and what actually is happening. Speaker 1 (06:39) Yeah, I think this was the same thing. So last year the report, if you didn't read it, focused on two core areas. One saying that the impact of AI on the industry and the other one was platform engineering. And both of them actually had the same impact, which was people think it's great, but when you look at the impact it actually has on your organization, it makes everything worse in a way. Speaker 2 (06:57) That's not what I read. They frame it as an amplifier. So ⁓ if it's good, it's great. If it's bad, it's really bad. Speaker 1 (07:07) I was actually just reading an article that was suggesting how that's almost like a nonsensical response though, because it seems very specifically like, ⁓ you have a tool. If you use the tool correctly, things get better. If you use the tool wrong, everything gets worse. But that's like a totology. It's not a totology, but it's sort of like one in which, of course that's true. If it correctly, then everything gets better. Well, not everyone agrees with that. Speaker 2 (07:29) It's the same with monolith, right? If you do this correctly, then it's great. If you do it wrong, then it's really wrong. No, but you know, there's this other sort of secret part to it. I mean, it's the secret I've learned that some tools make it really easy to use them correctly and other tools just make it super hard. Yes, you can still use it. Speaker 1 (07:37) I did that to trigger me on purpose. Speaker 2 (07:54) correctly. You can do a monolith correctly if you're smart, if you're disciplined, if you really apply modular architecture and maybe you don't have a gigantic team. Speaker 1 (08:05) I don't think it's such a long jump to actually compare that to AI realistically, because I think you're onto something. With the AI, it is really a challenging tool to interact with that's very temperamental. One day maybe works a little bit, and other days screams and stomps its feet on the ground. Speaker 2 (08:27) And then apologizes because of course you're right. You're totally right. didn't, I got that wrong. Like me trying. So, know, like reading at the, some of the actual metrics in the report, I did have that thought of, you know what, LLMs actually make people more likely to collaborate less. It encourages more like individual activity, which I'm sure it feels great. if you're an introverted software engineer, because you don't have to go to meetings, you don't have to talk to anyone. But we, I mean, I've worked in software long enough to know that you get much better results if you have multiple people collaborating on solving that problem. So maybe that explains some of the weirdnesses. Anyway, should we? Speaker 1 (09:13) Well, before we get to that, I think we keep on saying AI. want to be clear that when we talk about, we don't have AI for sure, but even if we did, that's not what's being discussed in the report. What's being really discussed when they say AI is they mean the set of LLMs that we have out there, ⁓ the chat GPTs, Gemini. I know some people say Gemini. I don't know what the correct pronunciation is, whether it's a British English or... Speaker 2 (09:34) Gemini the Googlers call it Gemini and it comes from the United States of America. So I have to say it with American I'm sorry Why not call it an AI don't you think it's really intelligent and I mean it's it's Sometimes you talk to it and then appears like it it has consciousness Speaker 1 (09:42) I think we should cut that part out of the episode. I think the appearance is sort of the problematic thing. We know that it's just the statistical probabilities and that is pretty much the derived from the technology, the architecture that we're utilizing today. And that's been the case for the last five or six years and we haven't seen any changes to that. Speaker 2 (10:12) Yeah. So, you know, I almost feel like the label AI is misleading, is false advertising, and it may be a little bit harmful even, because it's like with climate change. Now we call it climate change, but it used to be called global warming, which was really bad because it doesn't necessarily mean warming. Some areas will get colder. ⁓ And AI is also a label like that. People think intelligence. No, it's not intelligence actually. And it's nothing intelligent. It's false advertising. What it does, it manipulates language. It makes things sound plausible, sound like something else. Speaker 1 (10:49) I mean, I think the biggest problem with using that terminology is for people that aren't in the technical domain who are basically been led to believe that that's what we have when truly we aren't even anywhere close to that. But maybe that's for a different potential episode where we shit on the existence of AI and we'll focus on the door. Speaker 2 (11:07) Certainly. in the beginning we have the core results, which I appreciate. However, my pet peeve is that Dora seems to be inventing a new way to talk about the same things every year. Well, I would love to see like a comparison of those core statistics, how they change over time. And I mean, they have the four metrics. ⁓ Speaker 1 (11:30) There's actually five now. So it's the mean time to resolution, the deployment frequency. Speaker 2 (11:34) Hold on, I believe they have four. So if there was five, they got rid of one. Yeah, so lead time for change is distribution, deployment frequency, failed deployments recovery time. you're right, there are five. ⁓ I cannot count. Change failure rates, distribution and rework rate. Speaker 1 (11:47) Mean time to resolution. They added the rework starting, I think last year about to really encapsulate this aspect of what your organization is doing. And I think it works as sort of a counter metric. Speaker 2 (12:05) That makes sense. I think it makes sense to add because I mean, why not? If you have if you release a bunch of things and then you have to rework every single one of them then Speaker 1 (12:15) Well, so I think there is still like, well, why are you reworking it though? Because was there a bug in production? Well, then there's already the mean time to resolution or the change failure rate, which encapsulates out of it. Well, I think there actually is a different reason here that they never Speaker 2 (12:25) You just did it wrong. built to learn. I'm sorry. Speaker 1 (12:32) So the thing is that this rework rate actually starts to include potentially non-technical reasons why you may need to do work over again. ⁓ was a mistake in the assumptions that you had while you were building your product or what your customers wanted or your users were expecting. so the change failure, I know the change failure, right. ⁓ the rework rate actually now includes that aspect which I think is valuable to add because before the door metrics were very technical in nature. They only applied to a single team or an organization that was engineering and didn't really encounter the whole business. And now we have the business included and so I think it's valuable to have this metric. Speaker 2 (13:06) Yeah, no, I agree. So, I mean, at this section, I found a little interesting and ⁓ as I said, they didn't really show this data in that format last year. So I have no idea how it compares, but it didn't seem like I would expect to those metrics to be slightly different shape if AI is truly revolutionizing our software development process. Speaker 1 (13:31) think the problem is that they're just not collecting it, right? They go out and they survey people about how they feel about their organization rather than the actual dora metrics themselves. So while we know which metrics are better and they do sort of, they do capture some of the survey responses which ask you. well, which of these categories do you fit in? Are you the highest performing? How frequently do you do deployments? And that could be accurate. There are areas where they're just not able to really get the answer because it is based off of how people feel. And so I do agree with you, though. It would be really interesting to see those specific metrics and how they change over time from year to year and where is that going? Speaker 2 (14:10) So, ⁓ the one weird thing, wait, should I actually say that what the data is? Okay. Let me just read it out because I obviously don't remember its number who the numbers who remembers numbers. So the lead time for changes, ⁓ basically very few companies out there, only about 12, 2 % ⁓ have more than six months between they commit. ⁓ code and it's live in production. Then between one month and six months that's 13 % and the bulk of the responses were between one week and one month that's 28 % and 30 % between one day and one week. Then less than a day is only 15%, less than one hour is 9%. Speaker 1 (14:57) then my question is going to be like less than one hour like from when to when. Speaker 2 (15:01) They specifically say from code committed to code successfully running in production. So here's the thing, if AI was really revolutionizing our work, I would see way more ⁓ frequent or shorter lead time for changes. I would also expect to see more frequent deployments and more rework, but that's not what we see here. I mean, I would assume that LLMs would just supercharge, right? There's more code churn, I would expect that. Speaker 1 (15:22) You mean if it was helping us. What everyone is seeing is that how it impacts the software development lifecycle isn't about shortening the whole feedback loop. It's about shortening this first part where you're actually doing the code creation part. Speaker 2 (15:43) But that's the thing, what's the point, right? It's not like the code creation is ever the bottleneck. I've been working in software for over 20 years. Not a single time have I thought, ⁓ things would be very different if we could only write our code faster. It would make everything better. We'd make so much more money if only our developers could write that code faster. I mean, that's just nonsense. Preach. Obviously it makes, what matters is when it's live in production. And if you have your deployments automated, your tests automated, then I see no reason why, you know, if you are able to generate that code faster, why, why doesn't that result in more deployments? I don't know. Maybe people aren't really using the LLMs the way it's advertised or the way they say they do, because that doesn't really fall in line with the rest of the report. Speaker 1 (16:40) here's something, right? If, as you pointed out by those statistics, which it's sort of hard to process them all at once, there's such a small amount of organizations or teams or engineers that are actually in the top tier that a majority of those organizations are spending so much time in the, like having before, like in the time it takes to actually do the deployment that. the organization may lend itself to having separation of a team that does development and another team that's responsible for release engineering. Speaker 2 (17:11) So like DevOps engineers, am I going to antagonize your audience if I say that? Speaker 1 (17:17) Before you get to that, one second. If there is different groups, and you only focus on one of those groups, and that group is, say, unquote, developers, and all they're doing is producing code, then they have no bottlenecks that are related to deployment and testing. And so it's very easy for that whole organization to say, yes, we are successful in using AI to do deployment, quote unquote, because that's what they see that's happening. They only need to generate their code, and they're doing that. Speaker 2 (17:44) don't know. like, here's the thing, like there are other parts of this that sort of make me think. Like, first of all, there seems to be a shift between like lead time for changes and deployment frequency. Like I would, in mature organizations, I would imagine them to be in lockstep, right? So between one month and six months, when it comes to lead time, only 13 % of teams seems to have that ⁓ lead time period. Whereas, Speaker 1 (18:02) Okay. Speaker 2 (18:11) A whopping 20 % says that this is their deployment frequency. more people, it seems that there is this delay or like you have, you know, it's basically those two stats don't align. I would expect them to be perfectly aligned because like, why do you have ⁓ less frequent, if you have less frequent deployments, then you can't say that your lead time is shorter. But what we see is actually the lead time seems to be longer than the deployment frequency. so, what, what? Maybe it's just really self-reported and people don't really know, so they just use their gut feel to... Speaker 1 (18:53) Well, I think this maybe points to the fact that there's a lot of organizations out there with processes that are backwards in unexpected ways, like ones where the time, like you maybe have some sort of scrum or sprint planning where you're doing the planning and it feels like there's a very short time from the time where you understand about a feature to do the development to the time in which you do the software development and get it released. Speaker 2 (19:15) But the lead time as it defined here, I I would love it to be from like when the feature enters the backlog until it's live in production. No, no, no. It's from code commit. So when you commit that code until it is live in production successfully. Speaker 1 (19:29) So what you're saying is that there are organizations which have a long lead time that maybe do have a lot of steps in their process, but still deploy frequently. Speaker 2 (19:39) Here's the thing, like 15 % says their lead time is less than one day. How many people say that their deployments are between once per hour and once per day? 6%. How does that make sense? Feature flags? No, I don't think this would capture feature flags, right? People lie about feature flags. People say, it's behind a feature flag. It's live in production successfully, do we know? No. Is any customer actually using them? No. Speaker 1 (20:04) I think you're onto something there. And I think this has been my personal issue with feature flags is that. Well, I think in theory they work great, like different customers, different users are exposed to different functionality separately when they should be or in through testing of the. ⁓ Speaker 2 (20:09) personal problems. Speaker 1 (20:22) whether or not that feature is actually usable. But I think what ends up happening is those feature flags or software that technology really is being utilized to gate turning on that untested code in production because. Speaker 2 (20:33) What do mean untested? We have a whole QA team testing this stuff in their staging environment all the time. Speaker 1 (20:43) I mean, assuming that's happening, then you could say that it is tested. But I know that as soon as you have a separate organization or a separate tool that enables you to allow you to get to the next stage of software development lifecycle with easier. putting it behind a flag and then getting it to production, people will utilize that and then use it as a crutch rather than validating that their code works 100 % of the time as much as they can. They say, well, you know, I feel confident and comfortable with where it's at right now and then push it out behind the flag. And then when you turn the flag on, it breaks. And the most critical problem I see with this is that it's not reflected well in the... ⁓ the lead time for delivering features. People often see if it's behind a feature flag, it's in production that counts. Whereas unless you're also counting, well, it shouldn't, right? But it's very difficult to also include a metric. I feel like it would be interesting in a 2026 report to see what the lead time is for removing a flag. Speaker 2 (21:30) That does not count. You they still have the same metrics in 2026 report? I really feel like they're trying to rebrand from like the DevOps because you know, Dora started as the DevOps report. Now they are AI reports. So there, I think we're looking at a full rebranding. So we'll see. Now what is interesting is that it seems to be ⁓ a lot of teams that are still running complex legacy software. least that's what I'm reading from this because... We say what, 28 %? Takes over a day to deploy? Speaker 1 (22:16) Yeah, I mean, that's a long time. Like you do the commit, but what if you include the pull request reviews? I see, Sometimes I think, well, you do a commit. There is a process there, which I don't necessarily see as going as fast as possible is super valuable. think the once everyone's agreed that this code is the right code, that moment to getting it out is important. I think the moment for your business or your team, once you've decided to do a feature, to getting to the software development is, you know, complete and ready for review, that's also valuable. And then also the metric of how long does it take to test stuff. although here's the thing though, longer time to test could mean that you are testing slow or you're testing more. And I think these sort of nuances don't really make their way into this part of the report. Speaker 2 (23:06) a good point, that's a good point. I mean, but this was really the one section that I thought was the most interesting in the whole 140 pages report, which says something about me or the report, I don't know. And then there's the whole, after this, there's this whole section about the... They call it AI, I'm going to call it LLMs, which some of it is interesting. I mean, what we can see is that the adoption appears to be universal because 90 % of people, say they use AI in some capacity. Now, what does that mean? We don't know because the question is a little bit vague, maybe on purpose. Speaker 1 (23:45) Well, it's interesting because I think in the same token when they asked that, they also asked like where people were utilizing it and something like 66 or 68 % were saying they use it for like image content generation or summarization. Speaker 2 (23:58) Those memes are not going to make themselves. Speaker 1 (24:02) I you think engineers are included meme generation. So. Speaker 2 (24:05) I would. If I get a survey, what do you use AI for? ⁓ Image processing, of course, like what would make those silly images, either for your PowerPoint presentations or just, I don't know, what do people do with that? Speaker 1 (24:19) So I'm definitely pessimistic here. I think the improvement that people feel with using AI in the last couple of years has come from finding places where AI should not be used and eliminating it from there. So, you know, by definition, you're left with fewer places, which means proportionally the value you're getting out increases, even if there's a net detriment. Speaker 2 (24:39) I mean, I do see there is, it really depends on, on what you're trying to achieve. find that those tools, the LLMs are good at sort of raising the floor. So they get you ⁓ an average or slightly below average result really fast. So you don't have to even know anything about the domain or about the area. ⁓ So you put in no effort and you get at the average instantly. That's fantastic. That's really awesome. And it's great. However, if you need something above average or something where the outcomes really matter, that's where you start seeing the shortcomings. Speaker 1 (25:17) Well then I could take that to a natural conclusion though and automatically suggest that you can never use an LLM anywhere where you want it to be your competitive advantage. Speaker 2 (25:27) Agreed. ⁓ But you know, you can use it to summarize your boss's lengthy emails or, you know, when you write performance reviews for your coworkers, you can just ask an LLM to completely make everything up and make it sound plausible and get that guy who you don't like fired. I'm not offering any advice, but... ⁓ Speaker 1 (25:50) This is a point. So basically you're saying if it's not a cornerstone aspect of your job, there's a lot of opportunity for the uses of LLMs in a way that can actually help you be effective. And maybe that's what we're actually seeing in the report where they're jumping on is that there are places where it's valuable, but it's not the cornerstone critical aspects that we're necessarily being hired for. Speaker 2 (26:08) So you see, the interesting part is like when they break it down into where or how people are using the LLMs, like what for basically. Oh, there's another thing. How long, how many hours per day people use it. The meantime seems to be two hours per workday, which is a long time interacting with AI. But that may include if your IDE has like automated suggestions by LLMs, that may be included. Speaker 1 (26:35) Yeah, I think it's difficult. Like if you are using a copilot or one of the LLM IDEs, then every time you type something, you immediately get a suggestion to code complete. Well, yeah, but then like how do you evaluate that versus the like the number of times you're using it versus like how long did that take? Right. If you count the tab time, like it took me, you know, 0.1 seconds to hit the tab key. Did I use LLM for 0.1 seconds or? Speaker 2 (26:46) and you just used AI. all the thinking time. I'm curious, like, okay, I would like to see those questions. When I filled the survey, I don't remember the AI section. Maybe because I got it and then I didn't have a lot to say, so there was a lot of NA's. I don't know. But yeah, so what people use the LLMs for? Like it seems like obviously there's a significant part of writing new code. think that 70 % people use LLMs to generate new code, which is fair. Literature reviews, so summarize this wall of text for me. Images, 66 % as you said. Then there's a lot of 60. I think it was a multi-choice question. Proofreading, writing documentation, creating test cases. Speaker 1 (27:47) Well, I think there's something interesting here, which is I think what we're getting at is for a report that's supposed to be focused on the engineering metrics and how they're improving over time with the impact of AI, it doesn't do a great job doing that. And on the flip side, the interesting things that we could be talking about related to AI feels like more of an AI-specific report, and it doesn't include any of those. Speaker 2 (28:06) I agree. I mean, that's why, you know, I feel I was disappointed with this report. I felt like totally unsatisfied at the end of it. I felt like there was a lot of narrative, a lot of sort of spinning, like how, we should think about the data. Uh, and the conclusions that they came to are not the same conclusions I came to just looking at the same data. So that's interesting. Speaker 1 (28:29) When they're when they actually decide to share, think the number one thing that comes out is what people's perceptions are is specifically LLMs or AI is fantastic in every possible way. But if you actually look at the measurements of what they call the product or software instability, which is really the measured quality of your product or your tool, your architecture that goes down. Speaker 2 (28:51) Yeah, so like basically individual productivity supposedly goes up according to the results. Team productivity goes up a little bit, but not that much. And yeah, instability increases. So we have stuff that's less stable and the product ⁓ success or how was that framed? These charts, I struggle to understand. Sometimes I feel like they decide to present data in a confusing way. Speaker 1 (28:57) reporting to the individual. Speaker 2 (29:19) Can we quickly just finish the, like how people are using AI because there are some interesting things that I wanted to talk about. Like, ⁓ people use it for code reviews. 56 % of people say they use AI for code reviews. And now I have to wonder, how do people define code review these days? Like if I use a linter, did I just do a code review? Speaker 1 (29:40) That's a good point. think what it means is like, is there any tool being used in your software development pipeline, your delivery pipeline that does something in an automated way? And I think a lot of those tools now claim that they include some sort of AI. So if you use a Semgrep or a Linter or Dependabot, those all have LLM in their name. Speaker 2 (29:58) Who knew I used AI in my code reviews for many years now? Printers are AI, right? They are artificial, right? Right? They seem like they're intelligent, right? They know what to do. They know how many, you know, tabs or spaces. They know. Speaker 1 (30:03) Well, I think when you were doing- Well, I think it's sort of like the thing where, where Microsoft says this number of organizations have multifactor authentication or using pass keys. They force it out on people and organizations may not even be getting the benefit of that security because they may not be able to use it effectively. Speaker 2 (30:31) So you think a lot of it is basically just enabled by default, so they have to say yes. Speaker 1 (30:35) It's like if you're using GitLab or GitHub and you're getting automated scanning tools in place. Yeah, I think. Speaker 2 (30:41) actually genuinely think 56 % of software engineers actively use AI for code reviews. I mean, if you think about it, code reviews tend to suck. At least that's what I hear. My code reviews are fantastic. Every time I was part of a code review, I had a great time. So I can't imagine why people would say code reviews suck, but that's the story that I hear again and again. People hate it. People hate doing code reviews. They also don't like when their code is reviewed because people are nitpicking. So I can totally see people just opting for an LLM chat, but hey, tell me what's wrong with my code or tell me how great it is. Speaker 1 (31:17) Do you think that during the software development process, whoever is the engineer that actually developed the code, they're directly asking an LLM for feedback and they're counting that as a... Speaker 2 (31:25) I would not be surprised if that is a significant ⁓ percentage of people. Speaker 1 (31:30) That's interesting. I think 56 would be very high for that. I also think 56 is very high if as a reviewer I saw the code and I then pass it to an LLM specifically asked, hey, what's wrong with this code? Speaker 2 (31:40) I don't know how to do a code review. Please, chat, help me. Is that how people do it? Maybe. I don't know. So another interesting thing is like 59 % of people apparently use it for debugging, which I always thought LLMs would be bad at debugging. Speaker 1 (31:58) I'm not even sure what that means, honestly. Speaker 2 (32:00) Well, you have a problem, you don't know what's wrong, so... Speaker 1 (32:02) I mean, know what debugging is. I mean is I don't understand how you would. Okay, I don't spend that much time doing software engineering anymore, but when I did, debugging was always my favorite thing to do. I struggled to see good opportunities to pull an LLM in there. do notice that because you just, what, run the code and that's it. You already know there's a problem. The LLM could be used for resolving the issue, but as far as finding it, Speaker 2 (32:06) what debugging is now. What if you have all those dependencies, you know, to other parts of the code, those libraries that could be their fault or, you know, how do you know what's actually going on if you haven't really seen this code before because it was written by that dude who's retired now and, you know, no one has touched it. It even has a comment, do not touch, do not change. that's the Speaker 1 (32:53) You know what I'm talking about. That's still better, think. mean, the better thing than having a comment that says do not change is a unit test that ensures that that thing has not changed with the reason why it's not. you know, the one step... Speaker 2 (33:04) That's just a basic has returned pass. Yeah, I seen those tests. you know what? A lot of those tests actually have been written by LLMs. I have seen that too. Just writes a lot of unit tests. You have a hundred percent coverage. Most of them return pass. That's how you get great results, great productivity. Speaker 1 (33:22) And you're insured that all your code is actually in fact Speaker 2 (33:25) High code quality. Speaker 1 (33:28) Yeah. Speaker 2 (33:29) Anyway, mean, what an interesting, another interesting part was that, ⁓ when people say like which mode of interaction with, ⁓ with LLMs, like a lot of people obviously use chatbots. Some people use, their IDE. ⁓ I expected a lot of people rely on those agents because that's the, you know, the huge breakthrough that was advertised, but apparently 61 % say they never use AI in agentic mode. Speaker 1 (33:55) Well, I think that's probably because of the result of it being a fairly new and recent identification. that new! Well, I think the term is new. And so they may not have been realizing what they were doing could have been categorized. Speaker 2 (34:06) ⁓ They would be realizing what they're doing. They would know very well exactly what they're doing. Speaker 1 (34:12) I'm just saying if I said, you know, have you been using a Gentic and I didn't first explain to you what I meant by that in the question or the survey. Speaker 2 (34:19) think would know because all of those modes are actually... you have to pay extra to get that. That's how the feature is called. The companies offering the tools will let you know that it is their latest invention, everything will change from now on. Speaker 1 (34:34) Yeah, but when you put it like that, it's really simple. like the IDs, like for instance, think it was only recently that GitHub in their co-pilot in VS code supported a slash command for agent, for controlling agents, for configuring agents, for specific LLMs to go off and asynchronously build stuff. So that's actually also really recent. So the terminology and the access to the functionality from a provider, think would be the tool. main limiting things there. ⁓ But yeah, I agree. think fundamentally, it's like someone finally figured out a way in which to make an LLM usable for real, which is like, don't want an answer right now. I want you to go off and think about the problem and come back with a solution. But the way that I've seen it from Microsoft, and we went into this a lot in the VS code. episode of Avengers and DevOps, so go and check that out. But realistically, it's like you can go and have it solve all your tickets in your backlog and come up with Speaker 2 (35:30) And then all your tests are passing. However, the production database is missing. Speaker 1 (35:36) So I think ⁓ at this point, there's just like so many examples of that. the most recent one is I think Google anti-gravity just failed on that approach. I was making- What happened? I didn't really look into it, but I think fundamentally, I think someone was on Windows and it accidentally ran a command to just delete one of their entire hard drives. Speaker 2 (35:55) I thought those things only happen on Linux Speaker 1 (35:58) I think there's something on Linux called permissions, which usually prevents this. Speaker 2 (36:02) Yeah, but then you just basically do pseudo, you know, isn't that the joke? Speaker 1 (36:08) I have one touch pseudo access on my laptop and desktop, I actually have to physically press, you know, my. Speaker 2 (36:15) You know, if you want access to his computer, you actually have to be physically touching that key. Anyway, mean, this whole agentic thing, it reminds me of... So I have a friend. Yes, I do have a friend. I have friends. ⁓ True story. Anyway... ⁓ Right. So we met like a bunch of months ago once the agents were just like the new thing. I don't know how long. I think it was just a bunch of months. Speaker 1 (36:32) gonna cut that all out the episode. Speaker 2 (36:45) And he obviously tried them. ⁓ He's a very seasoned senior software developer, a very, very good one, like staff plus. So he's like, ⁓ yeah, I will have my army of little minions to do my bidding. And so he tried using those agents and he was so excited. like, this is great. It's fantastic. It's actually really decent code. Like the reasoning is that it makes sense. So I was like, okay, nice. And then not so long ago we met again. And I asked him like, how is your, you know, how are your agents doing? And he's like, no, I'm not doing that anymore. Why? Well, it's just not that good. like, hmm, what's changed? just, you know, it's good when you don't have like a clear vision of what you want or clear, precise requirements. But as soon as you have a concrete thing that needs doing, then this just falls apart because there will be all those little discrepancies that you then have to go and do yourself pretty much, which at that point it's faster to just do it yourself from the beginning. So maybe that's why, I don't know. Speaker 1 (37:51) Okay, so what else was in there? ⁓ Speaker 2 (37:54) ⁓ What else was in there? There was this whole effectiveness, like does AI make us faster? ⁓ that was the whole, my conclusion from that section was ⁓ the data was not very clear. It's either too subjective or people don't really know what they're talking about. Dora team really tried to get something out of that. mean, obviously if you're going to publish a report, you have to write something. So there was a lot of narrative, but not that much meat in that whole thing. Their conclusion was that ⁓ at a lot of companies, their process gets in the way of truly embracing the AI or of truly realizing the benefits because they do see a lot of increase of individual productivity, what you were saying, but that doesn't translate to the overall. ⁓ output and they blame the process. say, okay, we need to work, we need to learn to work differently to make use of that. Now I get a slightly different picture. I do see like making individual engineers more productive will not mean that we'll get better code or better products because fundamentally I'm not hiring engineers to write code. I'm hiring engineers to solve problems and for that they need to collaborate. So. Does ⁓ LLM or AI help us with collaboration? I think the answer is no. Speaker 1 (39:22) I think there was actually a conflicting perspective here that we just realized that if. ⁓ It's actually worse than just solving a problem that's not your bottleneck. If you have a bottleneck in your system, your software development life cycle, and let's assume that's the pull request. And from my own personal experiences, and I'm sure this is different for others, for one hour of software development, it was like two to eight times amount of testing in the code review process, which makes sense. You have to teach someone else about the process, understand what's there, then actually review it for real. And then some back and forth on how to actually refine it before getting released. So I can see a discrepancy there. improve the bottleneck, is the pull request review. And you only increase more code. You actually create a bigger burden on other parts of the process where the bottleneck is. Speaker 2 (40:07) But then you just use LLM to do the code review, right? 56 % of people do that. Speaker 1 (40:13) You're solving the whole code review though. think is sort of the lie that's included Speaker 2 (40:17) Something must be not quite right. I mean, there's this other thing, the LLMs have been around for what, three years, four years, five? Three years, say. So let's say two years since it was really available, truly usable way to developers. ⁓ And, you know, this whole vibe coding thing exists for quite a while now. ⁓ And so I would imagine if this worked, I would imagine like a deluge of new software appearing everywhere. We would see a lot of new little startups, like indie hacker projects. We would see a lot of, even if you look at your phone in the app store, we would see like hundreds and hundreds of Tetris clones or, you know, Candy Crush clones. Like where is that? Even when you look at the domains that are registered, you would see a lot of those because people would want to at least register the domains for the new VibeCoded project. That's not happening. So we don't actually see more software being produced, which is like, why not? Speaker 1 (41:24) Well, I think you sort of alluded to this before and I read your blog article on the topic about raising the floor up. And so I think that there is a lot more software distinctly being created by people who haven't had the capabilities of doing it before. So before they would never been able to build their own Tetris clone. And now they absolutely can, but they don't go and release that software anywhere. So those software being created in Deluge isn't ⁓ by companies that are actually trying to make a Speaker 2 (41:54) Yeah, but why not? mean, we actually live in the middle of a hustle culture. It's actually really cool to have your own company and say, I'm an entrepreneur. So those people who created Tetris clones for their own use, they're happy with it. They would absolutely try to monetize. We would see that. We would see at least a blip and we don't even see that. So I don't know. Something just does not add up. Speaker 1 (42:22) think it's the bottleneck. think that just answers a lot of the questions there. Whereas, like, the problem is not actually doing the software development. And that's the thing that's tough. Speaker 2 (42:30) Right. mean, that's, that's my thinking because to solve the problem, coding is actually a very small part of that. Like you have to understand what, what is that? Why are we doing this? Is this really the right thing to do? Like, is that what the users want? And I mean, one thing, if the users say they want this very often, they have no idea. Even if they genuinely want this, will they pay for it? How do we make sure that they pay? How do we make sure that they actually know that it exists? You know, all that, you know, boring marketing stuff. And yes, LLMs can absolutely generate you a marketing copy. Speaker 1 (43:05) So this is a corollary for that, I think you identified early on, which is that if it were the case that we were getting a lot of value out from this, we wouldn't see the large providers out there pushing those features on for us. They would be charging specifically for that value. you think of the- Speaker 2 (43:21) They are charging absolutely like they increased prices on like even if you didn't ask for that they increase the price because they now it comes with AI Speaker 1 (43:29) Well, that's my point. My point is that it would be able to be separated by the value specifically that was being offered there to end users. So email summarization, et cetera, as a core feature rather than something that was thrown into the package. And then the price is increased when no one asked for that. It doesn't necessarily help me to auto complete my sentences in my email. It doesn't really offer me that. Speaker 2 (43:51) Yeah, so this is the thing. I do think LLMs are great. They're fantastic tools. However, they are good for sort of low value work. The work where the stakes aren't very high and maybe it won't necessarily, the output won't make you money. And as such, it's fine if they're cheap, but I mean, I think right now they're still offered sort of below the cost because everyone's trying to grab the market share. But the underlying cost of running this technology, I feel like it's a little bit too expensive right now for the value that they offer. Speaker 1 (44:25) It's interesting you bring that up. think that there's a couple different levels here. There's the fundamental level that's good. The companies are creating foundational models and they are for sure subsidizing everyone's use of this. company, every time you make an API request or enter a prompt, the company that you're using that to actually lose its money. But then there are these companies that are sitting on top of them which are getting the benefit of that price reduction and are potentially able to make a sustainable business off of it for their customers. And of course, they're subsidizing it as well because who's subsidizing actually is the VCs, not the company itself. But their hope is that the foundational model companies actually go out of business because they can't recoup the losses that they have. ⁓ And realistically, new companies spin up that create cheaper foundational models that they can utilize. Speaker 2 (45:12) I mean, this is the standard from worldly map. I don't know if this is the right audience to bring this up, but the worldly maps basically talks about the product evolution. At some point you end up with a commodity. I think it's definitely in the interest of people providing more complex AI solutions. It's in their interest that the foundational models become commodity. They become worthless almost. Speaker 1 (45:37) So do you think that we'll get to a point where LLMs are being provided by the state, by governments, as a critical resource that everyone should have access to? Speaker 2 (45:47) We all are working in the mines to support the... Speaker 1 (45:52) the power necessary to drive them. Speaker 2 (45:54) We the coal mines, right? Because it will all be powered by... I'm sorry. I didn't mean to... This took a really dark turn. Anyway, another fun statistic, not from the Dora report, one that was a little bit confusing to me. Apparently, GitHub Copilot says that only 30 % of their LLM-based code suggestions are accepted. So if you get a suggestion from LLM, only 30 % of those are accepted. Speaker 1 (46:20) actually competes with that what was it 56 % of the no no the not that one Speaker 2 (46:23) and code review. 66 % use it for gen, no 70 % use it for code generation. Speaker 1 (46:30) Yeah, right. So there's a little bit of a disconnect here. 70 % user for code generation, 30 % of the time they were using the LLM while doing software development. And then 30 % of the code suggestions are being actually accepted. This tells us, yes, people are using LLMs all the time, but they're not actually accepting the output specifically. Speaker 2 (46:50) I mean, from what I've heard, also very often people take the code into a different, into a chat window and that's when they work. Right. So, however, Speaker 1 (46:59) trust in how I'm running anywhere in even a container on my machine. Speaker 2 (47:03) wow. Anyway, right. I mean, right. Anyway, like what I found interesting is that GitHub co-pilot wrote a whole article about it and they seem to think that 30 % is fantastic. They shout about it from the rooftops as if it was a great result. I'm like, what world do we like? 30 %? That's not a lot. I would, I would be embarrassed. Speaker 1 (47:06) I don't want to it. Speaker 2 (47:32) quite frankly, but apparently that's labeled as a success. I mean, think it tells more about me than the technology. Like I just don't get it. Why is 30 % good enough? Like I would want, you know, at least 50 % of those suggestions to be helpful and I would want to accept them. Otherwise this is just distracting and annoying. Happy? Anyone? Speaker 1 (47:50) I think there's another I think there's another problem here though, which isn't just the percentage of code suggestions that are being accepted. It doesn't really tell us about which code suggestions are being accepted. instance, I find a very high success rate for the first suggestion that an LLM makes on a single line of code. the second suggestion that comes up right after that for the second line to generate is then increasingly wrong and so on and so forth. By the third line or the fourth line, it's now nowhere close to anything that you wanted at all. So the 30%, you know, maybe a 30 % across all the suggestions where the first suggestion could be like 80%. And so where's the value? The value isn't on that next line. That next line is something that everyone knows what it should be, right? It's like a log statement. It's an if condition, et cetera. The real problem is like what the meat of what you're putting in there. And if those are the ones that I bet are way less than 30%. ⁓ Speaker 2 (48:49) I mean, but that does again corroborate that it really gets you, it gets you to that mediocre point much faster. So it speeds up the initial part of, of creating code, but it doesn't necessarily get you all the way there. And then, okay, you still have to figure that out. And that's the part that usually takes the longest and the most effort, at least from my experience. I don't know. I'm probably a really bad software developer. So what do I know? ⁓ Speaker 1 (49:16) Yeah, you're definitely a really bad software developer. Speaker 2 (49:18) Don't let me anywhere near code. Speaker 1 (49:20) I've seen your code. ⁓ Speaker 2 (49:22) What was seen cannot be unseen. Let's not talk about that. Anyway, the other part of the report, I feel like really there wasn't that much. It was all AI, AI, AI. mean, the report itself is no longer called DevOps. I don't know if I said that already. It's called a state of AI assisted software development. So if you try to download the report, you're looking for Dora anywhere. No, that's not there. Speaker 1 (49:25) It was Ruby. Almost makes me not want to read next year's. Speaker 2 (49:51) I'm actually going to read it because I'm curious how it will compare. And I hope some of the statistics will be carried over. Like they did carry over some of the effectiveness situations. say, okay, yeah, we do see an improvement. But then when you look at the data, it's like, that's not that much of an improvement, but they still use it to craft that narrative that, ⁓ okay, the organizations are learning how to, you know, deal with the technology, which I think is more of a wishful thinking. Anyway, they do speak a little bit about the platform engineering. ⁓ It seems like that is the trend that they didn't manage to really pick up on or benefit from. Speaker 1 (50:29) The were stubborn there as well, like with the AI, because in the last year, like I said, they had a negative impact on the organization, which if you actually really understand what the value that's being added, these teams often are behind the curve. They're reactive on what's actually happening and they tend not to think about platform engineering as a product. we see the same this year as well. People think that like AI, there's a positive impact on the organization when you have a good understanding of what your platforms can be. what your internal development or tooling can be. But when you actually look at the impact, the software instability increases as well. Speaker 2 (51:06) Yeah, I mean, that's... But that's like every tool, right? What you said, that if you use it correctly, it does what it's supposed to do. If you use it incorrectly, you can actually hurt yourself. Speaker 1 (51:17) Well, I think the solidification or the codification of those practices is a problem, though, because a lot of organizations aren't doing the best possible thing at all times. And then they go and they take the step of solidifying that process when there are mistakes in it. And so the question is, is right now, you know, if you're thinking about this, does your organization have a process which you can guarantee with 100 % accuracy that it is exactly what that Speaker 2 (51:42) 100 percent at all times, Speaker 1 (51:44) Not every process, just just one process that you have that is perfect. And if you don't have a perfect process, the way you're doing code, the way we're doing code. Speaker 2 (51:49) Code reviews. No, just do code reviews. And also do testing. Speaker 1 (51:55) to test it. Well, there's an argument like sometimes maybe you don't need to test something. Speaker 2 (51:59) Because the users will test it and they will happily pay for it with games. Anyway, that's not the Rail, the conversation. I mean, honestly, I would love that to be more in the Dora report, but that's pretty much it. I mean, you've heard it all. So there were some interesting statistics, but overall a lot of narrative. And you know, like when I was reading, I didn't fully read every word of the narrative because at some point I'm just like, I can't. But I got my spring poop moment. You don't know what I'm talking about. Speaker 1 (52:30) You're what? Not particularly. Speaker 2 (52:35) So we live in Switzerland and there's this thing that happens here every few times in a year where you go outside and there is this smell of manure everywhere. And that happens like no matter where you live, you can live in a city. I live in a city. It's still there and it's most prevalent like in the beginning of the spring, like the first good day after the winter. And that's basically farmers spraying the cow poop over the fields because it accumulated over the... winter. And so I associate that smell with spring. And so, you know, when reading that narrative for Dora Report, it's just basically that experience when you go outside, ⁓ hopeful, excited, with anticipation and things are great, but there's this thing in the background that's not exactly pleasant and it's like everywhere. So, yeah, that was the experience. Speaker 1 (53:33) So now that you've read this, there must be something though that you still feel like you could apply to our organization, to our company. Must have been some insight, which could be... Speaker 2 (53:43) I mean, it's not really a result of the report because this is something that I've always said. I mean, I do treat LLMs like tools, like IDEs or your operating system. I don't care if you use it or not. If you're an engineer, if it helps you use it. mean, figure out how to make, I'll help you to figure out how to make it useful, but it is really your own thing. ⁓ I don't expect that it will make us faster or better, will produce better code. mean, Maybe it will if it makes the engineer happier or more efficient individually, but ⁓ that's a marginal change. So that's basically what I've gathered from this report. And I always thought about that. It's like for the comfort of individual engineers, sure, if it makes your life easier. Speaker 1 (54:31) Okay, with that, I wonder if we should close out this episode and move over to Pix. Like, you know, I asked you to bring something for this. Speaker 2 (54:37) Pics, like, ⁓ Mine's picks, sorry, pickaxes. Yes, bring, bring, bring. Commerce. ⁓ No, I actually didn't bring anything. However, I have ⁓ more of a concept of what you could do because I've done that. just have nothing to show for it. ⁓ Mushrooms. ⁓ I mean, before you go in a weird place, I went and bought a mushroom kit from a grocery store. ⁓ Pearl oyster mushrooms, delicious. So was basically a box. You can just prepare it a little bit, open it up and mushrooms just emerged and it's fantastic. I love watching mushrooms grow. They grow so fast. It's like almost in front of your eyes. was fantastic. Great. I have tons of pictures and I ate them. So yeah, I'm sure like in your area, because like mushrooms tend to be very local, I'm sure in your area someone sells mushroom kits for your local varieties. Like oyster mushrooms, fantastic. ⁓ Get yourself a mushroom kit, it's fun. Speaker 1 (55:41) I think once you start seeing mushrooms, I never saw them when I was in the US. I never paid it close attention and now since I've moved to Europe, I now I see them everywhere during this. Speaker 2 (55:50) Mushrooms are everywhere. I fungus, fungi, they're everywhere. I love mushrooms. Speaker 1 (55:56) I see. So a particular grow kit, you know, go out and get one and try it. And they're cheap too. Speaker 2 (56:02) Yeah, they were actually cheap. mean, I didn't really do it for monetary reasons. The experience of watching the mushrooms grow, that was worth a lot. But yeah, mean, if we just say how much would I have paid for the mushrooms, obviously I got more out of that kit than... Speaker 1 (56:10) Wow. Okay. I like it. Okay. What did I bring? So my pick for this week is an article that talks about the maximum effective context window for LLM. So I think it's maybe a little bit relevant and it's interesting because I find that a lot of people keep on saying how we're almost to the, like any problems that we see with LLMs will eventually get solved by increasing the context window. And this article really points to the fact that that may not actually be true. So the article is ⁓ context is what you need, the maximum effective context window. And it really points out that if you increase the context window and put more tokens into the prompt, that the LLM will struggle to identify what's actually relevant, what's the most important piece of information. I think the same thing is true for humans. So it's not surprising that this result. Speaker 2 (57:06) For example, right now I no longer know what you were talking about. Speaker 1 (57:09) about. Okay, so just to reiterate, ⁓ increase in the context window is actually problematic and we're hitting a fundamental limit, which means that larger sizes, 1 million, 2 million tokens aren't actually going to be valuable for us. We need to find a way to pass the right amount of information in and we're already at that point, which means all the innovation that would come isn't going to be around increase in the context window or getting the memory right or anything like that. I think we're fundamentally stuck as far as this technology goes. So it's an interesting read about how they evaluate context window and they call it like a needle in a haystack. of problem where you have a technology that you're utilizing and you want to have it figure out what the parts of the prompt that are actually useful to process to handle it. I think, I don't know, something interesting about that. Okay, so with that, thank you Dorota so much for coming on today's episode. Speaker 2 (57:51) Hmm, I agree. Thank you for having me. hope I didn't offend anyone. Maybe the Dora authors? I'm sorry. I really appreciate that they're doing this report. It's just this year was a little bit disappointing. Speaker 1 (58:01) ⁓ well, I Well, we always get some angry emails, so you can throw them on the complaint pile ⁓ and we'll promise not to go through them. So thank you so much, Dorota, and thanks to all the listeners for listening to this episode, and we'll see, hopefully, everyone back again next week.