1
00:00:00,174 --> 00:00:01,350
you

2
00:00:07,790 --> 00:00:10,750
Welcome back to another episode of Adventures in DevOps.

3
00:00:10,750 --> 00:00:18,150
And today we have a special episode where we're going to finally review the 2025 Dura
Report findings.

4
00:00:18,150 --> 00:00:21,050
And I wasn't really sure what the best way to do that is.

5
00:00:21,050 --> 00:00:31,770
So I just figured I would bring in international speaker, tech entrepreneur, and a long
time CEO of Authress, Dorota Parod.

6
00:00:31,874 --> 00:00:33,275
Hi, that's me.

7
00:00:34,398 --> 00:00:36,561
So, Dora Report, huh?

8
00:00:36,903 --> 00:00:38,484
Finally, huh?

9
00:00:39,060 --> 00:00:41,858
I just came out so I don't think we're too far behind and actually

10
00:00:41,858 --> 00:00:43,327
been months, I feel like.

11
00:00:43,327 --> 00:00:45,438
oh That's how it feels, at least to me.

12
00:00:45,438 --> 00:00:46,186
uh

13
00:00:46,186 --> 00:00:50,734
took some time to actually get through it because it was not the shortest thing I've ever
read.

14
00:00:50,734 --> 00:00:54,286
Yeah, I mean it's like what, 120 pages or something?

15
00:00:54,286 --> 00:00:57,406
I think the actual like release was 142.

16
00:00:57,406 --> 00:00:58,412
You already forgot the number.

17
00:00:58,412 --> 00:01:02,711
Yeah, but there's a lot of like acknowledgements and the sponsors and all of that.

18
00:01:02,711 --> 00:01:04,399
I ignore that.

19
00:01:04,399 --> 00:01:08,213
yeah, there was one interesting session that I didn't pay too much attention to at the end
of the report.

20
00:01:08,213 --> 00:01:12,180
They actually talk about their like programmatic way in which they parse and manage the
data.

21
00:01:12,180 --> 00:01:16,878
methodology, yeah, which is I'm actually happy that they included that because...

22
00:01:17,761 --> 00:01:20,488
well, should we start from the beginning?

23
00:01:20,488 --> 00:01:22,526
Yeah, let's just jump in.

24
00:01:22,644 --> 00:01:24,096
No, no, no, not jump in.

25
00:01:24,096 --> 00:01:24,897
I'm not ready.

26
00:01:24,897 --> 00:01:26,558
I'm not ready to jump in yet.

27
00:01:26,558 --> 00:01:27,378
No.

28
00:01:27,639 --> 00:01:30,422
All I want to say, I just want to get it off my chest.

29
00:01:30,643 --> 00:01:31,984
I'm really disappointed.

30
00:01:31,984 --> 00:01:42,094
I'm really disappointed with this year's Dora report because it feels like a lot of fluff,
a lot of text, a lot of narrative, not that much data.

31
00:01:42,806 --> 00:01:44,880
I mean, can you really blame them though?

32
00:01:44,880 --> 00:01:49,497
I mean, they look at everything else that's been printed online recently and it is all
about the fluff.

33
00:01:50,062 --> 00:01:51,182
Is it?

34
00:01:51,182 --> 00:01:55,482
mean, okay, I feel like it didn't have to be 140 pages.

35
00:01:55,482 --> 00:01:56,142
That's true.

36
00:01:56,142 --> 00:02:00,398
It could be condensed easily and it would be of the same value, at least to me.

37
00:02:00,398 --> 00:02:00,889
Interesting.

38
00:02:00,889 --> 00:02:03,453
What would the best format of the report be you think?

39
00:02:03,453 --> 00:02:07,126
Like if you could pick what information you actually wanted included.

40
00:02:07,126 --> 00:02:09,078
Well, mean, show more of the data.

41
00:02:09,078 --> 00:02:12,992
I would actually even maybe show more than they did.

42
00:02:12,992 --> 00:02:18,388
And I understand that maybe not all of it made sense or not all of it fit the narrative.

43
00:02:18,388 --> 00:02:22,441
And also maybe shrink the narrative a little bit.

44
00:02:22,442 --> 00:02:26,285
I prefer to look at the data and figure out my own story.

45
00:02:26,422 --> 00:02:32,352
So like most of it it seemed like it was pretty much dedicated to AI is the best thing
ever.

46
00:02:32,736 --> 00:03:02,310
Yeah, Yeah, yeah, uh

47
00:03:02,444 --> 00:03:06,990
I have some data and then you write a book, maybe even publish them at the same time.

48
00:03:06,990 --> 00:03:07,802
That's fine.

49
00:03:07,802 --> 00:03:12,288
If it's clear that the book was made based on the database, based on your findings.

50
00:03:12,288 --> 00:03:19,597
But what happened here, I feel like the dude wrote a book and now we need to support the
narrative.

51
00:03:19,628 --> 00:03:25,845
Maybe let's get to actually what the report says and then we can potentially pick it apart
at the end.

52
00:03:26,294 --> 00:03:31,068
What the report says is, uh you know, the AI is revolutionized.

53
00:03:31,068 --> 00:03:34,101
No!

54
00:03:34,101 --> 00:03:35,334
That's I felt.

55
00:03:35,334 --> 00:03:46,511
I I genuinely thought that at some point when reading that executive summary, my God, at
some point I had this thought, my God, they fired all the researchers, all the people who

56
00:03:46,511 --> 00:03:54,798
work in the report, they just replaced them with an intern, sitting in front of an LLM
chatbot, whichever one they use, and the prompt was...

57
00:03:55,106 --> 00:04:02,211
please generate a Dora style report showcasing how AI revolutionizes software industry.

58
00:04:02,211 --> 00:04:03,772
And this is what we got.

59
00:04:03,953 --> 00:04:06,965
So it was really hard for me to go through that executive summary.

60
00:04:06,965 --> 00:04:08,856
And I usually like executive summaries.

61
00:04:08,856 --> 00:04:11,278
mean, who doesn't love a good TLDR?

62
00:04:12,186 --> 00:04:21,232
Well, I think when I'm thinking about the data and as it applies to our business and other
businesses out there, you're a technology company, I that's really what Dora is for.

63
00:04:21,232 --> 00:04:26,416
It's for software engineering based organizations within larger companies.

64
00:04:26,416 --> 00:04:32,100
I think, at least me personally, I'm all about not just what the high level conclusion is.

65
00:04:32,100 --> 00:04:35,246
I do really care about the why behind it.

66
00:04:35,246 --> 00:04:37,586
Yes, but I would like to see both.

67
00:04:37,686 --> 00:04:46,466
So as I said, I generally appreciate executive summaries, but this time I just felt like,
oh, it put me off from reading the whole report.

68
00:04:46,506 --> 00:04:53,646
you know, if any of you are trying to actually go through the report and you see, okay,
140 pages, oh, can I get at ELDR?

69
00:04:53,646 --> 00:04:57,920
I mean, the executive summary is not, I mean, it will get you stuck.

70
00:04:57,920 --> 00:05:01,962
If they haven't read the report and you're watching this podcast, this is your TLDR.

71
00:05:01,962 --> 00:05:08,517
So hopefully you don't need a TLDR for your TLDR, which would be that the report says AI
is the best thing ever.

72
00:05:09,048 --> 00:05:13,128
But that's not what the report says even, so that's weird.

73
00:05:13,400 --> 00:05:14,370
That's true.

74
00:05:14,370 --> 00:05:24,193
Like, if you go down to the actual sections, the thing to know about the report is that it
doesn't actually review the measurements of, the Dora metrics from organizations.

75
00:05:24,193 --> 00:05:26,214
There is very little of that.

76
00:05:26,214 --> 00:05:32,015
Most of the report and the AI section is focused on how people feel about how AI is
helping them.

77
00:05:32,015 --> 00:05:38,017
And I feel like there's a big disconnect between how people think about it and the actual
impact on organizations.

78
00:05:38,017 --> 00:05:40,878
And that's like the one thing that I actually read from the report.

79
00:05:40,878 --> 00:05:42,609
Yeah, I mean, that's what they basically say.

80
00:05:42,609 --> 00:05:53,797
They call them stubborn results because they saw some of that like last year where
increased AI adoption causes more friction, more instability, and that sort of doesn't fit

81
00:05:53,797 --> 00:05:58,851
the rest of the picture because people also report that it makes them more efficient.

82
00:05:58,851 --> 00:06:00,872
So something does not add up.

83
00:06:01,093 --> 00:06:06,187
But there was this other research actually, meter, I don't know how you pronounce it.

84
00:06:06,187 --> 00:06:08,098
Miter, something like that.

85
00:06:08,098 --> 00:06:14,082
But they basically uh reviewed a bunch of senior contributors to open source software.

86
00:06:14,322 --> 00:06:22,247
And uh those engineers basically said AI is making me faster, 30%, 40%, whatever the
numbers were, I don't remember.

87
00:06:22,568 --> 00:06:32,995
But then they actually measured the task completion and it turned out that AI made them
slower, almost the same amount, like 20 or 30 % slower.

88
00:06:33,215 --> 00:06:37,216
So I do think there is a discrepancy between what people perceive.

89
00:06:37,216 --> 00:06:39,273
and what actually is happening.

90
00:06:39,670 --> 00:06:40,770
Yeah, I think this was the same thing.

91
00:06:40,770 --> 00:06:44,136
So last year the report, if you didn't read it, focused on two core areas.

92
00:06:44,136 --> 00:06:48,842
One saying that the impact of AI on the industry and the other one was platform
engineering.

93
00:06:48,842 --> 00:06:57,013
And both of them actually had the same impact, which was people think it's great, but when
you look at the impact it actually has on your organization, it makes everything worse in

94
00:06:57,013 --> 00:06:57,793
a way.

95
00:06:58,399 --> 00:06:59,841
That's not what I read.

96
00:06:59,841 --> 00:07:01,683
They frame it as an amplifier.

97
00:07:01,683 --> 00:07:04,897
So uh if it's good, it's great.

98
00:07:04,897 --> 00:07:07,212
If it's bad, it's really bad.

99
00:07:07,212 --> 00:07:14,970
I was actually just reading an article that was suggesting how that's almost like a
nonsensical response though, because it seems very specifically like, oh, you have a tool.

100
00:07:14,970 --> 00:07:17,153
If you use the tool correctly, things get better.

101
00:07:17,153 --> 00:07:18,764
If you use the tool wrong, everything gets worse.

102
00:07:18,764 --> 00:07:20,126
But that's like a totology.

103
00:07:20,126 --> 00:07:24,099
It's not a totology, but it's sort of like one in which, of course that's true.

104
00:07:25,401 --> 00:07:28,465
If it correctly, then everything gets better.

105
00:07:28,465 --> 00:07:29,686
Well, not everyone agrees with that.

106
00:07:29,686 --> 00:07:31,368
It's the same with monolith, right?

107
00:07:31,368 --> 00:07:33,860
If you do this correctly, then it's great.

108
00:07:33,860 --> 00:07:38,034
If you do it wrong, then it's really wrong.

109
00:07:39,596 --> 00:07:44,320
No, but you know, there's this other sort of secret part to it.

110
00:07:44,320 --> 00:07:52,368
I mean, it's the secret I've learned that some tools make it really easy to use them
correctly and other tools just make it super hard.

111
00:07:52,368 --> 00:07:54,616
Yes, you can still use it.

112
00:07:54,616 --> 00:07:55,157
correctly.

113
00:07:55,157 --> 00:08:05,629
You can do a monolith correctly if you're smart, if you're disciplined, if you really
apply modular architecture and maybe you don't have a gigantic team.

114
00:08:05,870 --> 00:08:14,210
I don't think it's such a long jump to actually compare that to AI realistically, because
I think you're onto something.

115
00:08:14,270 --> 00:08:20,830
With the AI, it is really a challenging tool to interact with that's very temperamental.

116
00:08:20,910 --> 00:08:27,022
One day maybe works a little bit, and other days screams and stomps its feet on the
ground.

117
00:08:27,022 --> 00:08:29,724
And then apologizes because of course you're right.

118
00:08:29,724 --> 00:08:30,544
You're totally right.

119
00:08:30,544 --> 00:08:31,905
didn't, I got that wrong.

120
00:08:31,905 --> 00:08:33,185
Like me trying.

121
00:08:34,427 --> 00:08:48,395
So, know, like reading at the, some of the actual metrics in the report, I did have that
thought of, you know what, LLMs actually make people more likely to collaborate less.

122
00:08:48,655 --> 00:08:53,078
It encourages more like individual activity, which I'm sure it feels great.

123
00:08:53,078 --> 00:08:58,783
if you're an introverted software engineer, because you don't have to go to meetings, you
don't have to talk to anyone.

124
00:08:59,104 --> 00:09:08,713
But we, I mean, I've worked in software long enough to know that you get much better
results if you have multiple people collaborating on solving that problem.

125
00:09:08,774 --> 00:09:12,397
So maybe that explains some of the weirdnesses.

126
00:09:12,397 --> 00:09:13,730
Anyway, should we?

127
00:09:13,730 --> 00:09:16,071
Well, before we get to that, I think we keep on saying AI.

128
00:09:16,071 --> 00:09:22,445
want to be clear that when we talk about, we don't have AI for sure, but even if we did,
that's not what's being discussed in the report.

129
00:09:22,445 --> 00:09:29,399
What's being really discussed when they say AI is they mean the set of LLMs that we have
out there, uh the chat GPTs, Gemini.

130
00:09:29,399 --> 00:09:30,569
I know some people say Gemini.

131
00:09:30,569 --> 00:09:35,388
I don't know what the correct pronunciation is, whether it's a British English or...

132
00:09:35,388 --> 00:09:39,670
Googlers call it Gemini and it comes from the United States of America.

133
00:09:39,670 --> 00:09:54,266
So I have to say it with American I'm sorry Why not call it an AI don't you think it's
really intelligent and I mean it's it's Sometimes you talk to it and then appears like it

134
00:09:54,266 --> 00:09:55,566
it has consciousness

135
00:09:55,566 --> 00:09:59,786
think the appearance is sort of the problematic thing.

136
00:09:59,786 --> 00:10:08,466
We know that it's just the statistical probabilities and that is pretty much the derived
from the technology, the architecture that we're utilizing today.

137
00:10:08,466 --> 00:10:13,308
And that's been the case for the last five or six years and we haven't seen any changes to
that.

138
00:10:13,308 --> 00:10:23,367
So, you know, I almost feel like the label AI is misleading, is false advertising, and it
may be a little bit harmful even, because it's like with climate change.

139
00:10:23,367 --> 00:10:29,151
Now we call it climate change, but it used to be called global warming, which was really
bad because it doesn't necessarily mean warming.

140
00:10:29,151 --> 00:10:30,493
Some areas will get colder.

141
00:10:30,493 --> 00:10:34,216
uh And AI is also a label like that.

142
00:10:34,216 --> 00:10:35,537
People think intelligence.

143
00:10:35,537 --> 00:10:37,729
No, it's not intelligence actually.

144
00:10:37,729 --> 00:10:41,186
And it's nothing intelligent.

145
00:10:41,186 --> 00:10:42,428
It's false advertising.

146
00:10:42,428 --> 00:10:44,412
What it does, it manipulates language.

147
00:10:44,412 --> 00:10:49,421
It makes things sound plausible, sound like something else.

148
00:10:49,844 --> 00:10:57,132
I mean, I think the biggest problem with using that terminology is for people that aren't
in the technical domain who are basically been led to believe that that's what we have

149
00:10:57,132 --> 00:11:00,226
when truly we aren't even anywhere close to that.

150
00:11:00,226 --> 00:11:07,782
But maybe that's for a different potential episode where we shit on the existence of AI
and we'll focus on the door.

151
00:11:07,782 --> 00:11:08,903
Certainly.

152
00:11:10,004 --> 00:11:13,467
in the beginning we have the core results, which I appreciate.

153
00:11:13,467 --> 00:11:20,854
However, my pet peeve is that Dora seems to be inventing a new way to talk about the same
things every year.

154
00:11:20,854 --> 00:11:27,519
Well, I would love to see like a comparison of those core statistics, how they change over
time.

155
00:11:28,280 --> 00:11:30,202
And I mean, they have the four metrics.

156
00:11:30,202 --> 00:11:30,520
uh

157
00:11:30,520 --> 00:11:31,351
There's actually five now.

158
00:11:31,351 --> 00:11:34,844
So it's the mean time to resolution, the deployment frequency.

159
00:11:34,844 --> 00:11:36,885
on, I believe they have four.

160
00:11:36,885 --> 00:11:39,957
So if there was five, they got rid of one.

161
00:11:40,377 --> 00:11:48,301
Yeah, so lead time for change is distribution, deployment frequency, failed deployments
recovery time.

162
00:11:49,292 --> 00:11:50,602
you're right, there are five.

163
00:11:50,602 --> 00:11:52,604
Oh, I cannot count.

164
00:11:52,604 --> 00:11:55,916
Change failure rates, distribution and rework rate.

165
00:11:55,916 --> 00:12:03,525
They added the rework starting, I think last year about to really encapsulate this aspect
of what your organization is doing.

166
00:12:03,525 --> 00:12:05,600
And I think it works as sort of a counter metric.

167
00:12:05,600 --> 00:12:06,310
makes sense.

168
00:12:06,310 --> 00:12:09,183
I think it makes sense to add because I mean, why not?

169
00:12:09,183 --> 00:12:15,118
If you have if you release a bunch of things and then you have to rework every single one
of them then

170
00:12:15,148 --> 00:12:19,314
Well, so I think there is still like, well, why are you reworking it though?

171
00:12:19,314 --> 00:12:20,716
Because was there a bug in production?

172
00:12:20,716 --> 00:12:26,433
Well, then there's already the mean time to resolution or the change failure rate, which
encapsulates out of it.

173
00:12:26,834 --> 00:12:29,518
Well, I think there actually is a different reason here that they never

174
00:12:29,518 --> 00:12:30,834
to learn.

175
00:12:30,921 --> 00:12:32,064
I'm sorry.

176
00:12:32,254 --> 00:12:40,418
So the thing is that this rework rate actually starts to include potentially non-technical
reasons why you may need to do work over again.

177
00:12:40,418 --> 00:12:46,801
ah was a mistake in the assumptions that you had while you were building your product or
what your customers wanted or your users were expecting.

178
00:12:46,801 --> 00:12:49,054
so the change failure, I know the change failure, right.

179
00:12:49,054 --> 00:12:49,960
m

180
00:12:49,960 --> 00:12:56,879
the rework rate actually now includes that aspect which I think is valuable to add because
before the door metrics were very technical in nature.

181
00:12:56,879 --> 00:13:02,666
They only applied to a single team or an organization that was engineering and didn't
really encounter the whole business.

182
00:13:02,666 --> 00:13:06,074
And now we have the business included and so I think it's valuable to have this metric.

183
00:13:06,074 --> 00:13:07,215
Yeah, no, I agree.

184
00:13:07,215 --> 00:13:16,404
So, I mean, at this section, I found a little interesting and uh as I said, they didn't
really show this data in that format last year.

185
00:13:16,404 --> 00:13:30,147
So I have no idea how it compares, but it didn't seem like I would expect to those metrics
to be slightly different shape if AI is truly revolutionizing our software development

186
00:13:30,147 --> 00:13:31,032
process.

187
00:13:31,032 --> 00:13:33,214
think the problem is that they're just not collecting it, right?

188
00:13:33,214 --> 00:13:38,850
They go out and they survey people about how they feel about their organization rather
than the actual dora metrics themselves.

189
00:13:38,850 --> 00:13:47,682
So while we know which metrics are better and they do sort of, they do capture some of the
survey responses which ask you.

190
00:13:47,682 --> 00:13:49,533
well, which of these categories do you fit in?

191
00:13:49,533 --> 00:13:51,774
Are you the highest performing?

192
00:13:51,774 --> 00:13:53,175
How frequently do you do deployments?

193
00:13:53,175 --> 00:13:54,846
And that could be accurate.

194
00:13:54,846 --> 00:14:00,379
There are areas where they're just not able to really get the answer because it is based
off of how people feel.

195
00:14:00,379 --> 00:14:02,550
And so I do agree with you, though.

196
00:14:02,550 --> 00:14:10,194
It would be really interesting to see those specific metrics and how they change over time
from year to year and where is that going?

197
00:14:10,626 --> 00:14:17,270
So, um, the one weird thing, wait, should I actually say that what the data is?

198
00:14:17,430 --> 00:14:17,820
Okay.

199
00:14:17,820 --> 00:14:23,294
Let me just read it out because I obviously don't remember its number who the numbers who
remembers numbers.

200
00:14:23,294 --> 00:14:33,940
So the lead time for changes, uh, basically very few companies out there, only about 12, 2
% em have more than six months between they commit.

201
00:14:33,940 --> 00:14:37,772
Uh, code and it's live in production.

202
00:14:38,030 --> 00:14:51,410
Then between one month and six months that's 13 % and the bulk of the responses were
between one week and one month that's 28 % and 30 % between one day and one week.

203
00:14:51,570 --> 00:14:57,100
Then less than a day is only 15%, less than one hour is 9%.

204
00:14:57,100 --> 00:15:01,678
then my question is going to be like less than one hour like from when to when.

205
00:15:01,678 --> 00:15:06,659
They specifically say from code committed to code successfully running in production.

206
00:15:06,659 --> 00:15:17,722
So here's the thing, if AI was really revolutionizing our work, I would see way more uh
frequent or shorter lead time for changes.

207
00:15:17,722 --> 00:15:23,024
I would also expect to see more frequent deployments and more rework, but that's not what
we see here.

208
00:15:24,384 --> 00:15:27,945
I mean, I would assume that LLMs would just supercharge, right?

209
00:15:27,945 --> 00:15:30,758
There's more code churn, I would expect that.

210
00:15:30,758 --> 00:15:37,254
What everyone is seeing is that how it impacts the software development lifecycle isn't
about shortening the whole feedback loop.

211
00:15:37,254 --> 00:15:43,426
It's about shortening this first part where you're actually doing the code creation part.

212
00:15:43,426 --> 00:15:45,827
But that's the thing, what's the point, right?

213
00:15:45,827 --> 00:15:50,009
It's not like the code creation is ever the bottleneck.

214
00:15:50,009 --> 00:15:52,950
I've been working in software for over 20 years.

215
00:15:52,950 --> 00:16:00,133
Not a single time have I thought, oh, things would be very different if we could only
write our code faster.

216
00:16:00,133 --> 00:16:01,154
It would make everything better.

217
00:16:01,154 --> 00:16:06,466
We'd make so much more money if only our developers could write that code faster.

218
00:16:06,466 --> 00:16:08,157
I mean, that's just nonsense.

219
00:16:08,157 --> 00:16:10,458
Preach.

220
00:16:10,612 --> 00:16:14,643
Obviously it makes, what matters is when it's live in production.

221
00:16:14,643 --> 00:16:25,276
And if you have your deployments automated, your tests automated, then I see no reason
why, you know, if you are able to generate that code faster, why, why doesn't that result

222
00:16:25,276 --> 00:16:26,946
in more deployments?

223
00:16:27,867 --> 00:16:29,027
I don't know.

224
00:16:29,387 --> 00:16:40,270
Maybe people aren't really using the LLMs the way it's advertised or the way they say they
do, because that doesn't really fall in line with the rest of the report.

225
00:16:40,270 --> 00:16:41,331
here's something, right?

226
00:16:41,331 --> 00:16:50,455
If, as you pointed out by those statistics, which it's sort of hard to process them all at
once, there's such a small amount of organizations or teams or engineers that are actually

227
00:16:50,455 --> 00:17:03,316
in the top tier that a majority of those organizations are spending so much time in the,
like having before, like in the time it takes to actually do the deployment that.

228
00:17:03,316 --> 00:17:11,310
the organization may lend itself to having separation of a team that does development and
another team that's responsible for release engineering.

229
00:17:11,310 --> 00:17:17,431
So like DevOps engineers, am I going to antagonize your audience if I say that?

230
00:17:17,964 --> 00:17:20,195
Before you get to that, one second.

231
00:17:20,195 --> 00:17:29,208
If there is different groups, and you only focus on one of those groups, and that group
is, say, unquote, developers, and all they're doing is producing code, then they have no

232
00:17:29,208 --> 00:17:32,369
bottlenecks that are related to deployment and testing.

233
00:17:32,369 --> 00:17:41,271
And so it's very easy for that whole organization to say, yes, we are successful in using
AI to do deployment, quote unquote, because that's what they see that's happening.

234
00:17:41,271 --> 00:17:43,742
They only need to generate their code, and they're doing that.

235
00:17:44,290 --> 00:17:44,810
don't know.

236
00:17:44,810 --> 00:17:49,423
like, here's the thing, like there are other parts of this that sort of make me think.

237
00:17:49,423 --> 00:17:54,516
Like, first of all, there seems to be a shift between like lead time for changes and
deployment frequency.

238
00:17:54,516 --> 00:18:00,679
Like I would, in mature organizations, I would imagine them to be in lockstep, right?

239
00:18:00,820 --> 00:18:10,366
So between one month and six months, when it comes to lead time, only 13 % of teams seems
to have that uh lead time period.

240
00:18:10,366 --> 00:18:11,406
Whereas,

241
00:18:11,968 --> 00:18:16,600
A whopping 20 % says that this is their deployment frequency.

242
00:18:18,361 --> 00:18:26,875
more people, it seems that there is this delay or like you have, you know, it's basically
those two stats don't align.

243
00:18:26,875 --> 00:18:37,330
I would expect them to be perfectly aligned because like, why do you have em less
frequent, if you have less frequent deployments, then you can't say that your lead time is

244
00:18:37,330 --> 00:18:38,250
shorter.

245
00:18:38,410 --> 00:18:43,677
But what we see is actually the lead time seems to be longer than the deployment
frequency.

246
00:18:44,238 --> 00:18:47,402
so, what, what?

247
00:18:47,402 --> 00:18:53,320
Maybe it's just really self-reported and people don't really know, so they just use their
gut feel to...

248
00:18:53,320 --> 00:19:03,817
Well, I think this maybe points to the fact that there's a lot of organizations out there
with processes that are backwards in unexpected ways, like ones where the time, like you

249
00:19:03,817 --> 00:19:11,906
maybe have some sort of scrum or sprint planning where you're doing the planning and it
feels like there's a very short time from the time where you understand about a feature to

250
00:19:11,906 --> 00:19:15,599
do the development to the time in which you do the software development and get it
released.

251
00:19:15,599 --> 00:19:22,520
the lead time as it defined here, I I would love it to be from like when the feature
enters the backlog until it's live in production.

252
00:19:22,520 --> 00:19:22,851
No, no, no.

253
00:19:22,851 --> 00:19:24,654
It's from code commit.

254
00:19:24,654 --> 00:19:29,742
So when you commit that code until it is live in production successfully.

255
00:19:29,742 --> 00:19:39,330
So what you're saying is that there are organizations which have a long lead time that
maybe do have a lot of steps in their process, but still deploy frequently.

256
00:19:39,330 --> 00:19:42,883
Here's the thing, like 15 % says their lead time is less than one day.

257
00:19:42,883 --> 00:19:47,978
How many people say that their deployments are between once per hour and once per day?

258
00:19:49,030 --> 00:19:49,860
6%.

259
00:19:49,860 --> 00:19:51,081
How does that make sense?

260
00:19:51,081 --> 00:19:52,402
Feature flags?

261
00:19:52,443 --> 00:19:55,615
No, I don't think this would capture feature flags, right?

262
00:19:55,615 --> 00:19:57,107
People lie about feature flags.

263
00:19:57,107 --> 00:19:59,179
People say, it's behind a feature flag.

264
00:19:59,179 --> 00:20:01,571
It's live in production successfully, do we know?

265
00:20:01,571 --> 00:20:02,112
No.

266
00:20:02,112 --> 00:20:03,954
Is any customer actually using them?

267
00:20:03,954 --> 00:20:04,648
No.

268
00:20:04,648 --> 00:20:05,819
I think you're onto something there.

269
00:20:05,819 --> 00:20:10,004
And I think this has been my personal issue with feature flags is that.

270
00:20:11,927 --> 00:20:21,427
Well, I think in theory they work great, like different customers, different users are
exposed to different functionality separately when they should be or in through testing of

271
00:20:21,427 --> 00:20:21,678
the.

272
00:20:21,678 --> 00:20:22,266
uh

273
00:20:22,266 --> 00:20:24,370
whether or not that feature is actually usable.

274
00:20:24,370 --> 00:20:33,966
But I think what ends up happening is those feature flags or software that technology
really is being utilized to gate turning on that untested code in production because.

275
00:20:33,966 --> 00:20:34,976
What do mean untested?

276
00:20:34,976 --> 00:20:42,077
We have a whole QA team testing this stuff in their staging environment all the time.

277
00:20:43,318 --> 00:20:46,652
I mean, assuming that's happening, then you could say that it is tested.

278
00:20:46,652 --> 00:20:56,953
But I know that as soon as you have a separate organization or a separate tool that
enables you to allow you to get to the next stage of software development lifecycle with

279
00:20:56,953 --> 00:20:57,953
easier.

280
00:20:58,306 --> 00:21:05,651
putting it behind a flag and then getting it to production, people will utilize that and
then use it as a crutch rather than validating that their code works 100 % of the time as

281
00:21:05,651 --> 00:21:06,712
much as they can.

282
00:21:06,712 --> 00:21:13,506
They say, well, you know, I feel confident and comfortable with where it's at right now
and then push it out behind the flag.

283
00:21:13,506 --> 00:21:15,227
And then when you turn the flag on, it breaks.

284
00:21:15,227 --> 00:21:20,541
And the most critical problem I see with this is that it's not reflected well in the...

285
00:21:20,541 --> 00:21:22,342
um

286
00:21:22,434 --> 00:21:24,656
the lead time for delivering features.

287
00:21:24,656 --> 00:21:28,280
People often see if it's behind a feature flag, it's in production that counts.

288
00:21:28,280 --> 00:21:32,724
Whereas unless you're also counting, well, it shouldn't, right?

289
00:21:32,724 --> 00:21:35,267
But it's very difficult to also include a metric.

290
00:21:35,267 --> 00:21:42,452
I feel like it would be interesting in a 2026 report to see what the lead time is for
removing a flag.

291
00:21:42,452 --> 00:21:46,444
they still have the same metrics in 2026 report?

292
00:21:47,444 --> 00:21:54,247
I really feel like they're trying to rebrand from like the DevOps because you know, Dora
started as the DevOps report.

293
00:21:54,247 --> 00:21:56,228
Now they are AI reports.

294
00:21:56,228 --> 00:21:59,199
So there, I think we're looking at a full rebranding.

295
00:21:59,199 --> 00:22:00,830
So we'll see.

296
00:22:00,830 --> 00:22:08,193
Now what is interesting is that it seems to be uh a lot of teams that are still running
complex legacy software.

297
00:22:08,193 --> 00:22:10,574
least that's what I'm reading from this because...

298
00:22:10,574 --> 00:22:13,214
We say what, 28 %?

299
00:22:13,214 --> 00:22:16,194
Takes over a day to deploy?

300
00:22:16,270 --> 00:22:17,801
Yeah, I mean, that's a long time.

301
00:22:17,801 --> 00:22:21,872
Like you do the commit, but what if you include the pull request reviews?

302
00:22:22,213 --> 00:22:25,774
I see, Sometimes I think, well, you do a commit.

303
00:22:25,774 --> 00:22:32,147
There is a process there, which I don't necessarily see as going as fast as possible is
super valuable.

304
00:22:32,147 --> 00:22:38,230
think the once everyone's agreed that this code is the right code, that moment to getting
it out is important.

305
00:22:38,230 --> 00:22:46,274
I think the moment for your business or your team, once you've decided to do a feature, to
getting to the software development is, you know, complete

306
00:22:46,274 --> 00:22:48,806
and ready for review, that's also valuable.

307
00:22:48,806 --> 00:22:52,149
And then also the metric of how long does it take to test stuff.

308
00:22:52,149 --> 00:22:59,575
although here's the thing though, longer time to test could mean that you are testing slow
or you're testing more.

309
00:23:00,677 --> 00:23:06,548
And I think these sort of nuances don't really make their way into this part of the
report.

310
00:23:06,548 --> 00:23:08,029
a good point, that's a good point.

311
00:23:08,029 --> 00:23:20,039
I mean, but this was really the one section that I thought was the most interesting in the
whole 140 pages report, which says something about me or the report, I don't know.

312
00:23:20,039 --> 00:23:25,718
And then there's the whole, after this, there's this whole section about the...

313
00:23:25,718 --> 00:23:30,304
They call it AI, I'm going to call it LLMs, which some of it is interesting.

314
00:23:30,304 --> 00:23:39,415
I mean, what we can see is that the adoption appears to be universal because 90 % of
people, say they use AI in some capacity.

315
00:23:39,415 --> 00:23:40,827
Now, what does that mean?

316
00:23:40,827 --> 00:23:45,514
We don't know because the question is a little bit vague, maybe on purpose.

317
00:23:45,514 --> 00:23:55,454
it's interesting because I think in the same token when they asked that, they also asked
like where people were utilizing it and something like 66 or 68 % were saying they use it

318
00:23:55,454 --> 00:23:58,864
for like image content generation or summarization.

319
00:23:58,864 --> 00:24:01,837
memes are not going to make themselves.

320
00:24:02,247 --> 00:24:05,822
I you think engineers are included meme generation.

321
00:24:06,127 --> 00:24:06,928
So.

322
00:24:07,406 --> 00:24:10,660
If I get a survey, what do you use AI for?

323
00:24:10,660 --> 00:24:19,388
uh Image processing, of course, like what would make those silly images, either for your
PowerPoint presentations or just, I don't know, what do people do with that?

324
00:24:19,388 --> 00:24:20,429
I'm definitely pessimistic here.

325
00:24:20,429 --> 00:24:30,767
I think the improvement that people feel with using AI in the last couple of years has
come from finding places where AI should not be used and eliminating it from there.

326
00:24:30,767 --> 00:24:39,916
So, you know, by definition, you're left with fewer places, which means proportionally the
value you're getting out increases, even if there's a net detriment.

327
00:24:39,916 --> 00:24:46,091
I mean, I do see there is, it really depends on, on what you're trying to achieve.

328
00:24:46,091 --> 00:24:51,316
find that those tools, the LLMs are good at sort of raising the floor.

329
00:24:51,316 --> 00:24:56,840
So they get you uh an average or slightly below average result really fast.

330
00:24:56,840 --> 00:25:00,703
So you don't have to even know anything about the domain or about the area.

331
00:25:00,703 --> 00:25:06,208
uh So you put in no effort and you get at the average instantly.

332
00:25:06,208 --> 00:25:07,259
That's fantastic.

333
00:25:07,259 --> 00:25:08,079
That's really awesome.

334
00:25:08,079 --> 00:25:09,430
And it's great.

335
00:25:09,506 --> 00:25:17,099
However, if you need something above average or something where the outcomes really
matter, that's where you start seeing the shortcomings.

336
00:25:17,314 --> 00:25:27,325
Well then I could take that to a natural conclusion though and automatically suggest that
you can never use an LLM anywhere where you want it to be your competitive advantage.

337
00:25:27,872 --> 00:25:28,322
Agreed.

338
00:25:28,322 --> 00:25:39,726
ah But you know, you can use it to summarize your boss's lengthy emails or, you know, when
you write performance reviews for your coworkers, you can just ask an LLM to completely

339
00:25:39,726 --> 00:25:47,289
make everything up and make it sound plausible and get that guy who you don't like fired.

340
00:25:47,489 --> 00:25:49,230
I'm not offering any advice, but...

341
00:25:49,230 --> 00:25:50,414
uh

342
00:25:50,414 --> 00:25:50,875
is a point.

343
00:25:50,875 --> 00:25:59,125
So basically you're saying if it's not a cornerstone aspect of your job, there's a lot of
opportunity for the uses of LLMs in a way that can actually help you be effective.

344
00:25:59,125 --> 00:26:07,205
And maybe that's what we're actually seeing in the report where they're jumping on is that
there are places where it's valuable, but it's not the cornerstone critical aspects that

345
00:26:07,205 --> 00:26:09,026
we're necessarily being hired for.

346
00:26:09,026 --> 00:26:17,726
you see, the interesting part is like when they break it down into where or how people are
using the LLMs, like what for basically.

347
00:26:17,726 --> 00:26:19,346
Oh, there's another thing.

348
00:26:19,346 --> 00:26:22,146
How long, how many hours per day people use it.

349
00:26:22,146 --> 00:26:27,986
The meantime seems to be two hours per workday, which is a long time interacting with AI.

350
00:26:27,986 --> 00:26:35,830
But that may include if your IDE has like automated suggestions by LLMs, that may be
included.

351
00:26:35,830 --> 00:26:36,650
I think it's difficult.

352
00:26:36,650 --> 00:26:46,410
Like if you are using a copilot or one of the LLM IDEs, then every time you type
something, you immediately get a suggestion to code complete.

353
00:26:48,010 --> 00:26:54,850
Well, yeah, but then like how do you evaluate that versus the like the number of times
you're using it versus like how long did that take?

354
00:26:54,850 --> 00:26:55,190
Right.

355
00:26:55,190 --> 00:27:00,470
If you count the tab time, like it took me, you know, 0.1 seconds to hit the tab key.

356
00:27:00,470 --> 00:27:03,454
Did I use LLM for 0.1 seconds or?

357
00:27:03,454 --> 00:27:04,955
the thinking time.

358
00:27:04,955 --> 00:27:07,376
I'm curious, like, okay, I would like to see those questions.

359
00:27:07,376 --> 00:27:11,779
When I filled the survey, I don't remember the AI section.

360
00:27:11,779 --> 00:27:16,682
Maybe because I got it and then I didn't have a lot to say, so there was a lot of NA's.

361
00:27:16,682 --> 00:27:17,622
I don't know.

362
00:27:17,662 --> 00:27:21,385
But yeah, so what people use the LLMs for?

363
00:27:21,385 --> 00:27:25,887
Like it seems like obviously there's a significant part of writing new code.

364
00:27:25,887 --> 00:27:31,118
think that 70 % people use LLMs to generate new code, which is fair.

365
00:27:31,118 --> 00:27:34,958
Literature reviews, so summarize this wall of text for me.

366
00:27:35,198 --> 00:27:37,598
Images, 66 % as you said.

367
00:27:37,598 --> 00:27:38,858
Then there's a lot of 60.

368
00:27:38,858 --> 00:27:41,358
I think it was a multi-choice question.

369
00:27:41,818 --> 00:27:46,938
Proofreading, writing documentation, creating test cases.

370
00:27:47,214 --> 00:27:54,990
Well, I think there's something interesting here, which is I think what we're getting at
is for a report that's supposed to be focused on the engineering metrics and how they're

371
00:27:54,990 --> 00:27:58,643
improving over time with the impact of AI, it doesn't do a great job doing that.

372
00:27:58,643 --> 00:28:06,926
And on the flip side, the interesting things that we could be talking about related to AI
feels like more of an AI-specific report, and it doesn't include any of those.

373
00:28:06,926 --> 00:28:07,706
I agree.

374
00:28:07,706 --> 00:28:10,985
I mean, that's why, you know, I feel I was disappointed with this report.

375
00:28:10,985 --> 00:28:15,606
I felt like totally unsatisfied at the end of it.

376
00:28:15,606 --> 00:28:22,246
I felt like there was a lot of narrative, a lot of sort of spinning, like how, we should
think about the data.

377
00:28:22,506 --> 00:28:28,406
Uh, and the conclusions that they came to are not the same conclusions I came to just
looking at the same data.

378
00:28:28,406 --> 00:28:29,378
So that's interesting.

379
00:28:29,378 --> 00:28:39,647
When they're when they actually decide to share, think the number one thing that comes out
is what people's perceptions are is specifically LLMs or AI is fantastic in every possible

380
00:28:39,647 --> 00:28:40,228
way.

381
00:28:40,228 --> 00:28:48,946
But if you actually look at the measurements of what they call the product or software
instability, which is really the measured quality of your product or your tool, your

382
00:28:48,946 --> 00:28:51,212
architecture that goes down.

383
00:28:51,212 --> 00:28:57,186
Yeah, so like basically individual productivity supposedly goes up according to the
results.

384
00:28:58,687 --> 00:29:03,550
Team productivity goes up a little bit, but not that much.

385
00:29:04,211 --> 00:29:06,633
And yeah, instability increases.

386
00:29:06,633 --> 00:29:13,337
So we have stuff that's less stable and the product uh success or how was that framed?

387
00:29:13,578 --> 00:29:15,749
These charts, I struggle to understand.

388
00:29:15,749 --> 00:29:19,872
Sometimes I feel like they decide to present data in a confusing way.

389
00:29:19,886 --> 00:29:25,432
Can we quickly just finish the, like how people are using AI because there are some
interesting things that I wanted to talk about.

390
00:29:25,432 --> 00:29:28,346
Like, uh, people use it for code reviews.

391
00:29:28,346 --> 00:29:31,338
56 % of people say they use AI for code reviews.

392
00:29:31,338 --> 00:29:35,952
And now I have to wonder, how do people define code review these days?

393
00:29:35,952 --> 00:29:39,926
Like if I use a linter, did I just do a code review?

394
00:29:40,450 --> 00:29:41,251
That's a good point.

395
00:29:41,251 --> 00:29:48,901
think what it means is like, is there any tool being used in your software development
pipeline, your delivery pipeline that does something in an automated way?

396
00:29:48,901 --> 00:29:52,225
And I think a lot of those tools now claim that they include some sort of AI.

397
00:29:52,225 --> 00:29:58,424
So if you use a Semgrep or a Linter or Dependabot, those all have LLM in their name.

398
00:29:58,424 --> 00:30:02,449
Who knew I used AI in my code reviews for many years now?

399
00:30:02,449 --> 00:30:05,012
Printers are AI, right?

400
00:30:05,012 --> 00:30:07,125
They are artificial, right?

401
00:30:07,125 --> 00:30:08,125
Right?

402
00:30:08,246 --> 00:30:10,188
They seem like they're intelligent, right?

403
00:30:10,188 --> 00:30:11,259
They know what to do.

404
00:30:11,259 --> 00:30:14,443
They know how many, you know, tabs or spaces.

405
00:30:14,443 --> 00:30:15,438
They know.

406
00:30:15,438 --> 00:30:23,648
Well, I think it's sort of like the thing where, where Microsoft says this number of
organizations have multifactor authentication or using pass keys.

407
00:30:23,648 --> 00:30:31,534
They force it out on people and organizations may not even be getting the benefit of that
security because they may not be able to use it effectively.

408
00:30:31,534 --> 00:30:35,630
So you think a lot of it is basically just enabled by default, so they have to say yes.

409
00:30:35,630 --> 00:30:39,706
It's like if you're using GitLab or GitHub and you're getting automated scanning tools in
place.

410
00:30:39,706 --> 00:30:41,097
Yeah, I think.

411
00:30:41,550 --> 00:30:47,503
actually genuinely think 56 % of software engineers actively use AI for code reviews.

412
00:30:47,503 --> 00:30:50,505
I mean, if you think about it, code reviews tend to suck.

413
00:30:50,505 --> 00:30:52,056
At least that's what I hear.

414
00:30:52,716 --> 00:30:54,277
My code reviews are fantastic.

415
00:30:54,277 --> 00:30:57,799
Every time I was part of a code review, I had a great time.

416
00:30:57,799 --> 00:31:03,232
So I can't imagine why people would say code reviews suck, but that's the story that I
hear again and again.

417
00:31:03,232 --> 00:31:04,096
People hate it.

418
00:31:04,096 --> 00:31:05,824
People hate doing code reviews.

419
00:31:05,824 --> 00:31:10,348
They also don't like when their code is reviewed because people are nitpicking.

420
00:31:10,348 --> 00:31:17,026
So I can totally see people just opting for an LLM chat, but hey, tell me what's wrong
with my code or tell me how great it is.

421
00:31:17,026 --> 00:31:24,456
Do you think that during the software development process, whoever is the engineer that
actually developed the code, they're directly asking an LLM for feedback and they're

422
00:31:24,456 --> 00:31:26,371
counting that as a...

423
00:31:26,371 --> 00:31:30,934
would not be surprised if that is a significant uh percentage of people.

424
00:31:30,934 --> 00:31:31,715
That's interesting.

425
00:31:31,715 --> 00:31:34,118
I think 56 would be very high for that.

426
00:31:34,118 --> 00:31:40,894
I also think 56 is very high if as a reviewer I saw the code and I then pass it to an LLM
specifically asked, hey, what's wrong with this code?

427
00:31:40,894 --> 00:31:42,735
don't know how to do a code review.

428
00:31:42,735 --> 00:31:45,416
Please, chat, help me.

429
00:31:46,777 --> 00:31:47,677
Is that how people do it?

430
00:31:47,677 --> 00:31:48,137
Maybe.

431
00:31:48,137 --> 00:31:48,878
I don't know.

432
00:31:48,878 --> 00:31:58,572
So another interesting thing is like 59 % of people apparently use it for debugging, which
I always thought LLMs would be bad at debugging.

433
00:31:58,572 --> 00:32:00,600
I'm not even sure what that means, honestly.

434
00:32:00,600 --> 00:32:03,126
Well, you have a problem, you don't know what's wrong, so...

435
00:32:03,126 --> 00:32:04,357
mean, know what debugging is.

436
00:32:04,357 --> 00:32:06,986
I mean is I don't understand how you would.

437
00:32:09,111 --> 00:32:16,435
Okay, I don't spend that much time doing software engineering anymore, but when I did,
debugging was always my favorite thing to do.

438
00:32:16,456 --> 00:32:20,559
I struggled to see good opportunities to pull an LLM in there.

439
00:32:20,559 --> 00:32:24,021
do notice that because you just, what, run the code and that's it.

440
00:32:24,021 --> 00:32:25,782
You already know there's a problem.

441
00:32:25,843 --> 00:32:30,501
The LLM could be used for resolving the issue, but as far as finding it,

442
00:32:30,501 --> 00:32:41,731
if you have all those dependencies, you know, to other parts of the code, those libraries
that could be their fault or, you know, how do you know what's actually going on if you

443
00:32:41,731 --> 00:32:49,098
haven't really seen this code before because it was written by that dude who's retired now
and, you know, no one has touched it.

444
00:32:49,098 --> 00:32:51,790
It even has a comment, do not touch, do not change.

445
00:32:51,790 --> 00:32:52,680
that's the

446
00:32:53,694 --> 00:32:55,476
You know what I'm talking about.

447
00:32:55,476 --> 00:32:56,758
That's still better, think.

448
00:32:56,758 --> 00:33:03,585
mean, the better thing than having a comment that says do not change is a unit test that
ensures that that thing has not changed with the reason why it's not.

449
00:33:03,585 --> 00:33:04,570
you know, the one step...

450
00:33:04,570 --> 00:33:06,212
just a basic has returned pass.

451
00:33:06,212 --> 00:33:08,476
Yeah, I seen those tests.

452
00:33:08,476 --> 00:33:09,036
you know what?

453
00:33:09,036 --> 00:33:11,780
A lot of those tests actually have been written by LLMs.

454
00:33:11,780 --> 00:33:13,102
I have seen that too.

455
00:33:13,102 --> 00:33:15,394
Just writes a lot of unit tests.

456
00:33:15,394 --> 00:33:17,127
You have a hundred percent coverage.

457
00:33:17,127 --> 00:33:19,730
Most of them return pass.

458
00:33:20,151 --> 00:33:22,685
That's how you get great results, great productivity.

459
00:33:22,685 --> 00:33:25,804
And you're insured that all your code is actually in fact

460
00:33:25,866 --> 00:33:27,637
High code quality.

461
00:33:28,419 --> 00:33:29,244
Yeah.

462
00:33:29,610 --> 00:33:39,548
Anyway, mean, what an interesting, another interesting part was that, uh, when people say
like which mode of interaction with, uh, with LLMs, like a lot of people obviously use

463
00:33:39,548 --> 00:33:40,138
chatbots.

464
00:33:40,138 --> 00:33:42,730
Some people use, their IDE.

465
00:33:42,730 --> 00:33:54,950
Uh, I expected a lot of people rely on those agents because that's the, you know, the huge
breakthrough that was advertised, but apparently 61 % say they never use AI in agentic

466
00:33:54,950 --> 00:33:55,490
mode.

467
00:33:55,490 --> 00:34:00,615
Well, I think that's probably because of the result of it being a fairly new and recent
identification.

468
00:34:00,615 --> 00:34:01,065
that new!

469
00:34:01,065 --> 00:34:03,306
Well, I think the term is new.

470
00:34:03,306 --> 00:34:07,403
And so they may not have been realizing what they were doing could have been categorized.

471
00:34:07,403 --> 00:34:09,556
They would be realizing what they're doing.

472
00:34:09,556 --> 00:34:12,280
They would know very well exactly what they're doing.

473
00:34:12,280 --> 00:34:19,661
just saying if I said, you know, have you been using a Gentic and I didn't first explain
to you what I meant by that in the question or the survey.

474
00:34:19,661 --> 00:34:22,983
think would know because all of those modes are actually...

475
00:34:22,983 --> 00:34:25,664
you have to pay extra to get that.

476
00:34:25,664 --> 00:34:27,105
That's how the feature is called.

477
00:34:27,105 --> 00:34:34,496
The companies offering the tools will let you know that it is their latest invention,
everything will change from now on.

478
00:34:34,496 --> 00:34:36,337
Yeah, but when you put it like that, it's really simple.

479
00:34:36,337 --> 00:34:48,092
like the IDs, like for instance, think it was only recently that GitHub in their co-pilot
in VS code supported a slash command for agent, for controlling agents, for configuring

480
00:34:48,092 --> 00:34:51,424
agents, for specific LLMs to go off and asynchronously build stuff.

481
00:34:51,424 --> 00:34:53,735
So that's actually also really recent.

482
00:34:53,735 --> 00:34:59,978
So the terminology and the access to the functionality from a provider, think would be the
tool.

483
00:35:00,556 --> 00:35:01,687
main limiting things there.

484
00:35:01,687 --> 00:35:03,589
ah But yeah, I agree.

485
00:35:03,589 --> 00:35:11,427
think fundamentally, it's like someone finally figured out a way in which to make an LLM
usable for real, which is like, don't want an answer right now.

486
00:35:11,427 --> 00:35:14,920
I want you to go off and think about the problem and come back with a solution.

487
00:35:14,920 --> 00:35:20,905
But the way that I've seen it from Microsoft, and we went into this a lot in the VS code.

488
00:35:21,546 --> 00:35:24,991
episode of Avengers and DevOps, so go and check that out.

489
00:35:24,991 --> 00:35:30,804
But realistically, it's like you can go and have it solve all your tickets in your backlog
and come up with

490
00:35:30,804 --> 00:35:32,723
And then all your tests are passing.

491
00:35:32,723 --> 00:35:35,412
However, the production database is missing.

492
00:35:36,918 --> 00:35:40,249
So I think uh at this point, there's just like so many examples of that.

493
00:35:40,249 --> 00:35:45,282
the most recent one is I think Google anti-gravity just failed on that approach.

494
00:35:45,282 --> 00:35:47,698
I was making- What happened?

495
00:35:47,698 --> 00:35:55,366
I didn't really look into it, but I think fundamentally, I think someone was on Windows
and it accidentally ran a command to just delete one of their entire hard drives.

496
00:35:55,449 --> 00:35:57,983
I thought those things only happen on Linux

497
00:35:58,683 --> 00:36:02,407
I think there's something on Linux called permissions, which usually prevents this.

498
00:36:02,407 --> 00:36:07,586
Yeah, but then you just basically do pseudo, you know, isn't that the joke?

499
00:36:08,000 --> 00:36:15,082
I have one touch pseudo access on my laptop and desktop, I actually have to physically
press, you know, my.

500
00:36:15,082 --> 00:36:20,264
You know, if you want access to his computer, you actually have to be physically touching
that key.

501
00:36:20,584 --> 00:36:24,926
Anyway, mean, this whole agentic thing, it reminds me of...

502
00:36:24,926 --> 00:36:26,207
So I have a friend.

503
00:36:26,207 --> 00:36:27,977
Yes, I do have a friend.

504
00:36:27,977 --> 00:36:28,768
I have friends.

505
00:36:28,768 --> 00:36:30,847
uh True story.

506
00:36:30,847 --> 00:36:31,389
Anyway...

507
00:36:31,389 --> 00:36:36,451
uh Right.

508
00:36:36,451 --> 00:36:41,543
So we met like a bunch of months ago once the agents were just like the new thing.

509
00:36:41,543 --> 00:36:42,873
I don't know how long.

510
00:36:42,873 --> 00:36:44,994
I think it was just a bunch of months.

511
00:36:45,186 --> 00:36:47,107
And he obviously tried them.

512
00:36:47,107 --> 00:36:53,190
uh He's a very seasoned senior software developer, a very, very good one, like staff plus.

513
00:36:53,231 --> 00:36:58,233
So he's like, oh yeah, I will have my army of little minions to do my bidding.

514
00:36:58,233 --> 00:37:01,560
And so he tried using those agents and he was so excited.

515
00:37:01,560 --> 00:37:02,731
like, this is great.

516
00:37:02,731 --> 00:37:03,557
It's fantastic.

517
00:37:03,557 --> 00:37:05,498
It's actually really decent code.

518
00:37:05,498 --> 00:37:08,099
Like the reasoning is that it makes sense.

519
00:37:08,099 --> 00:37:10,280
So I was like, okay, nice.

520
00:37:10,541 --> 00:37:13,612
And then not so long ago we met again.

521
00:37:13,878 --> 00:37:16,730
And I asked him like, how is your, you know, how are your agents doing?

522
00:37:16,730 --> 00:37:19,542
And he's like, no, I'm not doing that anymore.

523
00:37:20,043 --> 00:37:20,644
Why?

524
00:37:20,644 --> 00:37:22,525
Well, it's just not that good.

525
00:37:22,525 --> 00:37:24,407
like, hmm, what's changed?

526
00:37:24,407 --> 00:37:32,234
just, you know, it's good when you don't have like a clear vision of what you want or
clear, precise requirements.

527
00:37:32,234 --> 00:37:42,626
But as soon as you have a concrete thing that needs doing, then this just falls apart
because there will be all those little discrepancies that you then have to go and

528
00:37:42,626 --> 00:37:48,213
do yourself pretty much, which at that point it's faster to just do it yourself from the
beginning.

529
00:37:48,655 --> 00:37:50,736
So maybe that's why, I don't know.

530
00:37:51,456 --> 00:37:53,892
Okay, so what else was in there?

531
00:37:53,892 --> 00:37:55,879
ah

532
00:37:55,879 --> 00:37:57,240
was in there?

533
00:37:57,240 --> 00:38:02,244
There was this whole effectiveness, like does AI make us faster?

534
00:38:02,244 --> 00:38:13,433
um that was the whole, my conclusion from that section was uh the data was not very clear.

535
00:38:14,655 --> 00:38:18,157
It's either too subjective or people don't really know what they're talking about.

536
00:38:18,530 --> 00:38:20,991
Dora team really tried to get something out of that.

537
00:38:20,991 --> 00:38:24,032
mean, obviously if you're going to publish a report, you have to write something.

538
00:38:24,032 --> 00:38:28,274
So there was a lot of narrative, but not that much meat in that whole thing.

539
00:38:28,674 --> 00:38:43,450
Their conclusion was that em at a lot of companies, their process gets in the way of truly
embracing the AI or of truly realizing the benefits because they do see a lot of increase

540
00:38:43,450 --> 00:38:48,526
of individual productivity, what you were saying, but that doesn't translate to the
overall.

541
00:38:48,526 --> 00:38:51,547
uh output and they blame the process.

542
00:38:51,547 --> 00:38:55,908
say, okay, we need to work, we need to learn to work differently to make use of that.

543
00:38:55,908 --> 00:38:58,829
Now I get a slightly different picture.

544
00:38:59,109 --> 00:39:11,973
I do see like making individual engineers more productive will not mean that we'll get
better code or better products because fundamentally I'm not hiring engineers to write

545
00:39:11,973 --> 00:39:12,263
code.

546
00:39:12,263 --> 00:39:16,614
I'm hiring engineers to solve problems and for that they need to collaborate.

547
00:39:16,614 --> 00:39:17,356
So.

548
00:39:17,356 --> 00:39:20,951
Does uh LLM or AI help us with collaboration?

549
00:39:20,951 --> 00:39:22,060
I think the answer is no.

550
00:39:22,060 --> 00:39:27,628
I think there was actually a conflicting perspective here that we just realized that if.

551
00:39:27,628 --> 00:39:28,206
uh

552
00:39:28,206 --> 00:39:32,446
It's actually worse than just solving a problem that's not your bottleneck.

553
00:39:32,446 --> 00:39:37,466
If you have a bottleneck in your system, your software development life cycle, and let's
assume that's the pull request.

554
00:39:37,466 --> 00:39:46,046
And from my own personal experiences, and I'm sure this is different for others, for one
hour of software development, it was like two to eight times amount of testing in the code

555
00:39:46,046 --> 00:39:47,466
review process, which makes sense.

556
00:39:47,466 --> 00:39:51,846
You have to teach someone else about the process, understand what's there, then actually
review it for real.

557
00:39:51,846 --> 00:39:55,286
And then some back and forth on how to actually refine it before getting released.

558
00:39:55,286 --> 00:39:57,146
So I can see a discrepancy there.

559
00:39:58,160 --> 00:40:00,925
improve the bottleneck, is the pull request review.

560
00:40:00,925 --> 00:40:02,949
And you only increase more code.

561
00:40:02,949 --> 00:40:07,610
You actually create a bigger burden on other parts of the process where the bottleneck is.

562
00:40:07,610 --> 00:40:11,110
then you just use LLM to do the code review, right?

563
00:40:11,110 --> 00:40:13,491
56 % of people do that.

564
00:40:13,491 --> 00:40:15,080
You're solving the whole code review though.

565
00:40:15,080 --> 00:40:17,555
think is sort of the lie that's included

566
00:40:17,555 --> 00:40:20,122
Something must be not quite right.

567
00:40:20,122 --> 00:40:27,188
I mean, there's this other thing, the LLMs have been around for what, three years, four
years, five?

568
00:40:32,130 --> 00:40:33,351
Three years, say.

569
00:40:33,351 --> 00:40:39,553
So let's say two years since it was really available, truly usable way to developers.

570
00:40:39,553 --> 00:40:44,015
um And, you know, this whole vibe coding thing exists for quite a while now.

571
00:40:44,015 --> 00:40:52,918
um And so I would imagine if this worked, I would imagine like a deluge of new software
appearing everywhere.

572
00:40:52,918 --> 00:40:57,390
We would see a lot of new little startups, like indie hacker projects.

573
00:40:57,390 --> 00:41:02,292
We would see a lot of, even if you look at your phone in the app store, we would see like

574
00:41:02,700 --> 00:41:07,044
hundreds and hundreds of Tetris clones or, you know, Candy Crush clones.

575
00:41:07,044 --> 00:41:08,735
Like where is that?

576
00:41:09,476 --> 00:41:17,504
Even when you look at the domains that are registered, you would see a lot of those
because people would want to at least register the domains for the new VibeCoded project.

577
00:41:17,504 --> 00:41:18,664
That's not happening.

578
00:41:18,664 --> 00:41:24,782
So we don't actually see more software being produced, which is like, why not?

579
00:41:24,782 --> 00:41:32,745
Well, I think you sort of alluded to this before and I read your blog article on the topic
about raising the floor up.

580
00:41:32,745 --> 00:41:41,599
And so I think that there is a lot more software distinctly being created by people who
haven't had the capabilities of doing it before.

581
00:41:41,599 --> 00:41:44,730
So before they would never been able to build their own Tetris clone.

582
00:41:44,730 --> 00:41:48,311
And now they absolutely can, but they don't go and release that software anywhere.

583
00:41:48,311 --> 00:41:54,538
So those software being created in Deluge isn't uh by companies that are actually trying
to make a

584
00:41:54,538 --> 00:41:55,299
but why not?

585
00:41:55,299 --> 00:41:58,991
mean, we actually live in the middle of a hustle culture.

586
00:41:58,991 --> 00:42:04,624
It's actually really cool to have your own company and say, I'm an entrepreneur.

587
00:42:04,624 --> 00:42:10,087
So those people who created Tetris clones for their own use, they're happy with it.

588
00:42:10,087 --> 00:42:13,329
They would absolutely try to monetize.

589
00:42:13,349 --> 00:42:14,129
We would see that.

590
00:42:14,129 --> 00:42:18,051
We would see at least a blip and we don't even see that.

591
00:42:18,252 --> 00:42:20,393
So I don't know.

592
00:42:20,613 --> 00:42:22,104
Something just does not add up.

593
00:42:22,104 --> 00:42:23,097
think it's the bottleneck.

594
00:42:23,097 --> 00:42:25,032
think that just answers a lot of the questions there.

595
00:42:25,032 --> 00:42:29,093
Whereas, like, the problem is not actually doing the software development.

596
00:42:29,093 --> 00:42:30,446
And that's the thing that's tough.

597
00:42:30,446 --> 00:42:37,020
mean, that's, that's my thinking because to solve the problem, coding is actually a very
small part of that.

598
00:42:37,020 --> 00:42:39,361
Like you have to understand what, what is that?

599
00:42:39,361 --> 00:42:40,792
Why are we doing this?

600
00:42:40,792 --> 00:42:42,393
Is this really the right thing to do?

601
00:42:42,393 --> 00:42:43,823
Like, is that what the users want?

602
00:42:43,823 --> 00:42:49,376
And I mean, one thing, if the users say they want this very often, they have no idea.

603
00:42:49,376 --> 00:42:52,938
Even if they genuinely want this, will they pay for it?

604
00:42:52,938 --> 00:42:54,519
How do we make sure that they pay?

605
00:42:54,519 --> 00:42:57,653
How do we make sure that they actually know that it exists?

606
00:42:57,653 --> 00:43:00,054
You know, all that, you know, boring marketing stuff.

607
00:43:00,054 --> 00:43:05,433
And yes, LLMs can absolutely generate you a marketing copy.

608
00:43:05,486 --> 00:43:15,541
So this is a corollary for that, I think you identified early on, which is that if it were
the case that we were getting a lot of value out from this, we wouldn't see the large

609
00:43:15,541 --> 00:43:18,102
providers out there pushing those features on for us.

610
00:43:18,102 --> 00:43:21,043
They would be charging specifically for that value.

611
00:43:21,043 --> 00:43:21,836
you think of the-

612
00:43:21,836 --> 00:43:29,710
are charging absolutely like they increased prices on like even if you didn't ask for that
they increase the price because they now it comes with AI

613
00:43:29,710 --> 00:43:30,460
Well, that's my point.

614
00:43:30,460 --> 00:43:36,953
My point is that it would be able to be separated by the value specifically that was being
offered there to end users.

615
00:43:36,953 --> 00:43:42,985
So email summarization, et cetera, as a core feature rather than something that was thrown
into the package.

616
00:43:42,985 --> 00:43:45,493
And then the price is increased when no one asked for that.

617
00:43:45,493 --> 00:43:49,890
It doesn't necessarily help me to auto complete my sentences in my email.

618
00:43:49,890 --> 00:43:51,598
It doesn't really offer me that.

619
00:43:51,598 --> 00:43:52,448
Yeah, so this is the thing.

620
00:43:52,448 --> 00:43:54,819
I do think LLMs are great.

621
00:43:54,819 --> 00:43:55,990
They're fantastic tools.

622
00:43:55,990 --> 00:43:59,541
However, they are good for sort of low value work.

623
00:44:00,142 --> 00:44:06,545
The work where the stakes aren't very high and maybe it won't necessarily, the output
won't make you money.

624
00:44:06,645 --> 00:44:18,060
And as such, it's fine if they're cheap, but I mean, I think right now they're still
offered sort of below the cost because everyone's trying to grab the market share.

625
00:44:18,060 --> 00:44:25,312
But the underlying cost of running this technology, I feel like it's a little bit too
expensive right now for the value that they offer.

626
00:44:25,742 --> 00:44:27,463
It's interesting you bring that up.

627
00:44:27,463 --> 00:44:30,276
think that there's a couple different levels here.

628
00:44:30,276 --> 00:44:31,767
There's the fundamental level that's good.

629
00:44:31,767 --> 00:44:36,931
The companies are creating foundational models and they are for sure subsidizing
everyone's use of this.

630
00:44:36,931 --> 00:44:42,636
company, every time you make an API request or enter a prompt, the company that you're
using that to actually lose its money.

631
00:44:42,636 --> 00:44:52,324
But then there are these companies that are sitting on top of them which are getting the
benefit of that price reduction and are potentially able to make a sustainable business

632
00:44:52,324 --> 00:44:54,345
off of it for their customers.

633
00:44:55,190 --> 00:45:00,295
And of course, they're subsidizing it as well because who's subsidizing actually is the
VCs, not the company itself.

634
00:45:00,295 --> 00:45:07,441
But their hope is that the foundational model companies actually go out of business
because they can't recoup the losses that they have.

635
00:45:07,441 --> 00:45:12,905
uh And realistically, new companies spin up that create cheaper foundational models that
they can utilize.

636
00:45:12,942 --> 00:45:16,122
I mean, this is the standard from worldly map.

637
00:45:16,122 --> 00:45:22,382
I don't know if this is the right audience to bring this up, but the worldly maps
basically talks about the product evolution.

638
00:45:22,382 --> 00:45:24,882
At some point you end up with a commodity.

639
00:45:24,882 --> 00:45:31,882
I think it's definitely in the interest of people providing more complex AI solutions.

640
00:45:31,882 --> 00:45:34,862
It's in their interest that the foundational models become commodity.

641
00:45:34,862 --> 00:45:36,962
They become worthless almost.

642
00:45:37,218 --> 00:45:47,160
So do you think that we'll get to a point where LLMs are being provided by the state, by
governments, as a critical resource that everyone should have access to?

643
00:45:47,160 --> 00:45:52,004
We all are working in the mines to support the...

644
00:45:52,802 --> 00:45:55,206
the power necessary to drive them.

645
00:45:55,206 --> 00:45:56,066
the coal mines, right?

646
00:45:56,066 --> 00:45:57,446
Because it will all be powered by...

647
00:45:57,446 --> 00:45:58,206
I'm sorry.

648
00:45:58,206 --> 00:45:59,406
I didn't mean to...

649
00:45:59,406 --> 00:46:01,406
This took a really dark turn.

650
00:46:01,406 --> 00:46:07,266
Anyway, another fun statistic, not from the Dora report, one that was a little bit
confusing to me.

651
00:46:07,266 --> 00:46:16,326
Apparently, GitHub Copilot says that only 30 % of their LLM-based code suggestions are
accepted.

652
00:46:16,326 --> 00:46:20,654
So if you get a suggestion from LLM, only 30 % of those are accepted.

653
00:46:20,654 --> 00:46:26,670
actually competes with that what was it 56 % of the no no the not that one

654
00:46:26,670 --> 00:46:30,606
66 % use it for gen, no 70 % use it for code generation.

655
00:46:30,606 --> 00:46:31,306
Yeah, right.

656
00:46:31,306 --> 00:46:32,806
So there's a little bit of a disconnect here.

657
00:46:32,806 --> 00:46:40,206
70 % user for code generation, 30 % of the time they were using the LLM while doing
software development.

658
00:46:40,206 --> 00:46:43,866
And then 30 % of the code suggestions are being actually accepted.

659
00:46:43,866 --> 00:46:49,826
This tells us, yes, people are using LLMs all the time, but they're not actually accepting
the output specifically.

660
00:46:50,214 --> 00:46:57,791
I mean, from what I've heard, also very often people take the code into a different, into
a chat window and that's when they work.

661
00:46:57,791 --> 00:46:58,150
Right.

662
00:46:58,150 --> 00:46:59,374
So, however,

663
00:46:59,374 --> 00:47:03,236
trust in how I'm running anywhere in even a container on my machine.

664
00:47:03,236 --> 00:47:04,326
wow.

665
00:47:05,687 --> 00:47:09,829
Anyway, right.

666
00:47:10,129 --> 00:47:11,989
I mean, right.

667
00:47:12,630 --> 00:47:20,293
Anyway, like what I found interesting is that GitHub co-pilot wrote a whole article about
it and they seem to think that 30 % is fantastic.

668
00:47:20,293 --> 00:47:23,504
They shout about it from the rooftops as if it was a great result.

669
00:47:23,504 --> 00:47:28,316
I'm like, what world do we like?

670
00:47:28,316 --> 00:47:28,691
30 %?

671
00:47:28,691 --> 00:47:30,197
That's not a lot.

672
00:47:30,197 --> 00:47:32,418
I would, I would be embarrassed.

673
00:47:32,546 --> 00:47:35,946
quite frankly, but apparently that's labeled as a success.

674
00:47:35,946 --> 00:47:39,862
I mean, think it tells more about me than the technology.

675
00:47:39,862 --> 00:47:41,493
Like I just don't get it.

676
00:47:41,493 --> 00:47:42,864
Why is 30 % good enough?

677
00:47:42,864 --> 00:47:48,458
Like I would want, you know, at least 50 % of those suggestions to be helpful and I would
want to accept them.

678
00:47:48,458 --> 00:47:50,450
Otherwise this is just distracting and annoying.

679
00:47:50,450 --> 00:47:52,462
Happy?

680
00:47:52,462 --> 00:47:53,462
Anyone?

681
00:47:53,762 --> 00:48:01,155
I think there's another problem here though, which isn't just the percentage of code
suggestions that are being accepted.

682
00:48:01,155 --> 00:48:04,817
It doesn't really tell us about which code suggestions are being accepted.

683
00:48:04,817 --> 00:48:12,171
instance, I find a very high success rate for the first suggestion that an LLM makes on a
single line of code.

684
00:48:12,171 --> 00:48:20,054
the second suggestion that comes up right after that for the second line to generate is
then increasingly wrong and so on and so forth.

685
00:48:20,054 --> 00:48:23,696
By the third line or the fourth line, it's now nowhere close to anything

686
00:48:23,696 --> 00:48:25,067
that you wanted at all.

687
00:48:25,067 --> 00:48:31,163
So the 30%, you know, maybe a 30 % across all the suggestions where the first suggestion
could be like 80%.

688
00:48:31,163 --> 00:48:32,594
And so where's the value?

689
00:48:32,594 --> 00:48:33,985
The value isn't on that next line.

690
00:48:33,985 --> 00:48:37,598
That next line is something that everyone knows what it should be, right?

691
00:48:37,598 --> 00:48:39,050
It's like a log statement.

692
00:48:39,050 --> 00:48:40,751
It's an if condition, et cetera.

693
00:48:40,751 --> 00:48:43,814
The real problem is like what the meat of what you're putting in there.

694
00:48:43,814 --> 00:48:48,316
And if those are the ones that I bet are way less than 30%.

695
00:48:48,316 --> 00:48:49,250
oh

696
00:48:49,250 --> 00:48:57,776
I mean, but that does again corroborate that it really gets you, it gets you to that
mediocre point much faster.

697
00:48:57,776 --> 00:49:04,601
So it speeds up the initial part of, of creating code, but it doesn't necessarily get you
all the way there.

698
00:49:04,601 --> 00:49:07,743
And then, okay, you still have to figure that out.

699
00:49:07,743 --> 00:49:12,296
And that's the part that usually takes the longest and the most effort, at least from my
experience.

700
00:49:12,296 --> 00:49:12,616
I don't know.

701
00:49:12,616 --> 00:49:14,827
I'm probably a really bad software developer.

702
00:49:14,827 --> 00:49:16,529
So what do I know?

703
00:49:16,529 --> 00:49:19,110
m

704
00:49:20,281 --> 00:49:21,312
I've seen your code.

705
00:49:21,312 --> 00:49:22,703
m

706
00:49:22,703 --> 00:49:25,344
What was seen cannot be unseen.

707
00:49:26,384 --> 00:49:28,264
Let's not talk about that.

708
00:49:28,605 --> 00:49:33,616
Anyway, the other part of the report, I feel like really there wasn't that much.

709
00:49:33,616 --> 00:49:35,044
It was all AI, AI, AI.

710
00:49:35,044 --> 00:49:38,037
mean, the report itself is no longer called DevOps.

711
00:49:38,037 --> 00:49:39,488
I don't know if I said that already.

712
00:49:39,488 --> 00:49:43,389
It's called a state of AI assisted software development.

713
00:49:43,389 --> 00:49:47,030
So if you try to download the report, you're looking for Dora anywhere.

714
00:49:47,030 --> 00:49:48,990
No, that's not there.

715
00:49:49,103 --> 00:49:51,508
Almost makes me not want to read next year's.

716
00:49:51,722 --> 00:49:54,193
I'm actually going to read it because I'm curious how it will compare.

717
00:49:54,193 --> 00:49:57,655
And I hope some of the statistics will be carried over.

718
00:49:57,655 --> 00:50:01,667
Like they did carry over some of the effectiveness situations.

719
00:50:01,667 --> 00:50:03,317
say, okay, yeah, we do see an improvement.

720
00:50:03,317 --> 00:50:12,922
But then when you look at the data, it's like, that's not that much of an improvement, but
they still use it to craft that narrative that, oh, okay, the organizations are learning

721
00:50:12,922 --> 00:50:18,644
how to, you know, deal with the technology, which I think is more of a wishful thinking.

722
00:50:19,279 --> 00:50:22,565
Anyway, they do speak a little bit about the platform engineering.

723
00:50:22,565 --> 00:50:29,230
em It seems like that is the trend that they didn't manage to really pick up on or benefit
from.

724
00:50:29,230 --> 00:50:38,764
The were stubborn there as well, like with the AI, because in the last year, like I said,
they had a negative impact on the organization, which if you actually really understand

725
00:50:38,764 --> 00:50:43,977
what the value that's being added, these teams often are behind the curve.

726
00:50:43,977 --> 00:50:49,740
They're reactive on what's actually happening and they tend not to think about platform
engineering as a product.

727
00:50:49,740 --> 00:50:51,680
we see the same this year as well.

728
00:50:51,680 --> 00:50:59,224
People think that like AI, there's a positive impact on the organization when you have a
good understanding of what your platforms can be.

729
00:50:59,224 --> 00:51:01,309
what your internal development or tooling can be.

730
00:51:01,309 --> 00:51:05,977
But when you actually look at the impact, the software instability increases as well.

731
00:51:06,134 --> 00:51:07,615
Yeah, I mean, that's...

732
00:51:08,336 --> 00:51:10,337
But that's like every tool, right?

733
00:51:10,337 --> 00:51:15,720
What you said, that if you use it correctly, it does what it's supposed to do.

734
00:51:15,720 --> 00:51:17,966
If you use it incorrectly, you can actually hurt yourself.

735
00:51:17,966 --> 00:51:27,699
Well, I think the solidification or the codification of those practices is a problem,
though, because a lot of organizations aren't doing the best possible thing at all times.

736
00:51:27,699 --> 00:51:33,031
And then they go and they take the step of solidifying that process when there are
mistakes in it.

737
00:51:33,031 --> 00:51:41,714
And so the question is, is right now, you know, if you're thinking about this, does your
organization have a process which you can guarantee with 100 % accuracy that it is exactly

738
00:51:41,714 --> 00:51:42,856
what that

739
00:51:42,856 --> 00:51:44,456
percent at all times,

740
00:51:44,456 --> 00:51:47,951
Not every process, just just one process that you have that is perfect.

741
00:51:47,951 --> 00:51:52,326
And if you don't have a perfect process, the way you're doing code, the way we're doing
code.

742
00:51:52,326 --> 00:51:53,797
just do code reviews.

743
00:51:53,879 --> 00:51:55,534
And also do testing.

744
00:51:55,534 --> 00:51:56,357
to test it.

745
00:51:56,357 --> 00:51:59,456
Well, there's an argument like sometimes maybe you don't need to test something.

746
00:51:59,970 --> 00:52:04,591
Because the users will test it and they will happily pay for it with games.

747
00:52:04,591 --> 00:52:07,212
Anyway, that's not the Rail, the conversation.

748
00:52:07,392 --> 00:52:12,254
I mean, honestly, I would love that to be more in the Dora report, but that's pretty much
it.

749
00:52:12,254 --> 00:52:13,294
I mean, you've heard it all.

750
00:52:13,294 --> 00:52:19,456
So there were some interesting statistics, but overall a lot of narrative.

751
00:52:19,456 --> 00:52:26,678
And you know, like when I was reading, I didn't fully read every word of the narrative
because at some point I'm just like, I can't.

752
00:52:26,939 --> 00:52:30,369
But I got my spring poop moment.

753
00:52:31,918 --> 00:52:33,662
You don't know what I'm talking about.

754
00:52:33,878 --> 00:52:35,363
Not particularly.

755
00:52:35,542 --> 00:52:47,639
So we live in Switzerland and there's this thing that happens here every few times in a
year where you go outside and there is this smell of manure everywhere.

756
00:52:47,639 --> 00:52:49,990
And that happens like no matter where you live, you can live in a city.

757
00:52:49,990 --> 00:52:51,031
I live in a city.

758
00:52:51,031 --> 00:52:57,764
It's still there and it's most prevalent like in the beginning of the spring, like the
first good day after the winter.

759
00:52:57,764 --> 00:53:05,206
And that's basically farmers spraying the cow poop over the fields because it accumulated
over the...

760
00:53:05,206 --> 00:53:05,536
winter.

761
00:53:05,536 --> 00:53:09,427
And so I associate that smell with spring.

762
00:53:09,827 --> 00:53:23,181
And so, you know, when reading that narrative for Dora Report, it's just basically that
experience when you go outside, oh hopeful, excited, with anticipation and things are

763
00:53:23,181 --> 00:53:29,533
great, but there's this thing in the background that's not exactly pleasant and it's like
everywhere.

764
00:53:29,846 --> 00:53:33,688
So, yeah, that was the experience.

765
00:53:33,688 --> 00:53:40,723
So now that you've read this, there must be something though that you still feel like you
could apply to our organization, to our company.

766
00:53:40,723 --> 00:53:43,805
Must have been some insight, which could be...

767
00:53:44,210 --> 00:53:48,970
I mean, it's not really a result of the report because this is something that I've always
said.

768
00:53:48,970 --> 00:53:55,458
I mean, I do treat LLMs like tools, like IDEs or your operating system.

769
00:53:55,999 --> 00:53:58,040
I don't care if you use it or not.

770
00:53:58,040 --> 00:54:01,062
If you're an engineer, if it helps you use it.

771
00:54:01,062 --> 00:54:07,026
mean, figure out how to make, I'll help you to figure out how to make it useful, but it is
really your own thing.

772
00:54:07,026 --> 00:54:12,170
eh I don't expect that it will make us faster or better, will produce better code.

773
00:54:12,170 --> 00:54:12,800
mean,

774
00:54:12,800 --> 00:54:20,641
Maybe it will if it makes the engineer happier or more efficient individually, but uh
that's a marginal change.

775
00:54:20,641 --> 00:54:23,434
So that's basically what I've gathered from this report.

776
00:54:23,434 --> 00:54:24,676
And I always thought about that.

777
00:54:24,676 --> 00:54:30,924
It's like for the comfort of individual engineers, sure, if it makes your life easier.

778
00:54:31,054 --> 00:54:36,900
Okay, with that, I wonder if we should close out this episode and move over to Pix.

779
00:54:38,543 --> 00:54:41,545
Like, you know, I asked you to bring something for this.

780
00:54:41,910 --> 00:54:44,571
picks, sorry, pickaxes.

781
00:54:44,632 --> 00:54:46,772
Yes, bring, bring, bring.

782
00:54:46,813 --> 00:54:47,413
Commerce.

783
00:54:47,413 --> 00:54:50,044
uh No, I actually didn't bring anything.

784
00:54:50,044 --> 00:54:55,256
However, I have uh more of a concept of what you could do because I've done that.

785
00:54:55,256 --> 00:54:56,757
just have nothing to show for it.

786
00:54:56,757 --> 00:54:58,918
uh Mushrooms.

787
00:54:58,918 --> 00:55:05,440
uh I mean, before you go in a weird place, I went and bought a mushroom kit from a grocery
store.

788
00:55:05,440 --> 00:55:09,186
uh Pearl oyster mushrooms, delicious.

789
00:55:09,186 --> 00:55:10,567
So was basically a box.

790
00:55:10,567 --> 00:55:17,209
You can just prepare it a little bit, open it up and mushrooms just emerged and it's
fantastic.

791
00:55:17,209 --> 00:55:19,290
I love watching mushrooms grow.

792
00:55:19,290 --> 00:55:20,471
They grow so fast.

793
00:55:20,471 --> 00:55:22,031
It's like almost in front of your eyes.

794
00:55:22,031 --> 00:55:22,731
was fantastic.

795
00:55:22,731 --> 00:55:23,272
Great.

796
00:55:23,272 --> 00:55:26,813
I have tons of pictures and I ate them.

797
00:55:28,034 --> 00:55:37,097
So yeah, I'm sure like in your area, because like mushrooms tend to be very local, I'm
sure in your area someone sells mushroom kits for your local varieties.

798
00:55:37,097 --> 00:55:39,158
Like oyster mushrooms, fantastic.

799
00:55:39,374 --> 00:55:41,932
uh Get yourself a mushroom kit, it's fun.

800
00:55:41,932 --> 00:55:45,505
I think once you start seeing mushrooms, I never saw them when I was in the US.

801
00:55:45,505 --> 00:55:51,249
I never paid it close attention and now since I've moved to Europe, I now I see them
everywhere during this.

802
00:55:51,350 --> 00:55:52,350
are everywhere.

803
00:55:52,350 --> 00:55:55,970
I fungus, fungi, they're everywhere.

804
00:55:56,030 --> 00:55:56,904
I love mushrooms.

805
00:55:56,904 --> 00:55:57,266
I see.

806
00:55:57,266 --> 00:56:00,500
So a particular grow kit, you know, go out and get one and try it.

807
00:56:00,500 --> 00:56:02,014
And they're cheap too.

808
00:56:02,014 --> 00:56:02,935
Yeah, they were actually cheap.

809
00:56:02,935 --> 00:56:06,920
mean, I didn't really do it for monetary reasons.

810
00:56:06,920 --> 00:56:10,943
The experience of watching the mushrooms grow, that was worth a lot.

811
00:56:11,745 --> 00:56:19,612
But yeah, mean, if we just say how much would I have paid for the mushrooms, obviously I
got more out of that kit than...

812
00:56:19,670 --> 00:56:20,770
Okay.

813
00:56:20,951 --> 00:56:21,671
I like it.

814
00:56:21,671 --> 00:56:23,072
Okay.

815
00:56:23,072 --> 00:56:23,933
What did I bring?

816
00:56:23,933 --> 00:56:29,086
So my pick for this week is an article that talks about the maximum effective context
window for LLM.

817
00:56:29,086 --> 00:56:38,381
So I think it's maybe a little bit relevant and it's interesting because I find that a lot
of people keep on saying how we're almost to the, like any problems that we see with LLMs

818
00:56:38,381 --> 00:56:41,203
will eventually get solved by increasing the context window.

819
00:56:41,203 --> 00:56:44,895
And this article really points to the fact that that may not actually be true.

820
00:56:44,895 --> 00:56:45,966
So the article is

821
00:56:45,966 --> 00:56:49,867
uh context is what you need, the maximum effective context window.

822
00:56:49,928 --> 00:57:00,332
And it really points out that if you increase the context window and put more tokens into
the prompt, that the LLM will struggle to identify what's actually relevant, what's the

823
00:57:00,332 --> 00:57:01,622
most important piece of information.

824
00:57:01,622 --> 00:57:03,823
I think the same thing is true for humans.

825
00:57:03,823 --> 00:57:06,112
So it's not surprising that this result.

826
00:57:06,112 --> 00:57:09,622
For example, right now I no longer know what you were talking about.

827
00:57:09,622 --> 00:57:18,626
Okay, so just to reiterate, uh increase in the context window is actually problematic and
we're hitting a fundamental limit, which means that larger sizes, 1 million, 2 million

828
00:57:18,626 --> 00:57:21,348
tokens aren't actually going to be valuable for us.

829
00:57:21,348 --> 00:57:29,181
We need to find a way to pass the right amount of information in and we're already at that
point, which means all the innovation that would come isn't going to be around increase in

830
00:57:29,181 --> 00:57:31,932
the context window or getting the memory right or anything like that.

831
00:57:31,932 --> 00:57:34,894
I think we're fundamentally stuck as far as this technology goes.

832
00:57:34,894 --> 00:57:39,586
So it's an interesting read about how they evaluate context window and they call it like a
needle in a haystack.

833
00:57:39,586 --> 00:57:49,007
of problem where you have a technology that you're utilizing and you want to have it
figure out what the parts of the prompt that are actually useful to process to handle it.

834
00:57:49,007 --> 00:57:51,019
I think, I don't know, something interesting about that.

835
00:57:51,019 --> 00:57:58,218
Okay, so with that, thank you Dorota so much for coming on today's episode.

836
00:57:58,218 --> 00:57:59,070
you for having me.

837
00:57:59,070 --> 00:58:00,614
hope I didn't offend anyone.

838
00:58:00,614 --> 00:58:03,570
Maybe the Dora authors?

839
00:58:03,570 --> 00:58:04,392
I'm sorry.

840
00:58:04,392 --> 00:58:06,477
I really appreciate that they're doing this report.

841
00:58:06,477 --> 00:58:09,141
It's just this year was a little bit disappointing.

842
00:58:09,658 --> 00:58:17,461
Well, we always get some angry emails, so you can throw them on the complaint pile uh and
we'll promise not to go through them.

843
00:58:17,481 --> 00:58:26,505
So thank you so much, Dorota, and thanks to all the listeners for listening to this
episode, and we'll see, hopefully, everyone back again next week.

