1
00:00:00,151 --> 00:00:04,874
Hello and welcome back to another episode of Adventures in DevOps.

2
00:00:16,331 --> 00:00:07,978
today I brought on Co-founder and CTO of Moderna.

3
00:00:07,978 --> 00:00:14,546
and previous Senior Product Manager at Pivotal after Principal Software Engineer at Adele,
uh Olga Kunzis.

4
00:00:14,546 --> 00:00:15,108
Welcome.

5
00:00:15,108 --> 00:00:16,559
Thank you, pleasure to be here.

6
00:00:16,559 --> 00:00:23,824
You know, I was looking at your profile on LinkedIn and I noticed that you were at Dell
for 12 years.

7
00:00:23,824 --> 00:00:28,547
And one way or another, I feel like I've been using Dell products for as long as I can
remember.

8
00:00:28,547 --> 00:00:32,530
I absolutely love the Dell XPS line of laptops.

9
00:00:32,530 --> 00:00:37,854
And I never really considered that there was a huge software engineering department at
Dell.

10
00:00:37,854 --> 00:00:40,657
I'm sort of curious, what was going on during

11
00:00:40,657 --> 00:00:44,588
So I came to Dell through EMC acquisition.

12
00:00:44,588 --> 00:00:48,949
That's where I spent a lot of my, all that time.

13
00:00:48,949 --> 00:01:00,413
I've worked in enterprise data protection, kind of working with the large enterprises and
with everything that they had in their environments from Oracle, Postgres, DB2, storage

14
00:01:00,413 --> 00:01:03,206
rate, networking, moving data to the cloud,

15
00:01:03,206 --> 00:01:13,699
Yeah, I think there are lots of different product lines and lots of different integrations
and it was super interesting kind of how that whole ecosystem evolved together.

16
00:01:13,699 --> 00:01:17,291
And you were like in the technology side for quite a long time.

17
00:01:17,291 --> 00:01:25,910
And then if I got this right, you've moved over to the company that's pretty much turned
around Spinnaker into a real product for the community,

18
00:01:25,910 --> 00:01:30,711
So there was a large community of people around Spinnaker when we worked with it.

19
00:01:30,711 --> 00:01:36,670
There was Google and Netflix kind of ran that community for a time with contributions from
Armory.

20
00:01:36,670 --> 00:01:42,337
Pivotal was the third largest contributor to Spinnaker at the time when we worked on it.

21
00:01:42,337 --> 00:01:46,500
we did put a lot of enterprise features like authentication authorization.

22
00:01:46,500 --> 00:01:47,881
things like that into it.

23
00:01:47,881 --> 00:01:58,964
But what we've heard from customers and community users was very often that people were
very interested in advanced capabilities of Spinnaker, like canary analysis and

24
00:01:58,964 --> 00:01:59,624
everything.

25
00:01:59,624 --> 00:02:09,960
But very often we heard time and time again, like talk to me in a year when I'm done
migrating Spring Boot one to two or fix this lock for shell vulnerability or like, and

26
00:02:09,960 --> 00:02:17,087
Then I will finally find the time to, and you know, the time will never come because you
just spring with two to three, three to four.

27
00:02:17,087 --> 00:02:24,434
it's every year it repeats and kind of we've heard from pivotal customers, like exactly
the same thing at the same time.

28
00:02:24,434 --> 00:02:30,541
It kind of dawned on us that, like we used to have technical debt that was our own.

29
00:02:30,541 --> 00:02:32,302
It wasn't our own applications.

30
00:02:32,302 --> 00:02:34,205
It was developer making a mistake.

31
00:02:34,205 --> 00:02:38,958
uh choosing a wrong pattern and now you have application that struggles with technical
debt.

32
00:02:38,958 --> 00:02:45,215
But a lot of what we have called technical debt is actually a system of software moving
from under you.

33
00:02:45,215 --> 00:02:55,094
A developer that makes a perfect choice today, the latest framework, the best libraries,
the best architectural patterns, the fashion of the day, builds an applications and six

34
00:02:55,094 --> 00:02:58,025
months later it's struggling with technical debt.

35
00:02:58,025 --> 00:03:06,839
had a meetup at one point and we asked developers, we polled them, how long your
applications are going to continue to function if you're not allowed to touch their source

36
00:03:06,839 --> 00:03:07,439
code.

37
00:03:07,439 --> 00:03:11,001
And the answer that came back was six months, which is shocking.

38
00:03:11,001 --> 00:03:15,712
In six months, your perfect application is accruing technical depth to a point of
stopping.

39
00:03:15,712 --> 00:03:19,884
But then you think about it, it's like, it's the real cadence that we see in the industry.

40
00:03:19,884 --> 00:03:24,686
The Kubernetes community, they make deprecations, they make releases every quarter.

41
00:03:24,686 --> 00:03:28,269
They duplicate something in one release, they remove it in the follow-up release.

42
00:03:28,269 --> 00:03:30,141
Spring boot, the same thing.

43
00:03:30,141 --> 00:03:34,174
So it's really if you don't touch the application for six months, that's what happens.

44
00:03:34,174 --> 00:03:35,895
You know, it's really interesting that you asked that.

45
00:03:35,895 --> 00:03:43,518
I'm surprised at the six month answer almost from the other perspective because I feel
like a lot of companies today are trying to move so quickly.

46
00:03:43,518 --> 00:03:48,700
They're not actually considering the cost of the technology or software that they're
putting out there.

47
00:03:48,700 --> 00:03:49,332
And

48
00:03:49,332 --> 00:03:56,470
alert the other day that the Azure Functions stopped functioning and couldn't renew some
sort of certificate.

49
00:03:56,470 --> 00:04:03,068
And we looked at it, we developed this function like nine months ago, it was perfect Azure
Function, now it no longer works.

50
00:04:03,068 --> 00:04:08,414
And I think it's like the second time in the history of this company that we have Azure
Functions stop working on us.

51
00:04:08,414 --> 00:04:11,196
yeah, no, I can totally see that.

52
00:04:11,196 --> 00:04:16,211
I'm sort of curious, so what is the right timeframe then that you would expect

53
00:04:16,211 --> 00:04:19,553
I think it depends on what kind of software we are talking about.

54
00:04:19,553 --> 00:04:27,148
Think about applications that we developed, I don't know, 10, 20 years ago, the one that
we deployed on prem and they ran on our own servers.

55
00:04:27,148 --> 00:04:29,919
Those were self-contained and isolated.

56
00:04:29,919 --> 00:04:35,753
And maybe if you don't touch their code, they don't magically accrue any features, but
they don't stop working.

57
00:04:35,753 --> 00:04:39,305
And then there is the cloud native applications that are really impacted.

58
00:04:39,305 --> 00:04:47,831
I think the applications that you mentioned like satellites and more like applications
that we deployed on-prem, they are self-contained, whereas the cloud native applications

59
00:04:47,831 --> 00:04:49,962
are the ones that are the most impacted.

60
00:04:49,962 --> 00:04:55,586
They have like 80, 90 % of their code is actually open source and third party
dependencies.

61
00:04:55,586 --> 00:05:01,252
This glue code that constantly needs to be restitched, otherwise it fails to function.

62
00:05:01,252 --> 00:05:04,955
What's special about the cloud native environments that causes

63
00:05:04,955 --> 00:05:06,655
it's third party dependencies.

64
00:05:06,655 --> 00:05:15,946
We know that cloud native applications are 80 to 90 % third party dependencies, open
source frameworks, vendor APIs that are changing from underneath you.

65
00:05:15,946 --> 00:05:21,682
you don't keep up Kubernetes infrastructure making breaking changes every six months.

66
00:05:21,682 --> 00:05:23,546
So that's, that's the key.

67
00:05:23,546 --> 00:05:32,905
What happens if organizations just stay on that first version of Spring Boot or Node.js
version 14 or 16 and never upgrade?

68
00:05:32,905 --> 00:05:36,085
So the first one, are two sides to this.

69
00:05:36,085 --> 00:05:37,767
One is vulnerabilities, right?

70
00:05:37,767 --> 00:05:42,965
Like Spring Boot right now no longer supports even version, I don't know, 3.3 is the
latest.

71
00:05:42,965 --> 00:05:51,939
I If you fall behind and you have vulnerability fixed in the library that they no longer
support in the open source, you need a patch for that.

72
00:05:51,939 --> 00:05:55,071
You're to pay millions of dollars as enterprise to this vendor.

73
00:05:55,071 --> 00:05:59,105
And the second part, think, is businesses want modern applications.

74
00:05:59,105 --> 00:06:06,320
Applications that they built on frameworks that were popular a decade ago look dated, and
developers don't want to work with them.

75
00:06:06,320 --> 00:06:16,125
And so you get into a state where you have legacy applications that are the most valuable
applications that you have in your portfolio that created your legacy as a business, but

76
00:06:16,125 --> 00:06:18,916
you cannot add anything to them because they are old.

77
00:06:18,916 --> 00:06:20,325
No one wants to work with them,

78
00:06:20,325 --> 00:06:21,962
I think the other one is performance.

79
00:06:21,962 --> 00:06:31,510
like we know the cloud costs that save by moving from Java version eight or 25 are like in
the 30 to 40%.

80
00:06:31,510 --> 00:06:35,962
Just because there's more optimizations went into Java runtime, it's more efficient.

81
00:06:35,962 --> 00:06:38,202
able to evolve your business applications.

82
00:06:38,202 --> 00:06:43,070
faster and bring user experiences online plus security vulnerabilities.

83
00:06:43,070 --> 00:06:43,740
That's interesting.

84
00:06:43,740 --> 00:06:45,272
hadn't heard that perspective before.

85
00:06:45,272 --> 00:06:55,222
the reason to be upgrading the technology in your stack relies on the fact you need the
engineers that you have actually only want to work on the latest technologies that you

86
00:06:55,222 --> 00:07:03,938
have available to you, or maybe more realistically, if you are growing and you need to
hire outside of your company, what are you going to put on those Java applications?

87
00:07:03,938 --> 00:07:10,215
And are you going to put on React version 4 or, as you said, Spring Boot version 1 or Java
version 8?

88
00:07:10,215 --> 00:07:14,203
If you do, who is going to be able to come who wants to work on those things.

89
00:07:14,203 --> 00:07:15,493
That's an interesting perspective.

90
00:07:15,493 --> 00:07:21,914
know from working in healthcare and aerospace and e-commerce, honestly, I don't think that

91
00:07:21,914 --> 00:07:29,985
the type of company that you are working with or even the technology has a huge impact on
the generation of tech debt.

92
00:07:29,985 --> 00:07:35,078
would be working primarily with business critical business applications.

93
00:07:35,078 --> 00:07:42,063
think there are different embedded applications, hardware, operating systems, and those I
believe would be different.

94
00:07:42,063 --> 00:07:46,835
But my experience primarily was with this type of software.

95
00:08:04,454 --> 00:07:55,509
I think there is this aspect where is it this elusive hypothetical problem that people
just point to when they can't describe an actual scenario?

96
00:07:55,509 --> 00:07:55,960
Or

97
00:07:55,960 --> 00:08:02,064
I don't think I'm passionate about technical debt and passionate really about being able
to develop software faster.

98
00:08:02,064 --> 00:08:08,139
think we, on the business side, we are really constrained by how much developers can
create software.

99
00:08:08,139 --> 00:08:12,872
think we failing to update like the infrastructure or our society.

100
00:08:12,872 --> 00:08:18,787
It's like still runs on cobalt, a lot of vulnerabilities in the stock, et cetera.

101
00:08:21,135 --> 00:08:29,582
when we talked about ROI or open rewrite and modern to our customers, it's not like you
say this much effort by using automation to remediate vulnerabilities.

102
00:08:29,582 --> 00:08:33,218
It's we return engineering capacity back to business.

103
00:08:33,218 --> 00:08:38,843
Right now we know engineers spend 30 to 40 % of their time on technical debt

104
00:08:38,843 --> 00:08:44,094
I think this is a mistake that lot of inexperienced engineers make where they just point
to the word tech debt

105
00:08:44,094 --> 00:08:52,069
one of the challenges that I've been wrestling with how do you deal with the challenge of
as you create more, there is more to have to deal with.

106
00:08:52,602 --> 00:08:55,401
don't think that you can just always stack more on top.

107
00:08:57,500 --> 00:09:03,120
I have to wonder, is there some maximum amount that we're just always as a business, if we
create more software, we are fundamentally always going to get to some

108
00:09:03,214 --> 00:09:08,465
to kind of visualize the problem that we have right now, just how much source code we
have.

109
00:09:08,465 --> 00:09:13,376
And we know one of our customers at the time had 500 million lines of code.

110
00:09:13,376 --> 00:09:20,707
So we said, if you take this 500 million lines of code and write it in the same books and
put these books like side by side.

111
00:09:20,707 --> 00:09:25,492
Like not in a toll-back case, but just in one line, how long this line will stretch.

112
00:09:25,492 --> 00:09:28,065
And this line will stretch from Miami to Montreal.

113
00:09:28,065 --> 00:09:31,779
This is just how much code they have on the management right now.

114
00:09:31,779 --> 00:09:36,733
And sort of when we all the work with code maintenance right now is very manual.

115
00:09:36,733 --> 00:09:42,868
It's a developer pulling this repositories into their ID, doing something to it, checking
back to GitHub.

116
00:09:42,868 --> 00:09:47,122
Like the amount of code we have, it does not fit into this workflow.

117
00:09:47,122 --> 00:09:51,797
I want to ask you about that, but first I want to get some, like my bearings set first.

118
00:09:51,797 --> 00:09:55,911
How common is 500 million lines of code versus 5 billion lines?

119
00:09:55,911 --> 00:09:57,353
Like is that a lot?

120
00:09:57,353 --> 00:09:58,754
Is that average?

121
00:09:58,754 --> 00:10:05,136
What do you normally expect when you look at say the comparison of a company that just
came out of say, series A

122
00:10:05,136 --> 00:10:14,436
was interesting you bring this up because our guest from last week, John Papa from
Developer Relations at Microsoft, was actually sharing that saying when you're at the

123
00:10:14,436 --> 00:10:19,596
dinner table or you're meeting some new colleagues for the first time, what do you say you
do?

124
00:10:19,596 --> 00:10:21,476
A lot of people say, oh yeah, I write code.

125
00:10:21,476 --> 00:10:23,716
And he's like, no, no, we read code.

126
00:10:23,716 --> 00:10:24,847
That's our job.

127
00:10:24,847 --> 00:10:25,264
Yeah.

128
00:10:25,264 --> 00:10:29,027
I feel like now something that can't even be done effectively.

129
00:10:29,027 --> 00:10:36,788
So, you know, my concern is that more and more companies will start producing this
unreadable amount of code

130
00:10:36,788 --> 00:10:39,040
I think we should have less code, not more.

131
00:10:39,040 --> 00:10:39,720
Right.

132
00:10:39,720 --> 00:10:44,803
But I think what we have is what we have and no one understands what's in it and how to do
it.

133
00:10:44,803 --> 00:10:47,475
And think about the trends with AI.

134
00:10:47,475 --> 00:10:50,517
AI is not good at optimizing refactoring.

135
00:10:50,517 --> 00:10:53,614
It's just good at creating more similar looking stuff.

136
00:10:53,614 --> 00:10:58,372
now with AI, the developers create more code, but refactor less.

137
00:10:58,372 --> 00:10:59,004
Interesting.

138
00:10:59,004 --> 00:11:00,139
can believe that.

139
00:11:00,139 --> 00:11:00,905
go ahead.

140
00:11:00,905 --> 00:11:06,015
of similar code is worse with the AI than it used to be before.

141
00:11:06,015 --> 00:11:07,408
I want to I want to come back to that.

142
00:11:07,408 --> 00:11:14,302
First I want to ask about this is the what has caused you to basically create the
cornerstone of your business.

143
00:11:14,302 --> 00:11:21,599
uh Open rewrite mentioned what exactly is that doing I understand it, it's sort of what
Rosalyn is for C sharp.

144
00:11:21,599 --> 00:11:26,502
It evaluates your source code and converts it to some ASTs.

145
00:11:26,502 --> 00:11:35,040
in order to make like not regex related changes, but actually understanding what the
source code is doing from a structure standpoint

146
00:11:41,921 --> 00:11:38,130
maybe we talk about history of OpenRewrite, it came about.

147
00:11:38,130 --> 00:11:42,738
So my co-founder, founded OpenRewrite when he worked at Netflix Engineering Tools.

148
00:11:42,738 --> 00:11:53,200
And in that organization, was freedom and responsibility and central team couldn't break
the bill and say, at this date, you have to migrate X, Y, or remove this login library.

149
00:11:53,200 --> 00:11:55,474
And so people kept telling him.

150
00:11:55,474 --> 00:12:05,123
If you do it for me, I'll accept the change, but otherwise I have other things to do." And
he heard it enough time that he said, I'm going to try to do it for them and try to like

151
00:12:05,123 --> 00:12:06,505
almost immediately.

152
00:12:06,505 --> 00:12:12,650
He looked at the tools around and all of the tools around are based on abstract syntax
trees, which just the syntax.

153
00:12:12,650 --> 00:12:21,500
And that was already not sufficient in order to make one of the first migrations that they
wanted to do is to replace homegrown library with a standard.

154
00:12:21,500 --> 00:12:25,872
library for logging, which they regretted the mistake of starting their own.

155
00:12:25,872 --> 00:12:33,616
They wanted to standardize and they couldn't like which Netflix engineer would want to
come to work and replace log in statements one for one.

156
00:12:33,616 --> 00:12:44,795
And as we talked to a lot of enterprises at Pivotal and heard time and time again, like I
need to migrate Spring Boot 1 to 2, talk to me in a year, we felt like this technology was

157
00:12:44,795 --> 00:12:48,509
ripe for like repositioning for this type of migrations, right?

158
00:12:48,509 --> 00:12:52,544
And kind of build the catalog and it's highly repeatable across the enterprises.

159
00:12:52,544 --> 00:13:00,606
And so unlike Roslin, which is uh kind of works on a single repository in the IDE,

160
00:13:00,606 --> 00:13:09,477
OpenRite was developed to run outside of IDEs to accumulate different transformation steps
for migrations.

161
00:13:09,477 --> 00:13:23,257
with modern, we actually have a technology that serializes these LSTs that we produce for
repositories so we can study them and work with them in a horizontally scalable manner.

162
00:13:23,257 --> 00:13:28,469
OpenRite allows the developer to consume framework migration or Java A to S.

163
00:13:28,469 --> 00:13:31,100
25 migration on a single repository.

164
00:13:31,100 --> 00:13:39,171
can polish it, work with it, understand how they maybe some architectural changes are
necessarily on side of it.

165
00:13:39,171 --> 00:13:49,951
But with modern we can study the code basis at scale with this catalog of recipes, which
are units of transformations, which could be as small as change method name or as large as

166
00:13:49,951 --> 00:13:50,942
Spring Boot migration.

167
00:13:50,942 --> 00:14:00,298
think you're on really interesting path here because it seems like a foregone conclusion,
which is we already have too much source code in the world that's riddled with changes

168
00:14:00,298 --> 00:14:01,219
that need to be made.

169
00:14:01,219 --> 00:14:10,305
I'll not use the word tech dead so I don't get any angry letters, but if we just assume
there are changes that we want to make to our services, upgrades, patch changes, remove

170
00:14:10,305 --> 00:14:15,109
vulnerabilities, change versions, or swap out libraries, we have a whole list of things we
want to do.

171
00:14:15,109 --> 00:14:15,590
And.

172
00:14:15,590 --> 00:14:24,946
At the same time, now we're using LLMs which are generating an immense amount of garbage
code, duplication, unnecessary, wrong in some way.

173
00:14:24,946 --> 00:14:33,470
The ability to even consume that and understand what's going on is problematic and yet we
know that there is a concrete value associated with making these changes.

174
00:14:33,470 --> 00:14:34,651
How can we even do that?

175
00:14:34,651 --> 00:14:42,482
Which brings us to the conclusion of there must be improvements to our tool chain in order
to actually automatically make those changes.

176
00:14:42,482 --> 00:14:51,743
Like why would anyone go into their ID and open up and do even a reject search to replace
a string when what you need to do is so much more complicated than can write something

177
00:14:51,743 --> 00:14:52,724
programmatically

178
00:14:52,724 --> 00:14:56,817
replacement was very quickly failed to be done with Redgex.

179
00:14:56,817 --> 00:15:01,211
Just like you mentioned, you logger dot, which logger are you looking at?

180
00:15:01,211 --> 00:15:02,191
You don't know.

181
00:15:10,444 --> 00:15:04,390
this is like your second open source tool.

182
00:15:04,390 --> 00:15:10,255
there's a leadership of Pivotal with Spinnaker and now on to open rewrite at Moderna.

183
00:15:10,255 --> 00:15:18,177
It seems like you absolutely are in, like you prefer the open source community Has it been
all sunshine and rainbows or

184
00:15:18,177 --> 00:15:27,674
So with open source, we knew that underlying framework has to be open source just because
we have so many third party and open source libraries and dependencies.

185
00:15:27,674 --> 00:15:38,606
we need in order to scale this as an ecosystem, need an engagement from a lot of framework
authors to help create refactoring recipes to move their consumers forward.

186
00:15:38,606 --> 00:15:40,609
so the core framework is open source.

187
00:15:40,609 --> 00:15:44,333
We work with a lot of a number of framework authors, Quarkus.

188
00:15:44,333 --> 00:15:53,763
Micronaut and many others contributing recipes whenever they make a breaking change to
their library, they create a recipe that migrates their consumer and sort of the unit

189
00:15:53,763 --> 00:15:54,904
economics of change.

190
00:15:54,904 --> 00:15:56,546
It's kind of the best of both worlds.

191
00:15:56,546 --> 00:16:01,572
The framework authors can make changes to their framework and adopt the best patterns.

192
00:16:01,572 --> 00:16:08,861
So if they change their mind between versions of frameworks, they can make the change and
not lose all of their consumers at the same time.

193
00:16:08,861 --> 00:16:13,325
and then consumers can be upgraded at the time they get the new best library.

194
00:16:13,325 --> 00:16:18,110
Unfortunately, not all software framework authors made such changes.

195
00:16:18,110 --> 00:16:26,055
Some went the path of, I'm going to be backpatching and charging millions of dollars from
my consumers for private fixes.

196
00:16:26,055 --> 00:16:35,695
And then also what happened is two years ago, Amazon Q Code Transformer announced the
Migration Assistant, which was based on OpenRewrite.

197
00:16:35,695 --> 00:16:46,270
There was IBM Assistant for Migrations that also was based on OpenRerite, Microsoft
Co-Pilot, Broadcom Application Advisor also based on OpenRerite.

198
00:16:46,270 --> 00:16:58,436
It's interesting you bring up that other open source maintainers would actually vie for
the opportunity to create a recipe to allow their users their dependencies to migrate

199
00:16:58,436 --> 00:16:59,373
between versions.

200
00:16:59,373 --> 00:17:05,912
change log or even a migration document, but it's very high level and doesn't really help
you in any way.

201
00:17:05,912 --> 00:17:10,485
I it depends on people and how popular they are from framework.

202
00:17:10,485 --> 00:17:13,758
It's just for their, it's such a benefit to their consumers.

203
00:17:13,758 --> 00:17:18,911
We, for example, we now have a recipe that migrates from spring to Quarkus as well.

204
00:17:18,911 --> 00:17:24,615
So you not only can migrate between the versions of one framework, you can move people
from one framework to another.

205
00:17:24,615 --> 00:17:30,632
And Quarkus is contributing, providing their consumers with recipes for migrations, both
between

206
00:17:30,632 --> 00:17:33,765
Spring and Quarkus and between versions of Quarkus.

207
00:17:33,765 --> 00:17:37,390
We've seen very different behavior from different people.

208
00:17:37,390 --> 00:17:43,058
Honestly, that's, I the unit economics of like a framework author making one API change in
one place.

209
00:17:43,058 --> 00:17:51,232
so one of the challenges I want to ask you about is sort of the trust you put in your own
tool that you've created here, especially the Open Rewrite ecosystem.

210
00:17:51,232 --> 00:17:57,731
I think as someone who in the past has done extensive software development, my concern is
always, can I trust the

211
00:17:57,731 --> 00:18:02,795
Yeah, it's a very common question that we get as people try to adopt the tool.

212
00:18:02,795 --> 00:18:05,777
And I think we distinguish between two types of changes.

213
00:18:05,777 --> 00:18:08,779
And we start people with very small, simple changes.

214
00:18:08,779 --> 00:18:10,770
For example, look for shell remediation.

215
00:18:10,770 --> 00:18:14,531
It's not optional to remediate it or not remediate it.

216
00:18:14,531 --> 00:18:17,614
And the timelines are very, like you do it now.

217
00:18:17,614 --> 00:18:21,956
And the fix is like two lines of code inserted surgically into the application.

218
00:18:21,956 --> 00:18:30,878
Like with the rule-based system, not with the AI assistant, if it does the right things in
one place, you know that it will make the same changes across the code base.

219
00:18:30,878 --> 00:18:35,920
So the manual change, or AI assisted change, you have to review every single occurrence of
it.

220
00:18:35,920 --> 00:18:41,633
With the rule-based system, at some point you test it, you prove it to be right, you now
know.

221
00:18:41,633 --> 00:18:47,096
Like things like Gradle wrapper upgrades or like...

222
00:18:47,096 --> 00:18:49,037
minor different patch version upgrades.

223
00:18:49,037 --> 00:18:52,800
Like we do it with automation and just mass pure it out.

224
00:18:52,800 --> 00:19:02,805
But then there is, like you said, the difficult framework migrations that may, the recipe
may be like, because the open source ecosystem is so deep and it so depends on what you

225
00:19:02,805 --> 00:19:05,126
use from that open source ecosystem.

226
00:19:05,126 --> 00:19:11,610
Like you may find that the open source recipe makes only 80 % of the changes that you need
to make.

227
00:19:11,610 --> 00:19:13,831
Then you look at what changes are left over.

228
00:19:13,831 --> 00:19:17,454
You may decide to write more recipes to cover that.

229
00:19:17,454 --> 00:19:19,055
Or maybe you make changes.

230
00:19:19,055 --> 00:19:21,295
And then you do need to test it.

231
00:19:21,295 --> 00:19:27,218
So that's kind of the pool based changes that developers need to pull on their
workstations and test it out.

232
00:19:27,218 --> 00:19:35,892
And these are also not optional, but because 3inboot1 has vulnerabilities and you don't
want to pay millions of dollars to the vendor.

233
00:19:35,892 --> 00:19:38,724
So, but the timelines are different than...

234
00:19:38,724 --> 00:19:45,991
You do it maybe in between sprints or you plan with your business owner or product manager
saying like, will do this here.

235
00:19:45,991 --> 00:19:53,457
it's like, we'll make application look modern and we'll make build more features faster,
but I need this downtime in this period.

236
00:19:53,457 --> 00:19:55,422
And we kind of aligned where it is.

237
00:19:55,422 --> 00:20:00,117
So we support right now Java infrastructure as code.

238
00:20:00,117 --> 00:20:07,106
So Kubernetes Manifest remediation, Terraform, CI-CD pipelines, Docker images, things like
that.

239
00:20:07,106 --> 00:20:16,494
So because infrastructure code as code is copy paste drift, so kind of being able to see
across the repositories what you have there and being able to uplift it all together.

240
00:20:16,494 --> 00:20:18,627
We just announced JavaScript support.

241
00:20:18,627 --> 00:20:21,410
And Python and C Sharp is under development.

242
00:20:21,410 --> 00:20:28,248
we will become a sort of universal platform for code maintenance and modernization and
evolution.

243
00:20:28,300 --> 00:20:29,430
Wow.

244
00:20:29,430 --> 00:20:32,342
so, sorry, I have to think about that for a moment.

245
00:20:32,342 --> 00:20:42,496
My question is then the biggest challenge must be not only understanding how one would
write code in those languages, but what is idiomatic and more than that, what is the

246
00:20:42,496 --> 00:20:45,957
actual structure of the language in order to correctly parse it?

247
00:20:45,957 --> 00:20:54,443
You're writing a language parser, but getting into the LSTs that you mentioned seems like
a huge challenge for some languages more so than others.

248
00:20:54,443 --> 00:21:04,254
So it's interesting that we discovered what is called in academia a C language family,
which is like C, C sharp, Java, JavaScript, Python.

249
00:21:04,254 --> 00:21:16,034
They look very different as in source code, but you see the abstract syntax here very
similar, like the for loops, the if else, the, you know, the method calls, et cetera.

250
00:21:16,034 --> 00:21:16,685
And so.

251
00:21:16,685 --> 00:21:22,948
We actually are able to reuse the underlying Java-based implementation and extend it for
additional languages.

252
00:21:22,948 --> 00:21:25,791
So we have a reuse of the Recipe Catalog on day one.

253
00:21:25,791 --> 00:21:27,973
We build the LST.

254
00:21:27,973 --> 00:21:29,653
To build LST is very hard.

255
00:21:29,653 --> 00:21:40,041
We actually invoke compilers for each language and we guide the compiler through the first
two stages where they do abstract syntax tree and semantic information about the code.

256
00:21:40,041 --> 00:21:42,432
And then we extract it out of the compiler.

257
00:21:42,432 --> 00:21:43,554
Compiler doesn't care.

258
00:21:43,554 --> 00:21:45,486
about this data representation.

259
00:21:45,486 --> 00:21:57,218
It wants to start writing machine code, but we stop it there and we create this lossless
semantic tree that is serializable and so on to be able to operate on it for refactoring

260
00:21:57,218 --> 00:21:58,089
the source code.

261
00:21:58,089 --> 00:22:00,841
So very, very deep IP

262
00:22:00,841 --> 00:22:03,670
How is doing the work to integrate with those compilers?

263
00:22:03,670 --> 00:22:10,576
You need to figure out how to invoke the compiler and look at the compiler internal data
structures and extract the data.

264
00:22:10,576 --> 00:22:22,002
I think the interesting part, in language parser development, it's very tedious for
developers and highly repetitive because you take a look at those data structure, you

265
00:22:22,002 --> 00:22:25,395
obviously need to make decisions how you do it, but once you decide...

266
00:22:25,395 --> 00:22:30,529
You need to move from those data structures to open rewrite data structures and map
things.

267
00:22:30,529 --> 00:22:33,892
recipe development is also kind of very similar.

268
00:22:33,892 --> 00:22:40,057
And the interesting part that the coding assistants are very capable of doing this
repetitive work.

269
00:22:40,057 --> 00:22:48,074
And that's where we see a lot of acceleration in all like parcel language development,
like as well as recipe catalog growth.

270
00:22:48,074 --> 00:22:49,806
very quickly these days.

271
00:22:49,806 --> 00:22:56,343
So the cost of custom recipe development went close to zero and the parser significantly
accelerated.

272
00:22:56,343 --> 00:23:02,142
But every open source library out there would potentially still need to write their own
recipes, right?

273
00:23:02,142 --> 00:23:06,536
So you could like bootstrap the recipe development for this library.

274
00:23:06,536 --> 00:23:14,543
If it has good description of what it is that they're changing, you give it to cloud code
and it starts developing recipes.

275
00:23:14,543 --> 00:23:26,636
So OpenWrite has a very declarative test framework where, know, inserting that unit test
before and after and make sure, the model, like it's hard for the model to cheat on tests.

276
00:23:26,636 --> 00:23:29,957
So that was a great investment uh that we've had.

277
00:23:29,957 --> 00:23:38,388
Is the model that you're expecting the open source maintainers to be utilizing, is that a
model that, a foundation model that you've developed something that you have fine tuned or

278
00:23:38,388 --> 00:23:43,028
at this point is it just a matter of the available LLMs by providers out there

279
00:23:43,028 --> 00:23:53,899
we actually worked with all of the like, entropy, Jiminy, like open AI, we work with all
of them via the API and tested a variety of different models.

280
00:23:53,899 --> 00:23:59,332
then they all worked similarly, but then no significant differentiation between them.

281
00:23:59,332 --> 00:24:04,058
So if you feel like this space is good enough, we don't need to fine tune our.

282
00:24:04,058 --> 00:24:07,917
train our own models, we just allow customers bring your own model.

283
00:24:07,917 --> 00:24:14,558
in a way it avoids your own concern and allows you to push that down to where that's
actually necessary,

284
00:24:14,558 --> 00:24:14,818
Yeah.

285
00:24:14,818 --> 00:24:24,964
And customers very quickly start writing their own recipes internally because like, in
addition to having consuming open source, they usually also have some sort of internal

286
00:24:24,964 --> 00:24:28,517
framework on top of which a lot of business applications are made.

287
00:24:28,517 --> 00:24:33,351
So they need to create the coverage for this part of the stock that they have.

288
00:24:33,351 --> 00:24:40,226
So, and they can bring whatever tool developers already use for writing more recipes.

289
00:24:40,226 --> 00:24:40,676
Yes.

290
00:24:40,676 --> 00:24:51,660
So all of our experiences with the models today, and we do write a lot of recipes and a
lot of other code with coding assistance of various kinds points us to the direction that

291
00:24:51,660 --> 00:24:53,742
this agents are really not autonomous.

292
00:24:53,742 --> 00:24:54,912
They are amazing.

293
00:24:54,912 --> 00:25:04,188
They do great stuff for us, but the developers need to be closely involved in what they're
doing to make them go in the right direction.

294
00:25:04,188 --> 00:25:06,439
I think the funny thing happened yesterday.

295
00:25:06,439 --> 00:25:07,260
I saw it in.

296
00:25:07,260 --> 00:25:12,153
in our fun Slack channel developer screenshot what model told him.

297
00:25:12,153 --> 00:25:14,533
And she said, I'm getting confused here.

298
00:25:14,533 --> 00:25:18,069
Would you like me to continue or would you like to debug this for me?

299
00:25:18,069 --> 00:25:20,222
They don't have access to debugger right now, right?

300
00:25:20,222 --> 00:25:21,874
It's not one of the tools that they have.

301
00:25:21,874 --> 00:25:27,143
And he said, this is the first attempt of AI using humans as MCP tools.

302
00:25:27,143 --> 00:25:35,520
It's interesting you bring that up because when we were talking with Incident IO, they had
brought up the challenge of when there is a production incident, they actually want to

303
00:25:35,520 --> 00:25:37,645
suggest a pull request to fix the problem.

304
00:25:37,645 --> 00:25:42,489
something simple like a null reference exception or something else to actually generate
that pull request.

305
00:25:42,489 --> 00:25:45,472
And to do that, they need to understand the source code.

306
00:25:45,472 --> 00:25:54,142
And the strategy has been that they need to run the customer source code in a protected,
secure virtual machine to actually run it, to actually do that debugging.

307
00:25:54,142 --> 00:26:05,144
I feel like in a way you have quite an interesting alternative here, which is if you are
generating LSTs, you actually in a way don't need to runtime do a debugging session

308
00:26:05,144 --> 00:26:10,640
because you can fully understand what the source code is supposed to be doing
intentionally.

309
00:26:10,640 --> 00:26:16,907
So relying on an MCP out to a real tool or MCP out to a human to perform stuff.

310
00:26:16,907 --> 00:26:17,368
I think

311
00:26:17,368 --> 00:26:23,167
I think we should be careful because I totally see a bunch of companies going jumping on
that bandwagon, especially for asynchronous work.

312
00:26:23,167 --> 00:26:32,277
I think what you've built here is actually really clever because it provides a very
technical, well, it provides a very deep technical solution to understanding the

313
00:26:32,277 --> 00:26:42,335
complexity of a code base at scale, historically, especially when we look at things like
LLMs, they're going to always be limited by small context windows.

314
00:26:42,335 --> 00:26:46,188
We know from the research, large context windows don't solve problems.

315
00:26:46,188 --> 00:26:51,033
Small context windows, which means utilizing tools that are actually able to consume a
whole.

316
00:26:51,033 --> 00:26:57,876
repository or realistically understand what the code is doing at a technical level without
actually having to read every individual piece.

317
00:26:57,876 --> 00:27:09,465
It seems like one of the critical components for actually allowing LLMs or our usage of
LLMs through agents or some other complex asynchronous processing to function at a higher

318
00:27:09,465 --> 00:27:09,856
level.

319
00:27:09,856 --> 00:27:15,568
Yeah, I think like LLMs are data hungry and they want to write sized data as well.

320
00:27:15,568 --> 00:27:21,653
Like if you give them the whole repository as text to read, they, like you said, lose
attention and cannot find it.

321
00:27:21,653 --> 00:27:30,847
It's like giving a human a book or giving a human a paragraph where can they find the
context of what they need better in it's in the paragraph.

322
00:27:30,919 --> 00:27:38,896
It's interesting you bring up that analogy because I think that it's something that we're
going to continue to see over and over again that the constructs that we've created to

323
00:27:38,896 --> 00:27:49,883
help our human societies advance in both technology industries and non-tech alike are
being rediscovered through the creation and the improvements of LLMs.

324
00:27:49,883 --> 00:27:52,767
Every single time I feel like a company jumps up and down and says,

325
00:27:52,767 --> 00:27:55,349
look, we figured out a really important thing.

326
00:27:55,349 --> 00:27:57,690
And then we can point to like five other examples.

327
00:27:57,690 --> 00:28:05,766
I think the one that had come up recently for me was, you know, if we use the example of a
book, there's usually a table of contents and in the back, some sort of index.

328
00:28:05,766 --> 00:28:12,194
And it's like, we should have like an LLMs.txt file or an agents MD file that, you know,
explains the different things that could happen.

329
00:28:12,194 --> 00:28:13,773
And I'm like, yes, of course.

330
00:28:13,773 --> 00:28:15,428
We always knew that was the case.

331
00:28:15,428 --> 00:28:24,947
That's why books have these things with terms and definitions at the back or references
and at the front, a good overview because we know that for very intelligent entities and

332
00:28:24,947 --> 00:28:26,629
organisms, we need those things.

333
00:28:26,629 --> 00:28:33,276
So there's no way that an LLM would be able to make progress without also having those
exact same things.

334
00:28:33,276 --> 00:28:42,714
And where we've discovered complicated technical processes or tools that we've developed
for ourselves, we've seen in a lot of tools the idea of

335
00:28:42,714 --> 00:28:52,519
attribute-based programming or reflection be a real thing, there's no reason why that
interface should be excluded from LMS and you provided the capability to actually make

336
00:28:52,519 --> 00:28:54,432
that happen by exposing LSTs

337
00:28:54,432 --> 00:28:59,217
I think the LLM paradigm of tool calling was like really game changing.

338
00:28:59,217 --> 00:28:59,797
I agree with you.

339
00:28:59,797 --> 00:29:06,081
think the interesting thing here is that a lot of companies stand up and say like, no,
this is the new best thing ever.

340
00:29:06,081 --> 00:29:07,722
This is the only thing we need.

341
00:29:07,722 --> 00:29:13,306
I think what we keep seeing actually in practice is all of the tools together are
important.

342
00:29:13,306 --> 00:29:22,171
Like if you have a toolbox with a bunch of tools in it, you likely still need the
instruction manuals for how to use those tools or a list of what those tools are or how

343
00:29:22,171 --> 00:29:23,042
they're being utilized.

344
00:29:23,042 --> 00:29:26,605
But then there's a whole bunch of other things and other scenarios where you do want the

345
00:29:26,605 --> 00:29:31,318
like literal recipes for say cooking a particular dish in your kitchen, right?

346
00:29:31,318 --> 00:29:34,010
That's not listed on any of the tools that are available.

347
00:29:34,010 --> 00:29:38,964
Your tools are your blender or your stand mixer or spoons.

348
00:29:38,964 --> 00:29:43,527
Yes, you need all those to actually work effectively, but where is the recipe still?

349
00:29:43,527 --> 00:29:47,660
And your catalog recipes, like those things still need to exist.

350
00:29:47,660 --> 00:29:54,175
I think the only mistake here is either assuming we're all like have solved everything or
that there is like one new

351
00:29:54,175 --> 00:29:55,984
innovation just around the corner that

352
00:29:55,984 --> 00:30:02,931
Yeah, historically we just built high level obstructions and not just software but
everywhere else as well.

353
00:30:02,931 --> 00:30:05,553
I think that leads us in a lot of different possible directions.

354
00:30:05,553 --> 00:30:09,241
And I think some of those topics we've explored on other episodes of the show,

355
00:30:09,241 --> 00:30:12,716
Yeah, so at this point we will move on to PICS.

356
00:30:12,716 --> 00:30:15,330
So Olga, what did you bring for us today?

357
00:30:15,330 --> 00:30:24,387
I bring an observation and sort of like we talked about LLMs and AI where it's going and
what's going to be next.

358
00:30:24,387 --> 00:30:36,088
And my observation in the next few weeks is, like I mentioned that modern we work a lot
with cloud code and we noticed that first there is the sonnet group of models and then

359
00:30:36,088 --> 00:30:40,972
there is an opus group of models and we've seen opus for one.

360
00:30:40,972 --> 00:30:45,967
marked as a legacy and which was the most capable reasoning model.

361
00:30:45,967 --> 00:30:49,531
And I just wonder why is this happening?

362
00:30:49,531 --> 00:30:52,624
Is this the cost of running this model for Antropic?

363
00:30:52,624 --> 00:30:57,328
Is it the cost going to go down eventually and you'll return these capabilities?

364
00:30:57,328 --> 00:30:58,470
don't know.

365
00:30:58,470 --> 00:31:00,652
Living in very much uncertain time.

366
00:31:00,652 --> 00:31:03,513
Every morning I wake up, I look, what's going to be?

367
00:31:03,513 --> 00:31:11,626
new and exciting in this space and this is just I hope we get to a point where we have the
return of these capabilities.

368
00:31:11,626 --> 00:31:14,437
Yeah, I think that's what everyone's waiting for.

369
00:33:03,009 --> 00:31:17,174
there is this expectation that they get better in a particular direction.

370
00:31:17,174 --> 00:31:19,537
And I find that we don't have that.

371
00:31:19,537 --> 00:31:22,809
I think it's the technology is amazing, but it's still too expensive.

372
00:31:22,809 --> 00:31:33,940
And so at what point it drops down in cost and we see a lot of build out of data center
capacity as well as energy needed to power it.

373
00:31:33,940 --> 00:31:35,043
And we'll see.

374
00:31:35,043 --> 00:31:40,574
Oh yeah, we've gone into extensive previous episodes about the energy costs associated
with that.

375
00:31:40,574 --> 00:31:43,065
So we'll leave that out of this episode.

376
00:31:43,065 --> 00:31:47,196
And I guess I'll share my pick for today, which I had something different.

377
00:31:47,196 --> 00:31:57,719
But since you reminded me that we've worked at Dell in the past, I brought in my favorite
computer, which is the XPS over

378
00:31:57,719 --> 00:32:08,254
years old I think but I absolutely love this laptop it is fantastic I'm a little bit
disappointed that Dell decided to stop their XPS line so I have I've been recommending it

379
00:32:08,254 --> 00:32:12,778
to everyone that's well that's thinking about getting a new laptop I don't know what it is
I just

380
00:32:12,778 --> 00:32:16,335
it's unfortunate when the things we love get discontinued.

381
00:32:16,335 --> 00:32:17,886
Yeah, and I actually don't fully understand.

382
00:32:17,886 --> 00:32:21,366
I don't know if it was a matter of them merging the lines together and they're

383
00:32:21,366 --> 00:32:31,146
I'm still on this old laptop, is just quite not up to date anymore, but it's still I can
open two versions of my ID on it and that's as much as I need.

384
00:32:31,146 --> 00:32:36,209
ah So thank you Olga so much for coming and sharing with us all about OpenRewrite

385
00:32:36,209 --> 00:32:38,591
was a pleasure, really enjoyed the conversations.

386
00:32:38,591 --> 00:32:39,773
I'm glad to hear it.

387
00:32:39,773 --> 00:32:46,909
Thanks again to all our listeners for uh showing up for today's episode we'll see you all
again hopefully next week.

