speaker-0 (00:07.896)
Welcome back to Adventures in DevOps. Every episode is a deep dive with an expert guest. Today's adventure focuses on automation and AI and hopefully a combination of the two. The expert is long time architecture consultant on everything web and now principal of DevRel, JavaScript, AI and cloud at Microsoft. Dan Wallin, welcome to the show.

speaker-1 (00:27.714)
Hey, great to be here, Warren, and look forward to the conversation with you.

speaker-0 (00:31.118)
Yeah, you know, I couldn't help but notice on your LinkedIn profile, you were getting out of consulting at the exact moment that I feel like many people were getting in. The start of the pandemic must have been quite the transformation for you.

speaker-1 (00:42.146)
You know, it was I, I ran a consulting company. did a lot of architecture and then coding and then also a lot of training as well across pretty much the U S but we also did some international. But anyway, long story short, did that for about 20 years and I traveled a lot. And I mean, sometimes three out of four weeks a month.

And when you, you know, when I have kids and a family and all that, and after 20 years, when that time period hit, I was kind of like, okay, it's kind of nice staying at home, you know, where everybody else was like, I want to get out. I'm like, I'm okay. Like I can stay at home. This is great. And yeah, so that's when an opportunity at Microsoft came up and I'm like, you know, this feels like a good time to change things up. And yeah, so here we are today's almost six years later now.

speaker-0 (01:31.884)
Yeah, it almost seems like it's been forever or it didn't exist. I don't really know how time works since the pandemic. It's always been a little bit confusing for me.

speaker-1 (01:39.116)
Isn't it just the strangest? Yeah, because I it feels like it was forever. But at the same time it wasn't that long ago. But anyway, yeah

speaker-0 (01:47.214)
Well, I still see stuff happening from 2015 or 2016 and that's been a decade now and it really doesn't feel that long.

speaker-1 (01:56.216)
Yeah, as I get older, my mom always used to be like, I'm telling you, enjoy it when you're young because time moves faster when you get older. I'm like, nah. And now I'm like, yeah, mom was right. Mom was right.

speaker-0 (02:10.146)
So as the Microsoft Cloud advocate here, I think we need to hear a pitch of what Azure is doing better than everyone else these days.

speaker-1 (02:17.454)
I think the big area that Azure excels in in general is we're like the ultimate Lego building block cloud. Anything you want is there. I think the challenge with that, of course, is when you have all the security features and all the, you know, some of the DevOps stuff, of course, the different services and, you know, the list goes on and on and on. I think the challenge we always face is how do you

make it so that there's a really good Lego manual.

speaker-0 (02:51.074)
like the comparison to LEGOs. I like the comparison that, you know, they're individual pieces that you can stick together with security modules, etc. The one thought that does come to my mind is stepping on LEGO pieces.

speaker-1 (03:02.114)
Yeah, well, I've done that before. It's very painful. Only thing worse than that. I live in the desert in Arizona and is scorpions. Those are even worse. But anyway, Legos are right up there with scorpions. Yeah, and I think that's the challenge, you know, from especially from an enterprise standpoint, having worked with enterprises, big, big enterprises for, you know, 20 plus years in the consulting world. They absolutely need that flexibility.

But at the same time, if you're a startup or something who's brand new, you know, to the cloud, like you have this fantastic idea. You used AI to maybe even vibe code the idea just to get a prototype out there to experiment with. Because it seems like that's what we're doing these days, Warren. That's where I think it can be a little challenging, actually, is the you're just kind of overwhelmed by there's so much power there, you know, that you can take advantage of. And that's kind of part of.

my job actually that I do now. I kind of work on end to end solutions with AI all the way from, I have an idea to I need to get it up there somehow. You know, what does that process look like? And it's super fun. But it's also there's just a lot to know. You know, and I know you working in this world, you could probably sympathize because it's, you know, it's not unique to Azure, AWS, GCP. There's just a lot of power available.

speaker-0 (04:26.99)
I really like this analogy and I'm tempted to bring it even further. You mentioned basically having the guides or the construction manuals for whatever the end product is. And I think this is something that historically used to be concrete building blocks where you would actually go out and have a repository that was just literally the code, the infrastructure that should be deployed, the individual services that need to be turned on, and the configuration that has to be there in order to get the end service product, whatever that you're building. Let's say it's a recommendation.

engine or some sort of website and the corollary that I want to draw is that a lot of times and especially now we don't have the book the construction manual for whatever the thing is we're building anymore I feel like it's gotten a lot leaner especially since LLMs have come around I feel like we do get those a lot less frequently or

On the flip side, get a lot more that are somehow like, you could do this weird thing. So on vibe, like some engineer at Azure, I look at AWS, so I see this all the time. Someone vibe coded a service with specific functionality to work on AWS. And I'm just like, what was the point of that? Is that actually helping anyone?

speaker-1 (05:35.616)
Yeah, and you bring up a really good point though with, you know, LLMs and just the AI in general. just literally last week, I'm on a project right now that helps with basically the deployment of popular open source projects to Azure. And I would argue that if you didn't know, you know, AWS or Azure or GCP or whatever really well,

It could actually be kind of challenging because some of these these OSS projects, especially some of the bigger, more popular ones, there's quite a few moving parts to them. So it's not just as simple as, know, I'm deploying like a web app service and I'm good or, you know, serverless function or something like that. There's there's a bit more to the story. And I've been pretty amazed, I'll have to admit, with, you know, skills are just a really big thing now.

in the AI world. And it turns out it works. They work really, really well for these deployment scenarios. So I'll give you an example. One of the ones I'm working with is N8N. I don't know if you've ever.

speaker-0 (06:38.068)
yeah, sure, it's the workflow orchestrator.

speaker-1 (06:41.1)
Yeah, the workflow orchestrator and it's actually not too bad to actually deploy. There's like some moving parts, but it's not ridiculous compared to some of the others. And by just setting up a couple core skills that know what to do. I think in this case, I deployed to Azure Container apps. Actually, I'm a huge container fan, by the way, in general. But without the skill, we have something called, well, you probably heard Terraform, I'm sure.

And then, you we have Bicep as well on the Microsoft side too. And you can with this command line tool called Azure Developer CLI, you can pretty easily provision and then deploy, you your app. But you have to have your infrastructure as code in place. And that's the part like for me, that's just not my expertise, Warren. You're like, hey, Dan, whip up this, you know, from memory on the fly. I'd be like, do I get my AI tools?

Because on my own, it's just not what I do. You know, it's not my specialty. These skills have really changed the game, though. Like, you know, it's I can't remember if it was the Matrix or Terminator or whatever the movie was, but they got to go fly a helicopter and they like, you know, plug in.

speaker-0 (07:45.038)
for sure.

speaker-0 (07:56.898)
How do you not know the reference? That's just... It's a matrix because Trinity and Neo need to rescue Morpheus.

speaker-1 (08:07.806)
There's one in Terminator 2, isn't there? I don't know. I haven't watched Terminator in so long. I was thinking it was Matrix and all... Hey, I'm gonna defend myself here. Yeah, you're right though, because I love Matrix, but I haven't seen it in a while.

speaker-0 (08:21.238)
And I would recommend not going back and watching it. There are some parts that are so great, but watching some old movies, I just don't think they stand out. One thing I do have to ask you is you said you're a huge Container fan, and I worry that the alternative is being like, I love on-prem hardware-based architecture where I just go and stand everything up from the basement. Is that the alternative world that you see that people are utilizing? you know...

speaker-1 (08:24.62)
You

speaker-0 (08:48.77)
physical machines or virtual machines or are you comparing it amongst like a set of something more than that?

speaker-1 (08:55.018)
No, virtual machines is definitely the world I've win, especially back in the consulting days. It was almost all virtual machines. Now there was plenty of on-prem. I used to work with a lot of financial companies, big, big financial companies, and one was a big credit card company. In fact, if we want stories, I got stories that go way back.

speaker-0 (09:15.084)
Let's go into it. I want to hear about VMs and credit card companies.

speaker-1 (09:19.112)
Not fun stories. Yeah, well, this goes way back to the days of calm and calm plus. So this kind of dates me. But anyway, we can get to that later if you want. That was quite the way to break into this world. anyway, yeah, normally VMS and and, you know, just deploying the containers to the VM at a minimum. Yeah, I mean, I've seen I've seen some companies as simple as we're going to use Docker compose and we're going to get the containers running. And then you're like, what happens if the container goes down?

And how are you going to scale that? Because, you know, it's not really designed for that unless you're going to Kubernetes or something like that. So but that's I love containers just just because of the portability. Yeah. It just makes it so much easier to work with.

speaker-0 (10:02.028)
Well, you bringing this up like sort of the complexity with open source projects, especially ones. I want to say just projects, but really products or services that are built in or distributed as a container to be run. And I think a lot of people who aren't in this space or aren't used to deploying pods on Kubernetes or running Docker compose scripts, they don't see how much complexity is really in the management and maintaining of an ecosystem built off of open source services. I remember even not too long ago thinking about standing up.

Bitwarden during the heyday of all of the password managers being atrocious in some way and they're like we offer an open source solution, but it's 20 containers and I'm like Bitwarden 20 containers like first of all, I'm a tech person and I could do it. I don't want to do that, but I could. How do you expect someone to just?

do that though. The majority of the scenarios, like most people who have experience with containers aren't going to just spin up 20 of them. That just seems so ridiculous to maintain a product. Yeah.

speaker-1 (11:01.646)
You almost wonder if that's on purpose to make the free aspect a little bit. I don't know that one. I've never used it.

speaker-0 (11:09.646)
I know I'm with you there. Actually, I do believe that a lot of companies that offer an open source solution, it's just as a gateway or a runway to a funnel into their paid version. And so, yes, you open you leave the open source and we joke that it's not really open source. It's source available. You can look at it. But if you actually try to build and run it, you know, good luck. You're never getting that off the ground. So like, like I remember a long time ago, I was integrating with GitLab and

I wanted to add a permission. The permissions there were like full admin access or nothing. And I'm like, well, it would be nice as a third party writing a plugin to be able to get access to people's repositories with just like read only. so I actually went through and tried to do software development and they wrote an engine to check out repositories from GitLab to build it on your machine. Like that's how complex GitLab open source is. So I just gave up. like, I'm gonna write this code.

and I'm gonna push it directly to the repository and have their build system build and test it for me. And it was the longest feedback loop I ever had in my entire engineering career, know, like multiple days even to sort of get this test run. But it was the only way I could figure out actually how to do the development. So like I'm totally with you on the container world and the complexity of this. So you brought up the solution is basically skills here. And I have to admit my experience with skills is...

very little. maybe we can jump into that and talk a little bit more like what skills are, how do they work for LLMs and specifically agents and how they're being utilized.

speaker-1 (12:41.538)
Yeah, and this is one that, like you said, I won't claim, I think anyone who claims to be an expert in any of stuff these days is, I would question that statement if they said that because it's just so new. You know, it's like, how do you know all the best practices and how do you know what's your experience with this long term? And there is no long term. It's too new. So I'll pitch it from that regard that

Yeah, I've been using them a lot, but we're talking over the last like three months. Yeah, so for those that aren't familiar with skills, Anthropic, you know, through cloud code and things like that, they introduced it kind of first. That's where it came out of. And then now, you know, Codex, GitHub, Copilot CLI, GitHub Copilot. On GitHub, there's a Copilot that runs up in GitHub. And then there's also an agent you can run and they can all use these skills. It's called.

And you can think of a skill as a very, so think of it this way. If you, if your house needed to be painted, you need an electrician, you need a plumber. To me, that's like, you know, that's a specialist. So that's like an agent to me, right? Because you have a plumber who specializes in the plumbing and electrical. But if you were doing something really, I don't know, unique with, let's just say electrical as an example. So I just had to put in

in US terms, a 240 volt line for something here at the house literally last week. And let's say that most electricians I call just didn't do that. They don't have that skill. They're an electrician, but they just don't know that angle. Well, a skill would be like the super specialty information that you could use either on its own or you could use it with an agent potentially.

So if you know and let's talk about the complexity here of deploying these let's go back to your 20 container thing I would argue back in the day that was a big deal right because if you didn't know what you're doing you I think probably did know what you're doing is just that it was just really complex at least I'm gonna assume you knew what you're doing

speaker-1 (14:46.232)
Yeah, we're going to get to the benefit of the doubt here. So these days, like if I want to deploy N8n, it's pretty manageable, actually. I'm trying to remember. I think it's like two containers or something like that. think I did Postgres and that I used the Postgres flex server, it's called, and then a flexible option and then ACA for the container. Well, if you have a skill who is specialized in knowing about

Here's how you deploy containers to Azure Container apps or whatever your cloud is, right? Doesn't have to be Azure. That skill would have all the details needed that would be required to know, like if you're not a container expert, it wouldn't matter because that skill has those details. So now when you hook that skill up and just think of it as like the ultimate knowledge source on whatever the topic is that you're specializing in. So we'll say ACA in this case. Now you plug that into your coding agent.

or your agent in the cloud doesn't really any agent really but like GitHub Copilot CLI is is one that is new pretty new and I'm just gonna tell you it like I'm a huge cloud code fan I'm also now a huge GitHub Copilot CLI fan It's freaking amazing people should it's the best deal in town. I'm telling you I would not say that even though I know I work for Microsoft Warren I would not say that if I didn't believe it because I'm the type of person I'm pretty transparent, but anyway

You now plug that skill in and now I can say, hey, I need to deploy X to Y. And as long as it knows about that skill, it kicks in and behind the scenes it's going, okay, I know what to do. And again, it's like the matrix, I guess, where you jack in and boom, you can fly the helicopter or whatever it is.

speaker-0 (16:32.344)
What's the context of what's in the scale? we talking about basically a written document that explains all the critical aspects of what the service does and how it's deployed and whatnot? So canonically what the readme was supposed to do for open source repositories in the past.

speaker-1 (16:47.328)
actually a pretty good analogy is yeah there's a skill.md it's all marked down you can have other assets artifacts resources whatever you want to call it that are associated with that so you could even have like scripts for example I don't know there could be like SSH calls that are made and there's a script that makes the call and you know that could be part of it technically

But at a minimum, yeah, it's the skill.md file. And it's kind of like you said, it's like what probably should have been somewhere in a readme or whatever. But now it's reusable. And so now if I give you that skill within literally a minute or two, you could also do the same deployment.

speaker-0 (17:25.954)
And like we really need this because where historically there has been documentation on how to do the deployment correctly for open source projects. I feel like over time that started to degrade even to some point of like, what exact distribution are you on? Which version of Linux or which OS are you on and what dependencies do you have and are using containers or Kubernetes and which version of Docker swarm are you using or using Nomad? All that is just.

such a huge burden from a maintainer standpoint to answer all of those questions. And I feel like some part of that just goes away because you're assuming that it's now a dependency or responsibility of the installer, but they don't have that expertise either. So it's now hopefully contained in the agent sphere, but there still has to be the instructions that explain the.

critical interaction points or maybe where the main function is or the number of containers or URLs, et cetera, that aren't going to be easily exposed or understood. An LLM wouldn't have picked up and trained on. And that information doesn't exist anywhere else.

speaker-1 (18:27.982)
Yeah, exactly. going back to your read me kind of comment earlier, there's a lot of stuff that either A, we just haven't had time to update because it's too detailed or whatever. Because you have nowadays kind of the standards. You have your agents dot markdown file, which can be at the root of the repo, of course. But it's I'd call it more of a generalist about the project that your agents can learn about. But then when it gets to really specialized, that's where the skills kind of come into play. And it's just

I've been pleasantly surprised. So I'll give you an example on this one. I had originally I had, were doing, we're working on three right now, N8N, SuperSET and Grafana. Some of those are more complex than others, you know, to deploy. And originally I had one agent, it was like OSS Deployer or something was my agent. And then I had like seven skills because I had one for, you know, like Postgres and one for ACA and one for security and one for whatever. And then, so I had a call.

probably about three weeks ago now with a buddy mine that works at Microsoft, name's Shane. And he's like, why are you doing it that way? We already have these skills. And I'm like, because I didn't know. And so now I've got it down to, think, three. I have one that just knows about N8N, one that knows about SuperSet, and one that knows about Grafana, and then all the other skills for going to Azure. And again, this would be the same for other clouds. They're already pre-built. I just have to know how to plug them in, which was easy. And boom, I'm ready to go.

So I could see your complex scenario earlier. I literally can see that now being down to almost a single prompt with a set of skills under it and boom, if they wanted to make it easy, which as we talked about, not sure all companies want to do that.

speaker-0 (20:11.854)
I think the thing that really connected for me is that the information...

and how to do that may already be contained within the training set that the LLM has utilized for how to do a deployment in any of those technologies. But realistically, one challenge is, are you able to construct the correct prompt to expose that information into the context so that the LLM, the agent, whatever utilizing can actually do the thing correctly? And can you get around any potential, say, poison or injection that was?

set up in the training data to have you not necessarily just install the open source technology, but also leak your API keys or credentials all over the internet. This gives you a canonical best strategy for dealing with it. can review the skills, see what's in there, and rather than having to learn that information yourself, you are in a way teaching the LLM to...

Utilize it specifically and so you don't have to figure out what the best system prompt is to actually do that or what? Magic keywords have to be in the the user prompt in order to have the right thing happen Right. I mean I think the fear that I have and I'm sure this is already happening Like where is the best, know trusted canonical list of skills out there? You know, I look at the Linux distro package system or issues and just the ones for every single source code you know, I think you know, everyone hates NPM but

speaker-1 (21:14.114)
That's exactly right.

speaker-0 (21:32.492)
Honestly, it's still the best one I've ever seen for a package manager goes and there used to be a fight, know, NPM or new get one was owned by Microsoft and the other one not but I guess now technically both of them are owned by Microsoft. Yeah. So it's like it's like there's a list of package managers and NPM is what is the worst except for all the other ones. Yeah.

speaker-1 (21:53.93)
That's pretty good. I think a lot of people feel that way.

speaker-0 (21:59.726)
I do think it's getting better over time. But one thing that I've learned to trust when it comes to canonical package managers is their ability and desire to weed out malicious packages. And I feel like this is one thing that PMPM has gotten right with this idea of like dwell time or wait time on new package publishing before utilizing it in case there is a vulnerability. But we really leave it up to the package repositories to take care of this for us and find those security.

vulnerabilities and potentially remove those packages, whether or not they're doing it directly or through some sort of consensus based algorithm or user reporting, etc. But I see the same problem going to spill into skills and in a way where there's a lot more people who are maybe less have less expertise in the technical understanding of the complexities and security issues that around them. What is like are there are we already starting to see skill like trusted skill repository set up? Does Azure have one of them?

Are there ones you point to? you just going the old Microsoft strategy of like a Windows strategy, really? Go to the internet, download some EXE and run it on your computer completely untrusted.

speaker-1 (23:07.118)
No, I mean this is you you hit you know, you hit the nail on the head there This is absolutely an area where people need to be concerned about just grabbing an off-the-shelf You know whether it's a skill or an agent because these are just markdown files, of course But there's a lot of prompt injection things, you know, you can do and and more and So take to go to your other things out there. Yes So like one we have for github copilot in general is called awesome

Copilot it's on github and it is one that's vetted very heavily and and It's one where it's just a repo and people submit pr's and they're reviewed all that so not to the point where it's like Like npm or you know pip installs for python or whatever where you just have millions of packages that that's a whole nother scale There's skills.sh. That's another site. We're gonna go just find tons of skills. I literally

Run any skill I don't know about though, even if I got it from a trusted site. I have my, you know, like Copilot CLI or Cloud Code or whatever folks use. I have them do a security review on it before I use it. I'll say literally go look at this markdown. You know, I don't want you to run it. I just want you to scan it. Kind of like I do the same thing with code. Code repos. I'll have it before I try it. Go scan this repo.

speaker-0 (24:26.398)
I think validating untrusted code is one of the unsolved problems in computer science, though, to the degree of we still get even solutions.

The most well-known one out there, at least from my standpoint, which is just a small iota of experience, is AWS's Firecracker, which is what they're using to power Lambda. And it's basically this thing where, well, you you look at the cloud providers, they clearly have some sort of security walls around one customer's source code, not executing on someone else's memory or storage space. And there are some open source projects that do this, but like, how much would you trust one of those out there?

for untrusted code to run on in a trusted environment. I'm like, I wouldn't do that. And that's why I want to ask about the validating some of these skills because I wouldn't trust sending that to Claude code and hoping that it doesn't do the wrong thing accidentally. Because even if it says, don't run this skill, no matter what you do, I just want to evaluate it specifically. And I think there are still ways to get around.

just using like the English language or whatever language you're using to attempt to suggest to the LLM not to execute a potentially dangerous instruction. It's a real thing.

speaker-1 (25:42.806)
That's the prompt injection, for example, is just one of many techniques that people can use. And there's actually cases where this has happened. know, open clause very popular, right? Right now. And I don't know if you've heard it and I actually use it. I have a very locked down VM I use it on. Nobody can get to it except for me through my tail scale. But that's another topic.

But I've heard all these stories now of people who just installed started installing all kinds of skills or agents or whatever it was. next thing they know, it was doing really malicious type things. Yeah. Because of what you just said. First off, didn't get it. They took it on the Internet. What could go wrong? You know.

speaker-0 (26:26.488)
The sad part is, especially with things like OpenClaw, the agent loop is doing it itself. You're not even telling it to use this particular skill that you then didn't vet. You were just telling it, want to do this thing. And it says, OK, I tried to do it. I couldn't do it. I found this random skill on the internet that tells me I can do it. I went and installed the skill, and then I executed it. It's like, oops, also, did I tell you that all your API keys for all your cloud providers, I accidentally published those to a public location.

And so like now there are like hundreds and probably thousands of forks of open claw out there that promise to be secure. And I think this is my biggest concern with any of those is that the way in which they promise security is through the front door. It's like it used to be like, oh, the gateway is insecure. Like anyone can fake sending you a telegram message or a Slack message and all of a sudden there's a prompt injection attack. I'm like, yes, those are all the standard problems with software that we've had for almost, it's still over 50 years I'm going to say.

realistically through the API. Those aren't the attacks I'm actually concerned with with the LLMs. It's the prompt injection from the data that it's getting itself, that it's choosing to go out to the internet and grab and pull down. And so it's just so ridiculous to me where we're already at this point where people who don't have the technical capabilities of securing their stuff are trusting what we're getting to be valid and not have any sort of vulnerabilities in it. So much so that companies like CloudFlare and AWS released open claw

runners basically for you to run your open call and I'm like yeah sure you've closed the front door you've secured that part but they're still vulnerable to prompt injection attacks I feel like it's a bit irresponsible to even go down that path.

speaker-1 (28:06.094)
And you know the funny the front door is a great analogy because you know there's a back door and there's windows and there's the attic and there's you know There's all kinds of ways you can get in and anyone who's been around for a long time I liked your analogy a little while ago. You said you know I'm just gonna download this executable off the internet and just

run it because I remember those days. I'm trying to call it two cows. I think it was called way back in the day. It was two cows and it was this website you could go to to just download like all kinds of cool apps. And I kind of feel like that's where we are with some of some of the AI stuff. And we can circle back to the Lego analogy with this as well, because honestly, I think that's where if you're doing it 100 percent, you if you're an enterprise and you're like you're saying you're trusting just the front door.

there's so much more to the story than just the front door. I think that's where having the cloud, you know, again, going back to the Lego blocks really matters because in addition to all that, you're going to get back a response from the LLM at some point. What happens if it has something that's malicious in it that was put in by the skills? Yeah, like the skill itself did nothing wrong, but it triggered something that's going to run that you would not look for normally when you get the response back. What you know, whatever you're doing with that.

And that's where, you know, and we could get in like that's responsible AI and the security aspect and all those things. Because like you said, it's you have to have this like 360 view of it, not just the front door. to me, excuse me, to me, that's where the, you know, the I think we actually do a pretty good job of this, to be honest, where I think the cloud providers are kind of essential these days, because to do this on your own, it's just not possible. There's there's no way anyone has that expertise to do it all on your own. I think we're going to have a

proliferation, like there's a huge opportunity for security companies focused on the AI angle, not just regular security. All this new unknown stuff is going to be just a massive opportunity, I think. So we'll see what happens there and if I'm proven right or wrong, but I'm going to predict that's probably going to be a thing for sure.

speaker-0 (30:13.902)
I want to take the pessimistic side. think security has always been this cost center and companies are trying to use LLMs to write code and the ones that are doing it are doing it because they're trying to save money and so they're less likely to then pay it out. So I think maybe the major players in the game for ones that are generating code will just have to promise to make more secure stuff. But realistically, I think one of the biggest challenges that has come about

if we call this a revolution, is that we've pushed the liability, financial and legal, back onto the consumer in a lot of ways. Whereas it used to be like, if you're running a service or you're a product and you're buying a service from a third party, you hold them liable for it. But now, instead of doing that, they're offering you an agent or an LLM that generates code and you're taking the liability to make sure that that code works. And so yeah, you can trust them that that code being generated is more secure in some way.

But I think the reality is that it may not be, and you're gonna be stuck holding the, I don't know what the analogy is here, holding the bag.

speaker-1 (31:19.894)
bag or whatever. mean, I like we could say.

speaker-0 (31:23.896)
I see. know, I, my mind went to like bag of groceries. Like I want to hold the bag, right? That, you know, I get, that's where the rewards are. I actually don't know what, where this comes from. And maybe that's on me. I don't, I don't know what the solution is honestly there, but it doesn't, it doesn't feel great. And I don't want to spoil my, my pick for this episode, but I do feel like one of the problems here is that the ability to do software development is getting more and more complicated. Not, not less because you have to know, not just

all everything about software engineering to build the right solution, because you have to review it. You also have to know what the agents are doing. You have to understand that skills can be malicious, for instance, and evaluating those. Where do you even get good skills? Understanding the different attack vectors on OpenClaw or whatever agent you're running. So it's not just like you could just get away with understanding you are, like, I have a lot of security knowledge and I have very little security knowledge around the AI space. That means that I don't have the same capability to protect systems as I would if I didn't introduce an LLM.

That means I think we're getting further away from secure systems. In some ways, maybe that's fine. You know, we're just, we're just speed running, you know, every software being insecure and all those, all those movies that came out, especially in like the eighties where someone's like, they're just mashing on a keyboard and suddenly, I'm in now. It feels so much more accurate. that. I'm just, I'm just typing like.

speaker-1 (32:39.79)
cause it's like- funny, you're right!

speaker-0 (32:42.988)
you know, some sort of fork bombs, like, I'm going to send a fork bomb to this process and I'll be in. It's like, well, now actually that's true because you tell the LOM, hey, run this code. then, you know, it does, does let you in.

speaker-1 (32:53.998)
It just took like 20 years, 30 years to catch up. But yeah, like the movie, I don't know if it's really old War Games, you know, like they're typing all this stuff and it just flows and you're like, that's not how it works. And now I just released a video last night, Warren, no joke, and on this copilot CLI. And there's so much there as the LLMS process. I'm like, just people are going to watch every little line. Right. So I'm chopping, chopping, chopping, chopping, chopping to speed it up.

speaker-0 (33:01.707)
Yeah, of course.

speaker-1 (33:23.82)
because it's exactly what you just said. It's just like boom, boom, boom, boom, all this data, you know, I wanted to circle back though to what you said on. I do feel like, don't know if you saw there was a blog post. I'm trying to remember the title. It was something along the lines of AI is making me tired. That's not the title, but how the promise was that it would make me more productive and which I honestly, I think once you understand how to leverage features.

and know what's good and what's bad. I actually think that's a true statement. Like I'm way more productive in general because I can get started faster even if I end up having to do, you know, even if I have to personally push it over the finish line, which I typically I'm still in that like, hey, the human matters in this loop. Right.

speaker-0 (33:55.16)
Yeah.

speaker-0 (34:11.886)
That's good. Thumbs up. I think that's a very important thing that I think we see a lot of the leaders saying throughout the industry that the human still matters here. I feel like it's sort of this meme curve where on one side, it's like those who don't know anything say the human still matters. Then there's the middle where it's like, the human doesn't matter. And then on the far side, they realize, actually the human still does matter. We actually have a whole episode on this podcast where we were talking about productivity.

did that a few, I think it was only a couple months ago. We went into that really deep. One of the things I wanna bring up is the agent loop though, because I think you may have some unique insights here. I find that we're right now in this time where what was old is new again, and I'm sure that XKCD comic already came back of like, what are you waiting for? My code is compiling, which used to be MS build taking forever, or Java on Eclipse.

1.5 or six, and now we're back to like, we're waiting. And one of the advice that I see in the industry going around a lot is, well, aren't you running like seven agents and like a hundred projects all at the same time? And I'm like, no, actually I can't handle that because the change of our cost is too high retorts, guess would be my question.

speaker-1 (35:28.526)
Well, going back to the running multiple agents thing, this, it frightens me. And I'm going tell you why. I do run multiple agents, but only enough, they're only creating enough that I can actually keep up with because it makes, you could argue, I suppose, that I'm going to have all these agents run in the background and they're going to build feature XYZ, know, ABC, whatever.

Who's going to go through and validate all that? I'm playing with a project right now. It's a personal project. I won't go into details, but I would say every four hours I check in with what I'm doing because, you know, the loop will wrap up and then I got to evaluate it, do my review. I'll run it and I'll go, what the heck? How did you miss this? Like this should have been so obvious and it totally missed it. Right. And I think that's where the we'll go back to the human concept I made earlier, the human in the loop.

like knowing what you're doing, I think now matters more than ever. Because if you don't, what kind of security issues are going to crop up when you're just like, yeah, I totally trust everything that's being spit out here. I'm just going to go with it as is like not even going to do a review on it. It's great. It's fantastic. And I'll be honest when, you know, when Opus four to six came out, GPT five to four just recently came out.

They're like a whole level above what we had. I just saw an internal comment earlier where somebody was using, I think it was, yeah, was Opus 4.6 and it literally made a comment, something along the lines of, I'll summarize it in my terms, like who wrote this crap? And the who wrote this crap was Opus 4.5. And that's how far we've come just in that one iteration. So getting back to the agents and the loop and running all these agents,

Here's kind of a couple scenarios I've been thinking through. I think you're going to have leadership in companies who realize that yes, we're way more productive. We can ship way more, which means we still need these people to evaluate that. Like they have to constantly be going through and doing some good reviews and all that on it, which takes time.

speaker-1 (37:43.884)
And then I think you're going to have the leaders who are like, no, AI is so good that I think we could just cut back our workforce huge and it'll be fine. Any predictions Warren on how that's going to go?

speaker-0 (37:54.966)
Unfortunately, I'm not an optimist here that those companies will still last probably about 10 years before crumbling.

speaker-1 (38:01.708)
I think they'll last for a while until they're bit by something huge. No, some are going to get lucky. Sure. And it'll be fine. And then there's going to be those who just went, my gosh, I had no idea.

speaker-0 (38:07.022)
Yeah, for sure.

speaker-0 (38:13.25)
Yeah, I mean, I'm, I'm, like your optimism. I just, I, we already see today that from the security domain, that lots of companies are just winging it when it comes to building stuff, especially with the loop on startups and the money coming from the different flavors of venture capitalists to go and deliver something that is as half baked as possible and basically scam users out of money until they get billions of dollars or in this case, guess trillions. And at no point is

a requirement that their solution actually be secure, let alone even be good for humanity. don't think magically there's something that's going to change now that the hypothetically, not that I agree, but if I were to agree that the cost of doing software development is less now, that is going to all of a sudden make them take that extra money and put it to actually reviewing what they have and reducing those potential security vulnerabilities. There is something to be said here. And that's if it's easier to do the software development.

If turning out a product is easier, is possible that if as a humanity all the source code that we've ever generated is more secure on average than what people were turning up before, then the security does actually increase a little bit. And I know it's something that bothers me because I see any issue as a hugely problematic one. We look at OWASP top 10, top 10 for...

APIs, top 10 for authorization and authentication, top 10 for AI based stuff, and they're really bad things. maybe the solutions we've had all along were already hugely problematic with like S3 buckets being completely exposed to the internet for most companies out there. from a Microsoft standpoint, publishing your API keys publicly in your .env file, because people do that. And then having some sort of issue when it comes to payment time for your cloud provider.

I think there's a couple of different angles there and I keep seeing, well, this is really bad because, you know, insert all these other things. It's hard for me to really effectively evaluate where is the 50 % mark? You know, where is the average? Is it increasing? Is it going up over time or is it coming down? And I think from my standpoint, my challenge is that when I'm looking at the code, I feel like I'm paying attention to it less, right? And in order to actually review it, I need to pay attention.

speaker-0 (40:29.474)
to it more than I was doing before. And while I feel like I'm in the right position for that, I think we're going to start to see a lot of engineers out there who are being pushed even more by their organizations, you these bad leaders that you're talking about, that will not review what they have. Because I think we have this sort of saying in security a lot, if the solution to the security problem is training, that's not a real security answer. And I feel like when it comes to the LM generated code, it's very similar. yeah, for sure. And people still make mistakes, right?

speaker-1 (40:56.316)
yeah.

speaker-0 (40:59.274)
even if you review it, you could be missing something. And so I worry that that's just gonna keep happening more and more.

speaker-1 (41:05.144)
Well, and to kind of put a positive spin on what we're talking about, here's one thing I will say. When it comes to reviewing code, so one thing that I use a lot, and all the CLI-based AI coding assistants have something like this, but is like in copilot CLI, I'll do slash review and then give it my prompt and what to do. And I am almost always pleasantly surprised how it will catch things that I honestly wouldn't have thought of.

And then security things I've also had come up where, do a review, but I want you to look for X, Y, Z and whatever. And I will say, because you have different skill sets, of course, with developers. And so you've probably worked with the folks who are just expert in this one area, but they don't know this other area at all. And that's where I think AI really can help out is it kind of, I don't want to say levels the playing field.

Totally. Because I think the people that have, for instance, architecture skills and know how to deploy to clouds and things like that are always going to be like the go-to skill set. Because without that knowledge, like how do even know what to look for? It'd be like being a doctor. If somebody comes in with a heart issue and you have no heart experience at all, like you, I don't know, you work on, you you're an orthopedist or something who just

speaker-0 (42:25.166)
Cardiologist. Yeah, for sure. My, my, my, my, CEO Dorota, she like keeps on saying the same.

really brings the same idea to the story, is that LLMs raise the floor and not the ceiling usually. And so it does give people the capability of delivering the same level of stuff, but it doesn't say anything about delivering it to a satisfactory level or what's necessary for say a real enterprise organization or to release a product that's going to be safe for both the company to manage and for their users.

speaker-1 (42:59.384)
Totally agree, totally agree. That's good analogy, I like that raises the floor.

speaker-0 (43:02.862)
I have not loved my using LLMs and agents to generate code, but in forcing myself to do it, I sort of realized one valuable aspect, which was what if I had to ask the question, okay, I'm going to go out on a limb here and say that I am not the worst software engineer that ever existed, which means that I have some good points. And the things that I can utilize and pass on to an LLM to do the right thing and the things that I get mad when the LLM doesn't do, I need to be better at articulating because

that actually is maybe the value that I can bring. And so if I can transfer that to say another person that I'm mentoring, someone else on my team or to an LLM, in order to do that realistically, I need to be able to articulate it. And so thinking about that is something that I've had to do specifically. Like, why do I like this pattern over that pattern? Why is there a special case here versus a special case there? It's been a very long time since I've actually thought about what those things are. And I think this does bring back to the story. And the more you think about that, I think,

is an interesting area and as software engineers, as technical people, one of the things we love to do is be non-logical and have philosophical discussions about principles rather than actually doing the real work because that's where the enjoyment is, right? And I do feel like I'm a little bit back there now. I mean, I'm doing that instead of doing the real work, but it is interesting. And I think this is where the of the start of agents MD or Claude MD or whatever you're utilizing to drive the agents or the skills that you pull in.

are so relevant, but I think people misunderstand what the purpose is. It isn't just like go out on the internet and copy something someone else says. This is the best thing for rust or C sharp or JavaScript. It should be something that you're generating because realistically the whole goal is to automate your activity rather than just have something be generated that someone else did. And I think that's why reviewing the skills is so important. The other reason I sort of come to this and I want to get your take on it is right now I actually like

that I run out of tokens on doing work. It forces me to stop. I'm actually happier at that moment that I no longer have to work and I have to go do something else and come back later. And I fear that we're gonna keep going down the longer agent loops rather than shorter ones and token consumption, which I feel like will make software engineering even harder. And I don't know if I have this fully thought out, but it's something like that.

speaker-1 (45:24.696)
I don't disagree at all actually because the longer the loops run and you know we're talking they can go for days now. Okay that's fantastic like you could generate even more features right and I'm going to go back to who's doing the security checks who's doing the code reviews do the features actually work as advertised. mean yeah you could have a good you know PRD document or whatever you're doing for this because I'm one of my big big

like go to things, which I'm sure you do as well, is I have by far the biggest success when I switch into plan mode first, plan it out, and then I work with it to get to the final state where I'm like, okay, that looks pretty good. Because otherwise it's just too vague sometimes. my whole point of this is if the agent loop runs even longer, to me that calls for a couple things. I already talked about, yeah, you still got to do the security checks, the code reviews, all that, but you need even more guardrails.

Because if you're going to let it run for days or whatever and you don't have any guardrails, next thing you know, you just crashed off the canyon wall. And yeah, you have a feature and it's a horrible feature that, you know, who knows what it's doing. But that's where you to go back to what you're saying about like sharing with colleagues, you know, your knowledge and things like that. That's where I think skills like and I know this will sound weird. Well, maybe it won't sound weird, but you know, like critical thinking skills like are you able to question

yourself and how you look at things without being like, no, I'm always right. You know what I mean? Because in the AI world, I feel like we have to move into this question everything for the first time in a long, maybe ever in human history. If you have ideas, you're like superpowers now, with these coding agents, which is fantastic, except for all those other things we talked about.

speaker-0 (47:13.038)
Maybe that's a good place to leave off on the episode and switch over to PIX. So Dan, what did you bring for us today?

speaker-1 (47:21.058)
Yeah, I talked quite a bit about skills earlier, and one of the challenges I've had is like if I'm on, let's just say Codex, for example, and I'm coding or Cloud Code or Copilot CLI, whatever it is, they all have skills. The problem is you'll get a skill for one of those and then you'll switch because I use multiple agents at same time a lot of times. I will do code reviews, for example, and I'll run all three of those. I actually use all three of those and

The reason is I'll get consensus across the three and then I'll identify the biggest issues, you know, and it actually works out really well to do that versus just one. And so the problem is, though, like you start getting these skills installed and what ends up happening is they get out of sync because I don't know about you, but like I have a VM I work with sometimes directly, which is a little more sandboxed. And I'm less like I can just let things run freely on that VM. But then I also have like I work on a Mac.

And I have, you know, like Copilot CLI, for example, on there. And now now everything's out of sync on the skills. So there's this it's called Skillshare and it's a GitHub repo. If you just search Skillshare one word and there's I've seen some others pop up lately, by the way, that do this. But what it does is you can install this Skillshare. It's kind of like a skill sinker. It'll actually sync them all in one place up to GitHub. And then on any machine, I can say Skillshare poll.

push sync and what sync will do is automatically sync them across all my agent harnesses that use skills. So it's pretty cool. So anyway, yeah, it's a technical one. So sorry, Warren.

speaker-0 (49:00.576)
No, I actually totally get it. I think it will be interesting, especially if we look at the sorts of problems that the listeners are dealing with. if you have, especially in a professional environment where you may have agents running both on the development side for every single engineer or team member you have present, as well as agents running in any sort of cluster on behalf of users, where you may actually either want to provide

customers their own sort of thinking or even across what you're deploying like I feel like we're back in the same world as you know early 2000s where it's how do I make sure all my build servers have the exact same versions of all of the dependencies so that when I build it doesn't say yeah you know this doesn't work seg fault or whatever because it doesn't actually have the right version of the Microsoft redistributable C++ libraries on it.

speaker-1 (49:50.67)
I think we're going to see that with agents, with skills, with all these things. And yeah, it's funny how things are just a big circle, aren't they sometimes?

speaker-0 (49:58.734)
What is old is new again. Like everyone going to spec-based development. I'm just already thinking about the world where we're back in agile software development with LLMs and I'm just putting my head down, ignoring spec-based for now, praying that we don't stay there forever. So, but I like the pick. I think it's super relevant and topical too. Okay, so I'll share what I brought, which is maybe a little relevant, hits too much at the heart maybe. There's a paper called the Ironies of Automation from 1983 by Lee Zane Bainbridge.

And she talks about, it's only five pages long, but I think it's absolutely great. The tasks left after automation are actually, ends up being the ones that are still manual. And they're the parts that maybe an automation designer, or in this case in 2026, an LLM company's company couldn't figure out how to automate. And so they're actually the most complicated parts of the tasks that are still left over. And the other aspect is what has been automated still needs to be monitored.

And when something goes wrong, like you can't just have like you look at a nuclear reactor. If you automate every part of it, how do you make sure the system is doing the right thing? Well, I suppose you could have a monitor that pops up says everything's green. But how do know everything is green? Like do you just trust the same system that's doing the automation also that the monitor that is generating and the alerts that is generated are also correct? Well, you probably need to dig into that.

It's like, well, why is it green? OK, these stats are also green. Does that mean that all of the detectors are operating correctly? I guess we need to check to make sure that the input signals are all there and that it's actually detecting those things correct and that those things are still in operating bands. All of that extra work on top is work that you wouldn't have to have done if you were just doing the manual labor. And so this is why we can see companies. There was actually a paper not too long ago released by

McDonald's and Wendy's after the installation of their automated cashiers that the level of training was actually increased. The number of technical staff had to increase overall in the organization. Adding self-service ordering machines cost the company money, not necessarily made them money. Now, you could argue that automation, the whole point is not to improve the process, but to make it scale. And I think this is where this sort of an interesting duality analogy comes in with the LLMs where

speaker-0 (52:13.548)
I think you have to really look at what your bottlenecks are. And if an LLM can solve your bottleneck, then it's a good thing to actually introduce in your process. But if you're just introducing it because why not, and you're not solving a bottleneck, you could actually be requiring more complex understanding of your system to do the software development lifecycle or wherever you're sticking it in. So it may not necessarily be the right thing on the forefront. And as you pointed out, it could be even decades before companies that go down this route actually see the impact of their decisions.

So I don't know, I really liked this paper. It's like 1983, it's like, wow, whatever was a whole new, what was old is new again. So I think this has been a great episode, Dan. Thank you for coming on the show and gracing us with your presence and great stories for the audience. Thank you so much.

speaker-1 (52:49.934)
Yeah.

speaker-1 (53:02.542)
Well, thank you. I appreciate you having me.

speaker-0 (53:04.544)
And thanks to all the listeners for tuning in for this episode of Adventures in DevOps, and I hope to see everyone back next week.