speaker-0 (00:07)
Welcome back to Adventures and DevOps, where we try to solve our collective existential crises we all have in software engineering. Crises? Crises? ⁓ Can you have more than one at a time? Until we resolve the English grammar question, we'll have to switch to our backup for this week, which is a deep dive into control planes, continuous integration, and build systems. And to answer all our impossibly tough questions, we have previously principal engineer, early circle CI denizen, Haskellar.

and currently staff engineer on the stability team at Mercury with us, Ian Duncan. Welcome to the show. Is it Haskellar or Haskellar?

speaker-1 (00:42)
Thanks for having me.

I would say Haskell-er is probably the more likely ⁓ pronunciation you're going to hear. Yeah, I'd say Haskell is actually ⁓ definitely one of my passions as far as programming goes because it's a language where a lot of the research that's come out of it has led to, has informed a lot of design decisions for subsequent languages.

speaker-0 (01:06)
The reason I actually sourced you for this episode is because of one of your blog posts on the challenges of GitHub CI. And I've actually gone through quite some of your blog and it seems like every article is just, honestly, it's a gold mine. They're really well done. It's not just the words. I saw some of them. They've got like clickable dynamic examples. Like you must have spent no small amount of effort putting these together and they're not short articles either.

speaker-1 (01:31)
Yeah, I'll say ⁓ in the interest of total honesty, a lot of the little visualizations have been somewhat vibe-coded to try to get a point across. It's not like I actually wrote every single one of those by hand.

speaker-0 (01:44)
Well, so there is a great book called, blah, blah, blah by Dan Roem, which really talks about the importance of conveying information, both as a written way and using visuals. And I feel like the moving visuals are a whole nother step on top of that, that I think I've only ever seen one other blog out there that ever tries to include that. And this, you know, I put myself in that group, like I don't have any moving visuals in my articles.

speaker-1 (02:08)
Well, there's, yeah, you may be referencing the same ⁓ individual. There's a guy out there who does some really fantastic infographic type blogs. He's written one that's about load balancing algorithms and a few other things. And that was sort of the inspiration that I wanted to go for was to kind of try to help people visualize as they go. Like, what is the point? Because ⁓ there's a lot of nuance to a lot of these concepts and kind of helping people have something to fiddle around with until it clicks, find, can be quite helpful.

speaker-0 (02:38)
yeah, no, I totally agree. It's very hard to get feedback on how well you're post land and so doing everything you can to actually try to convey it effectively is definitely worthwhile something.

speaker-1 (02:49)
Yeah, what's interesting, think, is I've been having a little bit more uptick on people reading my blog lately. But by and large, it's a mixture of stuff that I've written for myself or else propaganda for people at work to convince them to do something that I want them to do. So in particular, the GitHub actions rant that you referenced is largely because I'm trying to convince work to switch to build-kite.

You know, I was just working one day and just like, you know, GitHub actions glitched out on me and it was just like the final straw. And I was just like, you know, I kind of crashed out a little bit, but here we are. A guy that I used to work with, his name's Mark Watton, wrote a really nice article about what do you call the novelty budget? And it's a term that's really stuck with me. I'm sure maybe other people have used it as well, but you know, just the idea that you have every business needs like competitive advantage.

And so like if you're picking tooling or if you're, you know, whether it's databases or languages or SaaS products that you're adopting to like achieve your goals, have a certain amount of tokens where you, can probably invest those into like, I think this technology is really going to move the needle for me. But you know, there's, there's a lot of things that are like Postgres, right? Where it's like, everybody uses it. It's a really well understood tool. It's probably going to be here 30 years from now in some

capacity and like do you really need to reinvent the database or do you really need to like you know what I mean so I'd really say that I think I think this kind of ties into that concept of like you should really pick your battles ⁓ or like limit the number of and I'm saying this as a guy who's been doing Haskell for 20 years. I know right. Believe me.

speaker-0 (04:33)
I

mean, there is definitely this aspect of being a hypocrite, but at the same time, I think as you spend more time in engineering, you understand that there's like the business needs and then except like everything is risky in some regard. There is no such thing as the perfect software service except for the one that just doesn't exist. And so since it can't be perfect and if you believe it can be perfect, you know,

I want to invite you onto the show so we can have a conversation about that because it will fail in some way. There is no way to have fully reliable stuff and so you're making a trade-off somewhere. And I love the post that you're talking about actually which is about like innovation tokens and using boring technology basically. I don't know why I would have called Postgres boring that long ago but at this point it has so many extra sort of angles to solve business problems that it's hard to justify picking up

particular other database technology as a solution because it's just so natively implemented in Postgres today. That being said, I sort of want to ask your perspective, especially being on the stability team at Mark3, where you've used your innovation tokens versus where you felt like boring technology was better.

speaker-1 (05:42)
Oh yeah, mean Haskell is easily our biggest innovation token, I would say. There's no doubt. But I think we kind of hit the functional programming nerd peak on a few other fronts too. We use Nix quite a lot for reproducible builds and for setting up development environments. And that's gotten a little bit more mainstream traction. But in terms of how we do our operations at Mercury, we are pretty ruthlessly simple about it.

I would say up until a year, year and a half ago, we were still basically just deploying to straight up EC2 instances using a bash script and operating a massive system with really simple deployment techniques. there was really nothing fancy there. was GitHub actions, build sort of like a Nix. Package isn't really the right word, but I'm going to use it just to keep the conversation simple here. Build a package of our system.

ship it to the boxes, switch over to the new package, restart the services, bots your uncle, right? And we got super far with that with like three, three instances, five instances on EC2. You don't have to have like a lot of real fancy stuff until you get pretty far along as a company. So yeah, I would say it doesn't take much. And in fact, we were really quite ruthless about trying to avoid adopting additional like databases. Basically we had Postgres, we have...

Up until very recently, we didn't even use Redis for anything. And we still don't even use that for caching. We just use it for some tertiary things. It was basically Postgres, Cloudflare, and some EC2 instances. And we're servicing a huge amount of customers and moving a lot of money around. And that's all it takes. And even the one thing I'll say is that it can be pushed too far. So we were also using Postgres as our queuing mechanism. you'll see this. I feel like I see about once every six months to a year.

somebody will come onto Hacker News or Lobster or these other tech fora and be like, you can just use Postgres as a queue and it's fine and it's actually not a big deal. But it turns out that it's remarkably hard to do that properly and it does scale for good long amount of time. But once you get past the point where it doesn't scale, then it's kind of mission critical that you get off of it because if you're running a huge amount of stuff through that, then it's going to be using a lot of your

database resources, it's going to affect ⁓ replication to your read replicas, it's going to affect how long it takes to restore backups, all these things. So I guess what I'll say on that here. Yeah, no, there's lots of stories here. What I'll say on that front, before I get into any stories, just I'm really a strong believer that you should be looking nine months to a year into the future on a pretty regular basis and forecasting what happens at that point.

Because it's just one of those problems where if you get to the point where it's a problem, you don't necessarily have enough time to solve it. And so you either have to spend a huge amount of money to scale vertically, or it becomes an all hands on deck situation and product development screeches to a halt, things like that.

speaker-0 (08:46)
Well, on that, don't you think having those predictive problems in your backlog would allow engineering teams to remember what challenges are that are there and then just prioritize according to whatever the business initiatives are that are aligned with the vision that goes out maybe to a year? What's the problem with relying on that as the strategy?

speaker-1 (09:06)
You mean like filing a linear ticket and... ...so prioritize that? Well, I think that's a cultural thing. It depends on the context of the company that you're working at. But as we know with a lot of the startup world, there's a huge amount of pressure to be faster than your competitors, to ship features faster. ⁓ You know, the rise of AI is, if anything, accelerating that sense of pressure like innovate or...

speaker-0 (09:12)
Yeah, for sure.

speaker-1 (09:36)
be left behind. And so I think there's a huge amount of pressure to deprioritize maintenance tasks or preventative or work on these preventative measures in favor of just going full steam ahead on beating your competitors, I guess.

speaker-0 (09:50)
It's like this thing, and I think you have to drive a car to sort of understand this analogy, but it's like if you see two cars headed to an intersection way ahead of time where they're going to collide at some point, like they're not trying to veer off course and it's going to happen. It's just like in slow motion, you can see and it's like, don't you want to do something about it? And I just find that a lot of people ⁓ have shifted their focus to more short term to not even seeing the accident.

coming head on, they're just not thinking about it, they're not paying attention, where it's clearly going to happen. And I feel like even now, it's even more of a struggle.

speaker-1 (10:29)
Well,

there's this, I guess what I'd say on this front is the problem that you always have to be looking to solve is making sure that ⁓ incentives are aligned. If a person on a product team, like a PM or an engineer is not prioritizing the sort of like stability or reliability issues, it's because they've got like an OKR or they've got some upper level manager breathing down their back about a particular problem. And if they say, well, I don't have time to...

meet our quarterly objectives that you've established for me. I have to work on this thing that's blah, blah. It's like the Charlie Brown voice, like wah, wah, wah, wah. That engineer knows they're not doing their career any favors in that moment by pissing off a person who's got other priorities. ⁓ So it's kind of a self-preservation ⁓ mechanism. unless you can...

really work with your upper level engineering management to make sure that those incentives are aligned. It's a remarkably hard thing for a single engineer in a company to accomplish on their own.

speaker-0 (11:28)
Yeah, absolutely. And now I'm going to grill you on what were you using if it wasn't SQS and AWS for your queue management.

speaker-1 (11:38)
yeah, we had a homegrown, well, we still have it to a degree. It's called Postgres queue, which is not super imaginative. it's, yeah, it's basically it's a table and it is really nice because it follows the transactional outbox pattern where you can ensure that within a particular database transaction, you've got some stuff that you want to do, updates and rows, but you also need to ensure that some side effect happens. And so having the queue within Postgres itself is really nice because you can just say, all right.

let's put this job into this table and then it'll be worked asynchronously eventually. And you don't have to worry about making sure that there's like any sort of delivery coordination or anything like that. And there's a lot of ways you can do this, by the way. This is just how it was solved early on at Mercury and it just sort of persisted. But that in and of itself is kind of okay. But the problem is that our version of the queue kept creating all these new features. So first it was just, you know, as...

just a bunch of insertions into a table that's like, here's the job name, here's the payload. But then I was like, OK, well, some things need to be, some things can't run until some other system has run. then we have to implement. FIFO keys. you have FIFO keys, all right? So anything that's got this key, gets processed from front to back. And then I was like, well, sometimes we can't run these things until, because like banking requirements or whatever, we literally can't run this until it's next Monday.

So then we added scheduling. and then what about the service going down? ⁓ I guess we need to add retries. OK, but now it's retried a bunch, but it's still not working. So now we need to see what the error was. And there's kind of this gradual progression from what seems like a very simple system into all these sort of exceptional cases. And each of those things does become kind of expensive for the database to maintain. And you end up having to implement.

partial indexes and if you need to do some runtime debugging of what's going on and the index isn't right and you've got a billion things in this table, well suddenly you've got a very slow query that times out before you can actually get meaningful information out. So yeah, it kind of turned into this thing where what we needed at some point really could have been better serviced otherwise, but every single step along the way seemed reasonable to the person who was doing it.

speaker-0 (13:57)
Yeah, for sure. It's the same way the state actors, they get you, right? It's like one small microcrime on top of another one until you're fully indebted and your whole life will get tear to shreds if you decide otherwise. And it does seem reasonable at the time for what you know. And I think one of the struggles I've seen in my experience is the conversion to a different technology or in our case, we're using SQS.

It doesn't perfectly support every feature that you're currently implementing. so I feel like, and this is an engineering flawed problem, I think, where you always try to have every feature in the new thing. And so you're like, well, if we can't migrate all of our queue dependencies from this table to ⁓ real service, then we're not going to do it at all.

speaker-1 (14:48)
Yeah, well, I think for us, a big part of it was really people who had burned by sort of earlier experiences with microservices and the like, or systems that just had a huge amount of stuff that you needed to run locally. There's a real heavy emphasis at Mercury on developer experience. So we've got a front-end developer experience, back-end developer experience as part of sort of, well, ⁓ we call it Side-UX. So it's stability, infrastructure, ⁓ developer experience, basically.

And that's sort of like all the non-product things. And yeah, there's really something to be said for like, all right, you're a growing company, you've hired a new engineer, you shipped him a laptop. Can they just like have running environment in 20 minutes once they get the laptop? And that was really heavily prioritized, but it was really kind of perhaps overweight in terms of the considerations at hand.

speaker-0 (15:39)
No, I totally agree. I think that the reality is that neither extreme is accurate. The single monolith with a whole database cluster running to support all of the tables and forcing an engineer to run basically the production system in order to do software development, that feels wrong. But it also feels wrong to run a thousand microservices in order to also do development at the same time. And there's a perfect middle ground here that's different.

per company where somehow every single company gets it wrong and makes this a challenge.

speaker-1 (16:12)
Yeah, there's an interesting initiative on this front that we've got going on at Mercury at the moment, which is adopting Buck 2, if you're familiar with it. Maybe you've heard of Basil.

speaker-0 (16:22)
That's the monolith build system.

speaker-1 (16:25)
It's like a monolithic build system that I think comes out of Google. Well, Buck and then subsequently Buck2 are basically Facebook's version of that. I see. So yeah, we've kind of entered the stage of the company where we're trying to stick with a monolithic approach to things for the most part. basically, we've outgrown the standard kind of build system tools. I think that sort of gets to what you're talking about here, which is like, you don't want to have to build stuff that you don't care about.

or operate stuff that you don't care about. But you still kind of need global access to everything that your company could be doing or would be doing, depending on the day. having the ability to sort of do these, have a monolith, but have it just build out subsets of it instead of building the whole world, running all of the systems that it could possibly need to run, that's kind of what we're angling for. And it's a very hard thing to do. I think our development experience team has been working on this for two plus years. Oh, wow.

where we're just kind of rounding the corner where it's like really starting to be like, okay, this is actually usable for it.

speaker-0 (17:30)
We're just two years away from the build system, for the developer experience, for building on the machine. It's going to be solved in just two more years.

speaker-1 (17:39)
I mean, that's what it's felt like for some time. actually, I mean, it does work now. So that's exciting. it's it's actually really remarkable once you have it. Because what's really great about it is that with Buck2, all of the build targets are sort of hermetically sealed. They're reproducible in the same way that Nix kind of tries to achieve reproducibility. But it's on a much more granular level. So you can spin up like a really beefy.

cluster of machines and fire off your build jobs. it can do massive parallelism. And because they're hermetic, they're reproducible. If one person builds a particular artifact and all the inputs are the same, then if you try to build that same artifact, then it can just download it for you. You don't have to rebuild it. And those sorts of improvements in speed are ultimately going to be incredibly powerful for any organization that's really not willing to descend into architecture madness that comes from microservices.

speaker-0 (18:35)
So, you a long time ago, I was working at a company that had a similar process in their custom build system to achieve this. It was running in Jenkins and actually was running something else before that, which was also a mistake, ⁓ but it was running in Jenkins at the time. And one of the challenges was that you did have jobs on top of jobs that would trigger each other on builds and store the results in artifactory at that point to pull them back down.

And rather than ending up with a nice, easily managed solution, which could have individual components be cached without having to be rebuilt, we ended up with essentially a distributed monolith, where it was still all the benefits of the monolith and also all of the disadvantages of microservices all wrapped into one. You couldn't really rebuild just the part you wanted, and you had to connect to the artifactory to pull down individual components at runtime. And what you ended up with is

a lot of individual functions basically were all being wrapped and published. So this one function in this one class would be its own sort of package which would get pushed up and pulled down later and it just wasn't the right unit of granularity for what we were building. And it also encouraged this aspect where you would really define micro pieces of functionality sort of like what you have today in any sort of mature package management system.

where you end up with all these packages that depend on each other, but rather than having development teams really focus on package management and dependency trees and what makes sense, you just ended up with, it was so easy to hook up to the system that you published a lot of these micro packages without thinking about what makes sense as a library.

speaker-1 (20:17)
Yeah, that's a really interesting point. So in terms of Buck2, I mean, it's sort of like if you write a make file, right? Let's think about a traditional C library that you're going to build or something like that or executable. And you write a make file and it's got these targets and you're defining sort of a dependency graph effectively where it's like in order to get a .o file, you have to compile a .c file and yada yada. And it kind of figures out how to resolve all that stuff. That's kind of the same thing that Buck2 does, but it just provides sort of like

the hermetically sealed bit of it so that way you're guaranteed that, you know, basically it's hashing the inputs and the outputs and it's kind of making sure that everything that needs to be rebuilt is rebuilt. And this has some really nice knock-on effects. So for example, if you have tests, like if you have a module that has a bunch of tests in it for CI, ⁓ if you change anything that causes that to be rebuilt, then it reruns the tests. But if like the targeted change that you're making doesn't actually affect that part of, that other part of the tree at all, then

you already know those test paths effectively, as long as your tests are sufficiently isolated. there's some nice knock-on benefits there. in terms of packaging, that's actually the issue that we ran into, is that we have a bunch of local packages that we have to manage currently with the traditional built-like packaging package manager for Haskell. So let's say that you've got a utility package that's way up in the dependency tree, and you slightly tweak it.

and then you've got 30 other packages that depend on that package, then those all have to rebuild and everything underneath it. And so there's this huge cascade of rebuilds. And that's kind of the problem that you have to solve. Whereas if you're using something like buck two, that's doing literally module level rebuilds. So like if you change one file, it's only having to rebuild everything that's transitively dependent on that file. Then you don't have these huge like two hour coffee breaks because you're waiting for your.

for your local system to rebuild all these packages. It's able to just do a targeted traversal through what needs to actually be rebuilt. so breaking apart the concept of packages and going entirely granular gives you massive improvements in terms of build parallelism. And that's a thing that you really have to solve now.

speaker-0 (22:29)
I'm just having some trauma revisiting from the days of old where a similar thing had attempted to be done, but really just ended up in a more complex diamond dependency problem, ⁓ which is like, you have two different packages that depend on different versions of the thing. And then at runtime, there is no way to resolve the right thing. But what we actually ended up with ⁓ in practice was this scenario where

You would have some shared dependency between two different services or components that would change, as you mentioned, and it would get rebuilt. But we would determine that that function was only used in one place because statically it was determined it was only used in one place. But it turned out that dynamically at runtime, it was used in the other place and that thing wouldn't get rebuilt. So it was targeting a different binary compatibility than the one that was actually found at runtime.

And unlike some dependency management systems which allow you to solve this diamond dependency problem by including both versions of the dependency at runtime, some languages like C, C++, C sharp, et cetera, most of them realistically only let you get one version of the binary dependency. And so you would just end up with a runtime crash. Now I think that sort of thing is impossible with Haskell, but it definitely is possible with dynamically linked libraries in say Java and C sharp.

Yeah

Yeah, no, no, for sure, for sure not. I can tell you this is a thing that did happen. Whether or not it's solved now is sort of a different question, but I found that this is sort of the result of the primary builder, whatever that script is or executable for that language, not offering this functionality by default and trying to build it on top, like a second level where it doesn't fully understand how the builds work and then trying to compensate for that. And I think this is, I don't know, my clue would be...

potentially we should pick a different technology or whatnot, but maybe there it is a solved problem in this case. I don't know. Could be.

speaker-1 (24:33)
I think one of the interesting things is that, ⁓ at least the challenge that I found, is that every ecosystem ends up having to build its own build system, right? Yeah. ⁓ It's a remarkably challenging thing if you want to have a polyglot system because you have that many package managers on top of it with their own quirks, their own dependency resolution algorithms, and yada yada. And so doing any sort of serious integration across

multiple languages can become remarkably painful. And so there is a compelling aspect of saying, all right, in a monolith, we're using this one build system and then letting teams write some rules, plugins, so to speak, that kind of map their particular linguistic concepts onto a single build system so that engineers who you like, all right, for example, we send emails. Most companies send emails.

speaker-0 (25:30)
Shots.

speaker-1 (25:31)
They're templated, right? And ⁓ we want them to match our design system. so consequently, because we write our front end in React, engineers now react. So we're using this React email templating thing. I love where this is going. Yeah, but the thing is sending emails happens on the back end, which is where all our Haskell developers live. And our Haskell developers really don't want to fucking deal with NPM, right? That's just like this whole can of worms. They don't want to have to mess with that.

Yeah, this ability to provide a common interface across a monolith, I think, is quite compelling. So that way you can just say, build me the email templates. And that's that. You don't have to think so hard about it. But I want to say this before we move on. I mentioned we use Nix. And Nix is remarkably powerful. ⁓ And now we're using Buck2. And Buck2 is remarkably powerful. They both provide incredible benefits to an organization. I still hate using them.

Like, the actual ⁓ user experience of these tools is really brutal. So I want to say, as a counterpoint to myself, that if you're not a company with 200-plus engineers, or if you don't have existing experience in this stuff, like Knicks, there's probably an argument that you should just bite the bullet and use it, because it's incredibly powerful. It still sucks to use. Any day where I have to mess with these tools for more than about five minutes is a day poorly spent for me. Yeah.

I guess on the subject of novelty budget, sometimes you have these things that have really great payoffs when you actually have it all set up, but they're still painful.

speaker-0 (27:09)
A lot of wasted time in your past is what I'm hearing here. I can appreciate that.

speaker-1 (27:16)
I mean, it's the eat your Brussels sprouts thing, right? Like, no kid wants to their Brussels sprouts. ⁓ It's ⁓ good for you, but maybe not enjoyable.

speaker-0 (27:27)
You know, I've tried to cook Brussels sprouts so many ways and at one point I thought I liked them, but having migrated to so many other vegetables and root vegetables that I, or leaf vegetables in my, in the last 30 years or so, I have to say I still can't make Brussels sprouts taste good. So maybe there's an argument for not eating them.

speaker-1 (27:50)
Yeah, you know, ⁓ wow, we're gonna get really off topic here, but cooking is my other passion, so.

speaker-0 (27:57)
Maybe we can hold off the cooking one. we come back to it. I mean, you can make a cooking comment, but I need to hear it have some connection to the conversation.

speaker-1 (28:10)
It has no connection, so I guess that's me one.

speaker-0 (28:12)
You

can always try to connect this back in the end. Yeah, right. There's always a point here. You can make them taste good, maybe. I do want to get to one last thing, I think, before we close out the episode, and that's your take on what I would call control planes.

speaker-1 (28:17)
Do my assignment on root vegetables here?

speaker-0 (28:33)
I think that you brought this up in your blog articles, I think a couple of them realistically, and it's like one of those phrases from my experience where I didn't really pay a lot of attention to historically. So it's similar to like software-defined networking in practice, but once I got to the cloud, I felt like actually understanding networking is now critical. I need to understand how DNS works. I need to understand how load balancers work and packet switching, et cetera. I may not love it, and software may make it easier, but...

The other one is like control planes and it's that thing where if you're in a cloud or running in some way, it suddenly becomes a useful phrase for communication.

speaker-1 (29:09)
Yeah, so I think I said this in my blog posts as well, but the way that I use it is probably some people, some purists are probably going to say that this is like a really twisted definition. But the way that I think of it is basically you've got your software, your main product, your main system that's running. And it is easy and tempting to basically bake in all of the management that you need to run your software into the software itself.

So, you know, for example, a lot ⁓ of businesses have like admin pages where you log in and you do your admin things and it's like some portion of the executable that you built and it's like a subsystem of, you know, this thing that you ship. The problem with that is that when you have incidents, when you have outages that are taking down your system, you run into a circular problem where the tools that you need to fix your system are not available because they're baked into your system. And so there's a lot of systems that you're probably using on a daily basis that

have control planes. Like a lot of people use Kubernetes, for example. And Kubernetes, I think, is probably like the canonical example that everybody's familiar with at this point, where there's some services that you're running through your product, but then you've got this thing that's in charge of resource allocation. It's in charge of making sure that the things that are supposed to talk to each other can, and that it's in charge of how they connect to the rest of the world, all that. That is a control plane. But you probably often, also when I think about having a control plane,

for your product itself. So as I mentioned, we got this thing called PostgreSQL. And so here's where I can talk a little bit about PostgreSQL. For convenience, we ended up building a management interface for retrying PostgreSQL jobs that failed or seeing what the failures were. Problem is that we got to a point on Sundays where PostgreSQL was taking up 80 % of our database resources, right? And that would cause login on our site to fail because it took too long to log in. I know where this is going.

So you can't manage your system if you can't log into the system. So really, what we should have been doing, I mean, there's a number of lessons to take from this. I'm not going to try to enumerate all of them. Every system has its growing pains. But what we should have done is had behind a VPN a control plane for PostgreSQL that its sole job was make sure that you're an employee, make sure that you pass, you can get through Okta or whatever, and then

give you the things that you need to do and have it not be tied to all of the other domain level concerns of the primary system. And so that's my pitch for control planes in a nutshell is the things that you need to do to operate a system cannot live with the system itself or you are going to hate yourself at some point.

speaker-0 (31:51)
See, I feel like you're a secret advocate for microservices, just like me. You want your login and access control to be a separate system so that issues with another part of the system can't impact it negatively, have separate databases, separate compute and running there so that containers being brought down or virtual machines having an issue in a particular region don't cause a cascading failure for your entire system. Now, I think you're going to...

say you're not and it's still going to be the model, that's the strategy. But honestly, think the access control is a good example there.

speaker-1 (32:25)
Well, what I'll say is I'm an advocate for the concept of bulkheading, which is to say that if a particular portion of your system fails, it shouldn't be cascading into the rest of your system. And microservices are... Okay, here's what I'll say. I say this at my company all the time. I'm not opposed to services. I'm opposed to microservices. What I think we should have is appropriately sized services. ⁓ Maybe they're not broken down to a hypergranular level, but...

But one thing that I'll say is that you can achieve a service-oriented architecture with a monolith at the same time. What that means, for example, in a really easy world is if you're behind AWS, like ⁓ application load balancer or something, allocate some EC2 instances or some containers to individual teams or individual product domains. And it can still be running the whole monolithic build. It can still be...

like a single coherent view of the world, but the resources that it's using like database connections or what have you are scoped to a particular product domain. So that way if some team comes along and accidentally uses up your entire connection pool, you know, on a per instance basis, that's their problem. It's not the whole compliance problem.

speaker-0 (33:39)
We should have definitely had this be a fight about microservices versus Model S. think we could have gotten really far on that one.

speaker-1 (33:46)
Well, you can have me back. That's fine. I want to mention, I mentioned briefly that we were kind moving off of Postgres to you. And most of what we're moving onto from that is a system called Temporal, which is a durable execution framework. And I talked about that in some of my blog posts. But the concept of durable execution frameworks is a thing that I think is starting to gain a little bit of steam. we're starting to see a few competing products in the space.

And it's really nice because it's ⁓ basically what it does is you have workflows or you have particular pieces of business logic that need to definitely execute from start to finish. And you want to be resilient to ⁓ failures. You want to deal with like what happens if your process crashes halfway through. typically you have to call out to like APIs or do some database, multiple database transactions, and you don't want to leave the system in an inconsistent state.

And basically, this concept of durable execution frameworks is they record sort of the steps that you're doing along the way. And if the process crashes or there's a logic bug and you ship a fix for it later, then it's able to use the serialized results of what it's done so far to kind time travel back to where you were and pick up where you left off and execute to completion. And ⁓ I don't know, it just really ⁓ scratches and itch for me. And so.

Obviously we can't talk about it for very long, I want to encourage listeners to go read about it.

speaker-0 (35:16)
Well,

actually, it's really interesting that you brought this up for two points. ⁓ I think AWS has been full on in the last few months since re-invent introducing durable functions into Lambda. It's the sort of thing that we have state functions and workflows existing in all the cloud providers, but I think...

Azure was first and I don't think there's an option in GCP, but we do have sort of native implementations now that allow you to execute some code and give hints to the orchestration layer on how to retry that function when it fails and not just retry from the beginning and have to implement item potency everywhere, et cetera, and how to store state and check for that. I think the poor man's version of durable functions is like throw a message in SQS and have it execute your code, like.

every single step of there is item potent and it just executes it and then goes on to the next one and maybe there's some logic to skip there. But it'd be so nice if there was a framework and I think as you said, we're starting to see companies spin up. It's interesting you mentioned Temporal though because in a couple of weeks we will have VP of Engineering from Temporal on the podcast actually to talk about this. So you can spoil yourself if you want and read about it ⁓ or you can wait for that episode and see what shenanigans we get up to on there.

speaker-1 (36:27)
the privilege of talking at their conference last year a little bit about our learnings, adopting it at Mercury. So I'm a fan boy to a certain extent.

speaker-0 (36:38)
Okay, well then with that, we'll skip the tangents and move on to picks for the episode. So, Anne, what did you bring for us today?

speaker-1 (36:45)
Yes, my pick for the episode is I want to recommend a band called Glory Hammer, which is a power metal band. I'm a relatively recent power metal enjoyer. I'll say a lot of it I don't like, I think they're really fun as a band because they lean into absurdity. Like they know that they're in a goofy genre. And how do I say it? I find their...

lyrical choices to be really interesting. They do a lot of things that are incoherent with their lyrics. So for example, they'll say they cast the evil wizard into a tomb of liquid ice. you sit there and you're like, all right, awesome lyric. And then you're like, wait, liquid ice? Like water? Is it really? You know what I mean? Is the evil wizard actually trapped in there? So I don't know. It's just kind of like, they have a lot of fun with it. And I think it kind of captures that feeling of

a really good fantasy show or like a really good video game in terms of like, but they're just turning like the whole epic, epicness as a concept up to like 11. And it's just a fun listen. ⁓

speaker-0 (37:54)
I am not the biggest music buff, back when it was still a thing, honestly, was it like ⁓ Nightwish or ⁓ Metalocalypse, the Death Clock? know, they're the weird songs that definitely hit the fantasy angle. I don't know what this metal, power metal means though. what is that as a genre?

speaker-1 (38:18)
⁓ that's... I'm sure somebody else could say it better than me, but...

speaker-0 (38:23)
We'll get a link to it then in the episode description.

speaker-1 (38:27)
I guess what I would say is it tends to introduce a lot of symphonic elements into it. They'll have organs or choirs. They'll have, obviously, the traditional guitars and all that jazz. But it kind of, I think, goes into ⁓ a lot of the artists really lean very heavily on to fantasy or sci-fi. And I think as compared to normal metal, where there's a bit of a focus on just how it sounds, there's normal.

speaker-0 (38:54)
lot

of

speaker-1 (38:54)
Other

than power metal artists that I've really gotten into, I kind of have themes to them, I guess you could say. So for example, there's another band that I've come to enjoy called Windrose. You may have heard them or heard some of their songs, but basically imagine Lord of the Rings dwarves playing metal music about being dwarves. And there's another band called Power Wolf. They kind of do the same thing for vampires, or not vampires, but they do it for werewolves.

And so they kind of just lean really hard into these kind thematic elements. It's fun. I don't really go in for the death metal or anything, but just something that kind of channels that fantasy vibe when I'm working is really nice.

speaker-0 (39:37)
Okay, so Ann's recommendation will be in the podcast description for the episode.

speaker-1 (39:41)
Check

out description.

speaker-0 (39:43)
Okay, I think this is a great pick. ⁓ So thanks for that. ⁓ I guess mine's lame in comparison. A couple of weeks ago, I ⁓ had a pick about why archers don't fire volleys. It's not a real thing, never happened, never will happen. Every reenactment in any movie or television show is absolutely wrong. But on the same collection website, there is an article on battle logistics and how they actually do it.

I love this collection of articles from the Roman age, specifically warfare. Going to war meant literally moving your city into combat. That is the way that I would describe it. If you go to combat, you pretty much have to take all the people, whatever they're doing, not just the combatants, into combat and move them. And how would you manage traveling as a city together? There's lots of non-combatants, everything people need to survive. It's still true in any sort of war zone. Nothing really changes. And it's just...

It's really interesting to see how the author talks about it and realistically what actually happens in these scenarios. It's like this area in a lot of popular culture that you just don't see movies or television shows really ever get into at all.

speaker-1 (40:57)
Yeah, I suppose it's kind of the, what makes for good cinema, Nobody wants to see a bunch of people like, cheffing and repairing things.

speaker-0 (41:07)
I mean, maybe I think they do sort of get about this a little bit. Like I'm not a huge Game of Thrones fan, but there are a lot of like the army's moving and you do see like what happens at night within the guard camps or whatever and stuff is going on there. And you can sort of extrapolate like, well, if this is happening, these other things are probably happening. But it makes a lot of good points. Like, well, where do you get food from? And like, how do you transport it from where they are to where they're going? If you're going on like a three month journey.

How much food can you bring on your person? And then you have this issue of, well, I guess you can have some animals carry a cart with food on it, but how much food do the animals need to consume in order to move the cart? And so there's like a maximum limit that you can actually have. Same thing goes for, you know, what happens when your armor breaks? If you're wearing some sort of armor, who fixes that? Do you bring blacksmiths with you?

But what a blacksmith need to survive? Well, they also need like giant forges for their profession. And obviously they need food, too, and welding materials. And where do they get that from? And what do they do when everyone's in combat? Do they just sit there waiting? Do they just work all the time? They probably need rest and relaxation. So I know this feels like a bit pedantic, but the collection is actually called Unmitigated Pedantry. And the thing that I really like about it, if anyone knows me, is that

I find pedantry like exceptionally interesting, especially when gone into extreme detail. It's where the nuance lives.

speaker-1 (42:37)
It makes me curious now that you mention it, because whenever you see sieges portrayed in ⁓ fiction, it's kind of taken for granted that the people that are trying to trap people inside of a city or inside of a castle or what have you kind of have all the time in the world. But I suppose what you're mentioning here does sort of raise the question of how do they sustain that, right?

speaker-0 (42:58)
So this is where you can imagine realistically, and I don't know if he talks about this one, but like a siege, you need to have the people performing the siege get supplies. And so you can imagine a supply line is a real thing that has to be managed. But how far away are they from their city, you know, when they came there and how long can they actually lay siege? You can imagine if you want to take a towered or defended city, you don't need to actually bring the walls down. You can just destroy all of the supply lines that go into the city and eventually.

This is like war of attrition. You'll win. And so you sort of have this strategy. And this is where like alliances are born because like, if we're being attacked, where is our army? ⁓ Where are we getting goods from to sustain the population of that city? But yeah, it goes the other way too, right? Like where do you get the arrows and whatever, sledgehammers or battle rams, et cetera, in order to even do the attack in the first place. So you need to sustain that part of the army as well.

If you're very far away from your source location, well, you know, there's a good luck there. And I think that's where the term living off the land comes from.

speaker-1 (44:04)
Well, it sounds like you've got a really good candidate for a Steam game, right? The kind of people that like farming simulator or that sort of thing. think Supply Line Manager 1800 might be a big hit.

speaker-0 (44:19)
You know what? At this point, I'm sure it is already a game. ⁓ Maybe there is something out there and I can just go and find it. But I think it's interesting to play it and more interesting to sort of understand the challenges. Now, maybe someone else is more into games than me. I know I'm a self-proclaimed gamer, but I think in reality, I play only the very few puzzle games. Well, thank you, Anne, for coming on for this episode of Adventures in DevOps. I absolutely have loved the discussion and everything.

CICD and thanks to the listeners for tuning in for this episode and hopefully we'll see everyone back next week.