Warren Parad (00:00) Welcome back to Adventures in DevOps. Every episode is a deep dive with an expert guest. Today's adventure focuses on heterogeneous compute and auto scaling for data pipelines. The expert had a huge impact on Android early days at Google and is now currently the head of engineering at AnyScale. Welcome to the show, J. Kumar, J.K. Ganesh. Jaikumar (JK) (00:18) So excited to have this conversation. Thanks for the warm intro Warren. Warren Parad (00:21) you have a different paradigm for how you need to execute a particular program or code. it's not like, historically we can say containers are terrible as the unit of a worker or job that needs to be done, which is realistically what Kubernetes is using. And in especially the ML space, it may not map one-to-one with what should be optimized. So we need a different paradigm. Jaikumar (JK) (00:35) Yeah. you have fast iteration loops, training loops, reinforcement learning loops, you need much more finer-grained control. And that's how Ray came to be. Warren Parad (00:51) one interesting aspect is it does seem similar to the idea of having serverless running on top of container orchestrators. just except for the ML world and realistically the interface isn't the terrible OCI standard we have for deciding how to interact with specifically how much virtual memory and how big the containers are, et cetera, and having to depend on everything and be specific about it. But really, if you look at AWS's Lambda or GCP, know, cloud run functions, you are defining like memory usage and size maybe essentially. And I can understand that there are some things that you would have to configure for Ray, but the end goal realistically is to remove that complexity from someone who's gonna be running an ML job. Jaikumar (JK) (01:22) Yeah, yeah, yeah. Yes, you're spot on. There's also some complexity as to where the data lives. So there are some people who are fine with serverless and there are some who say, no, no, no, I have my own AWS or an ABS contract, so I want you to run in my environment. We have seen a lot of customers do that too. Warren Parad (01:46) Yeah. well, I think Lambda was like, I'm going to get this number wrong, but I think it was 2012. And now we're looking at 14 years later, they're finally potentially releasing improvements on top of that, that may actually give you fine-grained control. So I think one of the aspects here is, I think you're totally right, that at the end of the day, you need to not only control the underlying layer wherever you're running, it doesn't matter if you're running Kubernetes on-prem or... on a cloud provider, it's not the right interface for distributing control over that to individual workloads that are being run. And so you're installing a platform on top of that. Jaikumar (JK) (02:23) At the end of the day, all developers honestly care about time to market, how quickly can they get their solution, and what is the Warren Parad (02:31) Well, I don't actually know if ML developers care about those things. That's true. Yeah, no, I mean, I totally agree that it's definitely 100 % a business concern and it's better when ML developers understand or also engineers really understand what their constraints are for what they're creating because they're the ones at the end of the day who have to make the determination of what technology to pull into their stack and solve their specific problems to handle the business needs. I think in practice we... Jaikumar (JK) (02:33) Well, the business definitely cares about it and so it passes on to the developers. Warren Parad (02:58) We see a lot of companies historically have just given an unlimited budget to quote unquote data science teams and let them go wild. And the result is a complete mess of Python packages and ⁓ code that no software engineer would ever approve of. Jaikumar (JK) (03:01) Yeah. Yeah. Yeah, true. That's true. That used to happen a lot more maybe 7, 8 years back. And then the software engineering practices have come to the ML world. And many of the software engineers have become ML engineers. So you've actually seen that change a lot too, when models are checked into it. I remember 7, 8 years back, people were not checking in models. They were like, this is an iteration. Here's my Google Doc. And no one else can reproduce the results, because it's like, although the features used were not. documented well, so that has changed quite a bit. Warren Parad (03:41) There were so many horrifying stories I had from my past, like when I worked at one aerospace company, they for sure were putting their source control, wrapping it up, whatever the files were, into a zip archive and uploading it to Confluence as their source control with like the version there. So that's actually, that wasn't even that long ago. And that was pure software engineering, but I know where those engineers came from. So it makes a lot of sense. I do have to ask, who is building models today still? Like I do see like there is this aspect of Jaikumar (JK) (04:00) Mm-hmm. Warren Parad (04:09) a fixed small number of companies that are making the state of the art models, ⁓ we'll call them foundational models. And it seems like a small set. And there was a little bit of time where this idea of fine tuning could be considered. But realistically, I feel like that's been eliminated, unfortunately, in a way, because the fine tuning is expensive. And it also couldn't keep up with the improvements to models that were being created by larger companies. So I think my question really is, what specific problems Jaikumar (JK) (04:12) Yeah. Yep. Yeah, yeah. Warren Parad (04:35) do these companies have that they're turning to Ray in any scale for a Jaikumar (JK) (04:39) Yeah, You have your Anthropics and Open AIs and your Google's creating the foundation models. And then there are certain other companies which create the next year of foundation models for their specific vertical, whether it was health, whether it was finance, et cetera. And yes, fine tuning was and rag based systems were popular. Rag is still popular. Fine tuning has gone down a bit. And like, you know, now with agents, context engineering is more key. that you actually provide the right context of your internal system to the models. Where we are seeing a lot of users is like, know, as many times it's easy to get started with your OpenAI model when you're a startup, but as your business scales or if you're already a scaled digital native business, you are not wanting to be fully dependent on Anthropic and OpenAI, you have your data stream sets there. So where we are seeing is that people are reading large amounts of data. and then creating embeddings out of it. And then they want to serve those embeddings. Warren Parad (05:31) scenarios where I think in a lot of them, it really depends on having a lot of data. What is the go-to mechanism here for the capacity for storage and then interacting with the platform? You see that a lot of these companies are running Kubernetes and they're running large, I don't know, Postgres instances directly on their Kubernetes clusters and you're sourcing the data from there when they're executing or... Are they using some sort of cloud provider and the data is being stored in that mechanism? Are we seeing on-prem versus cloud instances? Where are most of the companies building their models today? Jaikumar (JK) (06:02) Yeah. S3 storage, some with data break, some with read the data from wherever the data is stored, where the data does not leave their environment. over there. On-prem, yes, there are some on-prem customers. But then they're on-prem for a reason, and they want a lot more Warren Parad (06:22) one thing I do struggle with a bit here is understanding specifically when to use the CPU versus switch off to GPU. And I think you're the right person to answer this question for me. Jaikumar (JK) (06:30) Yeah, yeah, yeah. Yeah, so let's take an example. Suppose you have ... podcasts, a bunch of podcasts and newsletters. And then you want to have a simple search interface which says, I am interested in a topic, say, about growth hacking. And it should go pick the right newsletter, write, your transcript, and take you exactly to that So you have this corpus of data, and you now want to read that data. Usually the reading stuff happens on the CPU. Now you need to create chunks of this data, and then you need to create an embedding model of this data. That is a GPU efficient process. You use GPUs for that. And then you want to write the results somewhere, which is, again, a CPU process. So your standard pipeline is like CPU, GPUs, two GPUs, one for, say, chunking, another GPU for segmentation or an embedding model. and then another CPU for actually writing it. And once you have written this results, you need to serve these results. And now for serving the results, again, you need to say either read it. If it's a two-stage pipeline, then you read it back from the disk. You can obviously stream it too, but let's make it simple. And you're reading it back from the CPU. And now you need to use an LLM inference provider. That's also another GPU. And so you need to have a GPU for the LLM inference provider so that you can now serve it on a page and people can type queries and you can stream in the results. Usually what happens in such pipelines is that when you're reading images, your GPU is just sitting there waiting for all these videos, images, newsletters to actually be read. And if you have a massive corpus, that's a good amount of time it's actually waiting for. And you need to keep running this pipeline again again. Warren Parad (08:08) Yeah. Jaikumar (JK) (08:11) So as the first image is read, it is sent to the GPU. And now that GPU has already done the chunks, and then it's sent to the second GPU for the embedding model, and then it's written to the CPU. And then you can keep reading the next set of data, the next set of data kind of stuff. So overall utilization of this pipeline is much more efficient. So the GPUs are utilization is higher, the time it takes for the entire processing is much more reduced. Warren Parad (08:33) do you know that some code that's being executed is the inference or some code that's executing is going to be image generation so that scheduling it on a GPU-specific container that has access to the GPUs from the underlying Kubernetes cluster is being used versus one that's optimized for compute? And maybe the underlying containers are all the same, but the The real trouble is you have to know, what you're doing is you're figuring out what capacity is still available and saying, let's schedule there. It's like, but you still have to know how the underlying program is deciding what is actually necessary. Jaikumar (JK) (09:00) Yeah, yeah. this is where a little bit of the ML developers work comes in, right? They say, hey, yes, you can specify some computer, say it says use GPUs. So I don't care about the numbers. I know this requires GPUs, use GPUs, right? And the rest is taken care of. Now, if the system only provides CPUs, then Ray can't do anything, right? So we do need some amount of hint. Warren Parad (09:19) Right, yeah. Jaikumar (JK) (09:28) from the user saying, use GPUs kind of stuff. Warren Parad (09:31) It makes sense. The only alternative I can think of is that somehow you would be collecting the actual code that would be executed, hashing it, storing that hash, and seeing how it dynamically performs under different utilization curves, like how many GPUs. Jaikumar (JK) (09:43) So we can figure out, the CPU is really busy right now, then we should add a GPU to the system. So we can actually do all those interesting things. And some of that is in the work, some of that is in our plans to actually continue doing that. Warren Parad (09:56) Yeah, for sure. So one question I have here is that my 20 years ago knowledge said that if I needed serialized work, use a CPU, and if it's very parallelizable, use a GPU. But obviously with hyper-threading and multiple cores, that statement went out the window. And I actually don't, I haven't followed the GPU architecture in a long time. So is this still an accurate statement or are there specific things that GPUs have been optimized for to actually be able to handle in some specific way? Jaikumar (JK) (10:22) Yeah, I think in the ML world For the transformer models GPUs have been optimized for a lot And you know you keep getting new generations of GPUs GP to GPU transfer speed is increased and you know your Interconnects are becoming a lot more efficient so that the nodes transfers doesn't have to go through so in the last three four years a lot of the GPU architecture is focused on making weight transfer efficient. The ML world, post-training world, etc. are much more efficient which is probably not the case during the graphics time when GPU started and then they started going for the crypto world and now it's for the ML world. Warren Parad (10:59) Well, there was this strategy with graphics where it's, you know, maybe you can just render this part of the screen. And so it's very easy to break that down. And for the crypto world, of course, it doesn't matter if you have any sort of alignment on what's being processed because realistically, it was all random in the proof of work world. Pull a random number from who cares and then calculate and see if it's a useful result. And if so, you know, great. You don't have any coordination required there. Jaikumar (JK) (11:18) Yeah, yeah. Yeah. Warren Parad (11:25) But now we're definitely at the point where it is required. So it is an interesting insight basically that one of the things that has been significantly improved in the last few years is the ability to scale up the coordination between individual GPUs, which I'm totally with you, who needs more than a couple of them max for personal usages, but for commercial strategies, stacking them in parallel, they need to collaborate in some way. So the old style of CPU or memory aspect was like a Beowulf cluster. Now, obviously we need something much more complex and having the companies who make the technology actually care about this use case means that they are investing and actually trying to improve it. that's an interesting insight that I wouldn't have guessed how they're actually improving. one thing that comes to mind with the CPU GPU breakdown is I feel like there has been this... Jaikumar (JK) (11:57) Yep. Yeah. Warren Parad (12:11) hypothetical that will break through this unnecessary aspect where we have two completely separate pieces of technology which sort of do similar things and are both useful in computing. I remember there being lots of releases like, no, you will only ever need a GPU going forward. Are we missing a hardware primitive that can do everything in a way where I know there's a lot of talk of these hypothetical AI chips, but I don't think it's actually a thing that Jaikumar (JK) (12:36) Yeah. Warren Parad (12:39) really exists so much as we understand different requirements for how the hardware needs to process things and have dedicated logical units for actually executing those specific areas. Maybe it's a, instance, in the cryptography world, performing this particular hash or signature function is optimized all the way at the CPU level. So my question is, where do you see this going? Is it that we are going to keep on living in this world where some companies produce CPUs, some companies produce GPUs? Jaikumar (JK) (12:59) Yeah. Warren Parad (13:05) and everyone has to pay a ton of money for both of these things, or is some company gonna come out there and be like, you only need one of these things, and it's not just a secret CPU plus GPU combination, but really something that is innovation when it comes to the hardware in computers. Jaikumar (JK) (13:19) I think Nvidia just announced it recently called the Vera chips, where the CPUs and GPUs are in a single chip kind of stuff. I see it more in that frame of reference where CPU, GPU get integrated. I see some parallels to what used to happen, say, in the Bluetooth and the Wi-Fi world on mobile phones. Bluetooth was a separate chip. Wi-Fi was a separate chip. And GPS was a separate chip. And then Broadcom and Qualcomm started integrating it all in a single chip. But each one had their own use case. Warren Parad (13:37) Hmm. Okay. Jaikumar (JK) (13:49) That started causing problems because Bluetooth and Wi-Fi was on the same frequency. And so many times, when you're in a Wi-Fi, just getting off your car and your phone connects to the home Wi-Fi, your audio in your car will just drop off for a second because of the interference. So Warren Parad (13:53) Yeah, for sure. Jaikumar (JK) (14:03) there is the chip to chip connectivity that is there in these newer systems, the CPU and GPU combined integrated chips kind of stuff, right? So you will actually get much more higher performance. Warren Parad (14:06) Yeah. That's good Jaikumar (JK) (14:15) Though I do think the bigger problem that's coming up is just the shortage of electricity for these data centers. Warren Parad (14:20) I hope this is a turnaround for green energy production strategies, ⁓ fission reactors, reinvigorated use and building up fusion reactors, given we know how bad wind is ⁓ and solar. I actually think they're just gonna keep on burning more gas. We talked with a couple of data center owners in the past and they told us there's lots of reservations already still available for energy. I think the... Jaikumar (JK) (14:25) Mm-hmm. Mm-hmm. Mm-hmm, mm-hmm. Mm-hmm, mm-hmm. Mm-hmm. Mm-hmm. Warren Parad (14:45) The goal of getting cheaper energy is just a ploy like, we're gonna run out. You have to make it cheaper for us. Those things aren't connected for me. I can totally believe at some point it's going to be an issue, but ⁓ the way I see it, and maybe it's super pessimistic, us lowly individual humans ⁓ still have lights on in our homes and electricity working for our refrigerators and dishwashers, assuming you have those. And until that electricity has been commandeered by the hyperscalers, There's still available capacity left. Jaikumar (JK) (15:14) Yeah, yeah, yeah, that's true. Warren Parad (15:15) I don't understand, and maybe this is my lax knowledge here, GPUs, why are they so much more expensive than CPUs? And maybe that's just not even a true statement, but that's how my understanding is. And like, I've seen the insides of the clean rooms for manufacturing CPUs and they always seem like the technology is quite amazing. It's very, you know, precision manufacturing and everything. And I don't remember the last time I saw a video of Jaikumar (JK) (15:29) ⁓ Mm-hmm. Mm-hmm, mm-hmm. Warren Parad (15:40) the actual manufacturing of GPUs. And maybe it's because it's a closely guarded secret that, know, Radeon and ATI and ⁓ Nvidia have been keeping and they just haven't shared, or maybe it is quite more spectacular. Jaikumar (JK) (15:46) Yeah. Yeah, yeah, yeah, yeah. it's a good question. So one of the are more expensive is just that the dice, Warren Parad (15:58) Silicon dies, Jaikumar (JK) (15:59) Silicon dies are much more larger for a GPU than a CPU. I think it's at least 10x larger. And that makes it harder to manufacture. If there is a single defect, then there is a problem. And there's also the integration ⁓ components, right? You need, like if you have a GPU machine as a result of which... Warren Parad (16:02) interesting. Okay. Yeah, for sure. Okay. Jaikumar (JK) (16:21) And it is specialized Vram, you need cooling for it. So you've got a bunch of these things that are components which are very specific to the GPUs. This will be a question for someone who actually does the manufacturing to see, are the... Warren Parad (16:33) Hint, hint, wink, wink for anyone who has the answer to this question. Jaikumar (JK) (16:37) Yeah, so Warren Parad (16:38) Okay, no, I just like, I've always seen the die manufacturing on the wafers and they're, you know, the dies are cut specifically and they're in the small form factor. And I think one of the challenges here is I think it's a similar fundamental challenge in the quantum computing space is you can't just make it fundamentally bigger. ⁓ like an innovation, there won't help a lot because you still have to get data from one point of the chip to a different point of the chip and size. can't just, so there are things that prevent it from being bigger. Jaikumar (JK) (16:54) Mm-hmm. Yep, yep. Warren Parad (17:05) And maybe what we're talking about challenges, just to focus it back on maybe the space that you're more of the expert in. I am sort of curious. So you've built up this, the platform and the open source libraries. There have got to be some fundamental challenges that you and your team have faced in actually spinning up Jaikumar (JK) (17:20) it's actually a thing because many open source packages are libraries. And we have multiple libraries and the core of the platform. And so the core of the platform have to work on different compute paradigms. They have to work on NVIDIA chips, ARM chips, Intel chips, Kubernetes, VMs, Azure Kubernetes, Google version of their Kubernetes, AKS, version of their Kubernetes, and now all the core Weaves and all those things. And now we talked about GPUs. Each of the GPUs are different. characteristics kind of stuff right and that's just on the core infrastructure itself then you've got your Python libraries and you have to actually make sure it works on a different CUDA version it different works on a different Python thing it works with a right numpy library it works with the right tensorflow library it works with the right pytorch library right so just making this whole ecosystem of Python repackaging work is really a lot of work in open source just more than I think it's easy to release a package. Just making sure it works in every single case and keeps working in different versions in different environments. There'll be some developer who will have some custom environment will change something and it'll not work for them and they'll say, hey, this is broken. And we're like, ⁓ we don't know why. And then we spend time investigating it. And there's Quanta, Forge, and all these packages. So yes, the open source requires investment. And honestly, as a startup, it's hard. Warren Parad (18:38) And it's not just that it doesn't work, right? Because that's almost like an easy problem to identify. Like someone's like, I'm using version one of this, two of that, three of that, and it doesn't work. You're like, OK, I can reproduce that problem. But then I think the real challenge is like, ⁓ we're offering an optimization solution for capacity utilization. So it's not just it doesn't work on this set. It works, but it's not as big of reduction as we want it to be. And then you're like, well, Jaikumar (JK) (18:48) Yeah. Yeah, yeah, yeah. Yeah. Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, Warren Parad (19:05) How big should it actually be here? So it sounds like you have to basically be continually performing benchmarks on the combinatorial nature, permutations really, of all the sets of different things that you could be utilizing Jaikumar (JK) (19:06) Yeah, yeah, yeah. I mean, from a process, operational process perspective, a bunch of regular release testing on various cloud providers, various combinations that's actually there. And purely from an operational perspective, the release team becomes extremely important because that's the team which is like, know, which is responsible for making sure these packages keep continuing to work all the time in all these combinational stuff. So we have actually got a very strong release team, in percentage-wise higher proportion. And so that is one part of the puzzle. The second part of the puzzle is we are still a startup, to be honest, so we cannot, we need to make sure we are focused. And so we work closely with the customers. And customers also have their own unique environment saying, hey, this networking setup is my layer seven stuff I need to connect over here. Or my DNS is having this issue. So I need to go talk to my security team. My security team will say, what is this, Ray? I don't care about you wanting to do XYZ. This is my policy. I'd here to do it. so Warren Parad (20:12) no, I totally get it. think something that is left often unsaid is that the company that you want to build and specifically the product you want to make is very dependent on the people you hire. if you have a particular mindset you're going after and it is about, I say, reliability or consistency in what you're pushing out, it sounds like you have a very close attention on how your release team is being built. And so my question for you maybe is any special tricks or anything that you're specifically looking at when you're hiring into the release team that you're looking for? Jaikumar (JK) (20:21) That's true. Yep, yeah. Yeah, honestly, it just depends upon someone who has a true passion for this work. It's very hard to find people who love this kind of work. you need at least one or two such people. to be fair, it's actually a lot of grand work. And so you need to have an eye for automation. And especially now with all the coding tools, Warren Parad (20:49) Yeah. Jaikumar (JK) (20:57) some parts of it becomes easier, but you have to understand the full space and say, what are the ways we can actually automate these things? How should we actually release these things? All that kind of stuff. because people who are not closely in the details don't see it. They don't see the complexity that's actually required. They don't see the one developer who is complaining. And they don't see like, know, hey, this particular test is failing or we cannot release it. They don't see the actual grunt work that's happening, right? So I think this is where ... Warren Parad (21:12) Yeah. Hmm. Jaikumar (JK) (21:28) It becomes a work of the leader to make sure their work is highlighted Warren Parad (21:32) my CEO frequently says, especially for these sorts of teams, when everything is going right and the teams are working effectively, which they're hiding everything else from the business, basically from the outside, it seems like nothing's happening. And so you're tempted to make a change to make it more splashy or... Jaikumar (JK) (21:47) Yeah. Yeah. Warren Parad (21:50) It wouldn't be great if things failed once in a while so that the team can get recognition for the work they're doing, but I definitely agree. It is a huge challenge there because to find those people that want to do that work, I mean, you called it maybe ⁓ grunt work or maybe thankless, but I'll say with 8 billion people in the world statistically, there is definitely some people who absolutely love that work. Jaikumar (JK) (21:52) Yeah. They definitely, there are people. Without this, the companies and software packages don't exist. Warren Parad (22:13) So maybe I'll ask you about that and you can feel free to say no comment. Every company needs to go through hiring at some point. And in today's world, I think that is an insurmountable challenge for many organizations, given the verbosity of the number of candidates submissions you get. How have you been tackling that to not only find qualified candidates, but given the nuance required to operate in these sort of special teams? that have such a huge impact on the business, that sort of thing doesn't just show up straight away on a resume that can be easily filtered out. Jaikumar (JK) (22:45) I think it all depends on the stage of the company. Like two years back when we were in really scaling stage, we hired a lot. Now we are not hiring that many. So say suppose you are on core rate, then the expertise that is required is different. So we look for people who have deep systems knowledge. lot of our hiring is based on referrals. kind of know the popular open source packages. is where open source is really beneficial because many companies use Ray. So they come and apply to saying, hey, I've already used Ray. I met you at your conference and stuff like that. But then there are also some gems which come in cold, outbound, recruiting, et cetera. You just sometimes have to think creatively. So we were doing international growth at Uber. So growing Uber's business outside US in Latin America, Southeast Asia, India, China, et cetera. And when we were growing China and India growth team here in San Francisco, we were looking for people who had lived in those areas because then they understood the problems on the ground because Uber was very much. It's a physical thing. You need to understand that local place, right? We had operations team. So we were like targeting. So the we did it is we actually put an ad for hiring engineers in a movie in Indian and a Chinese language movie at the AMC theater. We went to a social festival and we put a hiring booth. And all our nearby booths were all about foot stalls and we were the only hiring booth and we actually got good candidates. Warren Parad (23:45) Yeah. wow. Jaikumar (JK) (24:04) So what I meant is sometimes you just have to be creative in that sense. ⁓ Warren Parad (24:08) No, I mean, that's genius on honestly, I think it's the same research that applies to why hiring diverse teams actually guarantees you to get more talented people. think hiring that, I mean, it's still an aspect of diversity, but if there's no one else from none of your competitors or even in the whole industry are hiring from particular venue, you being there gives you a statistically outsized chance, not even to finding people, but a finding expertly good people who would not have gotten picked up because they're not using whatever. Jaikumar (JK) (24:17) you Mm-hmm. Warren Parad (24:36) LinkedIn, social media, or other mechanisms, because if they had, they would have seen jobs by your competitors and got those up. So I love that example. I mean, don't know if I would go through the process of buying an ad for a movie, but if that's you, if there's an adjacency there, video stuff, video editing, I totally see the overlap. Seems like genius idea, honestly. Jaikumar (JK) (24:41) Mm-hmm. Mm-hmm. Yep. Yep. Yeah, exactly. Warren Parad (24:57) I would be remiss if I didn't harass you a little bit on something related to Android, ⁓ given your historical experience there. And my question is going to be, I think, did it have to be Java? Jaikumar (JK) (25:03) Mm-hmm. Did it have to be Java? I mean, did not. I think if I remember right, initially it was JavaScript. And then we moved to a Java-based VM at that point of time. Now even Kotlin is supported did it have some kind of a deep dislike for Java? It looks like it. Warren Parad (25:25) since the university long time ago. if I had to pick a preference between submitting myself to the Oracle derived mindset and language of the world or Microsoft, think unfortunately I pick Microsoft. Still not my preference for coding languages for sure, but I do like C Sharp way more than Java. Jaikumar (JK) (25:40) Yeah. Yeah. Yeah. Yes. I mean, there were a lot of controversies between Google and Oracle over, know, because Android had created its own VM. was not part of the Java VM team, but my teammates were deeply involved in the lawsuit, et cetera. Yeah. Warren Parad (25:56) wow. Yeah, so I love this lawsuit because it's so ridiculous that the outcome was that open source software is copyrightable, but Google didn't violate the copyright because their API didn't match. Like the API interface is copyrightable, but they didn't violate it because it was different. That is just so ridiculous to me. yeah, I think, ⁓ so does that mean that you're a Java fan? You love the JVM. Jaikumar (JK) (26:05) Mm-hmm. Mm-hmm. I think every language has a purpose for its use case. So for example, Go was used in large-scale distributed systems for a reason. And there is a reason why Python is used in the machine learning system, because of the number of library packages and data scientists and engineers being comfortable with Python and it's just easy to get started with. I mean, if you were to ask me my favorite language, would actually pick on C, because that's what I grew up learning and I still love that. Warren Parad (26:26) Okay, okay, good answer. Mm-hmm Yeah. I don't know what that is. Jaikumar (JK) (26:47) But you know, so maybe there's a diplomatic answer, but I'm also not one of those language ZL also says, everything else is terrible. My language is the best kind of stuff. Programming languages, honestly, I think all the languages conversations will just go away. With AI coding agents, natural languages are the key. Warren Parad (26:58) Yeah. Yeah. we should have a fight. Jaikumar (JK) (27:09) long pole has moved from pure engineering to what to build. Warren Parad (27:15) See, don't think the poll was ever in engineering. I think it's always been in what to build. just think historically we sort of lied to ourselves that we knew what to build and ⁓ pushed it on engineering. And then since the cycle time was so long for engineering to build stuff, we could wait to verify our hypothesis that we didn't spend any time thinking about. But now that excuse has gone away. Jaikumar (JK) (27:38) Yeah, you're spot on there. You're spot on there. I don't disagree with that. Warren Parad (27:41) Yeah, Now it's like, ⁓ crap, we built that? I guess I have to take the next step now. So... Jaikumar (JK) (27:45) Yeah, it's true. I mean, yes, yes, I don't disagree Warren Parad (27:48) like your diplomatic answer because I think there's a couple of different aspects to it. I think one of them is that fundamentally each language has its deficiencies. Maybe there's also its benefits. so matching it up with the use case is required. I disagree with the train of thought that all the languages are completely isomorphic or interchangeable because I do see, like as you said, There's no alternative for Python for machine learning because you needed something sitting on top of R to do all of the quantitative analysis. And so you got the interface with Python. And then that's grown over time. it really only the last couple of years since ChatGPT did we start to see other interfaces pop up for other languages. But no, I'm totally with you there. So I like that flavor. I think that's Jaikumar (JK) (28:36) engineering leaders need to think also beyond engineering. I think it becomes a little bit of the onus on the engineering leaders to make sure your sales team, your customer support team, your other teams. are also using AI and getting the benefits out of AI and how that is actually set up so many times you have to hold training sessions for them. many times groups can get siloed, and especially with AI and agents and the state of the company, ensuring leaders have can actually play a bigger role than just being stuck in their bubble. Warren Parad (29:02) I think you said something interesting here, which is it sounds like we're shifting back the responsibility of building to where the decision can be made, where the knowledge is. So if it's the sales team that or marketing team that wants to build something, they now not only have the responsibility, but sort of the obligation to make that happen. And so my question is going to be, how do you actually train them to build software in the reliable way that we believe that has been a of a lifeblood of engineering for such a long time. I mean, I know there's so many engineers out there still who believe, no, no, no one else can do it exactly as I can. I know how to do the special thing myself and no one else can really make that happen. Jaikumar (JK) (29:32) Yeah. Yeah, yeah, yeah, yeah, yeah. Yeah, it's a very good conversation because just two days back, my security engine leader came in and said, hey, this person in sales used $15,000 worth of token, and I sat down with them, and they just didn't have to spend the whole context, they have done the whole thing in like $15 kind of stuff. maintainable systems is actually an art. And this experience matters And agents will not necessarily so sales marketing so try can have agents to Improve their workflow make it much more efficient kind of stuff But then you also have to be very very clear that if there is something they're building in production who is responsible for it Many times it says so I built it. It's broken. please. Can you help me in there you? Warren Parad (30:00) I like that framing. Jaikumar (JK) (30:18) Why did you build it without talking to us and all of that stuff? But there are also platforms which are created to actually solve this problem, help solve these problems, where you don't have to make it easy to develop agent software without having to worry about the infrastructure, et cetera, kind of thing. Warren Parad (30:34) I think I'm gonna skip ahead a little bit and I really like your framing. I think it's that we'll eventually lose the engineering team with this mentality. There will be no one to double check what is happening elsewhere. Because if everyone can build things, what is engineering really doing for us? And maybe we'll change the name. Maybe we'll just lose that reliability that we have in our organizations. But the idea that comes to mind really based on what you're saying is it sounds like we need to take the original idea of what DevOps meant. Jaikumar (JK) (30:36) Mm-hmm. Mm-hmm. Warren Parad (31:00) breaking down the silo between engineering and release basically, and have everyone in the organization understand really fundamentally what DevOps is, that if you write this thing, you will run it. One struggle I could imagine is that the mindset of people who haven't historically built software wasn't necessarily on reliability, and teaching them reliability. Jaikumar (JK) (31:07) Yeah, yeah, yeah, yeah, yeah. Warren Parad (31:21) Is it easy or could it be a challenge? Do you even have the right people in those positions to do both the job they have been doing, sales and marketing or whatever, design, et cetera, literally anything other than engineering, to also be responsible for the reliability work? Jaikumar (JK) (31:25) Yeah. Yeah. I don't think we should teach them reliability to be honest because teaching reliability, there's a teaching part and there's experience. teaching is enough is you need to have the battle scars even with agents even with agents you need to have the battle scars agents help you a bit but you know agents can cause other kinds of outages without observability so you need to have a strong observability team, strong production infrastructure team, release and engineering team. think sales folks can should create workflow automation tools that they run themselves but not production software Warren Parad (31:45) That's a good point. Yeah. Jaikumar (JK) (32:04) they should have a lot of automation tools that makes it easier for them to do research on a customer and be able to generate the right insights for the customer. Like all the examples we talked about is data processing the problem for them, is serving the problem for them, training the problem for them. Those kind of research, Cloud or Cursor can easily do it for them. And that's what workflow automation is what they should be focusing on. I would strongly resist the urge for an engineering team to build a sales specific tool and then making it reliable especially at a startup. It's the buy versus build sometimes. In these cases, spending some money to buying the right software is much more useful. Warren Parad (32:38) It's so hard to convince startups or those companies with very little money not to just do a thing that seems like if they're at the pinnacle of ignorance, you know, doing that thing and handing it off. And, but I mean, I think you make two really good points that you have to think about the job functions that are required or really roles responsibilities in your company and look at the tools that they're utilizing. And if you find people who need feel like they need to build something, they don't have the skills to really do that. Jaikumar (JK) (32:45) Thank Yeah. Warren Parad (33:03) then look at what tool they're using. Like maybe the wrong tools are being evaluated and handled there. And you also shouldn't build those yourself, but you know, maybe switch off of the Salesforce and SAPs to the little startup companies that are doing the exact thing that you you want your people actually utilizing. so the second thing that I really spoke to me was that in order to understand how to build reliable software, you need to have some sort of PTSD in your past. Jaikumar (JK) (33:08) Yeah. Yeah, yeah, yeah. Yeah, that's That's true, that's true. Yeah, yeah, yeah, Warren Parad (33:28) of the trauma that you've seen building something and struggling and being on call. I Jaikumar (JK) (33:29) yeah, yeah, yeah, yeah, yeah. Warren Parad (33:34) think there are those that learn through others' experience and those that learn through books and knowledge. And some of these things I think are very difficult to teach without having to deal with it yourself. Jaikumar (JK) (33:42) Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Yeah, no, it's true. It's Warren Parad (33:45) and now's the time to switch over to PIX. So JK, what did you bring for us today? Jaikumar (JK) (33:51) All right, so there's an interesting book I'm actually reading called The Explorer's Gene by Alan Hutchison. So it goes into the fact as to when should you explore and when should you exploit? Why did humans adventure out even in the early days to new, newer lands? Why did some do it and some not? And many times after adults reach an age, they stop exploring. So Warren, how do you navigate? You're sitting in a car. Maybe you're using Google Maps or Apple Maps. And you have to go from point A to point B. How do you navigate? Warren Parad (34:20) I'm so glad I haven't had to drive in almost a decade for real. So that's one of the benefits of living in Switzerland. ⁓ But you're still planning a path. I totally get you. I mean, my strategy now is like try to memorize the directions beforehand, like look visually at it. And then while I'm in the car driving, I constantly am second guessing myself like, wait, I was supposed to turn already, right? Shouldn't I already turn? Shouldn't I already turn? So. Jaikumar (JK) (34:27) Mm-hmm. Yeah. Yeah. Warren Parad (34:44) You know, if I'm already in that position, I am, am suck. ⁓ have the worst on there. So now I've just, you pull out your phone, you have it attached, you have GPS going with the maps or whatever else you're using, and it will tell you the turn by turn directions. And it's still not good enough for me because I want to know the thing that happens afterwards so I can already prepare myself mentally. So I guess I'm just complaining about the state of map driving today. Jaikumar (JK) (34:53) Yeah, yeah. Yeah, Sure. Yeah. Yeah. Do you zoom in and zoom out before starting your car? you zoom in and zoom out? Warren Parad (35:14) so many times while I'm driving, I want to zoom in and try to understand what's there. How many lanes are there? Am I going to have to get in the right? Because there are so many times, especially in the US, where you're driving and it's like, yeah, get in the right lane to turn right. But then the next instruction, don't tell you, is like, and then get in the left lane to turn left. And I'm like, I wish I knew that because there are five lanes. I wouldn't have picked the rightmost lane to get in in dire traffic. Jaikumar (JK) (35:19) All right. Mm-hmm. Yeah, Yeah, yeah. Yeah. Why do some people do it and some don't? Warren Parad (35:40) paranoia i guess Jaikumar (JK) (35:42) So here's an interesting thing. So say for example, let's since you said Switzerland, suppose you're going on a trip from Bern to Berlin, right? This is my personal habit. If I'm going, OK, Google Maps has given me direction. And I'll zoom in and zoom out and say, oh, these are the freeways. Oh, maybe there's an interesting route over here. Oh, OK. Even though I would not take it, but mentally I'm like, oh, here's the thing. OK, now I'm going close to this thing and before I even start driving. And my wife has a different style. Warren Parad (35:48) Yeah. Jaikumar (JK) (36:04) when I'm doing that, it'll annoy her. And she's like, hey, here's the route. We are going point A to point B kind of stuff. And I was like, OK, maybe the different styles. Until I read this book, and it actually says the different parts of your brain are activated. In the first style, it's the hippocampus, which is doing the job. And in the second style, it's the claudate nucleus, if I'm pronouncing these words right. for any of the neuroscientists in the audience, it's that part of the brain that's actually doing the work kind of thing. And so which one is an exploration part, which is the exploitation part kind of stuff. So there's a fascinating section about this. And I was like, interesting. And so that kind of explained why some people picked this versus some people always want to do it. And when I was a kid, I used to draw maps by hand. Warren Parad (36:24) Neuroscientists, yeah. Jaikumar (JK) (36:47) here's it this is the route this is how I'm going to plan my city and stuff like that so it is yeah so highly recommend the book Warren Parad (36:53) Maps are such an interesting topic as well. I just feel like unless you actually went out and tried orienteering, you probably would not survive in the wilderness today, ⁓ especially given the level of technology. Imagine if it all went away. It sounds like a really fascinating book. actually now want to add it to my read list. So thank you for that. Jaikumar (JK) (37:03) Yeah. Yep, Yeah. Warren Parad (37:11) Yeah, so my pick, ⁓ maybe it's just not as inspired as that, honestly. So there's one particular post in this whole collection called Archers Don't Fire Volleys. I guess that's sort of my pick. The collection is called The Collection of Unmitigated Pedantry. It's article series online by I think like a Greek Roman scholar, basically. And the interesting thing is that, Jaikumar (JK) (37:32) Mm-hmm. Warren Parad (37:34) It talks about how basically everything in popular culture when it comes to references, medieval and Romans, specifically regarding combat, is just like so totally wrong. Archers don't fire volleys. There's no like get ready aim, fire because firing arrows is actually incredibly taxing. And if you've ever done it, you don't sit there with your, the arrow hooked back, holding it for minutes for the perfect opportunity. The other thing is that arrows actually don't, ⁓ weren't used to kill people. mean, you wouldn't expect that there'd be a lot of deaths as a result of firing as an archer. It was most of like disorientation. Maybe you hit people and their heads or they're wearing armor and legs and they're trying to guard themselves. And so it's delay tactics. And after that, now when I watch stuff, just, more of those things where I'm like, it always is like, ⁓ they're typing on the keyboard. That's not how you hack stuff. Now I have to be annoyed whenever I see. Archer was volleying. I love this collection though. There's like, there's so many things in there and I'm sure one of these other articles will be my pick later. And it's not like a short little post. Like there are 20, 40 minute reads basically that explain every aspect of the architect, like the technology that they had at the time, why it was used, how the battles were actually waged and why those things were wrong. I just, there's almost too much there to go through, but it's written in such a great style that it almost... Jaikumar (JK) (38:32) Yeah. Mm-hmm. Mm-hmm. Mm-hmm. Mm-hmm. Yeah. Warren Parad (38:55) makes you want to keep on going and become an expert in that topic. And I just think back, like, if my teachers gave us reading material on different topics like this when I was in middle school or high school, I probably wouldn't have been a software engineer. Jaikumar (JK) (38:59) in the dark. All right, I have a different question for you. What's this painting behind you? I'm guessing it's a painting. Warren Parad (39:14) It's nothing, it is a painting. It's acrylic with molding in it of different kinds. It's abstract art. It's actually, you are the first person I think on this podcast in years to ask me what this is. These pieces behind me were all done by my wife and a painting is what it is. If it could be conveyed then... you wouldn't need to paint it. You could just write the words there. So it is what it is and that's it. I mean, you could say there's like a dark side and a light side filled in whatever, know, it evokes a feeling or an emotion and that's all there is. Jaikumar (JK) (39:45) Yeah. Yeah, think the colors are fascinating at least, the way I see it in the video. I'd love to see it in real life. Yeah. Warren Parad (39:54) You know, maybe I got to take a picture of this and, you know, put it up on the podcast so people can actually see what's on this wall behind me. I, to be fair, I went through a couple of different ones, which I, we talked about, like, which ones was I comfortable having on the wall behind me for video calls. ⁓ And this was, or she'll be mortified that I brought this up. No, I actually, I love this painting. It's one of hers that is my absolute favorite. Jaikumar (JK) (40:06) Your wife is going to be happy listening to this section. Warren Parad (40:21) Well, thank you, JK, for coming on to this episode. It's been absolutely great having you. Thank you so much. Jaikumar (JK) (40:27) Same here, it was fun talking to you about everything happening in AI landscape, release processes, bunch of things. Warren Parad (40:31) ⁓ I'm glad to hear it. Thanks for all the listeners for tuning in for today's episode of Ventures and DevOps. Hopefully, we'll see everyone back again next week.