speaker-0 (00:07.778)
Welcome back to Adventures in DevOps, where we frequently question if a spreadsheet can be a microservice. Today's guest speculated over twenty years ago that the web was being designed suboptimally, where it only took four years of maintaining an Oracle ERP system to burn out, found a new company on his speculations, and become successful. The CEO of Ragic, Jeff Quo. Welcome to the show. Hello everyone. Yeah, so I gotta say, what old what it what was old is new again. You were advocating for the semantic web

Twenty years ago or so. And now we're basically back to it.

speaker-1 (00:39.438)
Okay, yeah. Semantic web was my the the area of study of my master's thesis like twenty five years ago. Wow, okay, that's my master's thesis was twenty five years ago and it started out as some studies in AI and and semantic web and it just kind of evolved into this kind of prototype, like creating a system that can be semantically integrated with others and while being really fast.

to build. You can build simple database systems really quickly using this kind of technology. So that's kind of the the background of of that, what also l in later years turned into Reag. So yeah, in my in my master thesis defense I I even demoed a prototype, the like the very earliest version of Reagic there. Yeah.

speaker-0 (01:29.826)
I it's really interesting because it reminded me a lot about what is being used even in the last decade or so as like JSON LD in like metadata headers within HTML pages. Like I think it's this real real idea where programmatic systems do not want to have to scrape HTML and figure out like what is the important parts of that document. And I feel like it took until a bunch of AI companies were harassing every single website out there for understanding the semantics of what's

What's the point of the company? What does the product do? All the documentation pages for someone to stand up and say, like, we need to do something better.

speaker-1 (02:06.626)
But with the with the L L AI coming, it just kind of in a way erase the purpose of the semantic web because you know agents they can't understand the semantics and they can aggregate all those they don't really need the oncology and to to to to categorize them. They can just read it and categorize it in their in their models. So yeah, that's something we we never ex really expect.

speaker-0 (02:35.118)
Can we fight on this on this point? I I want to think sure. I want to take the opposite perspective. I feel like I see I've seen in a bunch of the IETF working groups a bunch of companies that quote unquote are AI companies struggling to parse effectively HTML to actually truly understand. So correct parsing is a is an issue. I think we end up with a lot of hallucinations because it is there's no like correct answer that can be ex like extrapolated from that. And the other one is token costs.

that companies are trying to reduce as much as they can what they're consuming. And so the d there's been a desire to shift from human visual language, like a U UI or GUI, to using like LLM's text or something that was basically markdown for for processing because it's more optimized for actually I hate using the word understanding, but I guess I'll use it here. So that LMs can understand what the page is actually doing. So I agree that there is a

capability to completely parse what's there. There's no reason to necessarily change. I feel like there is a quite an opportunity to really go back and challenge the everything we've built up and since the dot com bust about how we're doing it and instead go back to those roots that you sort of identified and really from that perspective, like what should actually be contained in a web page in order to make it usable for the entities that interact with it.

speaker-1 (03:59.394)
Yeah. Well I haven't really been in the semantic web field for many, many years. But I I I do agree that in in detail it's it's still different from you know, having the the correct ontology set with LM just parsing it. But I do kind of believe that LM can be immensely helpful in automating this process to to generate these graphs. But of course that will cost a lot of tokens. So yeah, it's

It it's kind of like half the answer. Yeah.

speaker-0 (04:31.576)
So took this really the the goal of the semantic web, which I I don't know was entirely realized in the in the public, and you converted it to and forgive me for my lack of understanding here, it seems like a much better version of Airtable is what you've built at Ragik.

speaker-1 (04:48.354)
Yeah, that's kind of kind of the intended effect and it we're not really happy about this. We actually started before Airtable and they got so successful. But anyway, but and yeah. we we we we are under the same assumption that and spreadsheets are kind of the de facto interface that business people like to interact with data. So in the very beginning, in the very beginning, Ragic is nothing about spreadsheets.

So in my master's thesis, there's nothing about spreadsheets. We we adopted the the Species interface like two or three years in when when we s when we started to really commercialize this this product. So the spreadsheet interface was kind of added later to make the whole database building process easier to understand for business people. Because when we are demoing our product,

They also they they always say, can you do this just like on Excel? Could you like drag this and could you like click that and and use hotkey to change focus on different fields or okay, so you're basically asking me to build an a special interface. So okay, so why why don't I just build a special interface and every everybody could be happy? So yeah, so we when we reach the market we gradually and realize that, you know, this is kind of what people want. And we're just lucky that w

Underneath it we have the data structure, the flexibility to support something like this.

speaker-0 (06:16.898)
Yeah, no, I could totally understand that. I think this is one thing that a lot of products get wrong when they are coming up with a new model, is they are envisioning what the user experience should be, and users aren't necessarily comfortable or would have to go through the process of learning that. So it's it's sort of like a it's like an unfortunate conclusion that you're like, we have the perfect, you know, user interface that would be beneficial here, but they're stuck on a a legacy understanding of how it would work. so in your conversion from going

from whatever interface you had to more of a spreadsheet, like did you ever consider having multiple different layers where you could optimize for the particular workflows specifically and let users decide if they want the sort of first level spreadsheet interface or if there's like a more workflow centric view and like let them switch back and forth? Or did you just go all in on like, well, no, we're spreadsheets, you know, may actually be the right answer.

speaker-1 (07:07.362)
Yeah, we basically went all in saying that no Sprechy is the answer because for most database builders, like traditional database builders, you can say Microsoft Access or FileMaker, they they usually have a two layer architecture. So they have a database underneath it and you can design the tables and you could kind of write the SQL or they generate a SQL for you and you design the the interface pages. So

it's usually like two layered and we we feel like and this is kind of unnecessary complication for non technical people because for them it's the same thing. So we would like to have them to manipulate the data model and the interface at the same time. Because for in in their mental model it's it's the same thing. So we want to match how what they think about the date the data. So especially in the beginning

It it's kind of interesting that technical background people who have written database applications first get a little bit confused when using Regix. So okay, so where do I create the tables? How do I add the fields? How do I add an index? So it's a little bit confusing for technical people, but for business people, for some people, they just get it like real really quickly and they just don't worry at worry about

like changing from one to one to one to many relationships. But but database people they'll say, No, you you can't do this. That that's not right. It's always that but then they'll but then they're gonna be be like, w whoa, this just how the hell do you do that? Just because it's just structured differently.

speaker-0 (08:51.05)
I think this is the challenge for technical and engineers or even technical business people who like really understand how data works because you know that there is like sort some sort of lie that's been propagated that's unavoidable. And it's like most people don't need to know about that, but they get stuck on like, wait, but how can I do this with my data? Like, are you doing something weird? It's like, yeah, actually we're doing something weird. I mean, it's clever and weird to deal with the business problem. so you you must have had quite the

Opportunity to learn about those weird oddities that you'd have to put into your product over time, especially building up spreadsheet services are not are like not simple. I mean, the UI looks simple, right? That's the sort of lie we tell ourselves that, the implementation must also be simple. But since you've had such a long go at this, have there been some cornerstone cases or challenging or controversial things that have happened in the last two decades or so? Good learnings or challenges that you've found with the architecture that you've built up?

speaker-1 (09:47.468)
Yeah, I think the biggest difference between Ragic and other database builders like traditionally like Microsoft Access or FileMaker is that it's not based on SQL. It's not based on database tables. It's not relational. Okay. So it's basically based on graph. Like because it came from semantic web. I I d was designing it as a graph. So it it has a lot more flexibility. So you you don't really need to change the data structure.

when you go from one to one to one to many or change it to many to many. So w that's one of the greatest strengths. But it's just really hard to implement. And in the beginning, I was just I was just a a master student, graduate student. So so it it took me quite a long time to to get it right. So the the first version, in the beginning I I tried to

Remember this is before spreadsheets. So we're still trying to build this super flexible data model, graph based data model that can work on any kind of business data. So so in the very beginning, I I I tried to use Hibernate because at that time it's kind of like the the the standard. So everything well if you have theta you should probably use Hibernate and O O R to do the mapping and do everything. So I use Hibernate and I try to implement it with Hibernate and it doesn't really work.

Because yeah, because I'm what I'm trying to build is to customize as you can if as you can imagine, if you try to build a a flexible database builder, it's not gonna be ri that easy to make all those kind of changes in Hibernate. So I think okay so I I I have I have to write raw SQL. So I began to I changed to H SQL, hypersonic SQL. It was a like a tiny database that I can embed in the application. So I began to write it in SQL in a lot of

joins. So I I began to write real, really large SQL and try to write code to generate a lot of almost like recursive joins to to search through the data. And that kind of worked, but and that created a lot of performance problems. So there there was kind of a big prop performance and barrier that that we're trying to break. So there's like one day I was just talking to some of my tech friends and he was talking

speaker-1 (12:11.734)
about how great this this this thing, the Berkeley Berkeley D B, how great this D B that a lot a lot of people has never heard of and he has he has been saying, how great this is and I was like, Okay, I'm I I'll try this out. So so th so then I think yeah, because I I think it solves a lot of my problem when doing these queries on the graph. Because at that time there's no tool like Neil for J that a graph database. Yeah, we don't we don't have that. It's it's

quite a few years after my initial development. So I began trying out Berkeley DB and and slowly to to to see that this could could actually solve our problem. So because what we need is a lot of very, very small quick queries to the database. We need to quickly query a lot of nodes on the gr on the graph and query like what links they have and if these two nodes have links. So

Every single one of those queries are very fast, very simple, but we need to do a lot of them. So wh how we end up doing is that Berkeley D B can be embedded into the the there's a Java edition can be embedded into the process. So it's actually all those queries are l are just API calls and within the same process. So it's super fast. Yeah. So but after we we tried to adopt Berkeley DB, the the performance was like

ten a hundredfold from the original ones using the the SQL. So it was it made a huge difference. But but Berkeley D B is not exactly a a full database. It's like half a database. So the the problem is that you have to basically implement half the database.

speaker-0 (13:57.908)
well, I I think today there still are no graph good graph database options available. And so have you switched off of using the Berkeley D B or are you are you still using that to power most of your technology?

speaker-1 (14:09.336)
We're still we're still using that. Yeah. If it ain't broke, don't fix it. Which it it took there are s still lots of weird stories that we we face while using Berkeley DB trying to fix that. There there's still lots of problems because you know, when you're building half a database, there are just tons of unforeseen issues. And I I was just a gr gr grad student, so what do I know? So I thought, wow, this works. It it it's it's like ten times, a hundred times faster. I'm I'm dedicated to that. So

I I I just feel that this is the greatest thing. This will this will actually make our product commercially viable. So yeah. So we stick to that and we we've been sticking to Berkeley D B until until today. Yeah.

speaker-0 (14:50.836)
If you had designed the the company and the the product from the ground up starting today, would you st would you go for that again or would you consider one of the other maybe a like a a relational data model or even a key value store for storing the graph nodes in or another or an actual first class graph database provider?

speaker-1 (15:09.016)
Well Berkeley D B is actually basically a name value store. And honestly, yeah, I would probably still use that. Cool. Because yeah, even if I use something like Neo4J, like this already graph based, but because we write our own in the indexes, we we design our own index. We could do s all sorts of cool, weird things with it. So if we because we have so much low low level control of the database, we just it just

gives us a lot more flexibility. But it it it takes time to work through all those issues and and oddities of the of writing your own no ha writing your own index, writing your own own half a database. But we've worked through that and it it turns out pretty well. So so we we do think it's kind of worth it. Yeah.

speaker-0 (15:59.662)
It's really amazing. Like for every for every story that there's, you know, we successfully built basically a database. There's a hundred more of that was the worst mistake we ever made as a company, thinking that, you know, we could just spin up a file system, you know, write some blobs to it or blocks to it, and we'd be totally good and and not run into problems. I mean, it sounds like, you know, in your scenario, given the nuances that are contained in how the product actually works and the specialty and the experience.

that this is one of those areas where it was much more valuable to actually learn gain that g knowledge and and insight and go from that. And you know, it's interesting you bring this up for a couple of reasons. The first one is that my company has a has a product where there is relationships between nodes that we want to capture. And we evaluated all of the Graph DB products and all of the relational databases and key value stores and stuff like that to figure out how to store stuff. And we we had a very complex model to start out with.

And when we were evaluating stuff based off of like P ninety nines of what customers needed or what we even needed to support queries and whatnot, most graph databases would be slower by a factor of magnitude than performing repeated queries or joins on relational or multiple queries to a key value store. If things were simple, you maybe can get away with that one clever query or three clever queries or a clever join. And while

Every engineer who looks at it is like, that's disgusting. I can't believe you would have that in your source code. the reality is graph databases were never designed as like speed optimal things. And so you end up with this huge challenge of if you want your solution to be fast, you do an ugly thing in your database. And I feel like if you're met with those two options, both it's gonna be ugly if you do this with a relational database or a key value store, and it's gonna be, but it's gonna be faster versus it's gonna be slow, but maybe it's slightly better.

But we can make it even better if we if we write our own database. You know, I I'd be really curious to know. I mean, you basically have the experience now to potentially go out and just write the whole database from scratch. Have you ever considered making that final leap, like using the Berkeley DB as sort of a starting point and reconfiguring it in a way that would be optimal for the the business?

speaker-1 (18:11.128)
You know, with today's AI assisted coding, I don't think that's that far off. The the honest take is that we don't really see a a huge need for that. Well we we see a little bit of demand for for something that's like inside Berkeley D B that we cannot change. But I I do think that for now we don't w we still feel it's kind of still kind of a big effort and yeah, w we we don't really see the benefit there yet. But maybe maybe someday

Maybe someday we we will. But but really like a few years ago when we were tackling with all those problems in in Berkeley DB, and yeah, we were thinking about come on, w why why don't we just write our own write our own database? Because there were some odd bugs, some pretty weirdly serious bugs in especially earlier versions of Berkeley DB edition.

speaker-0 (19:03.67)
I I mean please please share any Yeah, no, please share. Anything that comes to mind.

speaker-1 (19:08.088)
Yeah. Okay. Well yeah, one of the most serious one in in the earlier version of Berkeley DB is that when the database gets large, like hundreds of gigabytes large, and and when it's under like a lot of load, somehow we we cannot reproduce this, but we we know that it's under high load and a lot of data. It deletes the wrong data file. Like, yeah.

speaker-0 (19:32.5)
No, no, you don't want to hear that.

speaker-1 (19:36.362)
It deletes the wrong data file. And that is just that is the the biggest problem that that we we face. Like it's it's for for the before the version four. So right now we're using version seven. So the version seven's we've never seen this on version seven, but on version four it it just hits us like once every few months and it just

horrible because we we have to use like automated backup to restore or try to, you know, fix the data. So yeah, we we even try to write our own fixer f programs trying to fix this, but just it's just so difficult because they delete the the wrong data. And yeah, it it it just helpless when it actually deletes it. So in the end we we we came up with a solution is that on those version four databases they they have a flag is to

not actually delete data files. Just mark it with a s a suffix, a dot DEL. Okay. And you write your own program to delete it. So in the end we we we end up using this flag. So Berkeley DB never deletes any data file. And we just periodically scan the whole database and data file folder and move the the file with the DEL and suffix into another disk, another larger, cheaper disk, and for a few days and

We make sure that the database is still working all right. And if we find any of those issues, if that aha, I know this file is fist missing, so I can go back to that disk and find a missing file. Yeah, and this this mechanism has been a lifesaver. It's just a a very wacky solution to a a bug that we cannot just fix for them. So yeah, that that saved us a lot of time.

speaker-0 (21:25.442)
I think it's ingenious, honestly, as a as a solution. And I I think anyone who's questioning this has to realize that there is no technology that is 100% reliable, one and for databases 100% durable. You're gonna get a failure in some way. And usually it's around the writing to the journal and then or through the replication process. And then if you have a failover event or a critical failure in your main DB cluster for if you have a like a main writer or write node and you and the rest of them are readers, like when you fail over there.

There tends to be stuff that is in the journal but isn't written or isn't even in the journal because it's still in memory and process. And how do you deal with that? And so the fact that you like, you know, you found those, you ran into that issue and you thought about it consistently of how to solve it in a way which doesn't cause any sort of data corruption. I mean, it's not great that you were forced into it by, you know, what seems like a bug in in the in the software.

speaker-1 (22:18.222)
Yeah. Yeah. It doesn't it doesn't only happen at a checkpoint. It doesn't happen only w on replication and just randomly, seldomly, deletes old some some data file just by mistake. Just it's just very, very painful. That's

speaker-0 (22:36.632)
I read all these posts today that for one reason or another that we're not going to get into, decide that going to an on-prem solution is the right thing to do off the cloud. And we we can get into this in in this episode. But what I wanna what I wanna point out is that they often say, yeah, the capital the capitalization or the capital expenditures rather than the opex for buying data center resources is cheaper in the long run if you know what you're doing. And I question

Do you know what you're doing? Because are you prepared to deal with it's not just hard drives failing, but neutrinos flying through the air and flipping bits in your non-ECC RAM or on your hard drive for your database cluster? And these things will happen. And you probably aren't designing software with that in mind. Like most people that are doing this haven't been running data centers. Like it's one thing if you're in a cloud provider and you're like, you know what, I'm gonna quit. I'm gonna start my own company that's its own data center.

data center as a service in a particular region and I have the experiences of building and running that not just the hardware part but the interface layer, the software. So I know what to expect. But most companies and most people who work with them, they for sure don't. So what I will ask is are you running in a cloud or is all your technology on prem somewhere?

speaker-1 (23:53.28)
No, we're we're running on cloud on cloud. Yeah. So even wh wh when we're running on cloud, we also have the on prem version for our customers to use. But it's still kind of I do feel like the on prem is kind of the safer well, the cloud is the safer solution for them because no, when they actually see a problem like the system magically deletes a a data file, I'm I'm there to help them for the on on the on the on the on on the cloud. I I went through like

two three almost sleepless days to trying to salvage their data, but they don't even have the ability to do that if they wanted to. So it's it's still nice to have someone to help help you out help you out with that. And and for infrastructure we are also experienced a and some pretty incredible failures there. So yeah. On each layer there are just some difficulties. Yeah.

speaker-0 (24:47.97)
who wouldn't want their the the CEO of their vendor being on call to respond to every critical incident that they're you know, you know that there's a problem that you're waking up an executive to go debug that issue for you. Like it that that's just another level.

speaker-1 (25:01.474)
Yeah, but it just I I really can't go to sleep knowing that that they're d they have data corruption and and and they can't access their their their data. It just just just it it's not right.

speaker-0 (25:14.646)
I I know, I I totally understand. I I can totally commiserate with you. well I may not be able to do anything to help. I feel like I have some responsibility to be awake and field support calls or even translate or communicate with that customer with what's actually happening at that moment. Because I and I know some people that are more better at this than me, who've like calm composure, do not panic or anything like that. But honestly I'm just like, God, like we need to have a solution for this when it happens. And it's not it's not a fun time for for sure. You

mentioned that you're running on the cloud, it's easier to bug. And I'm totally with you because you just have access to a bunch of tools where you can extract data, store it, or even investigate really in a strategic way. Whereas on-prem stuff you have to be asking, well, are they running it on some weird on-prem cluster? What are they even using for a cluster? How about their hardware or connection, network cables, et cetera? Like maybe there's some RJ forty five jack that's

misbehaving or power fluctuations that's causing not enough power to be delivered to the hard drive are just incredible things to or challenging to debug. I remember previously in one company I was in, there was an issue in one of the data centers with the Wi-Fi signal. And people would be moving from one area to another one and that would cause an issue with some of the data which wasn't being validated correctly because they were using like YAML and YAML you don't know when the end of the stream is all the time because it's not encoded there's no end encoding.

so it can be dangerous, you know, if something gets cut off. And yeah, there were for sure some issues that are very difficult to debug when they're not in a data center that you can control. so I I totally understand the aspect of running on on-prem though. You know, there are some customers that absolutely have certain concerns, whether they're security related or usually regulatory reasons for for doing that. Are you building the same product basically? Like are you just shipping them what you're running in the cloud or are there fundamental differences between

These two versions.

speaker-1 (27:10.872)
They're they're off the same code trunk. So they're basically the same thing. Yeah. And we try to keep everything with one main main version so that it's by configuration. It behaves a little bit differently, but by configuration.

speaker-0 (27:25.294)
Do you have challenges trying like there are some scenarios where you may want to optimize because of the cloud provider? Can I can I ask you using like AWS or GCP or something specific to Taiwan? Both okay, cool. So I I mean well, you know, we're gonna hear that. I I feel like there's a thing if you're just using one cloud provider, there's an opportunity for optimizations utilizing the primitives that are available from that cloud provider.

speaker-1 (27:36.36)
G C P N AWS.

speaker-0 (27:50.754)
But as soon as you go to two, you start losing that capability. So you're not able to make those same changes. So it's a real trade-off to go multi-cloud. So I guess what I'll ask you is what what went wrong that caused you to have to go down this path?

speaker-1 (28:04.192)
Okay, yeah. But ear early in the days in in Ragik we were actually hosted on a a service called Linode. I think they're still around. Yeah. And it it was hosted on Linode. So it was year I think it's around two thousand and fourteen. There's if you if it if you Google it, there there's there around Christmas time, there was a DDoS attack a a whole series of DDOS attack around Christmas time, two thousand and f I think fourteen.

And the basically the whole data center went down. So we are not able to have any access to any of our data. So that I was on vacation. So that was just yeah, that was just hell. And yeah, I was trying to calm our calm our customers down and trying to trying to tell tell them because we we still have so little visibility from Dino because Dino say, we're under d h very, very, very heavy

D DOS attacks, so we're doing everything we can, but they can't cannot give us any visibility. So during that weekend, because it it's not like a matter of hours, it's kind of like a couple of days. I think it's it went down for like almost like twenty-four hours. wow. And we Yeah, and we we just thought that yeah, we really, really can't take this anymore. So because they're kind of remedying on and off. So we kind of start to get

about like thirty minutes of time that we can access the the database the the the servers and we then we get an hour and it's down again. So I so I just decided, yeah, that's it. We we have to get out of there. We have to move out of there. So we kind of decided and migrated all our services from Lino to GCP over one weekend. So over that that weekend. So I was on vacation. So I was not even in the office. So I was

kind of like in a museum. I was sitting there with trying to use the museum Wi Fi and then my kids were just like off off like in visiting in the museum. I was just in the in in in the the lobby and trying to move everything from Linux to G C P. And gladly or kind of nice that we don't have that much data back then. So just the like a couple of servers that we need to move. So we kind of just moved everything to G C P

speaker-1 (30:27.626)
over one weekend. And and later on we we just feels like we we need the flexibility to switch services whenever there's we don't want to be buying to one vendor. That's just too dangerous. We want to have the ability to move anywhere. We just I just told myself we have to get we have to have we we're not going to be buying to one single one single vendor. And yeah I know that

No, most people would use leverage tools on these like AWS and G C P to scale their services, to automate things. But basically we just roll out our own and service instance management with custom code. We just write custom code and they can even like SSH to different instances and t and to do the routine management and to to do whatever we want. So we we basically just write up our own

service management system with no connecting by SSH.

speaker-0 (31:27.692)
I mean, once you've suffered those those particular traumas, there's there's no going back at that point.

speaker-1 (31:32.662)
Yeah, yeah, that yeah, that was very, very very stressful.

speaker-0 (31:37.666)
That's really early days for G C P too. I I didn't even know they were out at that point. And so like that must have been a huge risk even to decide like, okay, not only are we going to go to a cloud provider, but we don't have a lot of options and they haven't been around that long. you know, are we ready to even trust them with that regard? But it was enough for you to actually even make that switch. I I know I mean Lino was definitely is focused more on the VMware the VM side of the house and definitely a huge challenge to get even

more reliability if you want, say, like a database in in any particular way which has backup or reliability set up. Are you using those primitives in the cloud providers though? Or are you managing the data? I mean, you're running your own database that isn't offered as a primary option. Like there is no Berkeley D B offered by G C P or AWS today. If they st

tomorrow stood up and say, Hey, you know, we're gonna offer to manage version of that, would you switch to that or would you still run it within the the service management mesh that you've you've created for the that particular provider?

speaker-1 (32:41.794)
I don't think it's possible at with today's technical architecture because the database runs within the same process. I was th I would I would think that anything outside of the process would be a lot slower. So we don't r even really put database on a different server. They just run on the same process and they use a shared and they usually use a shared memory pool inside the JVM to to for multi-tenancy.

So yeah, it's it's kind of different from the other database architectures because we're based on Berkeley DB. There are ad advantages. There are actually quite a few advantages of using that because when we're building multi-tenant application with Berkeley DB, and it's actually a very nice tech infrastructure to build multi-tenant applications.

They have shared memory pools for all these d different database instances and the database synthesis are physically separated. They're not just logically separate, they're you know physically separated so that makes it easy to do hot di hot backup and restore. So it's just quite suitable.

speaker-0 (33:51.148)
Yeah, I I suppose the other perspective is that the needs of something that stores data could be fundamentally different from the I.O. or HTTP required ports or sockets that are available for a application server. And so com combining those needs falls into one of two cap like buckets. The first one is we perfectly match all the resources available to the VM.

Some part of the resources are dedicated to the application and the other part is dedicated to the database. However, the alternative strategy, and so we have like full utilization. The competing argument is that this the way in which they scale is different. When we some queries or requests require more scaling on the database side or the database, the resources dedicated for the database. And so separating them is valuable because it's not a one-to-one match. More requests or more complicated requests may not translate to exactly the same thing in the database.

Have you found that you're at a sweet spot where the utilization is still incredibly high? Or are you sort of wasting some capacity or monetary value by having it on one machine, but making it up with the the simplicity of having it all in one place?

speaker-1 (35:03.466)
I do feel that having it in the same machine kind of improves the utilization because, you know, when you spread it out, it's more likely to have unused resources. So so basically for the application and the database, they're using the the same memory in the same JVM. So w we can actually configure the percentage of memory to be allocated to the database or

to be allocated to the application. That's something we can configure or even change dynamically. So that's a a a pretty good thing because we can decide how much memory I want to allocate to each database instance when it starts because it's multi-tenant. So we can kind of determine, we look at how large the data set is and we look at and how much usage they have, how many users they have, and we can decide like how many how much memory that we want to allocate to them and according to the current memory use.

So I think it actually gives us a lot of flexibility to move the resources between and the database that what would usually be the database server and the application server, that part is actually pretty nice. So we we we are able to easily shift the resources between the D B and the AP.

speaker-0 (36:18.35)
I mean, if you're already in the JVM, have you thought about ever taking the next level and shifting to having your architecture be specifically on a like a container native platform instead?

speaker-1 (36:28.044)
Yeah, they they they have been deployed on Dockers and and containers. Yeah. Because for sometimes for our on prem users they they do deploy on Docker. But for us, because the basically the whole installation is like really, really simple because and for the application server we use Jetty, which we embed in there. And for database we use Berkeley D B which also embedded. So basically when to set up a server you just install Java and you just

run Ragic and be done with it. So it it's just not a lot of work for not a lot of benefit for Docker. But but it but for some some of our on prem users they they like to they always use Docker so they could still still do that.

speaker-0 (37:12.14)
One thing that always comes up as a question for me in scenarios where people are basically running all the infrastructure themselves on a virtual machine or even bare metal is how do you scale testing effectively? Because from my standpoint, it's always been if you have containers, it's easier to

deploy them, make some configuration changes, understand memory utilization, CPU utilization, et cetera, and actually watch where they fail, potentially if you're doing some sort of load test. Do you have different deployment modes to be able to capture that well? Or have you found particular sweet spots in how you actually do load testing in in the model that you're running today?

speaker-1 (37:47.532)
I don't think there's a lot of special things that we're doing with these load testing. Yeah, we we just have some test test instances and test servers on G C P and AWS and also we we also have some local machines in our office that we can use to to do these low testing. So yeah, we there're just some we don't really scale those tests. So yeah, we just do some because they're they're basically a monolith, so

It's kind of easier to just test it.

speaker-0 (38:19.926)
Yeah. No, I I I totally get it. One thing you did mention, you hinted at a little bit was your multi-tenancy model. And so I am curious about your product, especially if you have on-prem or customers that are deploying it themselves. And once you have that and if you have the same basically source running in for those models as well as in your own, say, cloud environment, then a rather than a multi-tenancy model, I I found a lot of companies just deploy a different set of instances for every single customer. Are you doing that or

Did you mean something different by you're doing something different than that?

speaker-1 (38:51.04)
Yeah, yeah. That for one instance we we are running like thousand thousands of tenants on each instance. So that that's one of the the good things that I talked that I talked about, like they had they can use a shared memory pool on Berkeley D B and say we're running thousands of thousand thousands of in customers on on the same server. And that's of of course that's still kind of a a technical challenge that

has caused some problems because like n the the biggest problem would be like the noisy neighbor problems. For sure some some accounts they are taking up a lot of resources or than then the others affecting the service quality of other accounts. So there we we've learned from experience and and added a lot of those quotas and limits to to to everything that might take up extra resources and might you know starve other other neighbor

neighbor accounts, neighbor databases. And another thing that we do a lot, more and more based on these, is that we try to do most of the processing using an asynchronous queue, like a blocking queue. So that, you know, when certain account is trying to execute a hundred or a thousand thing at the same time, we just try to put that in the queue. So that it's not going to hog up all the CPU cores. It's not going to hog up all the

all the memories. So we will on these kind of multi tenant environments, we have to for everything that might take too much resource, you probably has to put it in an asynchronous asynchronous queue so that it's not gonna all fire at the same time. So that way I didn't start that way.

speaker-0 (40:32.814)
I'm guessing you didn't start out So so I I I guess Yeah, you didn't start that way. what was the straw that broke the camel's back, so to say?

speaker-1 (40:43.858)
I I can't well, to to say I can't even remember, but that back broke long ago because you hit that problem pr pr pretty pretty early on. Especially in Ragik's model is that people can build anything on Ragic, any type of therapy supplication. So you're we see all sorts of weird use of Ragik pretty early on. So so in the beginning we we we hit that problem quite quite long ago and and

Yeah, so I started kind of writing my own blocking queue to to to try to to fix this. And w we had we had our first blocking queue to for for this purpose quite early on. But but of course a lot of things didn't live on the queue in the beginning. And then we see people starting to misuse it or accidentally use too much resource. It was just we need to use the the blocking queue for this too. So yeah. We just slowly begin to move

Almost everything that takes resources to the to the queue.

speaker-0 (41:43.738)
I mean it makes sense. It's certainly one of the things. early on when we were designing our solution, we knew that this was gonna be a problem in some regard and you have to come to the come to the floor, basically. come to the table already believing that there's something that you're gonna like you have to have the tools in front of you in order to be able to do something about it. You don't necessarily need to implement it at the beginning, or you need to have a database that exposes hooks that allows you to build your own technology on top of it to r relieve back pressure or, you know, prevent

abusing the the database or resources that are available. There's actually a whole episode where we went into some of the differences between the single tenant architecture and and multi-tenant architecture. So that episode will be in the in the link in the description. So maybe we won't focus too much on that for for now. But one of the questions that I sort of want to get into is I think na in the last few years we see a lot of people standing up and saying, I rebuilt MS Excel in one weekend and it's great. Or I rebuilt Airtable in one weekend.

and me personally, I'm rebuilding Gmail as a product right now and I can tell you it's taking a lot more than one week yet. How is the like current ecosystem and the product climate fundamentally impacting your organization or like the technical challenges you're you're seeing today versus the ones that you would have seen in say the last fifteen years prior?

speaker-1 (43:01.144)
Yeah. Who what you're saying those those I built something in in a in the weekend because I was really curious about that. So let me address that in in like a few months ago. So I was like, wow, so really? No no coding experience you can build that over a weekend? So so I have two kids. They are twelve and fourteen. So they they always ask me to to help to teach them coding. So okay, so now we have th those L O So I'm gonna teach you how to not code and that's build a game.

We went up on a small project to build a an RPG game, like sort of like a Dragon Quest. So I I decided I I I I'm not gonna do any coding. I'm not gonna not gonna even look at the code. So so we begin trying to create the RPG game. Like in half an hour, there's a working RPG game that we can play around. It's just amazing. And I thought, this is why people say, I build this over the weekend, I build this over

o just over like like three hours. So that's kind of a magical feeling. But the next day and the next week, because the code gets bigger and bigger and just there's just a lot of problems that begins to to arise because it just takes longer and longer for for AI to kind of fix a problem and it and often says that it fixed a problem and it doesn't really fix the problem. So I begin to look into the code set and it just really

bad spaghetti code and with some really bad designs. And in the beginning because I I never told it told him what kind of design we want. I just have my kids telling me I want this, I want that. So the design is like really bad. But the amazing thing is it that the game still runs.

speaker-0 (44:43.822)
So I want to ask is I feel like that is often the response in these areas. And the feedback that I get in those moments are, well, Warren, it's a skill issue. If you were better at prompting the LLM, then you wouldn't have spaghetti code in the first place. And I don't I don't agree. I have tried I have tried so hard to make it so it the code can be maintainable at a long period of time. And I just I haven't found that to be the case. And I I'm sure with your experience that

Yeah, you weren't I mean, there's the I'm not looking at it, but in the on the same time I'm like, I tried to get it to fix it, right?

speaker-1 (45:18.572)
So I I this this is a fun side project I do with my kids, but at at work I find the the LLM never been able to do the system design correctly. it's probably because our product is kind of different. Because if you're building like an e-commerce site or like ERP system, I would tend to believe that they can build something pretty standard and pretty much right.

But I don't think that it has enough background understanding on how to build something that I want in the database builder. So it it it knows the concept, but the design will be really bad. So every day when my working I I still work with LM to help with the all those design all the time, but just that I have to I I always have to tell them, this is wrong because of this and this that so you have to change that and

I don't know. I I still work with them, although I end up with coming up with all the designs myself. But you know, talking to them helps me helps me organize the thoughts, I guess. I and and they can write up all these documentation about our design plans really quickly, so that's that's also a plus. But in the end I would look at a design. They're still made by me. Their design is usually pretty off.

speaker-0 (46:42.412)
Yeah, no, that's that's sort of what I've experienced. And I I've been trying to figure out if different n tools or different harnesses change the approach here. But I think the fallacy is sort of the same one that you've run into, which is in my area, whatever my area is in quotes, it can't be used. But if you look at other areas, you know, it's no problem for it to be used there.

And I think that's where like there's a devil in the details where like if you actually go and try to implement an ERP system, I will tell you. I will tell you how complicated it is and how much the LLM will get it wrong. and I think that's sort of yeah. And but I think this is the thing where w the area where we're an expert in feels like it can't work or does not work very well without a lot of micromanagement. And when we push it to other areas that are outside of our expertise, it feels like it does a much better job.

But when you ask experts from those areas, they give you the opposite feedback. building a spreadsheet tool, I can do that with an LLM. Matter of fact, I just saw 10 products on whatever a product hunt yesterday that were all you know selling the air table replacement. so you know that's an interesting perspective. But with that, maybe this would be a good moment before going further down a tangent to switch over to picks for the episode. So so Jeff, what did you bring for the audience today?

speaker-1 (48:00.672)
Okay, yeah. It's one of I s I saw on your intro you could be one of the adventures and and actually in about two two weeks I'm I'm going on the back backpacking trekking trip to the the the mountains in in Taiwan. It's called the and one of the mountains is called the Nanghu Mountain. So yeah, the and Taiwan is is where I I live or where I'm I'm from and Taiwan is like a a

kind of a small island, but it has a lot of mountains. So like sixty percent of Taiwan is mountainous areas. And and I've been to like the Rockies, I've been to Alaska, Switzerland to see the Alpines. And the mountains in Taiwan are just just very different. The sceneries are very different but equally beautiful. And and I'm just really excited about the backpacking trip that I'm embarking on in a couple weeks with my college friends and

And I would just love to bring up that, you know, if you're if you love hiking, you f if you love trekking, backpacking and you love, you know, mountaineering, Taiwan is actually a really, really nice place to to visit because there are just a lot of amazing and amazing sceneries in the mountain areas.

speaker-0 (49:19.52)
Is there one particular peak or or trail that you would recommend above all the other ones?

speaker-1 (49:24.798)
One of the most popular thing popular area is the the Taruko, the the Taruko National Park. So it's it's sort of it it's it's sort of like the Yosemite, but in a different flavor. So you can kind of expect kind of the the same valleys but in the but in the Taiwanese flavor. So it's it's a really, really nice place. There are a lot of hiking trails there and that you can take. So

No, it's it's it's a very, very nice place. There are a lot of mountains you can you can climb there.

speaker-0 (50:00.204)
Wow, it's like compared it to my two favorite places, Switzerland and Yosemite National Park. that's that that's a hard hard sell there. I'm gonna maybe now have to take the transit out there to see the the mountain range, which is just really interesting because I I just finished the jet lag, the game season where they're in Taiwan going around and it's interesting because they're only going around the edge of the map on the rail and there's no you can't get through the center of of the country because it's just a whole mountain range, which is

Quite quite inspired.

speaker-1 (50:31.266)
Yeah, it's it's just real it's just really beloved.

speaker-0 (50:34.254)
Okay. Okay. I just well then. I love I love the pick. For me, maybe I mine's a little bit less less inspired. I'm gonna pick the DevOps Days conferences. So I just got back from the one in Zurich and honestly, they're the best conferences I've been to anywhere. they're non commercial, they're run by volunteers, they're very well done, people very committed to it. And just I've been to ones in a a lot of different European countries and

Honestly, if you can go, you absolutely should. And more importantly, if you can sponsor one of them, I also highly recommend it because really out of all the conferences, they f don't just focus on technical aspects. They find opportunities for cultural improvement and having higher level of conversations than you normally get at just like a a Java or C sharp or JavaScript conference and way better than what you get at like an AI conference where you're just discussing the best new skills to throw into all of your agents.

I don't know if you ask for yeah. I I don't I don't think I don't know if there is one in Taiwan though, but maybe I'll I'll look it up after the after this call and see yeah see if if they're offering one. Basically it's up to volunteers to start and and run it. There's a global organization, but it's all volunteer based. And it's one of the few conferences that I go not just as a speaker, but I actually pay pay to attend. And I don't know what it is. I don't know if it's about the the branding or or the mentality or the culture of the or the organiz the global organization, but individually they just they always seem like they're

way more in tune with what people actually want and what should go on and they're very careful about who can sponsor and what sort of talks can show up.

speaker-1 (52:10.336)
Okay. Yeah. Just search if the the there's one in Taiwan.

speaker-0 (52:14.144)
Yeah. well, thank you, Jeff, for being the guest in in today's episode. I I didn't know we were where we were going initially, but I absolutely love talking about all the technical things that a executive can still get down and and and do. So thank you so much for for coming on for the episode.

speaker-1 (52:30.584)
Thank you, Warren. It's it's a pleasure talking to you.

speaker-0 (52:33.058)
Well, it it it's been great. And thanks to the audience for tuning in for this week's Adventures in DevOps and hopefully we'll see everyone back again next week.