speaker-0 (00:07)
Welcome back to Adventures in DevOps. Every episode is a deep dive into a specific topic with an expert guest. Maybe you've heard, it's not DNS. There's no way it's DNS. It was DNS. So for this week's expert, we're going behind the scenes with DNS, the expert 13 year veteran and CTO at DNS Simple. And according to his LinkedIn, a very passionate programmer, Simone Carletti. Welcome to the show. Hi.

speaker-1 (00:30)
for sure, dnice. Thanks for reminding me.

speaker-0 (00:32)
Last week, we had a deep review of the current state of IPv6 for everything commercial, business, and consumer. I thought it was only fitting to jump over and have a DNS specific episode, and I think right on schedule.

speaker-1 (00:46)
awesome, awesome. I'm so excited to be here and share and talk about the wonderful world of DNS.

speaker-0 (00:52)
So, you know, I think it's one of these areas that we often joke about it never being down. And I feel like from my own experience, it's not really the first place you go to investigate when there is a problem. Any thoughts on like why that may be?

speaker-1 (01:05)
It's,

you know, one of the main problems is the inner complexities, the number of layers in the DNS protocol and the DNS infrastructure. So it's something, it's actually an issue that we have sometimes inside at the NSimple when we need to investigate an issue. The number of layers and the complexity of the system makes so that you don't even know sometimes where to start, right? Because there are so many variables. There are so many layers in between caches.

and configurations that sometimes we think that the problem starts somewhere, where it's somewhere else. Or you think that the problem is not related DNS because in your brain DNS is maybe just that particular provider, but you don't realize that your ISP is in fact a DNS provider. Your computer is a DNS provider. And so maybe you have some configuration locally that is changing that and your brain that is not DNS, that is, I don't know, a local configuration. So.

It is complex. It's an amazing environment where you have so many players working together. ⁓ But when you need to figure out where the things are going south, it may take a little bit.

speaker-0 (02:19)
Yeah, I think it's really one of those areas is like also heavily abstracted away. Like if there's like a first level application problem, you're going to get something straight in your logs where the stack trace makes a lot of sense. But then you start getting down the layers like, is it like a part of the HTTP request or TCP UDP request that's problematic? In which case you're getting a little bit into the protocol land. But when it comes to DNS, you're really talking about the URL domain resolution there. And so at this point, if you're using containers or some other technology stack, serverless or even Kubernetes, Docker Swarm,

That may be completely abstracted away from what your application is even doing. So I can totally understand it not only being complicated and convoluted, and maybe there's lots of problems with it, but also hidden a lot of times. And so that means that the ways it is exposed, those problems into your application becomes unexpected.

speaker-1 (03:10)
Absolutely, absolutely. 100%. I come from the software engineering environment and I've been working a lot on programming and developing systems, more than operating them, honestly, or operating at the network level. I mean, I've been involved at network level for many years, but I started from the software engineering side. And if I compare the two, the two world, I definitely see how the network had stuck in the network environment, got obstruction earlier.

than the software development. mean, if you think about that frameworks, like web frameworks, like now we use Raze, for example. So Raze is super popular. And then there were many others in the past. You can go Django in Python. You can go Symfony in PHP. But really, that is what, 10 years, 15 years? Sure, it's a long time. But if you go back to the history of the DNS protocol and the network stuff, we're talking about a third or a fourth of the length of that technology.

Right? And so in fact, today we're starting to see issues in the software development similar to what we're talking about in the abstraction of the NAS protocol, where people are maybe building a web app, throwing that out, and then when the bot comes in, it's like, where is this happening? And you don't realize that you build a very thin layer on top of a framework. And when you start narrowing down into the framework, where then you are confused because you don't know what the framework is actually doing.

Right? So every time that there is an abstraction and the longer the abstraction there, then the more complex it becomes. And it's just that software developers as a software developers and engineers who have not been used to have obstructions on the software demand, I think for as long as we had in the network stack.

speaker-0 (04:57)
Yeah, I actually think, you know, as far as innovations go in a lot of ways, everything related to networking automated around the time of the cloud providers. And before that, you know, it was hardware related. And so when there was a problem in this space, there would always be a hardware concern. And I, I feel like a lot of software engineering comes around to the fact of like, if there's a problem, you really pray that it's not a hardware problem because good luck trying to investigate what's going on there. And I think for historically for something that was in part of the network stack, things like switches and routers, which are like almost pure

hardware-related components, you know, it's just like, please let it not be there.

speaker-1 (05:33)
Well, these days we have these kind of issues, but these days it's more about the time required to provision a piece of hardware. we also got used to have things in the cloud where you can spin out a new environment in a few seconds and you get up and running. But then in fact, I'm bringing this topic because there's an interesting story about ⁓ part of the architecture of the InSimple. We are using for all our authoritative named server, we're not using cloud.

providers. We are using bare-metal machines and so we configured everything from scratch. And historically this is exactly because of abstraction. ⁓ When we run our own authoritative name server software, it's a name server built on Erlang and released open source. It's called ODNS. And so when we build that software, we don't use things like, for example, PowerDNS or Wine. And so software...

running the virtual machine, the early virtual machine in a containerized environment would just simply not work. We tried that in the very early days. I mean, it was many, many years ago. I think about 10 or 14 years ago. And the level of obstruction that containers had back in the days simply didn't work for us because we needed control over the network. Just think about it. We are literally provisioning a network service. And so...

⁓ being able to have that level of control over the whole network stuff was absolutely, absolutely essential. And we couldn't do that with any of the containers that we have back in the day. So we still run this kind of environment for the name servers, also for performance reasons, obviously, for fine tuning and all of it. And now these days, the problem of the hardware is really like, if it's a problem with the hardware, we hope it's not, just because if we have to replace a hard drive.

then it physically takes someone to go there and change it. It's not that we can. And so the time to recover is lower than the time to recover that would take instead for some kind of software issue.

speaker-0 (07:38)
I think there's something to be said here. There's a lot of companies that are now gaining steam or repeatedly post on LinkedIn how great it is to reclaim so much ⁓ capital expenditure by moving off of the cloud and buying the hardware for bare metal or even paying a bare metal as a service provider rather than paying one of the hyperscalers because it's quote unquote cheaper. And a lot of those companies, they're like offering app.

or SaaS's that don't really make sense. But when you're really providing a fundamental backbone or infrastructure at the level that you're providing with DNS, that infrastructure is actually a competitive advantage. And utilizing a third party to sort of sit on top of it creates this sort of circular dependency in a way where I remember early on, there was always a question for me, like, how can ISPs run their infrastructure on top of AWS? Like, isn't there like a circular dependency failure mode there where like if

part of AWS goes down, will crash the ISP, then the whole network availability zone will go down for AWS because of a small little change. That's something you would want to avoid.

speaker-1 (08:44)
And

it's not just, you're totally right. It's not just because of competitive advantage. It's also about requirement. I I shared some of the requirements that were about the performance and the configuration, but there's actually another requirement, ⁓ like the ability to control the BGP protocol and the BGP configuration that no cloud provider will ever accept to do that. Because just imagine...

You going to AWS and say, by the way, I want to be able to control the BGP routing or your whole data center because I need to inject my own, you know, no, it's not going to happen because you have the potential and the possibility to take down an entire region if the configuration is misplaced. So on the other side, I totally agree also in the fact that, you know, we have seen cases like, for example, a few infrastructure.

providers, those services that sits on top of AWS or GCP. Do you mind if I stop for a second because the entered? I don't know how the dog opened the door. ⁓

speaker-0 (09:50)
We'll cut that out unless you want us to leave it in.

I totally get it. It's twofold right there because realistically, and this is an aspect of DNS that I still don't understand to today. I understand ASNs. I understand that if you want to issue IP addresses, you have to be part of the global board and get it approved. The BGP routing is the aspect of VPNs that I just never fully understood. I do, maybe a long story short, there was a few years ago where Facebook locked themselves out of their own data centers. And the way they did this is because Facebook

data centers had digital ID verification to get access. But in order to do that, the physical devices on the data centers had to go to the internet and verify that the device that is the badge actually did have access. But in order to resolve that, it had to resolve the DNS for the domain. But Facebook owned the DNS resolvers. They were in the data center. And so when they broke the resolver, when they published the wrong BGP routing,

then the access to the data centers was broken as well. So they couldn't actually fix it and they couldn't get into the data center to actually reset the system. So basically the story is they were breaking down the wall with heavy machinery in order to get into the data center to bypass all their security procedures to deal with this problem. So like this is not a small issue.

speaker-1 (11:17)
Absolutely. don't really need, although as a company, you actually don't really need to be someone hacking on BGP to have these kind of problems. mean, this is a common problem truly for most of the infrastructure companies today. We had a similar incident a few years back. It was actually 2014. It was one of the...

largest incident that we ever had. We have a massive DDOS attack back in the days. We didn't have the infrastructure that we have today. We didn't have the DDOS layers in from, in fact, all the investment and all the development and research that we've done after were originating from the incident. And so what happened is that back in the days we used to have, we were using a service for collecting all the logs.

Back in the days, was actually very early, 2014, we didn't have services like Datadog or monitoring services like that one. We used a service called PaperTrail, and PaperTrail was a customer of us. So during that incident, we were trying to log into the system, into the log aggregation system to figure out what was going on. And we couldn't, because we couldn't access that system, because that system was relying on the EnSimple, and the EnSimple was completely down.

So we tried to access our machines and we were using an internal name like DNSimple.anXtension to connect to our machines. But because that name was resolved through DNSimple, DNSimple was down and we couldn't even log into the machines. We had to start figuring out what were the IPs of the individual servers all over. That was actually very interesting. You know, there's a lot to learn from DNSimple.

And I'm sure that all the people like listening to this podcast and all the people that have had to deal with operations and issue management would probably agree with that. They would agree how frustrating it is to deal with an incident, but how insightful if then after that you evaluate and you assess what happened, how insightful it could be and how important it is to learn from that. And so we had so many...

chicken head problem where we're trying to use certain services and they were not available. And after that, that became one of the critical policies ⁓ inside the M-Simple where certain piece of the infrastructure need to rely on a backup plan in case, you you as a service, ideally you would use a service that doesn't use the M-Simple directly or you need a backup plan for that. And lately, ⁓

A few months ago, decided, no actually last year, we decided to re-engineer the overlay network. And we started doing due diligence about the various providers that are available right now. I don't want to name them, but we actually had to rule out two that were pretty competitive opportunities because they were using the NSimple. This is actually something that when you are an infrastructure company, when you are taking decisions there,

You actually have to start realizing it's a nice problem to have. But you have seen problems where Cloudflare is offline. I don't want to absolutely blame them, but they're just one of the largest provider, right? AWS is down. And on Cascade, a bunch of other services, as you were pointing out, are down. This is absolutely something that an infrastructure provider must account

speaker-0 (14:46)
I want to ask about how you ended up actually identifying what the IP addresses were that should have been at the end of the resolvers if you weren't able to access the systems that told you that information.

speaker-1 (14:58)
So the solution to this problem is actually one of the main reasons why you and I ⁓ got to connect together, because it ties back to the topic of infrastructure as code. DchainSimple has been using infrastructure as code to provision the whole architecture, the whole system, every single piece component from the web app to the whole infrastructure since day one. We started with that.

We have never had a single machine or a single component of our infrastructure that was not provisioned through some soil infrastructure in school. We started with Chef. We are still using Chef these days. We're also using way more than Chef because eventually we had to deal with other type of complexity. Also Chef is no longer going to the direction where we want to go.

speaker-0 (15:48)
You had real problems that you had to deal with.

speaker-1 (15:52)
Exactly. And at that point, everything is there. The moment that that's the beauty of it, the moment that you need to pull down some information, then they are codified somewhere. Even more so because we pair infrastructure as code with repositories with Git that allows to have offline access where basically any of the team members can have the whole repository copied locally and they work on a local copy.

And if you think about it, it's just insane thinking that you have an entire infrastructure in your whole local computer. But that's the way it is. And so the beauty of it is that you have codified all that kind of information there. And sure, maybe you are unable to get them through the normal tooling because the normal tooling may connect with the central orchestration tool, but you can still grab the code base and figure out where the APs are pretty much. And that's actually what we did.

speaker-0 (16:51)
Interesting. ⁓ feel like for me though, like early on, I stopped worrying about DNS and learned to love it. Even in the enterprise world, historically, I think I was in companies that wanted to use something like HashiCorp console, which I always considered a nightmare to do like registration for leader selection. And there is this weird aspect where DNS feels like this public thing. But from my standpoint, it's always been the most reliable piece of the picture. so if there's an incident there, you know...

ridiculous things are going down and of course there are people working to get it back up. But I feel like there's been this mentality where it's not the appropriate solution for everything. But when I look at the alternative solutions out there for handling things like, we have we need to do leader selection for databases or we need to do round robin routing for requests coming in. DNS feels so much more natural even if you're running stuff privately. And I don't know if it's just me and my experiences where I see this disconnect.

Any thoughts like have you you ever seen that companies are more hesitant to use the public domain name server registration architecture infrastructure out there that really is designed to be super reliable and instead they want to use an off the shelf product where they can plug it into their ecosystem.

speaker-1 (18:05)
Yeah, I've seen examples of companies trying to stretch the boundaries of what you can do with DNS at really levels that sometimes are just driving me nuts. I recall in the first few years when we built the system, it was very complex to be able to squeeze into...

our product, especially because we built the name server, we built the validation system and all of that, like trying to deal around with all the RFCs and all the possible bindations, sometimes conflicting specifications. so, you know, eventually some records will, some incorrect, malformed record will go through the platform and will start like producing odd results. And, and occasionally I've been seeing certain records where I'm like, why are you doing that?

Why are you trying to do that? That didn't even cross my mind. I recall there was a point, actually, to be fair, that was a research, but there was a point where someone started to try to use the TXT inside, the TXT records as a database.

speaker-0 (19:14)
Oh, that's Corey Quinn 101, right? Like the rub 53 is a database.

speaker-1 (19:20)
Exactly. ⁓ so there were some weird things going on. admittedly, it's an interesting experimentation. The reality is that you're totally right. DNS has been, despite being so often ⁓ pointed, like being the problem because it's in so many places, so statistically speaking, it's everywhere. But the truth is that it's an incredibly stable protocol. It's been around for years, really.

So many years that in the IT field, it's almost centuries, right? And ⁓ also the way, I mean, it has changed a little bit in the last few years, but for many, many years, here's gone unchanged. Not a lot of new record types, not a lot ⁓ of changes. It actually started to change a little bit with the rise of the various SaaS services. For example, you know, when certain, when we started to see

number of services where they were giving you a C name, like that you needed to point to because they didn't want you to point to an IP, but they wanted you to point to a name that could change. Now we started to see needs that didn't exist before.

speaker-0 (20:31)
You're really like the alias record at the Apex domain.

speaker-1 (20:34)
Exactly.

Exactly. That was one of the very, probably one of the first use cases, like innovative use cases that I could think of that's really bringing the protocol to a different level in terms of utilization. For many, many years, the protocol has been used in a pretty standard, and I would say to an extent boring ways, but it's there. It's very reliable. When you talk about reliability, if you think about it, I don't know if you know this, but no one would provide you 100%.

of SLA. But actually, if you start looking online, you will see that this is, I wouldn't say common, but this is normal in the DNS space. Because generally SLA, especially if you have a distributed system that uses, for example, A unicast in BGP and not unicast, then being completely offline means pretty much a catastrophic event that brings down everything.

across multiple data centers, across multiple regions. And generally speaking, that is not a DNS problem. Like all the catastrophic or semi-catastrophic events that we had were about provision, configurations, or software. Not about the protocol, right? And so it's about the whole orchestrations around it. It's about the services that you use to provide.

It's not about the protocol per se because there are so many layers, there's so many redundancy is baked into the protocol itself.

speaker-0 (22:05)
Yeah, it's interesting that you bring up the pushing DNS even further in some regard because like the quote about using Raptor D3 or DNS servers, name servers as a database. You know, there is something here because the AWS incident with DynamoDB, their solution actually required storing pointer records for the IP addresses for DynamoDB resolution somewhere.

And inside AWS, the normal place to store key value pairs is in DynamoDB itself. And obviously, if you don't want to use the same service to depend on itself for resolution, because you're more likely to have a catastrophic failure, they were already utilizing their Route 53 solution, their DNS solution for storing data. So actually, DynamoDB uses their DNS server as a database for its own needs, for locking and unlocking records.

to determine what the canonical correct resolution is to hit the database. So, you know, it's interesting where people are like, no, you should never do that. And then you actually see real world implementations where it is the right answer to deal with. And I sort of want to flip this upside down because you clearly have experience going through all of the RFCs related to DNS probably multiple times. My question is going to be, are there any parts of the DNS structure, maybe specific kinds of records that you're just like PTR records? I hate them.

They should have never been invented in the first place. You know, what's your pick here? ⁓

speaker-1 (23:29)
I have an answer. It's definitely the newest one that we implemented a few weeks ago, which is the SBCB slash HTPS record. It's two records, but the RFC goes together. And they are the follow-up from the Alias records. So the Alias or the Apex or the A name, it comes in different names and different flavors exactly because it was never formalized.

It's all very complicated. mean, just reading through that and implementing all the different variants and the parameters and the settings, I had to go through that multiple, multiple times. One of the reasons is because we use different languages for the different components. So we actually have three main components talking together, the web application with the storage, and then...

a distribution system that is in Go. So the web applications will be the distribution system that is in Go and the name server that are in Erlang. So I had to go through that implementation in three different languages, three times, multiplied by the number of times that I have to go through those sections again and again and again, especially about the various parameters parsing, the fact that you can use them through key and, and, and, and, then for parameters from one to eight, you also have convenient ISEs.

And every time I read convenient attached to alias, I'm like, man, when you have three ways to do the same thing, my brain just cannot compute. I'm a simple human. when I read, I've been working for years in Ruby, right? And Ruby has a great focus on readability, on the beauty of the code. so, especially in the very early days, there was an overabuse of the fact that you could alias.

the name of methods and really call like have a single function be called in many different ways. And you still can see that like there are libraries where you would call dot length, which is the equivalent of dot size, which is the equivalent of dot count. Unless it's not. And so some libraries then maybe use a size, can read on the caching one and count can always trigger the database. So in my brain it's like...

Why do we need three ways to do the same thing? Especially because then when you have a large code base and you need to search for something, you need to make a change, you have a bug and you wanna see how many other instances of that bug exist, then if you have three ways of calling the same method, it's just like, you know, very complex to deal with. So, admittedly, we went back and forth multiple time ⁓ trying to figure out the best way to implement that.

But also the best way to provide that to the customers, because we have a lot of very highly technical customers, but there are certain topics that just because you're technical, doesn't mean that I can take for granted that you don't have a That was a story also about SSL TLS certificates. Like I recall how complex it was in the very early days trying to figure out a way to serve the certificate chain.

speaker-0 (26:29)
yeah, absolutely.

speaker-1 (26:46)
the server certificate, the root certificate in a bundle in a way that was convenient for the customer. Because many of them did not realize that if you were using NGINX, you had to sort them from patent to server. If you had Apache, you had to go the other way around. No.

speaker-0 (27:05)
I don't want to know that. Yeah.

speaker-1 (27:07)
And so, you know, these are the kinds of things that when you implement that and you spend maybe days, like hours, days, sometimes even months on a particular topic, then you become an expert there, but you don't realize how easy it is to then just build something that is, it works, but it's unusable because you expect the other people to understand at the level that you implemented it. And so the SPCB and the HTTPS, I think it's a super powerful record. It's designed to be...

probably the replacement of the areas and many other configurations. It's just very tough to digest.

speaker-0 (27:43)
No, totally get it. mean, the need is clearly there. There's a question of obviously your company has some capability of going in and making good suggestions given that you are a huge player in the DNS space. But I know how the ITF groups work. know how the global groups work that even if you're sitting there is like, know, like we have customers, we know how they're going to use it. It's hard to convince some of the people that are going out and designing these things that their use case is not necessarily

necessarily one that speaks for everyone in every single scenario. And we can see this actually by just if we look at what the cloud providers offer, not just in the DNS space, but across the board when it comes to protocols and standards, they're not always up to date with the things that have been released. Like this is my opportunity to plug. I absolutely love the query verb that just showed up recently in HTTP as part of REST as part of a valid method. And I think it's going to be still years before we see the cloud providers implement caching based off of query to work.

I think the same thing goes with IPv6, is, you know, it's been around forever, but realistically, cloud providers still don't provide a strategy that works across the board. We already know IPv4 address space exhaustion. We know in the DNS space that most of the cloud providers don't support some of the records that exist. the one that comes to mind that I always sort of want to work is D name. It's sort of this thing that like, if you don't know, you know what C name is, D name is for a whole sub domain, not just a single record. And I mean,

It seems like the thing that is a foot gun in a lot of ways. So I understand why they haven't implemented it, but at the same time, it's like, it seems like something to be super useful.

speaker-1 (29:20)
As someone has been working in the open source community for many, many years, I know how hard it is to fall into the top of saying yes to everything. And in fact, I have a huge respect for all the people that working on RFCs, you know, the idea of working group, because the amount of complexity of designing something that potentially will take a long time to adopt, but also any mistake will be paid for years. Right? So you really need to...

to think upfront and try to minimize the risk of making a choice that then you will regret later. Which is by the way, something that all of us experience, all of us that have leadership positions experience in some shape or form when building a software, implementing an infrastructure, you like you pick that database.

Because it seems a great idea in that moment. And then a couple of years later, ⁓ man, why did I pick this NoSQL thing?

speaker-0 (30:23)
Yeah, I mean, you're so on. It's like when you're in a company that you're running for multiple years, it's the sort of thing where you have that long term opportunity both see forward, but unfortunately also get to regret all your past mistakes. And some of them are very difficult to see without having many, many cycles iterate.

speaker-1 (30:41)
It's

a very complex task. as I said, I really, I also think that there are certain people that are best at that. It's not for everyone. You really need to have such a fast forward thinking mind, that extends to also a very open mind to be able to understand the need of an industry, for example. But on the other side, you need to be ready to say no and to put some name because we all have seen the...

how easy a system could become overly complex. I I often say that also to the team. Everything that we build today is a liability for the future. And so you often don't realize later on, after you build that contract, after you assign that contract, that you became liable for supporting it. You just can't remove an API endpoint after you build that. You can't remove a piece of infrastructure after you have...

You build that, even if you think that nobody's using it, or even just a couple of people are using it. Because guess what? Those are going to be the two people that are going to be pissed off and speak very vocally about how dare you, you took down that service that they were really passionate about. So it's a very complex job. The one that has to decide and think long-term about solving a problem now, but in a way, which is not going to become your problem.

or another type of problem tomorrow.

speaker-0 (32:08)
I mean, I'm totally with you. We see this all the time in our own product for identity and access management where someone wants to add unnecessary claims into a JWT to be used for authentication or another API to handle attribute-based access control or something like that. And if you dive into the use case and you really try to have a customer focus and really solve their business problem, you realize they're not even interested in doing it.

A reasonable way they just they have it in their mind that there's an approach and if you try to support it, it's going to cause a long term problem and I think this happens more and more when you're in the infrastructure space and you're providing such a critical piece of infra for customers that an issue here can very rapidly get out of control. You know at the lower layers like it everything balloons out and so what I really want to ask you is. How does one run a DNS business in the first place and why I guess?

speaker-1 (33:02)
It's a great question. When I'm going to find the answer, I'm going to tell you. No kidding. I have to say that if someone would ask me, know, 20 years ago, we just started the DNS business, like that would, that didn't, was never crossing my mind. Right. It just happened to be fair. I actually joined ⁓ from the DNS, from the domain industry. ⁓ with DNS Simple was born, DNS Simple acquired my company. was starting a

I had a business related to domain management. So really the two phases of D &S are DNS side and domain side. So as I say, like when I joined, was the very early days. I actually joined as a first official employee back in the days when D &S acquired my company, it was still a pet project. So in fact, I helped to shape and build all that D &S employees today, but I was coming from the domain space, in fact. To me,

DNS was to an extent a lot what is for many other people. It was a commodity on top of the domain space, the domain industry. The interesting piece is that DNS really connect with the whole domain industry very closely. So it was an effort transition to me to enter in that space. And I somehow felt the space very compelling. That's why also I...

I was so excited to help building, you know, D &S simple and evolving the product because you never get bored. Really. There are so many challenges, so many R &D opportunities, so many ⁓ new things. don't have enough life to live in order to learn everything about this topic and this industry. And so...

speaker-0 (34:46)
Yeah

speaker-1 (34:57)
At the same time, for someone that is really passionate about performance, algorithm, this is a topic that really, this is a field, an industry that really helps you working and playing with it because you are working on a level of scale in terms of, you know, requests, data that is not comparable because you are literally at the backbone of the vast majority of the services. At minimum, you need to have the same, well, that's not true because you have cash, but I would say just to simplify it, minimum, you need to have

one-to-one between the number of web requests and DNS requests, right? It means that if you pick any website and you are dealing with a challenge that is the amount of traffic hitting that page, then certainly you have an equivalent problem in the term of the DNS queries, if not more, right? Because then DNS is not just about the web traffic, but it's about main traffic, it's about everything else. And sometimes for a single web request, are ⁓ a dozen.

⁓ of DNS requests because maybe you are checking for the other type of records. You're checking for all the other services. You're checking for A, Paraglouple A, NS, Groove Records, all of that, right? So ⁓ it's a very fascinating ⁓ industry. And as I said, you are never short on challenges. And by the way, what's new about DNS? And that's true. mean, as I was saying before, DNS has been the same for many, many years.

What is constantly changing though, is not necessarily the underlying protocol, it's the way people, systems, services interact with that. And that is constantly evolving because the needs of the ecosystem are constantly evolving. DNS through HTTPS, DNS through TLS, DNS through other, all the various services, DMSec, which is a whole different beast. All of those are coming from needs.

that the DNS protocol didn't have years, decades ago. And so, yes, the fundamental topic, the underlying topic is the same, but the way that people, the services interact with that changes and keeps changing. And so that is also part of the Never Get Bored challenge that was talking about. There's always a new way to use or abuse the protocol.

speaker-0 (37:17)
You hit one of the items on my bingo card, was DNSSEC versus DNS over HTTPS. And I'm wondering if you have a strong opinion on a direction here, like whether or not companies should be implementing one or the other or both. And I think we've seen a non-trivial number of times where I think there was the sales forces or the slacks that tried it to roll out, especially DNSSEC. And that works sort of, but then they decide to roll back and that causes a huge problem where they're

not signing things correctly.

speaker-1 (37:49)
Right. I think they are not mutually exclusive. They serve different purpose and they also serve different type of use cases. If you look at the DNSSEC side, then you're really talking about signing and guaranteeing that the data itself is unchanged. And so it's more at the low level. It's more about the backbone, the baseline for building things on top of it. For example, records like

TLSA, Dane, all those kinds of implementations rely on making sure that you have a way to guarantee that the record that you're receiving is the record that you're expecting to receive. And so really DNSSEC exists to solve a certain type of use cases. On the other side, if you look at the various way to communicate or to pass ⁓ DNS and so DNS through HTTPS, I consider that more on the consumer side.

the use case, right? Like different users will have different ways of different needs. And maybe because they need to encrypt that data, not sign, but maybe encrypt because they are in a region or an area of the world where they even the payload of the DNS queries that for the vast majority is in plain text is clear in the wild, right? Even with the MSEC, you're not really encrypting the data itself. Data is still visible. Just provide a way to guarantee that the data has not been altered.

Those are, in my opinion, serving different purposes. They are not mutually exclusive. They have a problem in common, which is the adoption has been extremely slow because there hasn't been enough push from major provider, the industry as a whole, same like IPv6, right? Because it's an industry that is very scared, I fully saw, of breaking things. So it's moving very, very slow. And so there hasn't been...

huge adoption or the option is generally going with whoever larger provider comes first and draw a line or clear the path, right? Like the adoption of certain way of resolving DNS through alternative protocol has been increased thanks to browsers implementing that. But it's also such a technical topic that then what do you expect? Like the normal consumer?

to just go there and select which protocol to use. It's not the kind of thing that an end user would care about.

speaker-0 (40:21)
And even if they do, the problem is that they're not going to be technical enough to drive forward the right sort of implementation. Where does we need privacy in DNS come from? It comes from ⁓ usually a consumer regard. But if the implementation is all the only on the other side, there's all these layers in between that just like don't care enough. And so I think you're stuck in this loop where it would be good for everyone, but we just don't get there because there's not enough like say money behind it.

speaker-1 (40:47)
Right. In my opinion, this is a place where the industry should step in and the industry leaders should set some requirements. Just think about it. Do you think the customers or the consumers would have cared about having the HTTPS in the browser? Why did that happen? It happened because the browsers got there and say, you know what? I'm going to give you one year. I'm going to give you three years. In three years from now, we're not going to open

speaker-0 (41:03)
You know, I still-

speaker-1 (41:17)
by default, any URL that is not HTTPS powered, period. Same story is happening now with the decrease of the duration, the lifespan of SSL certificates. That's where Let's Encrypt came in and really disrupted the market, among other things, by saying, by default, I'm gonna give you 90 days when, recall, the whole industry was about three years. I'm gonna give you 90 days, that's it. No extension, no excuse, nothing else. You have to automate.

You have to stop having individual users changing, like logging into a system and manually replacing the SSL certificate. Right? So it was three years, then it became one. And now in the next three years, it's going to be all the way down to 47 days. I don't recall because it's one of those odd numbers. Why would you pick 47? I actually have no idea.

speaker-0 (42:09)
I'm sure there's a I'm sure there is a reason. Logic, right? I don't know what it is.

speaker-1 (42:14)
I'm sure he's right. But you see the point. My point is people should not care about turning on the NSEC to their domain. If, as an industry, we think that this is an improvement in the ecosystem, then let's make that required. And let's have the industry players work to figure out how to provide the right infrastructure for the consumers. And in fact, there are a few TLDs that actually did it.

If you think about it, I don't know if you know it, but ⁓ there are a couple of GTODs like .Bank and .Insurance that enforce the use of DNSSEC ⁓ for every user. So at the registry level, if you are serving your DNS without DNSSEC, so if you are not signing your zone, the domain is disabled.

at registry level, they turn down delegation if they see that you're not using DNSSEC, so if the zone is not signed. So in order to turn that on, you actually need to sign the zone and publish DSKey. There's also a couple of other TLDs, GTLDs, sorry, that was a GTLD. There are a few CCTLDs, if I recall correctly, in the Nordic European area, I believe it's .nl and .ac.

and a couple of others that are not enforcing it in the sense that they're not making full, like if you don't have it, it's off. But they're doing some kind of campaign or some kind of promotion, some kind of communication that really requires it to the point that if you, don't have the stats in front of me, but if you look at the stats, I believe that we're talking about for those DODs, level of DMSEC adoption is above 90 % of the domain space. It's super high.

speaker-0 (44:03)
Interesting. That's really

speaker-1 (44:07)
Yeah. For the others, sometimes it's not even close to 5%.

speaker-0 (44:13)
for sure. That seems high to me even.

speaker-1 (44:16)
That's the point. If we need to get that out, then industry leaders must make that required. So we cannot expect the consumer to just come and say, you know what? I really want DNS today. If we were going to invent DNS today, it would probably be signed. Because DNS was invented in an epoch where we were all hippies and fair people, and we didn't think that, you know... ⁓

the world was not flowers and roses. Therefore, if we were going to do that today, most likely many choices would be different. as we transition towards that, let's just make that super easy for someone to adopt that technology as if it was the default.

speaker-0 (44:53)
question.

For sure, for sure. know, it's still surprising to me that we live in a world where things like GPS isn't signed, that spoofable GPS addresses, it's still something that can happen. ⁓ instead of going down that tangent, I think maybe this would be a good opportunity to start to close out the episode and switch over to PIX. Should we do it? Absolutely. Yes, let's go. What did you bring for us?

speaker-1 (45:21)
Absolutely, Should I start or you want

All right. So I'm switching away from technology because we have talked about technology for a while. so I'm a scuba diver. I'm a Harvard scuba diver. I really enjoy scuba diving. think it's one of those things that to me was absolutely fantastic because it was a way for me to disconnect from everything around. Like literally, because I couldn't bring my phone, at least back in the days, I couldn't bring my phone anywhere. Now you can. You also have waterproof cases.

There's no signal in the water. And so I just want to suggest, I just want to throw the idea to anyone out of there, take a break, do something out in the wild. And if you have never tried scuba diving, go and try that. There's many places, there's solution, there's an opportunity for everyone. If you like rocks, if you like ⁓ ancient history, if you like fishes, if you like nature, if you like... ⁓

whales or sharks. I do love sharks. There's opportunity for any of that. And I live in Italy and out of the coast of Italy, we have plenty of that. Unfortunately, we have a lot of wrecks here because of the Second World War and First World War, which I love to explore. I can share some of the links. There's also ancient history. We have seen a lot of places where there are amphoras and artifacts from the Roman epoch. We don't have as many fishes as

Maldives or the Red Seas, at least big ones. But that's the point, just take a break, go out and try scuba diving if you haven't.

speaker-0 (47:03)
I actually do it myself. I haven't gone since the pandemic, unfortunately, but it's actually really easy to just get a certification in a lot of places to start the process. There's shops everywhere where they'll train you.

It's like a week of ⁓ written stuff and then a week in the water in a pool. then you go on vacation somewhere and finish the diving certification wherever you go on vacation and whatever they have there. I'm not, unfortunately, I'm not a fan of recs so much. Like natural reefs are my thing. But they're everywhere too. And honestly, there's just nothing that compares to it. Storm cooling is okay. It's just not the same as going diving.

speaker-1 (47:41)
Absolutely 100%. Snorkeling is an effort. It feels like an effort, especially what I keep saying, if you enjoy it, you want to spend as much time as you can underwater. Snorkeling, you are limited by your breath capability. Generally, for most people, it's very short or by the wave or by the effort of swimming on the surface. If you can go down with a tank and just spend an hour underwater, it's going to change your day really.

speaker-0 (48:09)
You get 40 minutes underwater and then you're usually forced to resurface. But yeah, no, I'm totally with you. The snorkeling is like swimming, whereas diving is sort of like meditation underwater is really the analogy that I'll make. It is great, honestly. I love it too. I love that pic. Maybe as far as something for the audience, if there's a particular link to a dive site or location that you would recommend, think we'll have that link in the description.

speaker-1 (48:35)
Absolutely, absolutely. I will make sure to share something with the audience.

speaker-0 (48:39)
Great, great. love it. Okay, then I guess I'll share mine. ⁓ I had a different pick, but your analogy that DNS infrastructure is sort of like digital electricity made me ⁓ remember a book that Will actually had shared a lot of episodes ago called One Second After. It talks about the... I'll spare the incoming details about the book and it's really just about sort of a cataclysmic event, sort of a poke-a-pock-a-...

post-apocalyptic event that happens in the United States where you basically lose electricity ⁓ and how people survive. And you were listing out the things that immediately cause problems if you lose electricity. And one of the things that the book brings up that never really occurred to me is that a lot of medicine needs refrigeration. And so there's just a huge impact to humanity. Without the electricity, you think you can get by by either riding a bike

to power a fridge temporarily, but realistically, there's a lot of things that actually require it in order for us to continue going forward. I found it to take a really sort of interested and in-depth approach to understanding what those sort of connected complexities are. And I think there's a lot of similarities to running the digital infrastructure of the internet.

speaker-1 (49:53)
Well, hopefully we're not going to end up in a position where electricity is unavailable for a very long time. And you're going to have to, I don't know, ride a bike in order to either distribute DNS packets around.

speaker-0 (50:05)
I think at that point we may ⁓ sort of graduate back to the Avian flight protocol.

speaker-1 (50:12)
I don't think I've heard about it.

speaker-0 (50:15)
The RFC ⁓ for sending digital packets over carriers.

speaker-1 (50:20)
Yeah, the Easter egg, the April Fool's job, wasn't it?

speaker-0 (50:25)
Yeah, actually, there's a whole bunch of April Fool's jokes written into RFCs. There's like 10 or so of them that you can go through and find them randomly.

speaker-1 (50:33)
The one about HTTP is the teapot status code. Isn't that one?

speaker-0 (50:37)
Well,

so yeah, there's the one about ACP status code 418, I'm a TPOT error response. ⁓ So I know reading RFCs and long articles on stuff isn't everyone's idea of a good time or bedtime reading. There are some good jokes in there. And honestly, if you're in an industry where you're implementing any sort of thing that there has been a standard around it, it really helps to know what that standard was and

you sort of the motivation even that's capsulated in it. So you know the direction that you can go with your technology.

speaker-1 (51:09)
Absolutely. If you have never, I think it's an exercise, honestly, as much as was joking about it, but I think that anyone that has been in this industry for more than probably a year at some point needs to read an RFC. It's really part of your experience learning and growing. Reading it, is sometimes challenging, but at the same time is a learning experience. So go out and read one.

speaker-0 (51:36)
Honestly, I'd say that they're actually pretty simple. They're not like lead like if you ever read a legal contract It's so much simpler to read an RFC compared to that

speaker-1 (51:45)
Absolutely, absolutely.

speaker-0 (51:46)
Well, I'll say thank you, Simone, for coming on and talking to everything related to DNS. It's been absolutely great.

speaker-1 (51:55)
Thanks for having me, Warren. It was really a pleasure to be here.

speaker-0 (51:59)
Of course. ⁓ And thanks to the audience for showing up for this episode and hopefully we'll see everyone back next week.