In this post we'll take a look under the covers at what happens we look up hosts.
If you're here, you'll have an idea that the answer is basically "DNS" - but that's not the whole story. Let's make things concrete; we're going to answer the question: how does the following snippet of (pseudo-)code figure out what host to connect to?
conn = socket::connect("www.jameselford.com", 443);
DNS is just one part of the answer.
tl;dr
Most programs on a normal linux system will look in /etc/nsswitch.conf
to
figure out how to resolve hostnames. But lots of programs won't. Pretty
much everything will find your DNS servers in /etc/resolv.conf
. Generally,
this process will involve a call to getaddrinfo
, provided by your system's
standard C library.
If that's what you came to find out, then you can go back to the internet now; thanks for your time. Stick around if you want to get into altogether too much detail about how this all happens.
what's coming up
We'll see:
- How our program ends up calling into
libc
(by diving into the source) - How
libc
resolves our hostname (with a little help fromstrace
) - and along the way, some opportunities to tweak how our system does host lookup
We'll start in Rust land, because that's kind of my jam, but
you could do a similar journey for any high-level programming language. Maybe
you hate jam. Maybe you're more of a Python / marmalade / Ruby sort of person.
That's fine too; ultimately it doesn't matter how you do it, so long as there's
a thick layer of sugar on your breakfast. In C, you'd just skip the first step
and go straight to libc
. I guess in this analogy, C is the toast.
check the source - peeking into the (Rust) standard library
In Rust, the pseudo-code above looks like this:
let c = TcpStream::connect("www.jameselford.com:443");
... so TcpStream::connect
is our jumping-off point. connect
takes any
argument with a corresponding ToSocketAddrs
implementation - and the
standard library comes with implementors for a whole range of sensible types.
Here's the
implementation for str
:
// accepts strings like 'localhost:12345'
#[stable(feature = "rust1", since = "1.0.0")]
impl ToSocketAddrs for str {
type Iter = vec::IntoIter<SocketAddr>;
fn to_socket_addrs(&self) -> io::Result<vec::IntoIter<SocketAddr>> {
// try to parse as a regular SocketAddr first
if let Ok(addr) = self.parse() {
return Ok(vec![addr].into_iter());
}
resolve_socket_addr(self.try_into()?)
}
}
The call to parse
near the top brings some extra conversion magic into the mix,
but all that's doing is leaning on SocketAddr
's FromStr
implementation,
which I won't list here, because it's just dealing with the case that the str
is already a straightforward SocketAddr
(e.g. 192.168.1.1:80
). That's not
our case.
Next comes resolve_socket_addr
.
"Resolve" - that's a familiar word from the world of DNS; sounds like it could
be what we're looking for. Let's dig in:
fn resolve_socket_addr(lh: LookupHost) -> io::Result<vec::IntoIter<SocketAddr>> {
let p = lh.port();
let v: Vec<_> = lh
.map(|mut a| {
a.set_port(p);
a
})
.collect();
Ok(v.into_iter())
}
Uh... huh... looks like a simple transform of the lh
input argument
into the result. Also, I can't help noticing, this function returns a Result
(which is what we expect! ... we're looking for something that can reach out
and hit remote DNS servers, over the network after all), but this function is
infallible: it can only return Ok(...)
at the end.
Just one thing: our str
has become lh: LookupHost
, and it appears to have all
the answers. Where did that come from? Let's back up to our to_socket_addrs
function - notice the last line:
resolve_socket_addr(self.try_into()?)
That try_into
is where the magic's happening. The name try_...
and the
Result
return type (indicated by the ?
) are good hints that something's
going on here. On first pass, I assumed it was "just" a straightforward
conversion into another more convenient type for the lookup, but in fact,
try_into
is the lookup, as we'll see. Let's skip over to the
implementation... but... how exactly do we find that? No mention of LookupHost
on doc.rust-lang.org's normally trusty search.
further into the source - standard library internals
So we need to look deeper than what's publicly exposed by the standard library, into the implementation details.
This is good news: it means we're getting to the heart of it. "Where do host names come from?" is exactly the sort of thing that we want the standard library to take care of for us, so we knew we'd have to look behind the curtain at some point.
Okay, enough narrative, let's see some code.
We'll shift over to the Rust sources: here's
the TryFrom
implementation for LookupHost
we were after (this allows us to
use try_into
to to_socket_addrs
above). We'll skip over that as it just splits
the str
up into a (host, port)
pair, and calls try_into()
again, which is
defined immediately below in the same file:
impl<'a> TryFrom<(&'a str, u16)> for LookupHost {
type Error = io::Error;
fn try_from((host, port): (&'a str, u16)) -> io::Result<LookupHost> {
init();
let c_host = CString::new(host)?;
let mut hints: c::addrinfo = unsafe { mem::zeroed() };
hints.ai_socktype = c::SOCK_STREAM;
let mut res = ptr::null_mut();
unsafe {
cvt_gai(c::getaddrinfo(c_host.as_ptr(), ptr::null(), &hints, &mut res))
.map(|_| LookupHost { original: res, cur: res, port })
}
}
}
Ah hah! Unsafe code! With a strong whiff of FFI about it! We're getting to the
good stuff now. That call to getaddrinfo
is the final step that takes us into the libc
- which is, ultimately, where
the sausage gets made.
There is one more hop here:
c::getaddrinfo(...)
It's natural to read that as "call getaddrinfo
from a C library", but c
is
just a Rust namespace like any other, so this function is being imported from
somewhere. Scroll up to the top, and you'll see it comes from:
use crate::sys::net::netc as c;
sys
is where the Rust standard library keeps its platform-specific code, so
the implementation will depend on the current platform. On unix platforms,
sys::net::netc
is defined as backing on to libc
:
pub extern crate libc as netc;
... but that needn't be the case everywhere - for example on the wasi platform
netc
is defined as a native Rust module, with no libc
in sight.
Okay, that's enough technicalities: for our purposes, this c::getaddrinfo
is
a call through to libc
.
quick asside on FFI and libc
FFI is short for "foreign function interface".
Rust leans on existing libraries C for a whole bunch of functionality, and in
this case, for functionality built into libc
. When languages call into other
languages that exist outside their own ecosystem (in this case, Rust to C),
that's FFI.
libc
is a widely-scoped C library that's provided as part of POSIX. It
handles all sorts of common functionality - hostname resolution is one area,
but in fact Rust delegates to libc
for pretty much anything that touches the
network, opens files, spawns new processes, ... in short: anything that
requires making System Calls. Rust isn't alone in leaning on a libc
for
this; most programming languages eventually call through to libc
(Python,
Ruby, and Java all do, for example). That's good news; from this point on, what
we learn translates well across languages. Go is a notable exception here -
but more on that later.
POSIX only specifies the interface that libc
has to provide, and
there are several implementations. On Linux, the most common implementation of
libc
is glibc
, so that's what we'll
talk about next, but others do exist. Again, more on that later.
rephrasing the question
Now that we've established that we call through to libc
's getaddrinfo
, and
that libc
is commonly implemented by glibc
, we can rephrase the question as:
how does
glibc
implementgetaddrinfo
?
Before we dig into that, let's take stock. We've arrived at the conclusion that
we call through to the commonly used libc
. We've said that pretty much every
language does the same thing. That means that the answers we're looking for
will be applicable pretty much everywhere, including in C programs. There's
going to be some prior art on this.
what does the internet say?
What do we know already about how our programs find hosts?
- Eventually we know that we'll connect to a DNS server and make a request
- We probably know that at some stage the
hosts
file comes into it - We know that DNS configuration is system-wide. When we connect to a network, our computers automatically figure out what DNS to use (or we configure it ourselves).
- Maybe some caching?!
localhost
is always the loopback interface - presumably we don't ask the DNS servers for that?- There's something special about the
.local
domain?
Let's start with the global configuration of DNS servers. If we search for "dns linux" we get a few interesting hits:
- There's this article pointing us to resolv.conf, which contains DNS configuration.
- The Arch Wiki entry on domain name resolution
tells us that
resolv.conf
is generally overwritten by network managers like GNOME's aptly-named NetworkManager. That's re-assuring, as it joins the dots from the familiar configuration options we get when we go to our environment's Network Settings screen, to something thatlibc
and friends can find. - The same page also tells us that
glibc
's implementation ofgetaddrinfo
is backed byNSS
, which reads from nsswitch.conf. Hopefully we'll be able to see where that happens.
Let's have a look in those files and see what we can see. Here's the contents
of my resolv.conf
:
j@.. ~/s/dns-experiment> cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 192.168.1.2
nameserver 8.8.8.8
nameserver 8.8.4.4
It checks out:
192.168.1.2
is the address of my local DNS resolver (a pihole, if you're interested)8.8.8.8
and8.8.4.4
are Google's public DNS, which I have configured as my fallback DNS.- The file mentions that it's generated by
NetworkManager
, which tallies up with what we saw above.
And here's the relevant section of my local nsswitch.conf
(used by glibc
):
# Generated by authselect on Thu Jun 18 09:08:01 2020
# Do not modify this file manually.
--- snip ---
hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname
authselect
is something new - looks like on my Fedora system there's one more
layer of machinery involved in generating this file, but let's not worry about
that for now.
The interesting part is what it has to say about host resolution: it mentions
dns
, but also three sources of naming information that we haven't discussed
before: files
, mdns4_minimal
, and myhostname
.
The hosts
line is read in order.
-
files
means what it sounds like: it tellsglibc
to look in local files. For host lookup, that means/etc/hosts
. -
mdns4_minimal
doesn't have a special meaning. Theman
page says this about how these names are translated into meaning:Libraries called /lib/libnss_SERVICE.so.X will provide the named SERVICE.
So,
mdns4_minimal
will be implemented by a separate dynamic library calledlibnss_mdns4_minimal.so.2
(the2
depends on theglibc
version; for2.1
and above it's2
). A small note here; for me these files are in/lib64
, not/lib
.So, what is
mdns4_minimal
? Well, it comes from thenss-mdns
project, which provides Multicast DNS, a system used for host discover on local networks. This is what allows us to resolve.local
hostnames. -
dns
gets a shout out as explicitly allowed in the docs, but I notice the presence oflibnss_dns.so.2
in my/lib64
directory. Now that I look, there's alibnss_files.so.2
in there too. I guess we'll see how that fits together when we dig intoglibc
. -
myhostname
is asystemd
module (docs) that will resolve the local system's hostname.
Something that tripped me up: [NOTFOUND=return]
.
It says that if mdns4_minimal
returns NOTFOUND
, then we can stop early and not
bother with querying dns
or myhostname
. My question was: since
mdns4_minimal
is only going to be able to find .local
hosts, doesn't this
action prevent us from moving on to use real dns
for everything else? In fact
NOTFOUND
is only returned if mdns4_minimal
both believes itself responsible for
looking up the name and then fails to do so. Otherwise,
mdns4_minimal
will return
an UNAVAIL
status, and resolution will continue.
That gives us a clearer picture of what we're expecting to find glibc
. Let's
finally proceed.
what does glibc
do?
read the code
In this section, we'll take a brief look at the sources of glibc
, but we
won't dig through all the details.
Here's
glibc
's implementation of getaddrinfo
. It's about 350 lines long, and
you're welcome to pick through, but it delegates the real work of lookup to
another function, gaih_inet
(further up
in the same file). That function's just shy of 800 lines, and we're along
the right lines, since references to some of the concepts we were looking
at above are creeping in now:
- a reference to
/etc/nsswitch.conf
in the comments - calls through to functions with
nss
in the name, like__nss_database_lookup2
and__nss_lookup_function
.
Notice that __nss_lookup_function
takes the name of another function as an
argument. If we take a peek the sources
for __nss_lookup_function
, we'll see that it's responsible for dynamically
loading in further libraries - presumably this is what links us up to the
libnss_SERVICE.so.X
files we saw above, eventually resolving the required
function with __libc_dlsym
.
I don't know about you, but I find these sources a bit hard going; alongside the business logic of looking up the various resolver functions, there's a lot of housekeeping.
Time for a change of tack.
strace
to the rescue
There is another way we can get to the bottom of what's going on inside the
loveable but inscrutable yak that is glibc
: strace
. If you haven't seen
strace
before, then I promise you're going to love it: strace
monitors all
System calls from and signals to a given process.
We've mentioned System Calls in passing already when talking about the role
of libc
in Rust - I said libc
was used for:
anything that touches the network, opens files, spawns new processes ...in short - anything that requires making System Calls
So what exactly is a System Call? Well it's anything that you need to ask the Kernel to do for you. Anything that touches the Real World.
So, strace
provides us with a straightforward way to answer the question of
what a given program does - really does - without looking at the source
code. So let's put together a simple program that will call getaddrinfo
,
and strace
it.
You can find the sources here,
but it boils down to a call to getaddrinfo
, then printing out the results.
We can run it through strace
with:
target=x86_64-unknown-linux-gnu
cargo build -q --target ${target}
strace -e desc,network,file -o "strace-${target}.log" -f "./target/${target}/debug/dns-experiment" www.jameselford.com
# ipv4 185.199.108.153: jelford.github.io
# ipv4 185.199.110.153: <unknown canonical addr>
# ipv4 185.199.109.153: <unknown canonical addr>
# ipv4 185.199.111.153: <unknown canonical addr>
The strace
incantation is:
-e desc,network,file
: report any system calls relating to file descriptors, the network, and files.-o
: put the trace into a separate file, rather than interweaving with the output of the process being traced-f
: follow any process forks.
The program's output is telling us that www.jameselford.com
resolves to
185.199.108.153
, which has a canonocal name of jelford.github.io
(so, now
you know that this blog is hosted on GitHub Pages).
But we could have got that from dig
; we came here for the strace
output!
You can see it here. It makes for pretty dense reading, so I'll surface the interesting stuff, snipping out most of the calls I don't think are relevant.
Let's start at the top:
139893 execve("./target/x86_64-unknown-linux-gnu/debug/dns-experiment", ["./target/x86_64-unknown-linux-gn"..., "www.jameselford.com"], 0x7ffd292c7e10 /* 46 vars */) = 0
139893 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
139893 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
139893 openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
139893 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\"\0\0\0\0\0\0"..., 832) = 832
139893 fstat(3, {st_mode=S_IFREG|0755, st_size=36800, ...}) = 0
139893 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda8e374000
139893 mmap(NULL, 24688, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda8e36d000
139893 mmap(0x7fda8e36f000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fda8e36f000
139893 mmap(0x7fda8e371000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7fda8e371000
139893 mmap(0x7fda8e372000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7fda8e372000
139893 mmap(0x7fda8e373000, 112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda8e373000
139893 close(3) = 0
I did mention that it was going to be dense.
A note on how to read this:
- the first word is the name of the systemcall.
- following are all the arguments, in brackets
- then the return value at the end, after an
=
strace
tries hard to convert arguments and return values into the
corresponding names like you'd find the in C header files, and even gives us
the text description for errors. Helpful, right?
So, what's happening in this snippet is that a shared library (libdl
in this case) is opened, read, mapped into memory (notice PROT_EXEC
-
which maps the libraries into executable memory, which is what we need if
we're going to run code from them!), then closed. This is the first of many
shared libs that get loaded at the start of the process. The whole process is
easy to recognize in the trace once you know what's going on, so, but I'll
spare you the housekeeping for the rest:
139893 openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
139893 openat(AT_FDCWD, "/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
139893 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
At this point we've got the following libraries loaded:
libdl
: will be used later for dynamically loading more shared libraries.libpthread
: threading support.libgcc
: GCC runtime support. Rust doesn't compile with GCC, butglibc
does, so our executable must loadlibgcc
. Even C has some runtime.libc
: this isglibc
, on my system, and by now we were expecting to see it
Let's move on. We're looking to see what glibc
does next:
139893 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
139893 connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
In this section, we're attempting to connect to nscd
,
the "Network service cache daemon". It doesn't appear to be running, and
systemctl
denies all knowledge too. I've got local man
pages for nscd
but there's no sign of it on my $PATH
and a quick rg
on a copy of the POSIX
spec doesn't show anything up. locate '*ncsd*'
turns up a few hits in man
pages and embedded in flatpaks, but nothing in the main system. I guess this
isn't important anymore (just maintained in glibc
for backwards
compatibility). I'd love to hear about it if you know differently.
139893 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
139893 read(3, "# Generated by authselect on Thu"..., 4096) = 2556
Okay, so this is loading nsswitch.conf
, the first of the config files we
mentioned earlier. Looks like glibc
looks at that first, to decide what
it'll do next. This ties up with what we're expecting; so far so good. We can
see the read
call getting the familiar "generated by authselect..." header
that we saw before.
Where does it go from there? As a quick reminder, nsswitch.conf
listed:
hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname
So, we're expecting it to consult the files
source first. Back to the trace:
139893 openat(AT_FDCWD, "/etc/host.conf", O_RDONLY|O_CLOEXEC) = 3
139893 read(3, "multi on\n", 4096) = 9
Not quite what we were expecting. Here's the contents of my host.conf
:
j@.. ~/s/dns-experiment> cat /etc/host.conf
multi on
man host.conf
tells us that this line tells the resolver value some details of how to
interpret /etc/hosts
. Okay. So we're still expecting to openat("/etc/hosts")
soon...
139893 openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3
139893 read(3, "# Generated by NetworkManager\nna"..., 4096) = 91
We saw earlier that this file specifies our DNS nameservers. We shouldn't need
those yet as we haven't even gotten to /etc/hosts
- but another look at
man resolv.conf
reveals other options that may be relevant to the resolution process. Onwards...
139893 openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
139893 openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
Here we are - the hosts
file. files
, along with dns
, got its own special
mention in the nsswitch.conf
man page, but it looks like this source is read
using the same general mechanism as the other sources; load an appropriately
named .so
and hook into that. I won't drag you through the sources, but
glibc
does ship a module called nss_files
, and sure enough, it knows
where to look for the hosts
file.
My hosts
file is empty, so we won't find anything interesting in there. Now
we're expecting to see our application move on to mdns4_minimal
, and dns
.
It should never get to the myhostname
since - assuming you're reading this,
we're going ot find a DNS entry for www.jameselford.com
from DNS.
Here's mdns4_minimal
:
139893 openat(AT_FDCWD, "/lib64/libnss_mdns4_minimal.so.2", O_RDONLY|O_CLOEXEC) = 3
Nothing to see here. We're not looking for a .local
domain, so that's just
going to return without further ado. Next is DNS:
139893 openat(AT_FDCWD, "/lib64/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 3
139893 openat(AT_FDCWD, "/lib64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 3
139893 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
139893 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
139893 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.2")}, 16) = 0
139893 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
139893 sendto(3, "\271\253\1\0\0\1\0\0\0\0\0\0\3www\vjameselford\3com"..., 37, MSG_NOSIGNAL, NULL, 0) = 37
139893 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])
...
139893 recvfrom(3, "\271\253\201\200\0\1\0\5\0\0\0\0\3www\vjameselford\3com"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.1.2")}, [28->16]) = 132
...
139893 openat(AT_FDCWD, "/etc/gai.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
That's more interesting! libresolve
is also part of the wider glibc
package
(description here).
We can use ldd
to see
that lib_resolve
is brought in as a dependency of libnss_dns
:
j@.. ~/s/dns-experiment> ldd /lib64/libnss_dns.so.2
linux-vdso.so.1 (0x00007ffda434b000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f50fc674000)
libc.so.6 => /lib64/libc.so.6 (0x00007f50fc4aa000)
/lib64/ld-linux-x86-64.so.2 (0x00007f50fc6b3000)
We can also see that, finally, we're making a DNS call:
connect(... sin_addr=inet_addr("192.168.1.2")...)
- that's a call out to my DNS server (that we saw configured in/etc/resolv.conf
).poll
is used to wait for the socket to be "ready" - whether that's to finish the business of connecting, sending data, or reading data.poll
is used in asynchronous IO - which is a big topic, so I'll just draw attention to theSOCK_NONBLOCK
option passed when the socket is initialized. This is what configures the socket to work in non-blocking (asynchronous) mode.sendto
andrecvfrom
- these are the calls that actually send our DNS requests out to the world, then read the results back.
The final attempt to read gai.conf
allows some customization of the order with which results are returned from
getaddrinfo
.
Great, we're done! Well... not quite.
Before our application can print out the results we get... this...:
139893 socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3
139893 getsockname(3, {sa_family=AF_NETLINK, nl_pid=139893, nl_groups=00000000}, [12]) = 0
# Params truncated
139893 sendto(3, { {len=20, type=RTM_GETADDR, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1594996449, pid=0}, {ifa_family=AF_UNSPEC, ...} }, 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20
139893 recvmsg(3, ...) = 252
139893 recvmsg(3, ...) = 72
139893 recvmsg(3, ...) = 20
...
139893 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
139893 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("185.199.108.153")}, 16) = 0
139893 getsockname(3, {sa_family=AF_INET, sin_port=htons(39670), sin_addr=inet_addr("10.150.1.131")}, [28->16]) = 0
139893 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
139893 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("185.199.110.153")}, 16) = 0
139893 getsockname(3, {sa_family=AF_INET, sin_port=htons(41427), sin_addr=inet_addr("10.150.1.131")}, [28->16]) = 0
139893 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
139893 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("185.199.109.153")}, 16) = 0
139893 getsockname(3, {sa_family=AF_INET, sin_port=htons(46916), sin_addr=inet_addr("10.150.1.131")}, [28->16]) = 0
139893 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
139893 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("185.199.111.153")}, 16) = 0
139893 getsockname(3, {sa_family=AF_INET, sin_port=htons(54403), sin_addr=inet_addr("10.150.1.131")}, [28->16]) = 0
...
AF_NETLINK
is a socket family that provides socket-based communication
between the Kernel and Userspace (man pages),
and the NETLINK_ROUTE
family being connected to is responsible for routing.
The truncated recvmsg
calls after we establish the AF_NETLINK
connection
contain a lot of info about my local network interfaces - interface names, IP
addresses, that sort of thing. I've truncated them just because they don't play
well with the blog's markdown parser. All of this happens after the DNS lookup
has finished, here:
/* Now we definitely need the interface information. */
if (! check_pf_called)
__check_pf (&seen_ipv4, &seen_ipv6, &in6ai, &in6ailen);
Having gotten information about the various network interfaces, glibc
goes
ahead and "connects
" to the IP addresses it got back from the host lookup
procedure. There's no real connecting taking place here though; the socket is
set up with type SOCK_DGRAM
, which means it's connectionless. The source
has this comment:
/* We overwrite the type with SOCK_DGRAM since we do not
want connect() to connect to the other side. If we
cannot determine the source address remember this
fact. */
The game here seems to be to do just enough to call getsockname
, and use the
information it gets back to sort the final output list of getaddrinfo
. Why
does it go to so much effort to sort the output list? There aren't really too
many hints right in the code, but let's rewind to the man
page for
getaddrinfo
, which mentions:
Normally, the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484
Here's that RFC, which I don't intend
to get into, but I will note that it covers sorting both on destination
address (i.e. the host we intend to connect to), and the source address
(the interface that we connect from). I don't see that source address is
included in the return value of getaddrinfo
, but it looks like it is used
as part of determining the order of results.
Okay, finally we can print out the results that we saw at the top of the
section - a series ofwrite
calls on file descriptor 1
(which is stdout):
139893 write(1, "ipv4 185.199.108.153: jelford.gi"..., 40) = 40
139893 write(1, "ipv4 185.199.110.153: <unknown c"..., 47) = 47
139893 write(1, "ipv4 185.199.109.153: <unknown c"..., 47) = 47
139893 write(1, "ipv4 185.199.111.153: <unknown c"..., 47) = 47
139893 +++ exited with 0 +++
Phew! That was quite a journey.
summary so far
Let's wrap up what we've seen so far:
- Rust's standard library calls out (after jumping through a few hoops of its own)
to
libc
'sgetaddrinfo
. - Assuming that
libc
isglibc
, it then:- checks for
nscd
, a caching daemon that doesn't seem to exist on my host - reads
nsswitch.conf
to determine what to do next. Based on that it... - checks in
host.conf
andresolv.conf
to determine how to interpret what it finds when it... - reads the
hosts
file (in case of statically configured naming info), - tries
mdns4_minimal
(in case of.local
domains), - queries DNS (using the DNS server it found earlier in
resolv.conf
), - sorts the results according RFC-3484, using information about local interfaces, and...
- finally returns a list of host lookup results
- checks for
That gives us a pretty clear picture of where to look if we want to configure or understand how hostname lookup is working on our system:
/etc/nsswitch.conf
is used to determine how to look up hosts (maintained byauthselect
)/etc/hosts
allows us to configure a static set of names.local
domains have their own specialmdns4
thing going on,- and finally,
/etc/resolv.conf
lists our name servers (maintained byNetworkManager
)
Simple. This feels like a good place to stop, so... let's just look at one more thing...
what if we're not using glibc
?
Now... getaddrinfo
is specified by POSIX, but the rest of the details are not.
nss
is a glibc
(well... inspired by Sun) invention, so everything from that
point on is implementation dependant.
This matters when:
- We're not using
glibc
as ourlibc
.- Other 'nixes, like BSD, come with their own
libc
's. - The normal alternative on linux is
musl
, which, apart from being the default choice when producing static binaries in Rust, is the system-widelibc
for some Linux distributions - in particular alpine, a popular choice for container base images.
- Other 'nixes, like BSD, come with their own
- We're not even using
libc
. Go makes its syscalls directly (well, on some platforms - including Linux) - what about other fundamentallibc
functions likegetaddrinfo
?
Let's round this out with a quick look an strace
generated by the same Rust
program as above, but this time targetting musl
:
target=x86_64-unknown-linux-musl
cargo build -q --target ${target}
strace -e desc,network,file -o "strace-${target}.log" -f "./target/${target}/debug/dns-experiment" www.jameselford.com
# ipv4 185.199.110.153: jelford.github.io
# ipv4 185.199.109.153: jelford.github.io
# ipv4 185.199.108.153: jelford.github.io
# ipv4 185.199.111.153: jelford.github.io
Right away we notice that we get different results with musl
vs glibc
!
musl
has come back with canonical names for all results, whereas glibc
didn't. The ordering is also different, though we'd have to look into RFC-3484
to know what to make of that.
You can find the full log of the strace
here.
The first thing you'll notice is that it's a lot shorter than the
glibc
-based trace; 25 lines for the musl
binary vs. 170 for the glibc
one.
The first thing that's gone is all the shared library loading. That's about 50
lines from the start of the glibc
trace. The next thing to note is that we're
not strace
ing everything here - just syscalls related to file descriptors,
network, and file operations. So, we shouldn't read too much into a line-count
comparison.
So, what does the musl
version do? I'll summarize here, but encourage you to
look through the trace - it's only 25 lines!
- First up, we go straight to reading the
hosts
file - Nothing there, so next we'll read
resolv.conf
. - We use the nameservers we just found to issue DNS requests (
sendto
calls) - Get data back (
recvfrom
) - ... and we're done!
No nsswitch.conf
, no lookup of information about local interfaces,
no checking host.conf
for what we should do with multiple results from hosts
(perhaps that would get read if there was something in there?).
Overall, it seems to just do a lot less... but there is one thing that stands
out - whereas glibc
only issued one DNS request (to my local resolver),
musl
goes ahead and fires off requests to all three configured nameservers.
That's a bit of a surprise to me. I use the local resolver for blocking trackers
and ads - what would happen if the remote nameservers returned first? Would
musl
happily come back with the results and effectively ignore my blocker?
A whistlestop tour of the source:
getaddrinfo
calls into...__lookup_name
which calls into...name_from_dns_search
, andname_from_dns
in the same file, which finally calls out to...__res_msend_rc
, which issues DNS requests to all the nameservers in parallel.
My reading of name_from_dns
and __res_msend_rc
is that the first reply
from any server will win, but to be confident in that, I'd want to test it out;
one way would be to introduce a delay on my local resolver's responses...
Something for another post perhaps.
just one more thing...
Oh hey, go
does its own thing! I mentioned before that it doesn't use libc
for its system calls. But what about using getaddrinfo
, and the whole nss
thing? Well it turns out... it depends!
The net
package documentation
has some detail on this. I'll leave it there though, as I'm all straced
out.
copyright
One final note: all the source code snippets above (that don't contain my name) are taken from either:
- the Rust project's official sources, where the original authors retain copyright; or
- the
glibc
sources (copyright notice here)