As discussed above NAT should be a layer encapsulating the kernel. Therefore all other network kernel functions, and thereby all user level programs, would not be able to see the real (IP-) world but only the addresses made visible by NAT. Wether these are real IPs or translated IPs does not matter, they all came from the NAT layer. This makes it possible to have the kernel (and all programs) live in a virtual address space that does not really exist in the real world, only NAT will know the reality. An example of a setup that can be supported by this implementation that a ``regular'' NAT that does translation inside the kernel cannot do is if you have networks that use the same IP address space and want to communicate with one another. I do not know any such setup and never heard of one, but it can be imagined there are two independently administrated networks that once, when internetworking was not even knows as a word, have been built up using the same RFC 1597 address space for each of them. We can connect them to a different interface of the NAT-router, set up rules to translate the IPs depending on which interface they arrived on, and those networks can exchange IP packets.
To achieve independence and flexibility the changes made to the kernel itself are marginal. Most of them do some initialization on kernel startup or are hooks where the real NAT module can register itself with the kernel. The only real NAT code in the kernel are calls to a function in the module that examines each IP packet. These calls take place right when an IP packet comes in and right before an IP packet is going to be transmitted, and are done only if the NAT module has been inserted into the kernel, i.e. without the module the system will behave just as if it had been compiled without NAT support.
NAT-Rules can be bound to the inbound or the outbound direction. If that would not be possible NAT could not be laid around the entire kernel and we could only translate incoming or outgoing packets, loosing flexibility.
I reused large portions of the packet matching code of Linux' firewalling. Here the above discussion about why it has been implemented the way it is and not integrated with other parts of the kernel could be continued, because one might rightly argue that packet matching is the same for firewalling and for NAT. Again, to cite myself, I wanted to implement and test NAT in the time I had for the project, and finding a good and general solution for integrating parts of the kernel was too big a task, especially since
Above I mentioned we need to keep information about fragments if we want to do port dependent NAT. Linux has the ability to defragment all packets it routes. This is even more than we need since it's only necessary when we need the port information for NAT. This way we do not need to keep fragment information. However, it does not completely work: since NAT has been laid around the kernel encircling everything it also encircles the defragmenting code, i.e. NAT for incoming packets is called before defragmentation can be done. That is why port dependend translation can only be done reliably for outgoing packets. Changing the defragmenting code so that it gets incoming packets before NAT does should be easy, but I did not bother to do it.
The data structure used to store the various address translation rules
is a list. Each rule specifies some criteria a packet has to match
in order to be translated using the rules NAT-IPs. These criteria
are source and destination IP and mask, source and destination port,
protocol (UDP,ICMP,TCP) and the interface where the packet has arrived
on or is going to be sent through. It is also checked if the packet
matches the reverse of the current rule in order to enable bidirectional
rules. If any of the data used for packet matching is left empty every
packet matches this criteria. Because order is important there was
no choice but to use a linear list. There are also skip-rules which,
when matched by the packets IP header, cause the NAT function to skip
this packet. Each rule has a unique number so that new rules can be
inserted anywhere into the chain. This is another example where NAT
is more flexible than the current firewalling code that could have
been used, and in order to use still have this flexibility when integrating
NAT into existing code I would have had to rewrite the other code
as well, which would have been too much work.
Since I wanted the implementation to be as flexible as possible I
had to find a way to allow storing information needed for such different
kinds of NAT as static NAT, dynamic NAT and virtual servers all in
the same structure, i.e. the NAT rules should look equal whatever
kind of NAT they represented. Therefore the the structure that stores
exactly one rule contains all the information needed for packet matching,
some additional information like flags, packet and byte counters and
the rules identification number. The various information that is different
in respect to data types and amount of data for each kind of NAT gets
stored in dynamically allocated memory and the rule only has a pointer
to the start of the area where this information lives.
The following example shows some static NAT rules where all the necessary information is included in the rules and a virtual server rule (the second rule). The virtual server rule needs additional dynamic data, which are a list of all the IPs where all requests to the virtual server should be redirected to and a (more dynamic) list of clients and to what server they have been connected to. The list of clients may become quite large so an appropriate data structure is necessary to ensure a minimum overhead for searching this list. A hash or a (balanced) binary tree would work. I have used a linear list, in contradiction to my own proposal. The reason was because this implementation served for experiments and I thought it easier to track a linear list in case something went wrong and I had to debug the code. It can easily replaced by a more sophisticated data structure allowing much faster search algorithms.
Another puzzle to solve was how to get the NAT rules specified by
the user into the module, which is part of the kernel. Linux has several
interfaces for user- to kernel-communication. One that is special
to networking is the call to the function setsockopt() . The
Linux firewalling administration tool ipfwadm written by Jos
Vos (jos@xos.nl) uses this interface for sending data to the firewall
code. Since I have used ipfwadm as a basis for my NAT administration
tool ipnatadm , I have reused the idea and modified it only
slightly in order to allow for changes in the structure and size of
the data exchanged without having to recompile the kernel each time,
so that all checks for validity of the data are done by the module
and the kernel just passes all data received via setsockopt()
without looking at it.
The function setsockopt() provides one-way communication only.
We want to get some data back, however, since we are a curious species
who always want to know what is going on behind the scenes. The Linux
kernel implements a great feature for this, the proc-filesystem. This
is not really a filesystem although it looks just like that to the
user. You will notice that all files under /proc have zero size, but
if you try, for example, a cat /proc/cpuinfo you will get some
output. What happens is that when you access a file of the /proc-hierarchy
inside the kernel a function is called -- which is a different one
for each file under /proc -- which produces some output that is given
to the user space program as the contents of the file. It is also
possible to write to some of these files, thereby sending data to
a kernel level function, but I have not used this feature but chose
setsockopt() instead for the only reason that the example program
I used did so and I did not want to spend time writing completely
different code. Currently there are two files starting with ip_nat_*
in /proc/net/, showing information for core NAT and for virtual servers,
when the module has been inserted into the running kernel. They list
the contents of the dynamic data structures for the NAT service they
represent, such as the NAT rules or what real servers belong to a
virtual server rule.
Below is a graph that shows how the module interacts with the kernel and the user:
When an IP packet arrives (1) the kernel calls the NAT module (2)
giving it a pointer to where the packet has been stored in kernel
memory. The NAT module examins the packet and does address translation
if it matches any NAT rule. It than returns (3) the packet to the
kernel which in turn continues as usual (4), doing routing or delivering
it locally to a process. The same happens to all outgoing packets
(5/6), which are packets we route and locally generated packets, just
before they are transmitted and just before any ARP is done. The packet
is than given back to the kernel for further processing, which means
the device driver sends it out on the wire (7).
The user can influence the process by using ipnatadm to send
instructions and data to the module via setsockopt() -calls,
such as new NAT rules or instructions for deleting a rule or the like.
They can view the contents of the dynamic data structures where the
module stores the NAT rules and dynamic information collected while
running directly by viewing the contens of the NAT-files in the /proc-filesystem
or by using ipnatadm , which also uses these files but rewrites
the lines into a more human readable format.