trekhopd v1.50 - Netrek firewall hop daemon
By Andy McFadden (fadden@uts.amdahl.com)
Updated: 03-Sep-93


OVERVIEW:

"trekhopd" (Trek Hop Dee) is a daemon process intended to run on an Internet
firewall machine.  It will pass TCP and UDP packets from a modified netrek
client from a host on the inside to a server on the outside.

This program was developed as an alternative to "gw", a general-purpose
firewall bridge.  trekhopd has several advantages:

- works with RSA v2.0 servers.

- the server host and local UDP port are specified by the Netrek client at
  execution time, and may optionally be checked against a list of "approved"
  server hosts & ports.  "gw" had one port for every possible server, and
  listened to them all; the client was required to attach to the right port
  on the server machine.  This is no longer necessary.

- local networks (or local hosts) can be specified.  By setting some variables
  in the configuration file, you can block connections from machines outside
  the network.  Connections TO machines on the listed local networks are
  always forbidden unless the host is specifically listed on the "approved"
  list.

- the UDP ports aren't opened until they're needed; "gw" used two file
  descriptors per UDP port whether they were in use or not.  Also, the
  UDP ports on the firewall machine are allocated dynamically, instead of
  being specified in the configuration file (they're passed back to the
  client after the connection is established).

- since this is a specific-purpose tool, the risk to the company from users
  doing stupid things is minimized.  It was possible to route data from
  outside the company back in under "gw".  With trekhopd, the configuration
  is done only once, by the sysadmin (or at least someone with a clue).

- because it's a daemon, it runs all the time, so you don't need somebody
  to start it up (ask your sysadmin to add it to the rc scripts).  In general,
  the people using it don't need accounts on the firewall machine.

- trekhopd uses non-blocking connect() calls, so you won't get stalled
  whenever someone else makes a new connection (especially important for
  connections to machines which are down).

- it's easier to adapt MS-II client programs like XNetrekM to work with
  trekhopd.

The primary disadvantage of trekhopd is that it requires a slightly modified
client (need a tweak in socket.c and, for ease of use, some major additions
to main.c).  The server host, server port, and the UDP port must be sent
immediately after opening the TCP connection; if the data is invalid the
connection is dropped before any outside connections are made.


In general, trekhopd uses resources more efficiently and should be more
palatable to firewall sysadmins (mine asked me to write this after hearing
about gw.  He wasn't worried about deliberate actions, but rather silly
people doing silly things.)


COMPILING, INSTALLATION, AND CONFIGURATION:

usage: trekhopd [-v] [-t] [-p listen_port]

    -v sets verbose mode; some extra info is sent to stdout
    -t truncates the log file before starting up
    -p sets the port which trekhopd listens to (overrides the thdrc value)


There's a simple makefile to build trekhopd.  Before you do, there are some
things in thd.h which you may want to change.

#define LOGFILE		"thdlog"

This is where logging information will be placed.  It should be writeable by
whoever owns the trekhopd process.

#define CONFIGFILE	"thdrc"

This is the trekhopd configuration file.  It should be writeable ONLY by the
system administrator.

#define USERFILE	"/tmp/thd.users"

When a USR2 is received, the number of active connections is placed in the
log file, USERFILE is created (or truncated if it exists) and a list
of active connections is placed there.

Note that, unlike "gw", there is no FASCIST mode.  For trekhopd we always
assume that we are concerned about security.

Configure these defines, and then check the Makefile for system dependencies,
in particular UTS2.1 libraries.  Run "make" to make the program.  Where to
install the program is up to the system administrator.

trekhopd should be started up after the other network daemons during the
boot sequence.  It should not use any CPU when inactive; if it does, there's
a bug in the code.


The format of thdrc looks like this (note that the boolean and integer
variables are shown with their default values):

-----
#
# trekhopd configuration file
#

# where to listen for connections from Netrek clients
listen_port:		6592

# set this "on" to use a log file, "off" to not do logging
do_logging:		on

# only allow connections only from machines in "local network" list
local_origin_only:	on

# identify our local network(s) (0 is a wildcard)
localnet:		129.212.0.0

# metaserver connection (special "insulated" TCP connection)
meta_listen_port:	2520
metaserver:		sickdog.cs.berkeley.edu 2520

# set this "on" to require a match with list of "approved" hosts
approved_only:		on

# add a host/port pair for every server you wish to allow contact with
approved: bezier.berkeley.edu		2592
approved: bronco.ece.cmu.edu		2592
approved: rwd4.mach.cs.cmu.edu		2592
approved: calvin.usc.edu		2592

-----


This example should be largely self-explanatory.  If "approved_only" is ON,
then only those hosts/ports specified on an "approved:" line can be used as
destinations.  If it's OFF, then any host not part of an address specified on
a "localnet:" line can be used, and any port greater than or equal to 1024
is valid.

It should be noted that the approved: entries are NOT validated.  You CAN
specify invalid host/port combinations, and they WILL succeed whether
approved_only is ON or OFF.  For this reason, the thdrc file should be
writeable only by the administrator.  Note this is the only way to get at
machines with netrek hooked up to inetd (i.e. through port 592) or local to
the company.

When local_origin_only is ON, connections will only be accepted from machines
which match one of the patterns specified on a localnet: line.  If it is OFF,
then connections will be accepted from any machine.  For obvious reasons,
the default setting is ON.

Feel free to add as many approved: entries as you like; they don't use any
resources (other than the memory needed to hold them).  Same with the
localnet: lines.  Keep in mind that, while they aren't intended to be used for
excluding a specific machine, they can be.  I didn't bother adding the code
to convert DN to IP in the localnet stuff, so you'll have to use the numeric
address if you do.

The metaserver connection is special.  When you contact the metaserver port
(usually "telnet gateway_machine metaserver_listen_port"), trekhopd opens a
connection to the metaserver, reads up to 4K of data, closes the connection,
and then returns the data to the client.  This is NOT a TCP pass-thru
connection.  At no time can the client send information to the metaserver; it
just gets whatever the metaserver volunteers.  You should not, however, point
the metaserver address at a local host (I don't bother with validation;
consider it like an "approved" line).

There can be only one metaserver specified.


----- quick aside for the security-minded -----
To allow a massive breach of security, you would have to:
(1) not specify any localnet lines (or specify only the intruder nets;
    note that 0.0.0.0 will be rejected, and wouldn't work anyway, since we
    ALWAYS forbid connections to non-approved local machines)
(2) set "local_origin_only" to OFF (if it's ON and there are no localnet lines,
    the program will exit)
(3) set "approved_only" to OFF

However, even with all of this done, outsiders will still be unable to
connect to a port < 1024.  To get free access to reserved ports, the actual
internal destination would have to be given on an approved: line.

And, of course, the caller will have to run through a front end which handles
the initial calling sequence...
----- end of aside -----


LOGGING

If logging is enabled, the log file will contain stuff like:

2a50d199: 
2a50d199: ** trekhopd v1.1 started on Tue Jun 30 14:26:49 1992
2a50d1ad: 06 connection from tde.uts.amdahl.com (4396) Tue Jun 30 14:27:09
2a50d1ad: 06 fadden --> bronco.ece.cmu.edu 2592 (client UDP:5110)
2a50d1ae: 06 +client 6   server 7   udp client 9   udp server 8 
2a50d1ae: 06 connected
2a50d1b8: 06 closed (7 9 8)  total tcp: 1000  udp: 0
2a50d1f1: 06 connection from tde.uts.amdahl.com (4411) Tue Jun 30 14:28:17
2a50d1f1: 06 fadden --> bigmax.ulowell.edu 2592 (client UDP:5110)
2a50d1f2: 06 +client 6   server 7   udp client 9   udp server 8 
2a50d1f2: 06 connected
2a50d201: 08 UDP from server bigmax.ulowell.edu
2a50d201: 08 UDP connection complete to server port 3908
2a50d20e: 06 closed (7 9 8)  total tcp: 10844  udp: 3116


The column on the left is the output of time(0) printed in hex format (it's
more compact that way, and easy enough to convert.  I include a plaintext
date on the initial connection for convenience).  In this example, the user
(fadden) started up Netrek on his host (tde.uts.amdahl.com), and connected to
bronco.  Seeing the huge wait queue, he closed the connection and went to
bigmax.ulowell.edu.  We can see that he got in the game because he opened a
UDP connection to the server.

The "total tcp" is the total amount of traffic which went over the TCP line
(with a similar entry for UDP).  This is for traffic in BOTH directions.
This can give you some idea of how much of a strain this is placing on the
network.  Keep in mind that the connect time is NOT a good indication of when
the game began; use the UDP connect time instead (you have to be in the game
to open UDP) if you want to get a bytes/second rating.  The user could sit on
a wait queue for an hour and use very little net traffic.

UDP through trekhopd will work just fine with calvin.usc.edu (home of weird
network problems).  It will print a warning message in the log file because
the UDP packets come from a different address than what was expected.

The log file is mostly for identifying problems; I don't really do much
accounting beyond the total traffic values.  There's plenty of info on what
hosts are connected to what, which is what I figure most firewall admins will
be most concerned with.


CHANGES TO THE NETREK CLIENT

The changes needed to make trekhopd work are not much different from those
needed to make "gw" work.  You need a replacement netrek/main.c, and you need
to add a couple dozen lines to netrek/socket.c.

The "client" subdirectory (which should be in the same archive as thie file)
has all the stuff you'll need, including replacements for main.c and
rsa_key.c, plus the changes you need to add to socket.c.  The "main.c" will
work with "gw" if you just specify -DGATEWAY instead of both -DGATEWAY and
-DTREKHOPD in the Makefile.

main.c now contains some code to restrict the distribution of trekhopd
clients (trekhopd-capable clients have altered copies of reserved.c, and
should NOT be distributed widely).  Basically, you specify which host or
hosts the client is allowed to run on, and it will refuse to run anywhere
else.  See the comments in main.c.

The new main.c accepts a new argument, -H <server-abbreviation>.  When you
give a -H argument (say "-H bezier"), it looks for "bezier" in a file called
.trekgwrc.  It converts that to a full address (bezier.berkeley.edu) and an IP
address (128.32.150.109).  The IP address can be used with a modified
reserved.c to run a blessed client through a firewall (it needs to be in the
table because machines inside a firewall will most likely not be able to
resolve external domain addresses, and hence can't derive the IP address from
the domain address).  The full domain name is what gets passed to trekhopd,
along with the destination port.

The chunk of code in socket.c sends the request to trekhopd, and waits for
a reply.  While the request is a standard Netrek message packet, its
appearance immediately after the connection is opened may cause the remote
ntserv process to crash.  For this reason, the special packet is ONLY sent
when you specify -H.  This allows you to use the same copy of the client on
a local server.

If you specify -p (port) *after* the -H flag, you will change the port where
netrek looks for trekhopd.  Thus, if you change listen_port, you don't need
to do anything fancy, you can just use the client's old -p flag.  (It's a
feature, not a bug. :-) )


TECHNICAL INFO - CONNECTIONS


The initial packet trekhopd expects is a client->server message packet,
which looks like:

struct mesg_cpacket {
    char type;          /* CP_MESSAGE */
    char group;
    char indiv;
    char pad1;
    char mesg[80];
};

The packet is interpreted as if it were:

struct mesg_cpacket {
    char magic;
    char pad1, pad2, pad3;
    long port_request;
    long gw_local_port;		/* client's UDP port */
    char userid[8];
    char host_request[64];
};

The response packet is a mesg_spacket (which is structurally equivalent to
the mesg_cpacket).  It is interpreted as:

struct mesg_cpacket {
    char type;
    char pad1, pad2, pad3;
    long gw_serv_port;		/* these are the UDP ports on the firewall */
    long gw_port;		/* (the client tells the server about first) */
    long serv_port;		/* actual port on server host, for reserved.c */
    char unused[68];
};

 
After the connection to the client is established and has been validated
against localnet (unless local_origin_only is OFF), trekhopd fills a buffer
with incoming data until the entire mesg_cpacket has arrived (it doesn't
block on the connection, so you can't jam the daemon by sending a few bytes
and sleeping).

The user's request is then validated against the approved list and (if
approved_only isn't set) localnet.  After we have decided that the request
is valid, the TCP connection to the server is established, and ports are
opened for UDP.  The response packet is returned at this time.

Note that the *client* UDP port is connect()ed, but the *server* port is left
hanging.  This is done because the UDP connection can close and reopen on
a different server port during the life of a TCP connection.

The serv_port value is the port we have connected to on the server host.
This is used by the RSA authentication stuff, and will likely be used by
as-yet unwritten schemes.


TECHNICAL INFO - SPECIAL SIGNALS

trekhopd will respond to several different signals.  SIGINT, SIGQUIT, and
SIGTERM will cause trekhopd to exit after writing a message to the log file.
When SIGHUP is received the configuration file will be reread (a good way to
add new servers upon request, without hosing existing connections).  SIGUSR1
will cause all existing connections to be dropped, but the daemon will
continue to run and accept new connections.  SIGUSR2 will cause the number
of active connections to be printed to the log file, without affecting the
connections themselves (useful when you want to see if anybody is on).  A
list of active connections, showing userid, client host, and server host,
will be placed in USERFILE (default "/tmp/thd.users").

Rereading the config file will change all of the stored values, but some
things will not take effect until trekhopd is stopped and restarted.  For
example, trekhopd will continue to listen on the same port for new
connections, and existing connections to (now illegal) hosts will not be
dropped.


MISCELLANEOUS

If you want an RSA key to be accepted for a trekhopd-capable client, you MUST
state that it can be run through trekhopd, and should NOT make it widely
available.  If it gets loose, or we find out that it has the trekhopd stuff
in it when you didn't announce it, your key will be deactivated.  (Sounds
ominous, doesn't it?)

Currently only one user can connect to the metaserver at a time.  Since
metaserver connections are fast, this should not be a problem.  However, if
the metaserver is down, the first call will take a little while to time
out.  During this interval, a second metaserver request will end up polling,
which is not all that healthy for the CPU load.  This isn't serious enough
to be worth fixing.

Bug: running trekhopd -t will truncate the log even if the program is unable
to start up (perhaps because of an existing process).  A feature...?

To Do:
- Figure out why the process stalls when a client machine dies (sometimes).
  (May be related to "ghost" players who aren't connected but appear to be.)
  [now solved with v1.50?]
- Add an optional "hours of play" to the config file.
- Get rid of all the "if (verbose)" crud in the source.
- Stop assuming that ints are 32 bits.
- The "reconfigure with SIGHUP" feature seems to have some problems...
  sometimes it forgets about hosts.  Probably best to fully kill and restart
  it when you have the opportunity.
- Return some sort of meaningful error result when unable to establish
  a connection.
- Make sure all the byte ordering problems are gone.

Send all bug reports to fadden@uts.amdahl.com.

