Questions and comments regarding the approach outlined in this article should be directed to the author at rbala@i-2000.com.
Today, several million users access the Internet and its vast ocean of resources daily. To the layman, the Internet's most visible aspects are:
Many users of the Internet spend hours uploading and downloading software and data from FTP sites. This is made possible by the FTP application-level protocol of the TCP/IP suite as described by RFC 959 (144K text file). Although FTP has been around for almost two decades in various forms, not many implementations of this protocol have implemented mechanisms for recovery from system failures. Till now, this has not been a major concern because the sizes of the transferred files were relatively small (less than 1 MB) in most cases.
However, with multimedia ranging from audio to full-motion video being incorporated into entertainment, education and business software, file sizes are increasing on average. For instance, a minute long full-motion video clip could run into a megabyte or more. With technologies such as video-on-demand looming on the horizon, a lot more data transfer activity involving large files is anticipated.
One of the common problems that many Internet users can relate to is a system error during a file transfer. File transfer sessions get aborted as a result of:
The above reasons mainly indicate hardware failures. However, there are a number of other reasons not directly related to hardware that can abort a file transfer, including:
System failures during file transfers are palatable when the file that is being transferred is small. However, it becomes annoying when a failure occurs in the midst of transferring a large file, especially when most of the transfer has taken place.
For example, let us assume that you are downloading a four megabyte file and that a system failure occurs after three megabytes have been transferred. The only recourse offered by most implementations of FTP today is for you to begin the download operation from scratch. This is an extremely painful reality, but it need not be so. In this article, I'll shed some light on the little known facts about the error recovery and restart aspects of the File Transfer Protocol.
The TCP/IP protocol suite forms the basis for the Internet. TCP/IP is made up of four layers:
For more in-depth information on the TCP/IP Protocol Suite, refer to Reference 1.
FTP is an application-layer protocol in the TCP/IP suite, and it uses TCP as its transport-layer protocol. The primary objectives of FTP include:
FTP follows the client-server model as many other TCP/IP applications do. This figure shows how this model is setup for FTP:
The client half of the equation is made up of three pieces, namely, the user interface (also known as the FTP client), user protocol interpreter, and the user-data transfer function. When a user accesses a character-mode FTP client interactively, the user enters commands such as ``get'' and ``put''. Newer user interfaces are graphical, replacing these commands with graphical buttons. The commands that the user issues get interpreted by the user-protocol interpreter, which translates the request into commands understood by the FTP server. For a list of commands, refer to Reference 1. On the server end, there is a FTP server listener process (also known as a daemon) that interprets the request from the client. This connection between the user-protocol interpreter and server-protocol interface is known as a control connection . When a file needs to be transferred from the server to the client, a data connection is spawned by the client. Once data transfer is complete, the data connection is terminated. For more details, readers should refer to the References.
Users don't need to access FTP functionality with a dedicated client. Instead, other application software can access FTP servers transparently. For example, most Web browsers, such as Netscape's Navigator, use FTP ``under the hood'' to download files.
The way in which files are transferred and stored is determined by the following factors:
For more information on data representation issues, please refer to the References.
The way in which error recovery and restart is detailed in RFC 959 is vague and implementation details are not mentioned. The primary mechanism is use of a restart marker that is only available when using block or compressed transmission mode. With block transfers, a file is transferred in chunks made up of a header portion followed by a data portion. The header portion has a descriptor and a byte count for the data portion. The one-byte descriptor field describes the data block. Certain bits are set for a special meaning. For instance, if the most significant bit is set to one, it means that the data block marks the end of a record. In that vein, if the fourth most significant bit is enabled, then it indicates that the data block holds a restart marker.
In compressed-mode transfers, restart markers are preceded by an escape sequence that is a double byte. The first byte is all zeroes and the second is a descriptor byte similar to that used in block-transfer mode.
What is a restart marker and how is it going to help us in recovering from a system failure? Restart markers (also known as checkpoints) are milestones during a file transfer process. Should a failure occur, the file transfer need not be restarted from the beginning, and instead could proceed from the last recorded milestone.
Readers should note that in order for any error recovery as specified by RFC 959 to be implemented effectively, it requires cooperation among all implementors of FTP client and server programs to agree on a common format for restart markers.
Let us assume that an FTP client and an FTP server support a common recovery and restart scheme. Now, suppose the FTP client wants to download a four-megabyte file from the server. The server may decide to embed a restart marker every 100K bytes, say. Then, if a system failure occurs after transferring 3,213,517 bytes, say, the file transfer process could be rolled back and started from the 3,200,000 byte mark. Is this good enough? Well in most cases the answer would be ``yes''. What if the file that was being transferred is modified before the FTP client decides to rollback and continue to download the remainder of the file? In this case, there is no guarantee that the file that was transferred would be coherent to the intended audience because it would essentially be a mish-mash of two files.
Hence, let me now propose a standardized restart marker that would solve this problem. A simple solution would be to store the file size of the file to be downloaded in the restart marker together with a byte count indicating the cumulative number of bytes downloaded thus far. When a failure occurs, the file size from the restart marker can be compared with the file size at the time of error recovery to see if they match. If they match, then the file transfer can proceed, otherwise, the FTP client is notified that the file has been modified and that recovery is not possible.
There is an inherent flaw in the above solution. Files can change without file sizes having to change! So, file size is not a reliable gauge for determining whether a file has been modified or not. Instead a better measure would be a time stamp. This time stamp would include the date and time when a file was last modified. Our proposal for a restart marker will consist of a byte-count followed by a time stamp:
The proposed restart marker consists of N bytes,
where N is an integer greater than or equal to nine,
and the first eight bytes store the time stamp for the last-
modified time of the file being transferred. The nineth to the
Nth byte stores the file size. The value assigned
N is based on the number of bytes required to store the
file size. For example, if the file size is 50 bytes long, then
N would be
In this section, I shall go through the time line for an FTP download procedure which has a system failure and subsequent recovery. This figure shows a time line:
The events that take place during the file transfer process are in the following chronological order:
get abc.doc
abc.doc
. Every 100K bytes, it inserts a restart
marker with a byte-count and time stamp.abc.doc
. Whenever it comes across a restart
marker, it updates a transfer log as to how many bytes have been
transferred and remote file's time stamp. In addition, the transfer log
would contain the local file's time stamp. Assuming the FTP server
does not have an exclusive lock on abc.doc
, it is possible
that abc.doc
is modified even when no system failure
takes place. Hence, the two successive time stamps can be
compared by the FTP client to ensure that there is no loss of
data integrity during the file transfer. If time stamps don't
match, abort transfer and inform FTP server. Otherwise
continue.abc.doc
locally. If there is a mismatch, do not proceed
with error recovery.get abc.doc 3213517 013196 / 142301
abc.doc
. If time stamps
match, then it moves file pointer to an offset equivalent to the
byte count and continues to download from that point.Note that a transfer Log is maintained on the client end in the scheme shown above. This transfer log may be implemented as a simple file whose records have the following structure:
struct { char* filename; // should include path (if any) long bytestransferred; // bytes transferred TIMESTAMP rt; // last server file // modification time stamp TIMESTAMP ct; // last client file // modification time stamp } LOGSTRUCT;
Listing 1A presents some pseudo-code for implementing the FTP protocol discussed above in the client and Listing 1B for the server. These algorithms are presented at a high-level and interested readers should refer to Reference 4 for more details. All functions starting with the prefix ``svr'' are server functions and would be called from the client via RPCs. But I have omitted details regarding RPCs here.
It is apparent that error recovery and restart are essential in implementations of the File Transfer Protocol. However, it requires cooperation among software vendors and the industry in general to bring about a consensus opinion on the format of a restart marker. In this article, I have proposed a format for a restart marker that I believe helps in furthering the cause of improvements to FTP.