Data corruption caused by noise is a common problem when transferring files from one computer to another over telephone lines. A major source of noise for data transmission is impulse noise. These pulses or spikes on the line typically have a duration of 10 msec. To the human ear, such pulses sound like little clicks. To a 1200 baud modem, they are the death knell for 12 bits of information.

To combat the problem of data corruption during file transmission, file transfer programs which use error checking protocols have been devised. Typically, a program on the sending end examines a block of data being sent and adds various error checking characters to the data according to the error checking protocol. A similar program on the receiving end examines the received data, determines what the error checking characters should be, and compares the expected values with the received values. If the two agree, the block of data is accepted as valid. If the two don't agree, the block of data is deemed corrupted and retransmission of the data is requested.

The XMODEM file transfer program currently available on the GRC-VAX cluster supports two types of standard XMODEM error checking protocols. These are checksum (also known as longitudinal parity), and cyclic redundancy code (also known as polynomial code). The type of error checking used is determined by the receiving program when it first tells the sending program that it is ready to receive a file. If the receiving program transmits a NAK character (ASCII 21), it is requesting checksum error checking. If the receiving program transmits a "C" (ASCII 67), it is requesting cyclic redundancy code error checking.

In order to understand the two different error checking protocols, it is first necessary to understand how XMODEM transfers files. XMODEM transfers files in blocks consisting of 128 data characters each. Added to each block are block identification characters and error checking characters. The total number of characters in a block total 132 if checksum error checking is used, and 133 if CRC error checking is used.

The actual placement of characters within a block is as follows:

     1    SOH (start of header, ASCII 1)
     2    block number
     3    complement of block number
     4    DATA (1st character)
     .    .
     .    .
     .    .
   131    DATA (128th character)
   132    error checking byte  (Checksum and CRC mode)
   133    error checking byte  (CRC mode only)

The determination of the error checking bytes depends upon which error checking protocol is being used. In checksum mode, the ascii values of each of the 128 data characters are added up and the low byte of the resulting sum is used as the error checking character. This method will detect most errors, but it is possible that two or more of the data characters may be corrupted in such a way that their checksum remains the same. For example, the characters "BB" will add up to the same value if they are inadvertently changed to "AC". Because of this, a more sophisticated method of error checking is usually preferred.

In CRC mode, the 128 data characters are treated as a continuous stream of 1028 bits. Two error checking bytes (16 more bits) are appended to the end of this stream, and the resulting set of 1040 bits is thought of as one very large binary number. The last 16 error checking bits are chosen in such a way that the resulting binary number will be evenly divisible by a previously agreed upon binary constant (sometimes called a generating polynomial). If any one of the 1040 bits of information become corrupted during transmission, the large binary number they represent will no longer be evenly divisible by the previously agreed upon binary constant.

By carefully choosing the binary constant, it is possible to drastically minimize the number of errors which go undetected. XMODEM uses a binary constant value of 10001000000100001 (equivalent to a generating polynomial of X^16 + X^12 + X^5 + 1), which is a standard value recommended for use by CCITT (Comite Consultatif Internationale de Telegraphique et Telephonique). By using this value it is possible to detect all single and double bit errors, all errors involving an odd number of bits, all burst errors of length 16 or less, 99.997% of 17-bit error bursts, and 99.998% of 18-bit and longer bursts.

The mathematical theories of polynomial arithmetic lie behind the overwhelming success of this type of error checking. Although the calculation required to compute the last 16 error checking bits may seem complicated, it turns out that there is a simple method using shift registers which makes the method easily applicable and highly attractive for computer use.

-David Deley (1984)


Belonis, J.James II. XMODEM version 5.53 (FORTRAN source code) DECUS tape VAX-96, June 1984.

Housley, Trevor. _Data Communications and Teleprocessing Systems_. Prentice-Hall, 1979.

Peterson, W.W., and D.T. Brown.: "Cyclic Codes for Error Detection". _Proc. IRE_, vol. 49, pp. 228-235, Jan. 1961.

Puzman, Josef., and Radoslav Porizek. _Communication Control in Computer Networks_. John Wiley & Sons, 1980.

Tanenbaum, Andrew S. _Computer Networks_. Prentice-Hall, 1981.