PAGE FAULTS AND ARRAY ADDRESSING

Understanding what a page fault is can sometimes be crucial to writing efficient code. Below are two examples of a FORTRAN nested Do loop which sets every element of an array A to the value X. Both programs achieve exactly the same result; the only difference between the two is the order in which the array elements are referenced. The example on the left loops with the first index I changing most often, while the example on the right loops with the last index K changing most often. While the two examples appear to be nearly identical, a test run of the two programs reveals a startling difference between the two -- the example on the left took 45 seconds to run using 30 seconds of CPU time, while the example on the right took 1 1/2 hours to run using 17 minutes of CPU time!

       DIMENSION A(100, 100, 100)          DIMENSION A(100, 100, 100)
       X = 1.0                             X = 1.0
       DO 10, K=1, 100                     DO 10, I=1, 100
       DO 10, J=1, 100                     DO 10, J=1, 100
       DO 10, I=1, 100                     DO 10, K=1, 100
10     A(I,J,K) = X                  10    A(I,J,K) = X
       END                                 END

In order to understand the reason for this difference, it is necessary to briefly look at the internal memory structure of the VAX computer.

VAX/VMS stands for Virtual Address Extension / Virtual Memory System. Virtual memory is the method used by VAX computers to extend their addressable range of memory. Simply put, the computer "pretends" it has more memory than it actually does. This greatly increases the capacity and usefulness of the computer.

The difference between virtual memory and real memory is a simple but important concept to understand. Virtual memory is the memory a computer "appears" to have to a program running on the computer, whereas real memory is the memory the computer actually has. Real memory resides inside the computer and is the only memory directly accessible by the computer's central processing unit. Program code must reside in real memory in order to become executable, and program data must reside in real memory in order to be read or modified.

Since a VAX computer usually has only a few megabytes of real memory, while a program running on the computer might use up to four gigabytes of memory, we are presented with what looks like an impossible task. How does one squeeze a four gigabyte program into a few megabytes of memory? The observation which makes it possible is this: not all of the program code and data needs to reside in real memory at the same time; only that portion of code and data that is actually being used at the moment needs to reside in real memory. The rest of the program code and data can be stored somewhere else until it is needed.

In order to optimize the use of real memory, a program and its data are broken up into segments called pages. Each page is 512 bytes long, which is equivalent to 128 storage locations for a data array. The pages of code and data that are actually being used at the time reside in real memory, while the rest of the program and data is stored on disk in a special place called the virtual memory page file.

As the program continues to execute, it eventually comes to a place where it attempts to reference a part of the program or data that isn't in real memory. When that happens, program execution stops, and the operating system issues a "page fault." The page of code or data that is needed is copied from the virtual memory page file into real memory, the page fault is cleared, and program execution resumes. Page faults are transparent to the user, but they slow down program execution.

With this in mind we are now ready to examine why one method of array addressing is considerably more efficient than another. It has to do with the way FORTRAN stores arrays in memory. FORTRAN always stores data arrays in memory with the leftmost subscripts varying most rapidly. The array is broken up into pages, with each page containing 128 elements of the array. Thus, the array used in both programs above is stored as follows:

  ----------
  A(1,1,1)
  A(2,1,1)     PAGE 1 of data array
  A(3,1,1)
  A(4,1,1)
   .
   .
  A(28,2,1)
  ----------
  A(29,2,1)
   .           PAGE 2 of data array
  A(56,3,1)
  ----------
  A(57,3,1)
   .           PAGE 3 of data array
  A(84,4,1)
  ----------
  A(85,4,1)
   .           PAGE 4 of data array
  A(12,6,1)
  ----------
  A(13,6,1)
   .           PAGE 5 of data array
   .

The array used in both programs above takes 7813 pages to store. At the beginning of program execution, each page of data is stored in the virtual memory page file ready to be read into real memory when data on that page is needed. We now examine in detail how each program above references the array A.

The example FORTRAN program above on the left references each element of the array A in the same order they are actually stored in memory. First the element A(1,1,1) is referenced, and then the element A(2,1,1) is referenced. When a page of the array is read into real memory, each element of the array contained on that page is referenced until the end of the page is reached, at which point the next page of the array is read in. This program is efficient, since each array element that is referenced is always near the previous array element that was referenced. Only 8051 page faults occurred when running this program, a relatively low number.

In contrast, the FORTRAN program above on the right does not reference elements of the array in a natural order. First the element A(1,1,1) is referenced, and then the element A(1,1,2) is referenced, which is 10,000 memory locations away. In general, whenever a page of the array is read into real memory, only one array element contained in that page is referenced. The next element referenced is always a considerable distance away from the previous element, and hence contained in a different page of data. Unless that new page of data already exists in real memory, program execution will be forced to pause while the page of the array containing the next element is read in from the virtual memory page file. This takes time, and therefore this program is slow and inefficient. Whereas the efficient program caused only 8051 page faults, this program caused 312631 page faults; nearly 40 times as many.

Thus we can now see the real difference between the two programs. The above example on the right caused a large number of page faults due to its inefficient method of addressing the array, whereas the example on the left caused a small number of page faults because it addressed the array in a natural order.

One last word of warning. While FORTRAN stores arrays with the first subscript changing most often, other programming languages may not follow the same convention. For instance, PASCAL stores arrays with the last subscript changing most often, the opposite convention to that of FORTRAN.

-David Deley (1984)

Update (2012)

This is still true today. Try the following C# code:

using System;
using System.Text;
using System.Diagnostics;

namespace SpeedTest
{
  class Program
  {
    static void Main(string[] args)
    {
      int[, ,] a = new int[600, 600, 600];

      Stopwatch stopwatch = new Stopwatch();
      Console.WriteLine("Begin");
      stopwatch.Start();

      for (int i = 0; i < 600; i++)
        for (int j = 0; j < 600; j++)
          for (int k = 0; k < 600; k++)
            //a[i, j, k] = 1;    //natural memory order
              a[k, j, i] = 1;    //jumping around a lot

      stopwatch.Stop();
      TimeSpan ts = stopwatch.Elapsed;
      string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
    ts.Hours, ts.Minutes, ts.Seconds,
    ts.Milliseconds / 10);
      Console.WriteLine(elapsedTime);
      Console.ReadLine();
    }
  }

Running the above using "Natural Memory Order" took 2 seconds.

Running the above using "Jumping Around A Lot" took 15 seconds.

An article in the January 2012 IEEE Computer magazine compares randomly inserting and deleting elements in a 500,000 element array vs. a linked list. To insert or delete into an array you have to shift all the elements past the insertion point all the way to the end of the list, which sounds like a whole lot of intense time consuming work, but it turns out this is 20 times faster than inserting and deleting elements in a linked list!

— Bjarne Stroustrup, "Software Development for Infrastructure," Computer, vol. 45, no. 1, pp. 47-58, Jan. 2012, doi:10.1109/MC.2011.353

(Bjarne Stroustrup was the inventor of the C++ Programming Language.)

Back