GPU : A Giga@lobal Processing Unit ?

D escription of the virtual machine inside the node

Our application effects a virtual machine for interpreting computational commands. Once a command is typed and the ‘Compute locally’ button is clicked or once an incoming job is caught from network, the virtual machine interprets it. (See the screenshots section to get an idea)

As an example, let us look at the simple "1,3,add" command. The virtual machine reads arguments each separated by a comma. If the argument is a number (like "1" and "3"), it is loaded on the stack. If the argument is a command (like "add"), the virtual machine searches in all dynamic link libraries in the directory plugins for a method called “add”. Then the virtual machine calls that function, the argumets for the functions are popped and the result is pushed onto the stack. Normally most of the functions just crunch some parameters and give one parameter back. Look at another more complicated example:
Example: 3,1,add,7,mul
Here is the behaviour of the virtual machine, step by step :

  1. Stack : empty
    String : 3,1,add,7,mul
  2. Stack : 3
    String : 1,add,7,mul
    Here the number 3 is loaded on the stack
  3. Stack : 3,1
    String : add,7,mul
    One is loaded on the stack, too
  4. Stack : 4
    String : 7,mul
    The command add first pops the two parameters from the stack. Then it computes 3+1, the sum of the two parameters. The result, 4, is pushed on the stack
  5. Stack : 4,7
    String : mul
    7 is pushed on stack
  6. Stack : 28
    String : empty
    4 and 7 are pooped. The final multiplication is computed and the result 28 is pushed. The empty string signalises that we are finished.

Computation using interpreted strings is slow way. In fact the virtual machine is about 500 times slower than compiled code. Why 500 times? Because about 500 assembler commands are used to find out where the function with that command name lies in the plugins. The virtual machine is more intended for small adjustments to the parameters. Big tasks should be done by plugins that implement the whole algorithm. These algorithms are compiled in the dynamic link library. They run maximum speed, the speed of machine compiled code.

A nice example is the following :
compare the speed of :
with the speed of

Both commands do the same: they compute pi using the Monte Carlo method (described in Appendix A). However, the first one leaves the whole computation to the virtual machine, while the second one only uses the virtual machine to find the appropriate compiled code among the plugins (in this case, the source code for the function pi is in greekpi.dll).

On a 1GHz Pentium III the first command takes about 40 seconds. The second command takes about 8 seconds. However the second command uses a sample of 100’000’000 while the first only one million. So, 40 seconds for 1’000’000 throws and 8 second for 100 millions throws shows that if we rely on the virtual machine, we are 500 times slower.

Earlier versions of the application presented here were only 20 times slower when using the virtual machine before the plugin mechanism was implemented.

Another limit of the virtual machine can be seen from the example : the virtual machine has a sample limit of 1’000’000 because the command "rpt" recursively calls the procedure used in its computation; every time a sample datum is executed more space is reserved in memory. Once the memory stack is full, Out of Memory errors occur.

Implementing a new plugin in the application framework

As stated in the previous section, the virtual machine is not suitable for fast computations. However our application offers a framework where plugins can be installed. The plugins are compiled code; so there are no performance losses caused by the virtual machine, except the initial overhead to search for requested functions among the plugins.

Physically, plugins are dynamic link libraries in the subdirectory /plugins of the application. They are loaded when the application is created.

Dynamic link libraries are a collection of procedures and functions (methods in C-terminology). So a plugin, as a collection of functions, can implement more than one command which can be directly typed in as commands in the "Computing" page of the GPU window (GPU is the name of our application); e.g the plugin basic.dll exports the functions add, mul, sqrt and many more...

The source code of a plugin is split in two files. One is a main file specifying which functions have to be exported (only these functions can be used from the main application GPU, all others stay hidden). The other file contains the source code of the functions.

As example of the main file, here the main file basic.dpr, which is part of the plugin basic.dll :

library basic;

uses basicdll in 'basicdll.pas';

{$R *.RES}
exports add;
exports sub;
exports mul;
exports dvd;

exports sqrt;

{... nine other functions are exported here}


As we said, the main file specifies the functions to be exported. The file where the source code of these functions is defined is specified in the "uses" clause. In the example the file for the plugin basic is called basicdll, and is found in the same directory of basic.dpr with name basicdll.pas
Now take a closer look at the file basicdll.pas :

unit basicdll;

  uses CommonConst;

  {here we define the signature of the functions}
  function add(var stk : TStack):Boolean;stdcall;
  function sub(var stk : TStack):Boolean;stdcall;
  function mul(var stk : TStack):Boolean;stdcall;
  function dvd(var stk : TStack):Boolean;stdcall;

  function sqrt(var stk : TStack):Boolean;stdcall;
  {here nine other functions are defined}


{here the source code for the functions is declared}

Between the keywords interface and implementation, we define the signature of the functions. To avoid nasty runtime errors, it is important that exported functions use this signature :

function tobeexported(var stk : TStack) : Boolean;stdcall;

The function name in this case is "tobeexported". A strange parameter of the type Tstack is passed to the function (this is explained below) and the function result is a boolean type (true or false). By definition, a function returns true if the computation could be done without any problems, and false is something went wrong: for example if the parameter Tstack was incorrect.

The keyword "stdcall;" means the function is a "standard call" and informs Delphi [14,15] that the function is a function in a library, which will be loaded at runtime. For these functions the address cannot be known a priori in the linking process.

Here a translation in C of the signature:

bool tobeexported(Tstack stk);

Here are some important things about the strange parameter of type Tstack :
Parameters are passed to the function with a record of type Tstack (a record is a struct in C-terminology). The record fields of TStack are defined in the file commonconst.pas. Here the record definition, as it is in commonconst.pas :

  type TStack = Record
    Stack : Array [1..MAXSTACK] of Extended;
    StIdx : LongInt; {Index on Stack where Operations take place}
                      StIdx cannot be more than MAXSTACK}

Translated to C, this would look like :   struct stack {
    float Stack[];
    int StIdx;
  } TStack

The field Stack is an array of floating point numbers, with size MAXSTACK, which is a constant defined in commonconst.pas.
The field StIdx is actual index in the array Stack, and can be viewed as an array pointer.

Imagine now that someone calls the function add, using the GPU command

The main GPU application will first create and fill a record of type Tstack like this :

Parameters, record of type Tstack, containing the following :
    Stack = [3,7, not defined, not defined, not defined, not defined, ...]
    StIdx = 2

Note that the StIdx points always to the last defined value in the array and that arrays in Delphi begin with index 1 and not 0, unlike C.
GPU will search in all plugins in the /plugins directory for a function called add. GPU will find it in the dynamic link library basic.dll. GPU will call add and pass the record Parameters by reference (in Delphi values passed by reference are marked by the keyword var). By reference (or "call by address") means that the function works with the same record as the main application, changes are visible for both. This is the opposite behaviour to "call by value": there the function works with a copy of the record, and changes will be lost when the function returns.

Let us take a closer look at the internal details of the simple function add.
The definition is still in the file basicdll.pas and follows the keyword implementation.

function add(var stk : TStack):Boolean;stdcall;
      Idx : Integer;
  Result  := False;
  Idx     := stk.StIdx; {this should make the code more readable}

  {check if enough parameters}
  if Idx < 2 then Exit;

      Stk.Stack[Idx-1] := Stk.Stack[Idx-1] + Stk.Stack[Idx];

  {never forget to set the StIdx right at the end}
  stk.StIdx := Idx-1;
  Result := True;

Here we comment line by line what happens 1) Result := False;
Here we set the boolean function result as False. If something goes wrong, we will not reach the end of the function and the result will stay False. This will display an error in the GPU computing window.

2) Idx := stk.StIdx;
For better readability of the code we load the stack pointer in a local variable (which is 2 in the example)

3) if Idx < 2 then Exit;
Here we check that there are enough parameters. We can only add two numbers, the function needs at least two parameters. Notice that if there are not enough parameters, we just exit the function, and the result of the function stays false.

4) Stk.Stack[Idx-1] := Stk.Stack[Idx-1] + Stk.Stack[Idx];
Finally we sum up the values in the array and we store the result at the position Idx-1
The value at the position Idx is not used anymore.
Notice also that eventually there may be exceptions, e.g. in the rare case that numbers are too big. Again, we won’t reach the end of the function and the result stays False.
5) stk.StIdx := Idx-1;
  Result    := True;

These are the clean-up steps, we move down the stack pointer, and we set the result of the function to true, because all is well which ends well.

The control goes now back to GPU and the record Parameters looks like this:
Parameters, contains at the end following:
Stack = [10,7, not defined, not defined, not defined, not defined, ...]
StIdx = 1

It is important to know that the second parameter (7) was not erased. It is still there, but it is not used anymore, because by definition functions accept parameters only between 1 and StIdx. We "erased" the second parameter at step 5) when we decreased StIdx by one.

GPU displays then all values between 1 and StIdx in the computing window, separated by comma.

The logo was designed by Mark Grady. Graphics are by David A. Lucas.
© 2002-2003 the GPU Development team

Page generated in 1.07 milliseconds by PHP 4.4.3-dev on