1 Adding a new function to Simple Parallel R INTerface (SPRINT) 2 ============================================================= 3 4 SPRINT is a framework for make parallel algorithms available to R users. 5 It is designed to be relatively easy to extend. 6 7 SPRINT is made up of two components: R<->SPRINT interface and the compute 8 cluster itself. The two communicate via files. 9 10 Processor 0 Processors 1-n 11 +-------------------+ +-------------------------+ 12 | +---------------+ | | +---------------+ | 13 | | R | | | | R | | 14 | | | | | | | | 15 | +---------------+ | | +---------------+ | 16 | | SPRINT-R stub | | | | Wait for cmd code | | 17 | +-----------------------------------------------+ | 18 | | SPRINT | | 19 | | +-----------------------------------+ | | 20 | | | ptest | | | 21 | | | pcor | | | 22 | | +-----------------------------------+ | | 23 | +-----------------------------------------------+ | 24 +-------------------+ +-------------------------+ 25 26 When the user starts R with SPRINT loaded, the compute cluster goes into 27 a wait state. When R reaches a function that is in SPRINT, the SPRINT-R stub 28 sends a command to SPRINT via MPI. The MPI message contains an enumeration 29 code that represents a function, which forces SPRINT to wake up and execute 30 that function. 31 32 The idea behind SPRINT is to allow parallel processing of data from within R 33 without being restrained by R. Functions created in SPRINT should have 34 similar interfaces to the serial R equivalent. 35 36 Data required by the parallelised funciton is also passed via MPI. The creator 37 of the function is responsible for creating that data flow. 38 39 Afterwards, the funciton created is also responsible for passing the 40 data back to R. This does not have to be the result of the processing, 41 it could be a file handle, or a simple error code. However, bear in 42 mind that the parallelised funciton should match the funcitonality of 43 the original R function as much as possible. 44 45 This document develops an example SPRINT function, which requires: 46 o creating an R stub (such that R can call the function) 47 o the C equivalent of the R stub 48 o the function to run in the computer cluster 49 o finally, connecting the different parts 50 51 SPRINT is organised into the following directory structure: 52 53 / -- root dir Contains configure scripts, etc. 54 | 55 |- exec 56 |- inst 57 |- man 58 |- po Contains translation (not used). 59 |- R Contains the R stubs (see 1. below) 60 |- src Functions header files and sprint itself. 61 | The source code for the R<->sprint interface (also called sprint!) here. 62 |- sprint The source code the compute farm executable, sprint 63 |- algorithms Where you place your new functions in a directory 64 |- pcor 65 |- implementation 66 |- interface 67 |- ptest 68 |- implementation 69 |- interface 70 |- pFunction 71 |- implementation 72 |- interface 73 |- tests 74 75 1. Create a Stub in R 76 --------------------- 77 78 Add a file in the "R" directory to perform any appropriate actions (in the R 79 domain), then call the underlying C. Appropriate actions may include parameter 80 sanity checking and other housekeeping. 81 82 We start off with two R functions which call the same backend function 83 with different parameters. 84 --- R/pexample.R -------------------------------------------------------------- pexample<-function() { .Call("pexample") } ------------------------------------------------------------------------------- 93 2. Add the interface function 94 ----------------------------- 95 96 These are the C functions which are called by the R stubs. Like the R stubs 97 which they mirror they are likely to perform argument checking and general 98 housekeeping. For each new function create a directory in the "algorithms" 99 directory and add directories "implementation" and "interface". This function 100 lives in the "interface" directory and most won't need much editing, apart from the 101 commandCode 102 --- src/algorithms/pexample/interfaces/pexample.c ------------------------------------------------- #include #include #include "../sprint.h" #include "pexample.h" // note that all data from R is of type SEXP SEXP pexample() { SEXP result; int response; enum commandCodes commandCode; int message = 10; MPI_Initialized(&response); if (response) { DEBUG("MPI is init'ed in ptest\n"); } else { DEBUG("MPI is NOT init'ed in ptest\n"); } // broadcast command to other processors commandCode = PEXAMPLE; MPI_Bcast(&commandCode, 1, MPI_INTEGER, 0, MPI_COMM_WORLD); response = example(1,message); result = NEW_NUMERIC(response); return result; } 134 ------------------------------------------------------------------------------- 135 136 --- pexample.h ---------------------------------------------------------------- #ifndef _INTERFACE_PEXAMPLE_H #define _INTERFACE_PEXAMPLE_H // anything you want #endif 143 ------------------------------------------------------------------------------- 144 145 3. Implement the main function 146 ------------------------------- 147 148 These functions, again written in C, implement the actual parallel algorithm 149 you are interested in performing. They will make use of MPI for communication 150 and perform some useful work. This functions runs in the compute cluster. 151 152 This file can be saved in the "implementation" directory for your function. 153 154 --- algorithms/pexample/implementation/example.c --------------------------------------------- #include #include "mpi.h" int example(int message) { int result = 1; int pool, rank; MPI_Comm_size(MPI_COMM_WORLD, &pool); MPI_Comm_rank(MPI_COMM_WORLD, &rank); LOG(stdout, "Process %i of %i says '%d'\n", rank, pool, message); result = 0; return result; } ------------------------------------------------------------------------------- 174 175 4. Connecting the R/C stubs to the compute cluster 176 --------------------------------------------------- 177 178 Declare the new functions and include them in the command code list. 179 --- src/functions.h ----------------------------------------------------------- /** Lists all the functions available, ensure that TERMINATE is first and * LAST is last **/ enum commandCodes {TERMINATE = 0, TEST, EXAMPLE, LAST}; ------------------------------------------------------------------------------- 187 188 Then add them to the look-up table in common/functions.c. These 189 are extern function with variable number of arguments. The functions also 190 need adding to the look-up table in the *same order* as the enumeration. 191 --- src/common/algorithms/functions.c ------------------------------------------------------ #include #include "../functions.h" /** * Declare the various command functions as external **/ extern int test(int n,...); extern int example(int n,...); /** * This is a dummy operation which can be used where a command code exists * but does not represent a useful function. **/ int voidCommand() { printf("Void command called, I would not expect this.\n"); return 1; } /** * This array of function pointers ties up with the commandCode enumeration. **/ commandFunction commandLUT[] = {voidCommand, \ test, \ example, \ voidCommand}; ------------------------------------------------------------------------------- 222 223 Update the NAMESPACE file so R can find the new functions: 224 --- NAMESPACE ----------------------------------------------------------------- # Namespace file for sprint useDynLib(sprint) export(ptest) export(pexample) ------------------------------------------------------------------------------- Finally, include the object files to the Makefile as it is done with previous functions.