Part+II+-+GPGPU+Applied

⇤ Intro | ← Basics | You Are Here | → Improvements | → Reflections | ⇥ Resources

=__Part II - Practical Applications:__= In the sections that follow, I start with basic computer setup, followed by emulation and debugging info. Included is a basic application of programming in CUDA.

Computer Setup:
One compiles the .cu files via NVIDIA's compiler (NVCC) with the following command: code format="c" nvcc .cu -o  code

Make sure to set up the bash shell profile (.bash_profile or simply .profile dependent upon version) to include the correct paths: export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib [or lib64 if using a 64-bit machine]

I updated the NVCC compiler profile (nvcc.profile) as well, by adding the lib64 folder as a resource:

A CUDA Version of Hello World? Not Yet:
Of course, it would be imprudent and practically social taboo to start coding without involving a "Hello World." This proved to be much more difficult than I thought. While I successfully compiled a program that stored arrays of characters, I did NOT successfully print "Hello World!". code format="c"
 * 1) include 
 * 2) include 
 * 3) include 

// Main method int main(void) {  // setup string on host char desired[32] = "Hello World!"; char *desPtr;   // pointer to host memory desPtr = desired; printf("Desired string is %s", desired);

// allocate memory on host int N = 8; char *z_h;   // pointer to host memory z_h = (char *)malloc(sizeof(char)*N);

// initialize host data int i;  for (i=0; i<N; i++) { if (*desPtr != '\0') {        z_h[i] = desired[i]; }  }   // allocate memory on device char *z_d;   // pointer to device memory cudaMalloc((void **) &z_d, sizeof(char)*N);

// send data from host to device: z_h to z_d cudaMemcpy(z_d, z_h, sizeof(char)*N, cudaMemcpyHostToDevice); // retrieve data from device: z_d to z_h cudaMemcpy(z_h, z_d, sizeof(char)*N, cudaMemcpyDeviceToHost); // check result for (i=0; i<N; i++) {     assert(z_h[i] == z_d[i]); }  // cleanup free(z_h); cudaFree(z_d); } code

Emulation:
Due to my current lack of a supported driver, I intended to run the debugger in emulation mode, which supposedly can be run by anybody, with or without a CUDA-enabled device. According to the CUDA programming guide, this is done by using the -deviceemu option during compilation. In fact, the CUDA programming guide specifically calls out the emulation mode as an excellent debugger for several reasons:
 * By using the host‟s native debugging support programmers can use all features that the debugger supports, like setting breakpoints and inspecting data.
 * Since device code is compiled to run on the host, the code can be augmented with code that cannot run on the device, like input and output operations to files or to the screen (printf, etc.).
 * Since all data resides on the host, any device- or host-specific data can be read from either device or host code; similarly, any device or host function can be called from either device or host code.
 * In case of incorrect usage of the synchronization intrinsic function, the runtime detects dead lock situations.

Beware, there are also a few tricks if one is to run in emulation mode, according to the guide. First, programmers must ensure:
 * "The host is able to run up to the maximum number of threads per block, plus one for the master thread.
 * There is enough memory is available to run all threads, knowing that each thread gets 256 KB of stack."

In addition, the emulation (NOT simulation) of the device means that the following errors are difficult to find:
 * race conditions
 * failure from dereferencing a pointer on the wrong platform (i.e., on host or device when it should be the other one)
 * variations in floating point computations (may differ dramatically)
 * any error that would be created when warp size is greater than one (in emulation, warp size = 1)

In order to use emulation mode, I used the following code: code format="c" ~$ cd ./workspace/CUDAHello/ ~$ nvcc helloWorld.cu -o helloWorld -deviceemu code ...but was met with the following (!): code format="c" NOTE: device emulation mode is deprecated in this release and will be removed in a future release. code

Well, at least emulation in a previous release sounded cool... Apparently, the updated version of the CUDA SDK takes care of emulation somehow - but is not documented in the programming guide. As it turns out, even without the right drivers I just compile it all the same, with no specification of emulation mode, and voila: a binary is created.

Debugging with CUDA-GDB:
Since debugging did not work as advertised with emulation, I then used the debugging-specific tool named CUDA-GDB. Luckily, the CUDA Toolkit includes the CUDA-GDB user manual, which has a very convenient chapter called "CUDA-GDB Walkthrough." Of course, I went straight to that chapter.

Of course, I had to compile the code correctly with nvcc before the debugger would touch it, so I used the following code: code format="c" $ nvcc -g -G basicHello.cu -o basicHello $ cuda-gdb basicHello NVIDIA (R) CUDA Debugger 3.0 release Portions Copyright (C) 2008,2009,2010 NVIDIA Corporation GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". (cuda-gdb) code

Thankfully, within the cuda-gdb a programmer can call the help function, which is good to know if the user manual is not nearby: code format="c" (cuda-gdb) help List of classes of commands:

aliases -- Aliases of other commands breakpoints -- Making program stop at certain points data -- Examining data files -- Specifying and examining files internals -- Maintenance commands obscure -- Obscure features running -- Running the program stack -- Examining the stack status -- Status inquiries support -- Support facilities tracepoints -- Tracing of program execution without stopping the program user-defined -- User-defined commands

Type "help" followed by a class name for a list of commands in that class. Type "help all" for the list of all commands. Type "help" followed by command name for full documentation. Type "apropos word" to search for commands related to "word". Command name abbreviations are allowed if unambiguous. (cuda-gdb) help running Running the program.

List of commands:

advance -- Continue the program up to the given location (same form as args for break command) attach -- Attach to a process or file outside of GDB continue -- Continue program being debugged detach -- Detach a process or file previously attached detach checkpoint -- Detach from a fork/checkpoint (experimental) disconnect -- Disconnect from a target finish -- Execute until selected stack frame returns handle -- Specify how to handle a signal interrupt -- Interrupt the execution of the debugged program jump -- Continue program being debugged at specified line or address kill -- Kill execution of program being debugged next -- Step program nexti -- Step one instruction run -- Start debugged program signal -- Continue program giving it signal specified by the argument start -- Run the debugged program until the beginning of the main procedure step -- Step program until it reaches a different source line stepi -- Step one instruction exactly ---Type to continue, or q to quit--- code

With the CUDA-GDB debugger, one can follow this basic outline of steps:
 * 1) Set breakpoints
 * 2) Run the program to the first breakpoint
 * 3) Continue
 * 4) View thread information, if desired
 * 5) Print blockIdx, blockDim, or gridDim, if desired
 * 6) Delete breakpoints

For now, I have reached a temporary setback using the debugger tool: code format="c" (cuda-gdb) run Starting program: /home/chelsea/workspace/CUDAHello/basicHello error: The CUDA driver failed initialization. code

PREVIOUS | NEXT

⇤ Intro | ← Basics | You Are Here | → Improvements | → Reflections | ⇥ Resources