But I need that!

posted Mar 30, 2010, 9:36 AM by Robbie McMahon

I'm sure you have heard about Sony's new 3.21 firmware update . Geohot pretty much says everything I would say here. It is just really disappointing, especially when Sony says it is "due to security concerns". After almost 4 years, they just now decide it is a security concern? Absurd.

Geohot and the PS3

posted Feb 3, 2010, 12:04 PM by Robbie McMahon

I am a little slow to post this, but Geohot seems to have made it past the hypervisor. While it is interesting and exciting, it isn't particularly useful at this time. It doesn't really help the project, and he is still trying to make the exploit useful.

I plan to read the full writeup at some point, but really only for my own interest. Too many people got over excited at the announcement. However, I will keep tabs on any updates he makes.

Progress, my friend

posted Jan 7, 2010, 2:29 AM by Robbie McMahon   [ updated Jan 7, 2010, 3:48 AM ]

I successfully split the monster source file into many baby source files. This isn't particularly impressive, but it will make things more manageable. Commits will be smaller and more precise. Functions will be more encapsulated and less dependent. The only drawback so far is having multiple files open and searching for function definitions. On one or both of the wikis, I may make a list of the functions and their file locations.

I also made some changes to the kernel parser. First, it now handles new lines. I somehow missed testing that before. Second, it now returns a list of the kernel's argument types. I can now say, "Oh, this kernel is expecting four floats and an integer." I also need it to return whether the argument is a pointer, but I'll get around to that later. I have not done any work on the kernel transformer. The task was a bit intimidating, so I moved to other things.

I fixed the makefile to handle the new split files. After running the new makefile, I also fixed a handful of compiler warnings. Some warnings still remain (see attached file), but they aren't critical. Although everything compiles, I have not yet tried to run it. I am especially wary of the changes I made to kernel_parser.lua and kernel_parser.c. The Lua file tested fine, but I couldn't check the new C code.

Not being able to compile everything on demand makes me far more cautious and diligent, but it does certainly slow me down. I've gotten used to git, but I am still not totally won over. All together, it has been a very good start.

I've got a fever...

posted Jan 3, 2010, 4:23 PM by Robbie McMahon

And the only prescription is more OpenCL programming.

After a rather extended break, I feel the need to get back into some serious programming. It will take some time to get back into the swing of the project, but my first priority will expedite that process. I've wanted to split up the primary header file for a long time. 1855 lines of code is a bit much, especially when most of it can be grouped together into separate files. I plan to make a new branch in git, split everything up, and then merge it all back.

I got the fever when I came across another OpenCL project on GitHub. It is quite similar to mine except he is trying to make a general implementation for gcc. I was extremely happy to see my exact same mistake in his project -- all of his kernels are prebuilt binaries! I am quite relieved I am not the only person to go down that road. Both of our projects are GPL and I may steal/borrow ideas from his project.

I've been doing some "cleaning up" in the git and svn repos, but I think I just made things worse. I accidentally added the .git folder to the svn repo, and there are some .svn folders in the git repo. Those need to get trashed. I can leave that alone for now.

In addition to the file splitting, I want to get a prototype "kernel transformer" up and running. It will transform a basic kernel like

__kernel void doSomething(__global float4 *in, __global float4 *out)
  for(int i=0; i<4; i++)
    out[i] = in[i];

into an actual SPE runnable function like test_kernel.c.

int main(unsigned long long spe, unsigned long long argp, unsigned long long envp)
  // argp is an array of void pointers (void **p)
  // compiler knows how many and what kind

#define NUM_OBJECTS 2

  int tag = 1;
  int i;

  void **data_list = malloc(sizeof(void *) * NUM_OBJECTS);

  // get the list of addresses to all the data
  spu_mfcdma64(data_list, mfc_ea2h(argp), mfc_ea2l(argp),
               sizeof(void *) * NUM_OBJECTS, tag, MFC_GET_CMD);
  spu_writech(MFC_WrTagMask, 1 << tag);

  float4 in, out;
  //get the input data
  spu_mfcdma64(&in, mfc_ea2h(data_list[0]), mfc_ea2l(data_list[0]),
               sizeof(float4), tag, MFC_GET_CMD);
  spu_writech(MFC_WrTagMask, 1 << tag);

  //run the function
  doSomething(&in, &out);

  //send the output data
  spu_mfcdma64(&out, mfc_ea2h(data_list[1]), mfc_ea2l(data_list[1]),
               sizeof(float4), tag, MFC_PUT_CMD);
  spu_writech(MFC_WrTagMask, 1 << tag);

  return 0;
It will be a big pain to get that working, but it is the only major hurdle remaining. With the kernel transformer, a user can write a kernel without any knowledge of the Cell SDK. That is the whole point of OpenCL: write general kernel, load kernel, compile kernel for device, and run kernel on the device.

I should have plenty of time tonight to work on this. I should be able to at least get started on splitting up that giant file.

OpenCL on Mac and Cell

posted Dec 12, 2009, 8:41 PM by Robbie McMahon

My previous post was a little light on details. I have been asked to expand on it. As a side note, please see the new Disclaimer section.

As I mentioned in the previous post, I was able to get the same source code compiled and running on both my Mac and the CellBuzz cluster. There are only two minor changes that need to be made. 
  1. Download the source code. The folder contains an Xcode project. You can launch the project and look at it, but that won't really help us in a shell on the cluster. 
  2. The two files we are interested in are "main.c" and "". Copy those two files out and put them together in another folder. On the Mac, you don't need to change anything.
  3. To compile: gcc -o main main.c -m32 -lm -lstdc++ -framework OpenCL -std=c99

The first part "gcc -o main main.c" is standard compiling. "-m32" compiles the code in 32-bit mode. You can try "-m64", but it doesn't seem to make much difference. "-lm" is the math library. "-lstdc++" loads the standard C++ libraries. "-framework OpenCL" is the important part. It inculdes the OpenCL framework. "-std=c99" is also important because lines 174 and 186 use the C99 standard. If all the options bother you, you can instead run "g++ -o main main.c -framework OpenCL". Same results, I just prefer gcc for whatever reason.

Important Note: This only works on Snow Leopard (10.6.x and higher). Earlier versions of Mac OS do not have OpenCL. Also, I believe users with ATI video cards require 10.6.2 or later.

Did you get the code to run on your Macintosh? Good. Now we move on to CellBuzz. Two changes to the source code are in order. Line 8 reads "#include <OpenCL/opencl.h>", it should read
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#include <CL/cl.h>
It is just a simple preprocessor. The other change, which isn't necessary but helps, is line 57 which reads 
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);

Change it to read 
#ifdef __APPLE__
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ACCELERATOR, 1, &device, NULL);

The SPU is considered an accelerator in the IBM implementation, which I think is a pretty good idea. Compiling this code is almost identical, "gcc -o main main.c -m32 -lm -lstdc++ -lCL -std=c99". The library call for OpenCL is just different, "-lCL". Everything else is the same. You could also use the XLC compiler ppu-xlcl, but I couldn't get that to work. It gives me a bunch of library errors,
[user@cellbuzz]$ ppu-xlcl -o main main.c -lm -lstdc++ -lCL -pedantic
"/usr/include/sys/select.h", line 110.49: 1506-215 (E) Too many arguments specified for macro select.
"/usr/include/sys/select.h", line 109.12: 1506-275 (S) Unexpected text '__tg_builtin_gentype' encountered.
"/usr/include/sys/select.h", line 109.12: 1506-046 (S) Syntax error.
"/usr/include/stdlib.h", line 699.12: 1506-275 (S) Unexpected text '__tg_builtin_gentype' encountered.
"/usr/include/stdlib.h", line 699.12: 1506-046 (S) Syntax error.
"/usr/include/stdlib.h", line 699.21: 1506-963 (W) The attribute "const" is not a valid variable attribute and is ignored.

After the library errors, it complains about the pragmas, but that doesn't matter. I don't have a great yearning to use the XLC compiler. Maybe someone else can figure it out.

I've attached my modified source code that compiles and runs on both systems. Credit goes to MacResearch for the bulk of the source code. I plan to make a benchmarking program, but I may just use an existing one.

Georgia Tech is OpenCL capable

posted Dec 2, 2009, 9:12 PM by Robbie McMahon

After submitting a request, Georgia Tech was kind enough to install the IBM OpenCL dev kit on their CellBuzz cluster. You will need to compile your code on the compute nodes as the compiler and libraries are PPC only.

I was able to compile and run the blackscholes sample code, but I did not understand the output. I have been following a tutorial series for OpenCL on Macintosh. I was able to compile and run the sample code from episode 3 with two minor modifications.
  1. #include <OpenCL/OpenCL.h> became #include <CL/cl.h>
I have not added any performance metrics to the code yet, so I do not know how the two implementations/hardware compare. But this shows that OpenCL is cross-platform. A simple preprocessor to test for Apple or IBM would fix change #1.

I noticed something cool, though. I had been treating the entire Cell processor as a compute device with type CL_DEVICE_TYPE_CPU. IBM instead treats the PPU as a CPU and the SPUs as accelerators. That makes much more sense. I'll see if I can merge that change into my code.

IBM Releases OpenCL Implementation

posted Oct 29, 2009, 7:01 PM by Robbie McMahon

A member of the mailing list pointed out that IBM has released an official version of OpenCL for Power hardware running Linux. They say it has been tested with the IBM BladeCenter QS22 which uses the newer version of the Cell Processor. When I get some free time, I will definitely download the development kit and give it a try.

Despite this news, an open source version of the implementation is still desirable. This project has not reached its end. I will certainly work on the project as much as I can; either for my own interest or just to maintain an open source version. The IBM implementation also seems to be in an early stage. Maybe I still have time to catch up. :-p

But I don't feel too bad being beat to the first waypoint by "teams in IBM consisting of the HPC Multicore Software Development team ..., the XL C Compiler team ..., with assistance from the Compiler Research team."

Still Going

posted Sep 22, 2009, 12:08 PM by Robbie McMahon

The project is still going, just at a much slower rate. Distractions and obligations keep me from working on the project. I still think about it and read up on OpenCL, but I haven't done much programming recently.

As I have said before, the project will always exist, but progress may be fairly slow at times. I just have to keep at it.

OpenCL is Official

posted Aug 30, 2009, 11:20 PM by Robbie McMahon

With the release of Snow Leopard (10.6), OpenCL now has an official, full implementation. AMD recently announced an x86 implementation, but Apple made it work with GPUs as well.

If you want to try it out, a user over at the MacRumors forums found a small application that will test your computer's OpenCL capabilities. I can definitely use Apple's implementation to test the functionality of my project.

Kernel Parser and Lua Libraries

posted Aug 16, 2009, 9:42 AM by Robbie McMahon

I wrote a simple kernel parser in Lua. It returns the kernel function name and the number of arguments. Right now it only works if all the information is on one line, but I can update that later. The parser is in Lua for two reasons. First, it is much easier to write a parser in Lua than in C. Second, I just wanted some practice with Lua. The only issue was compiling the Lua libraries for the Cell architecture. I spent most of the evening trying to compile the library and its dependencies. It took much longer than expected, primarily because of my own stupidity and inattentiveness. Luckily, the dependency list was fairly short. I needed readline and ncurses. I think I will add these libraries to the distribution just to make it easier for others.

I also did some refactoring. I moved the source code into subdirectories for better organization and separation. I also created a new makefile that is more flexible than the old auto-generated one. I tried to use autotools, but I couldn't find an easy way to compile different batches of source code with different compilers. I can worry about distribution mechanisms later. The project isn't even close to being completed.

1-10 of 28