Work Journal

Journal Data and Results Mentor Contact Me Home
  • 7/17/07, 3:30: So, I've been working on the atmospheric code for about 4 days now. The first few were mostly just looking through the code trying to figure out what in the world it did. I also discovered I hadn't learned as much Fortran 90 as I thought (there were a lot of weird constructs in the code I didn't recognize at all), so I did some Googling of various pieces of it as well. Now, though, I'm actually comfortable enough with it to start some basic timings of the code to see where the bottlenecks are. Right now the compiler (we have to use a trial Intel compiler to get it to work correctly) is a little weird and it will only compile on 32-bit machines, but we're getting there. But, at least the code RUNS! I'm amazed we got it even running this quickly.

  • 7/11/07, 11;50: It's done! I have all the data collected and the final report done for the toy problem (check it out here)! So, I think I can start on the *real* atmospheric science project now :).

  • 7/10/07, 5:30: Heh heh....all those wierd issues with the timing? They're ALL from one wrong type, in the initial subroutine, at that. The initial input is NOT a real(kind=8), it's just a real (which is only 4 bytes). However, in the subroutine, I'd typed it as an 8-byte, and somehow that screwed everything up. I can see it eating the values going in (they all became 0s) but WHY did it make everything run about 10x slower? How would it even affect that? And why in the serial version, did it *only* affect it if I printed out the variable? Whatever, it seems to work now, so all I have to do is finish up the final report, and I'm done with the toy project!

  • 7/9/07, 3:00: So, it turns out that a lot of the times aren't coming out at all like I'd expect. The latest set of data has the parallel version on a single 64-bit computer taking almost 400 seconds, and all three of the latest versions take a lot longer than they did before, and all I did was change the timing mechanism from gettimeofday (or its Fortran binding) to MPI's WTIME, which theoretically should NOT change how long it actually takes to run (quotations about the act of measuring something changing it aside). Also, it's odd that my dual-core machine is *so* much faster than the single-cores, but Michelle says that's probably a difference in cache sizes, and I know my machine has more memory than the single-cores, so it having a bigger cache wouldn't surprise me. The other issue, though, is that the parallel times are incredibly short (in the range of the *old* times I was getting, though, so they in and of themselves are quite reasonable). Michelle brought up the point that sometimes compilers will optimize out the vast majority of your program if it's not relavent to the output (!) but that doesn't *seem* to be happening here....I'm not really sure what's going on, but to me, it seems like the *new* times are wrong, they just don't fit everything else.

  • 7/6/07, 2:00: CoGrid finally works!! It turns out after all that, that the issue was the types of variables I was using (real vs. real(KIND=8), vs. double precision). Once I got that all straightened out, it was happy; it really WAS dividing by 0 (I *think* the dt and dx values were getting corrupted by other variables having the wrong datatype, and then were being treated as 0). So, finally, I have real CoGrid times!!

  • 7/5/07, 12:30: I finally have the report done! (or at least a first draft of it). Check it out here! Once I got the data recollected with the *correct* timing, it was pretty easy to finish (I had most of the writing done already). So now, I just have to figure out how to get CoGrid working, then try NCAR, and then I get to start working on the *real* problem!

  • 7/3/07, 2:15: UPDATE: It turns out that there wasn't actually anything wrong with the data from the Linux computer runs....I just was measuring the wrong thing! I was measuring CPU time, and really I needed total execution time (wall clock). That's because if you only measure CPU time, then as you add more processes, by definition each one will take less CPU time...but since the CPU itself isn't really parallel, they have to be swapped in and out to all run in "parallel", which slows it down quite a bit. I was actually effectively measuring speedup/CPU MHz, rather than speedup/number of processes! Oh well, at least now I know!

    For the last several days I've been fighting with several different problems. One is that CoGrid still won't work...the latest is some kind of strange socket error when the last node tries to recieve from the second-to-last node, but then dies for some reason. At least it compiles and tries to run now....but I have no idea why it won't connect to itself correctly when it can on the Linux machines in the department with no problem. *sigh*...

    The other odd problem is with my 7-processor and 1-processor sets of data. They're virtually identical. And they really shouldn't be. The single-processor run should be a lot slower than the 7-processor run, and a little slower than the 2-processor run, but instead they're both faster than the 2-processor one. As far as I can tell, everything is running where it's supposed to be, so we don't have a clue why the times are so close- within 100ths of a second or so. The problem is, it's hard to analyze data when the data you're getting is fairly obviously flawed somehow....oh well, other than that, my report's basically done.

  • 6/27/07, 11:00: I have been trying for the last day to run my stuff on CoGrid, but every time I try, it refuses to compile. I'd been using .f90 files, and it didn't create any errors, but it never made the object files either, so it wouldn't compile. Michelle suggested using the older (?) .f extension, so I did, but then it is incompatible with Fortran90 even if I select the Fortran90 compiler. So, basically CoGrid is just a pain, and it looks like I'm not going to be able to run jobs on it using Fortran. Oh well....at least it's not the only computer I have to work with.

  • 6/26/07, 11:05: I have data from real sets of computers now! There's still something wierd with trying to run on 32-bit machines- Michelle thinks it's something with it not comiling correctly, but since the stuff I'm going to be working with is mostly 64-bit, we're kind of ignoring it for now. I spent the morning making another set of scripts so that I can run jobs on CoGrid, since obviously one master script that runs everything can't deal with running jobs on CoGrid in the middle. This one basically takes the original set of scripts and breaks them up- the first part sets up all the files, then I run the actual experiments on CoGrid with a second script, then I use the sql parsing script to make a sql table and a Gnuplot graph with the results. Now to see if it will actually work on CoGrid and not just on my machine!

  • 6/25/07, 11:15: I've gotten the whole setup to work correctly for the toy problem, but only on 64-bit ones. If I run on any 32-bit machines it hangs (in the "main" node) indefinitely. I'm really not sure what's going on with that...maybe I have some kind of infinite loop (or more likely deadlock) that only occurs some of the time, and it shows up when there are more machines being run? Maybe it's something if I wind up breaking the bar up into more pieces than there are points? But I'm specifying the number of MPI processes constantly (right now at 4) so I don't know how adding more MACHINES would break that, when the number it's using is constant (and why would it break if I have 3 processors on 3 machines but not 4 processors on 2, or 1-2 processors on 1?).

    I also fixed the webpages so that they'll display correctly on different window sizes, not just the one I'm using.

  • 6/14/07, 17:00: Today I got everything for running the tests automated (using a mess of shell scripts calling other shell scripts calling python scripts calling Gnuplot, but hey, it works...), meaning that to generate my graphs, all I should have to do is run the main shell script with the desired input parameters, and a little while later my set of graphs *should* pop out in the correct subdirectories. I really want to see it work on a real run, not just my two-machine tests I've been doing today!

  • 6/13/07, 13:30: I finally got the parallel implementation of my practice problem done (after writing about three different implementations of it and eventually coming up [with Michelle's help :)] with a solution that was way easier than what I had been trying)! I had a lot of trouble with the MPI_RECV command not getting the correct ID of the process it was recieving from, and neither Dave or I could figure out why. It turns out that there's something weird in Fortran that if you declare an array using 'Dimension', it seems to index it backwards from how MPI is expecting it to be indexed. So it wasn't actually an MPI problem at all...

  • 6/12/07, 13:20: This morning I went to an interesting presentation on QT. This summer, there's a whole series of these presentations on various useful tools, and this is the first one I'd been to. I'd used QT a little for a project last semester (nothing I really had to write), but it sounds like there's a lot more to it than I thought. I thought it was just a GUI-maker, sort of like Java's AWT and Swing, but it seems almost like an entire language sort of based on top of C++, that lets you do all kinds of things like threads, sockets, etc. I'll have to mess around with it sometime.

  • 6/11/07, 10:00: So, I get to start writing a parallel version of the temperature-of-a-bar problem now...I'm going to be using MPI to do it, but I really have no clue how to use it. It doesn't look incredibly complicated on the surface, but it seems like one of those things that has a lot of random little details that cause you a lot of problems until you understand them.

  • 6/8/07, 14:15: I finally got the printf-creating python code running, once I figured out what I needed to print, and how to do it in Fortran. For some reason, writing a program to make a print statement is much harder than it seems like it should be, but it works now. I also now have a working machine again, as this morning it decided that I had no permissions to do anything at all. According to the sysadmin, it was probably because of some confusion with switching my machine for a new one, but it's all better now and likes me again :).

  • 6/7/07, 15:15: Yay! I got my *new* computer today (turns out the one I had been assigned before was really somebody else's), so no more swapping of monitors, computers, etc. I also have a shiny working version of the temperature-of-a-bar problem, written in Fortran. Oddly, a lot of the problems near the end were with trying to get the output to work the way I wanted- Fortran is surprisingly awkward when it comes to formatting things, I think because it goes back to the days when you printed things from printers and read them in from punch cards.

  • 6/6/07, 15:00: Today I'm pretty much just learning how all the tools and stuff work, and figuring out how to actually write a program in Fortran90. It seems to me like it's kind of an awkward language (I miss my 'for' statements, and being able to declare temporary variables close to where I need them), and there are a lot of things about it that just seem like they made sense when the language was originally developed, but really should have been changed since then. The output formatting comes to mind- I'd really like to be able to declare print statements without a newline, rather than having to do complicated things with control characters to get it back up to the previous line! But I suppose it makes more sense if your output device is a printer, rather than a screen. Overall, it's a pretty straightforward language, but I feel sort of limited in what I can do with it.

  • 6/5/07, 16:20: Today I actually have a computer and a desk in the grad area, and keys to get INTO the grad area :). Pretty much today was learning the sorts of tools we'll be working with, and sort of the overall plan of how the summer's going to go. We had a group meeting this morning to discuss the first reading "assignment" and just kind of discuss who's doing what (and introduce me to the people I'll be working with, since they've been here a week already). It seems like it should be a really interesting summer, and I think I'll learn a lot.

  • 6/4/07, 16:00: My first day at work! Unfortunately, I don't really have a desk right now, because while there are two computers there (one of which doesn't seem to be a department one, and nobody knows who it belongs to), neither is the one that's supposed to be assigned to my desk, since the new one hasn't come in yet. I should get it tomorrow, if I'm lucky. So, most of today was me learning Fortran90 (Michelle gave me a couple of books to read on it) and getting things like the NCAR logon stuff set up. Overall, there was a lot of information all at once, but it seems like it should be an interesting and fun summer.