![]() |
CS575 Parallel Processing - Campus/Online/GS511 - Home |
|
|
|
|||
|
On campus MapReduce / Hadoop
project presentations in
final exam week Wednesday May 14, 9:10 -- 11:10.
Final test on campus: Monday May 5 in class, distance people in the finals week. Questions will be about the working and the performance of parallel algorithms on parallel machine models, e.g. "Discuss the execution of the PRIM Parallel Minimal Spanning Tree algorithm on a four processor distributed memory machine for a particular given graph. How are the datastructures distributed. Show the stages in the parallel execution." MAP REDUCE Please limit the number of mappers to 100. Some of you are running with 1000 to 2000 mappers. This will not give good performance. On the contrary, there is an overhead on starting mappers. At the same time, some people use 1 reducer. So the mappers take seconds, where the reducer takes hours. Try to have a more balanced approach. Google/IBM Data Center Tokens are now available. Please e-mail wim bohm to request a token. You will get a token and instructions on how to use it. System support for the cluster is available at ibmcloud@us.ibm.com. Message from Tim Renner: Using Hadoop with more than just Java For all of you who haven't talked to me yet, if you're planning on using Hadoop with a language other than Java and want to test it out on our machines, make a directory and "cp ~trenner/pub/hadoop* ." into it. Read hadoop-howto.txt to get it up and running... I've given C++ and python examples of WordCount in the hadoop/tim directory. I haven't given this a try yet on the actual cluster, but Hadoop does the job of distributing data out of the distributed FS and I'm using the same version they are... but just as a warning ;) If I get it up and running on the cluster and it's different I'll write up an update and see if I can get Wim to distribute it too... Hopefully this gets you jump started if you haven't gotten anything working yet... Some notes: The way this seems to work is that the Java wrapper pulls data out of the DFS and distributes it to many mappers via STDIN. The mapper then dumps its output to STDOUT, which is fed to one or more reducers via STDIN. The reducer dumps its output to STDOUT and that is written back into the DFS as output files. As far as I can tell, your mapper and reducer are running as separate processes on the local machine, so you likely have local file access, but not distributed... Also, there is *no* g++ or gcc on either the gateway or job submission machine, which leads me to believe that other libraries are likely missing, so if you're using C++, compile your code statically so that everything is linked in and you won't be relying on any .so files that may or may not be there on the machine... -Tim Prerequisite CS 475: Parallel Programming, or an upper division parallel programming course. Description This is a graduate level course on parallel computing with the objective to familiarize students with the fundamental concepts, techniques and tools of parallel computing. Participation in this course will enable you to better use parallel computing in your application area, and will prepare you to take advanced courses in more specific areas of parallel computing. The schedule page contains the weekly schedule, links to lecture notes, quizzes, homework etc. This semseter we are trying to unite the campus and online version of cs575 as well as general studies GS511. Moreover, we will record the lectures synchronized with powerpoint lecture notes. We will do a project using a Google/IBM data center, where we will learn how to use 1000s of machines to operate on 100s of terabytes of data. NOTICE ONE: The lecture notes are brief. They are not meant to be complete study material, but initial pointers to what needs to be studied. For more complete material, see the resources page. NOTICE TWO: The quizzes are worth very few points, and we DROP your worst three quiz results. They are there for you to make sure you are getting the material, and whether you are studying enough. Upon completion of this course you will
News Flash
|
|
Copyright © 2002-2005: Colorado State University. All rights reserved.