CS Colloquium Schedule

Link to Colorado State University Home Page

CS Colloquium Series Spring 2013: Abstracts

Innovations In Teaching Software Development Skills

William Pugh
University of Maryland

Educators have been looking at various ways to improve the classroom experience, incorporating ideas such as active learning, on-line lectures, and various communications technologies. However, the way in which programming and software development is taught has not changed much at many schools.

I'll be talking about various approaches being developed at Maryland to improve the way we teach programming and software development. Much of this has evolved through the Marmoset project, which is a web based framework for handling student project submission and evaluation. Marmoset, also known as the submit server, accepts student project submissions and provides students with limited access to test results before the project deadline. It provides various incentives for students to start work on projects early and practice test driven development. It also provides students with access to tools such as static analysis and code coverage data, and supports web-based code reviews. Code reviews include instructional code reviews (where TA's or faculty review student code), peer code reviews (where each student reviews code by two other students), and canonical code reviews (where all students are asked to review one specific code example, perhaps something from a standard library). Marmoset is open source, and used in most CS programming courses at UMD and by several other universities.

Speaker biography: Bill Pugh received a Ph.D. in Computer Science (with a minor in Acting) from Cornell University. He was a professor at the University of Maryland for 23.5 years, and in January 2012 became professor emeritus to start new adventure somewhere at the crossroads of software development and entrepreneurship.

Bill Pugh is a Packard Fellow, and invented Skip Lists, a randomized data structure that is widely taught in undergraduate data structure courses. He has also made research contributions in in techniques for analyzing and transforming scientific codes for execution on supercomputers, and in a number of issues related to the Java programming language, including the development of JSR 133 - Java Memory Model and Thread Specification Revision. Professor Pugh's current research focus is on developing tools to improve software productivity, reliability and education. Current research projects include FindBugs, a static analysis tool for Java, and Marmoset, an innovative framework for improving the learning and feedback cycle for student programming projects.

Professor Pugh has spoken at numerous developer conferences, including JavaOne, Goto/Jaoo in Aarhus, the Devoxx conference in Antwerp, and CodeMash. At JavaOne, he received six JavaOne RockStar awards, given to the speakers that receive the highest evaluations from attendees.

Professor Pugh spent the 2008-2009 school year on sabbatical at Google, where, among other activities, he learned how to eat fire.

Research and Engineering Challenges in FindBugs

William Pugh
University of Maryland

I'll talk about some of the research and engineering issues in FindBugs, a static analysis tool for finding errors in Java programs (and other languages that compile to Java byte code). FindBugs has been downloaded more than a million times, incorporated into all major commercial static analysis tools, and is used tens of thousands of time a day worldwide. After a brief review of FindBugs, I'll talk about the design of the type qualifier analysis built into FindBugs, the challenges of annotation-driven frameworks, the null dereference analysis used by FindBugs, and new ideas about how FindBugs could be made user extensible.

Bio-Inspired Supercomputing

Hamid R. Arabnia
University of Georgia

The two major issues in the formulation and design of parallel multiprocessor systems are algorithm design and architecture design. The parallel multiprocessor systems should be so designed so as to facilitate the design and implementation of the efficient parallel algorithms that exploit optimally the capabilities of the system. From an architectural point of view, the system should have low hardware complexity, be capable of being built of components that can be easily replicated, should exhibit desirable cost-performance characteristics, be cost effective and exhibit good scalability in terms of hardware complexity and cost with increasing problem size. In distributed memory systems, the processing elements can be considered to be nodes that are connected together via an interconnection network. In order to facilitate algorithm and architecture design, we require that the interconnection network have a low diameter, the system be symmetric and each node in the system have low degree of connectivity. For most symmetric network topologies, however, the requirements of low degree of connectivity for each node and low network diameter are often conflicting. Low network diameter often entails that each node in the network have a high degree of connectivity resulting in a drastic increase in the number of inter-processor connection links. A low degree of connectivity on the other hand, results in a high network diameter which in turn results in high inter-processor communication overhead and reduced efficiency of parallelism. Reconfigurable networks attempt to address this tradeoff. In this presentation, we discuss our design of a reconfigurable network topology that is targeted at medical applications; however, others have found a number of interesting properties about the network that makes it ideal for applications in computational biology as well as information engineering. The design that will be presented in this talk is a bio-inspired reconfigurable interconnection topology (the work is based on an ongoing project).

Hamid R. Arabnia received a Ph.D. degree in Computer Science from the University of Kent (Canterbury, England) in 1987. Arabnia is currently a Professor of Computer Science at University of Georgia (Georgia, USA), where he has been since October 1987. His research interests include Parallel and distributed processing techniques and algorithms, supercomputing, interconnection networks, and applications (in particular, in image processing, medical imaging, bioinformatics, knowledge engineering, and other computational intensive problems). Dr. Arabnia is Editor-in-Chief of The Journal of Supercomputing published by Springer; Co-Editor of Journal of Computational Science published by Elsevier; and serves on advisory/editorial boards of 35 other journals.

How we teach impacts students learning, performance, and persistence: Results from three recent studies of Peer Instruction in Computer Science

Beth Simon
University of California, San Diego

What a course “is” and “does” can be viewed through the lens of instructional design. Any course should be based around the learning goals we have for students taking the course – what it is we want them to know and be able to do when they finish the course. Describing how we go about supporting students in achieving those goals can be broken into two parts: a) the content/materials we choose to cover and b) the methods/pedagogical approaches we employ. In this talk I review the results of three about to be published studies looking at the impact of method or pedagogical approach in computing courses. Specifically, I’ll review our experience using the Peer Instruction method (aka “clickers”) in computer science courses at UC, San Diego and discuss the following:

a) an observed 50% reduction in fail rate in four computing courses adopting PI,

b) an in-situ comparison study showing Peer Instruction students to perform 6% better than students in a standard “lecture” setting, and

c) a 30% increase in retention of majors after adopting a trio of best practices in our introductory programming course (Peer Instruction, Media Computation, and Pair Programming).

Randomized Motion Planning: From Intelligent CAD to Computer Animation to Protein Folding

Nancy Amato
Texas A and M

Motion planning arises in many application domains such as computer animation (digital actors), mixed reality systems and intelligent CAD (virtual prototyping and training), and even computational biology and chemistry (protein folding and drug design). Surprisingly, a single class of planners, called probabilistic roadmap methods (PRMs), have proven effective on problems from all these domains. Strengths of PRMs, in addition to versatility, are simplicity and efficiency, even in high-dimensional configuration spaces.

In this talk, we describe the PRM framework and give an overview of several PRM variants developed in our group. We describe in more detail our work related to virtual prototyping, computer animation, and protein folding. For virtual prototyping, we show that in some cases a hybrid system incorporating both an automatic planner and haptic user input leads to superior results. For computation animation, we describe new PRM-based techniques for planning sophisticated group behaviors such as flocking and herding. Finally, we describe our application of PRMs to simulate molecular motions, such as protein and RNA folding. More information regarding our work, including movies, can be found at http://parasol.tamu.edu/~amato/ Speaker Biography

Whats wrong with GPGPU and how to fix it

Sanjay Rajopadhye
Colorado State University

GPUs (Graphics Processing Units) are currently the hottest topic in computing. They were originally developed as special purpose accelerators for graphics rendering, since the game industry needed very high performance on this niche domain. Over the past decade or so, many smart people realized that these special chips could be used for much more than graphics. Initially this effort was very difficult, and led by "heroic/masochistic programmers" who were willing to sweat blood to get the last ounce of performance. In 2006 NVIDIA, one of leading manufacturers of GPUs, released CUDA, a programming API to expose GPU computing to the "merely macho" programmers, but this was sufficient. There is now a thriving ecosystem around CUDA and OpenCL (a similar programming standard put out by another industry consortium). On the hardware side, many of the top-500 supercomputers in the world today have GPU-based co-processors (called accelerators), and are delivering unprecedented performance.

However, using GPUs for general purpose computing is doomed to failure. As we move the next, exascale generation, the dominating cost metric will be energy. I will demonstrate, using arguments from the IPDPS 2011 keynote talk by Bill Dally (CTO, NVIDIA) why GPUs are terribly energy inefficient when it comes to computations that have dependences. I will argue that fixes to these problems proposed in the literature are making GPUs more and more like CPUs. This is not necessarily a bad idea in and of itself, but for niche, special-purpose application domains, it is "throwing the baby away with the bathwater." I will propose, for an application class called dense stencil computations, a new, simple extension to GPU architectures, called SPU. I will discuss the impact of the new architecture on the programming API, the run-time system, and ultimately domain specific compilation to this new target.

Multiple Testing under Dependence with Applications to Genome-wide Association Studies

Jie Liu
University of Wisconsin

Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null, and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.

The Effect of Contact Lenses on Iris Recognition

Kevin Bowyer
University of Notre Dame

This talk will (a) introduce iris recognition as a means of verifying a person’s identity, (b) discuss how wearing contact lenses affects the accuracy of iris recognition, and (c) present results of algorithms developed to automatically detect whether or not the eye in an iris image is wearing a contact lens. We consider clear, prescription contact lenses, and “cosmetic” or “patterned” lenses. It has long been believed that wearing clear prescription contact lenses does not affect the accuracy of iris recognition. However, recent results show that wearing clear prescription contacts does have a (smaller) effect on recognition accuracy.

This talk should be understandable to persons not working in biometrics or computer vision.

Biography

Privacy Preserving Record Linkage (PPRL): Opportunities & Challenges

Ramki Thurimella
University of Denver

PPRL is an area within computer science that has many practical applications. It is at the intersection of databases, algorithms, security and privacy, and cryptography. In this talk, PPRL is motivated and introduced. After giving some simple background material, the central problem is introduced and current approaches are sketched. Along the way, several important research questions that pertain to private record linkage from different perspectives are identified. Other than some undergraduate computer science knowledge, no other background is assumed.

Biography: Dr. Ramki Thurimella has twenty five years of experience teaching and doing research in computer science. He has a B.E. in Electrical Engineering from Osmania, M.Tech. and PhD in Computer Science from IIT Chennai and the University of Texas at Austin respectively. After spending 2 years as a research associate at the University of Maryland, College Park, he joined the University of Denver (DU). He has been with DU since 1991 where he is currently a professor and the chair of the computer science (CS) department.

His research interests are in algorithm design and information security iand privacy. Some recent projects in information security he has worked on include location-based privacy (joint with Dr. Dewri) which appeared in the February issue of IEEE Computer) and the application of data mining techniques to intrusion detection (as part of PhD supervision of Dr. Treinen). At DU, Dr. Thurimella regularly teaches courses on computer security, different flavors of algorithm design, programming languages, and algorithms.

Advances in High-Performance Computing: The Race to the Top

Tarek El-Ghazawi
George Washington University

In recent years, the world has seen an unprecedented international race for the leadership in supercomputing. Over the past decade or so, the top ranking supercomputer has been moving in circles every several months between the US, Japan and China. In addition, over almost one decade, the speed of the top supercomputer has risen four orders of magnitude and is moving rapidly to be measured in exaflops, or million trillion calculations per second, units that are not used today by industry. The secret behind this international out-of-control race is perhaps because leadership in supercomputing means technological and eventually economic leadership. This talk will provide a quick in-depth characterization to the advances in High-Performance Computing in the past 20 years. In the light of this, it will also provide some projections, future trends, and identify some of the open research issues in High-Performance Computing.

Speaker biography: Dr. Tarek El-Ghazawi (http://hpcl.seas.gwu.edu/tarek/) is director of the GW Institute for Massively Parallel Applications and Computing Technologies (IMPACT) and the NSF Industry University Center for High-Performance Reconfigurable Computing (CHREC), and oversees the GWU program in HPC. El-Ghazawi’s research interests include high-performance computing, computer architectures, reconfigurable, embedded computing and computer vision. He is one of the principal co-authors of the UPC parallel programming language and the first author of the UPC book from John Wiley and Sons. El-Ghazawi has published well over 200 refereed research publications and his research has been frequently supported by Federal agencies and industry, including NSF, DARPA, DoD, and NASA to name a few. He served in many editorial roles including an Associate Editor for the IEEE Transactions on Computers, and chaired many international technical symposia. He also serves on a number of advisory boards. Professor El-Ghazawi is a Fellow of the IEEE and a Research Faculty Fellow of the IBM Center for Advanced Studies, Toronto. He is a member of the Phi Kappa Phi national honor society and an elected member of the IFIP WG10.3 and a Fulbright Senior Scholar. Professor El-Ghazawi is a recipient of the 2011 Alexander Schwarzkopf prize for technical innovation.

Cameras as Computational Devices

Hank Dietz
University of Kentucky

As electronic sensors replaced film in cameras, not much appeared to change. However, modern digital cameras contain computers. Instead of merely simulating film, the camera can use the sensor and optics to intelligently capture data for more sophisticated processing -- doing things no film camera could. This talk will introduce two very different computational photography concepts that we've been developing. The first is a method by which a commodity camera can be used to capture scene depth data in a single shot. Combining a better understanding of optics with appropriate processing produces images for "3D" viewing, allows refocus after capture, etc. The second concept involves a completely new way of thinking about camera sensors -- in which the sensor itself is a massively-parallel computer constructed using millions of nanocontrollers.

Speaker biography: Upon completing his Ph.D. at Polytechnic University (now NYU-Poly) in 1986, Henry G. (Hank) Dietz joined the Computer Engineering faculty at Purdue University's School of Electrical and Computer Engineering. In 1999, he moved to the University of Kentucky, where he is a Professor of Electrical and Computer Engineering and the James F. Hardymon Chair in Networking. Despite authoring approximately 200 scholarly publications mostly in the fields of compilers and parallel processing, his group is best known for the open source research products it has created: PCCTS/Antlr compiler tools, Linux PC cluster supercomputing, SWAR (SIMD Within a Register), FNNs (computer-evolved Flat Neighborhood Networks), MOG (MIMD On GPU), etc. Most of his work is freely distributed via Aggregate.Org. Dietz also is an active teacher, and was one of the founders of the EPICS (Engineering Projects In Community Service) program. He is a member of ACM, IEEE, and SPIE.

Highly Collaborative Bioinformatics in Partnership with Post-Genome Medicine

Thomas L. Casavant
University of Iowa

Since 1995, large-scale gene discovery and mapping focused on disease gene mutation discovery and understanding, have been at the heart of research efforts in the CLCG and CBCB at the University of Iowa. Together with faculty in the UI Carver College of Medicine, advanced computational, mathematical and high-performance networking techniques have been developed and deployed to help isolate genes that carry normal and diseased functions of interest in humans and model organisms. To date, these methods have been used to identify hundreds of disease-causing gene mutations, as well as disease-associated genetic loci and predictive and/or diagnostic genomic markers of human disease. After presenting a broad overview of bioinformatics and computational genomics work at Iowa, some specific software solutions will be presented, including a recent phenotype-driven analysis of next-generation clinical sequencing data. Then, I will present a practical solution to the computationally demanding problem of identifying a class of genes known as Xenologs – genes found in a species, which arose from horizontal gene transfer. A multi-granularity parallel solution is presented which reduces the computation time of a comprehensive Xenolog search from years to hours of computational time. The talk will conclude with a discussion of unique challenges involved in bioinformatics training programs, and efforts at Iowa aimed at addressing this growing need.

Speaker Biography: Dr. Casavant has a B.S. in Computer Science (1982), and M.S. (1983) and Ph.D. (1986) in Electrical and Computer Engineering and has conducted extensive research and development in the design and analysis of parallel/distributed computing systems, algorithms, and tools. He is Professor of Electrical and Computer Engineering, Biomedical Engineering, Ophthalmology, and Genetics. He holds the Roy J. Carver, Jr. Chair in Bioinformatics and Computational Biology and has been Director of the UI Center for Bioinformatics and Computational Biology (CBCB) since its founding in 2002. Since 1994, he has been involved in research in computational methods and tools for Gene Discovery, Mapping, Transcriptome Analysis and Disease Gene Identification/Isolation. His research interests span the areas of computational aspects of genomics, molecular biology, and human genetics, as well as software engineering, high performance computing systems and networks.

Research Challenges in Green Computing, and Some Solutions

Ishfaq Ahmad
University of Texas at Arlington

Energy is one of the most valuable and scarce resources available to humanity, a significant portion of which is now being consumed to power up computers and their accessories. In particular, high-performance parallel and distributed computing systems, including data centers, supercomputers, clusters, real-time systems, embedded architectures, and grids not only consume considerable amounts of power but also require extensive air-conditioning. The explosive growth in computing is leading to rapidly increasing consumption of precious natural resources such as oil and coal, strengthening the looming danger of an energy shortage. The conversion of these resources to electricity results in carbon emissions that can negatively affect the environment, a threat that is continually escalating. However, energy saving usually comes at the expense of performance. Power-aware “green” computing requires a comprehensive and multi-disciplinary approach that involves myriad research challenges. In this talk, we give an overview of various research challenges encountered in green or energy-efficient computing. We also present our research activities pertaining to various aspect of energy and performance optimization in parallel and distributed computing environments. We propose multi-objective optimized scheduling algorithms and tools that can save energy while ensuring performance.

Combinatorial Algorithms for Switching Classes

Nathan Lindzey
Colorado State University

Given a graph G, a vertex switch on a vertex v results in a new graph where neighbors of v become non-neighbors of v, and the non-neighbors of v become neighbors of v. This operation gives rise to an equivalence relation over the set of all directed labeled graphs on n vertices. The equivalence class of G with respect to the switching operation is commonly referred to as G's switching class.

We show that switching classes can be used to asymptotically speed up several super-linear graph algorithms. The current techniques for speeding up graph algorithms are all somewhat involved insofar that they employ sophisticated pre-processing, data-structures, and/or assume the RAM model to achieve at most an O(log(n)) speedup for sufficiently dense graphs. Our methods are elementary and can result in super-polylogarithmic speed ups. In particular, we achieve better bounds for diameter, transitive closure, bipartite maximum matching, general maximum matching. The talk is intended for a general CS audience and will conclude with a few open problems in the area.