Title: Can My Program Be Fast, Energy-efficient, and Simple to Write? Abstract: The past decade saw dramatic and disruptive changes to the computing landscape, with the widespread development of handheld devices, and with power density and energy considerations becoming the primary constraints driving technology directions for embedded, mainstream, and peta/exascale computing at the high end. Non-homogeneous CPU cores and increasingly complex System-on-Chips are on the roadmap of most manufacturers. In a word, computing platforms are now heterogeneous, after decades of mass marketing homogeneous single-core x86 processors. These changes have fundamentally impacted the application development cycle. First, the effort required to efficiently map an algorithm to a particular hardware has skyrocketed, due to the increasing complexity of both the target hardware and the software stack to be used. As a consequence, determining whether the algorithm is at fault or its implementation when facing sub-par performance is a considerably time-consuming task, in turn preventing scientists to quickly evaluate new algorithmic ideas and significantly increasing the development cost of new consumer applications. Second, the number of application domains that effectively need to harness massive computing capabilities keeps increasing: medicine alone drives a high-impact demand, with automated and computer-assisted diagnosis, individual tumor-specific cancer treatment, or large-scale exploratory genome datamining have all become a computing challenge. Our research must provide usable, practical solutions to address these challenges, making scientific advances eventually possible via high-performance computing. In this talk I will present some current and upcoming research to dramatically improve productivity and performance portability on modern computing devices, spanning application modeling, compiler optimization, and architecture design. We will see how a pluri-disciplinary approach is needed to reach the level of efficiency demanded to address a variety computing challenges, with illustrations from several key applications such as in-situ automated lung tumor detection, energy-efficient camera image processing for handheld devices, or large-scale physics simulations using distributed computing.