Sangmi Pallickara: research

My research interests are in the area of Big Data for the sciences; including issues related to storage, retrievals, analysics, metadata, provenance, and visualization.

To see the complete list of publications, please visit publicstion page. link

Galileo
Galileo is a geospatial data storage system that is designed to provide efficient access to time varying geospatial datasets from observational instruments. The storage system is designed to enable large-scale visualizations and processing of geospatial datasets. I designed the algorithms that are responsible for the organization and dispersion of data and also the evaluation of queries. linkVisit project page

 

ADDS
The Atmospheric Data Discovery Network Service (ADDS) enables discovery of observational, binary datasets managed at multiple data hosting services. Datasets are packaged and published by organizations at regular time intervals. ADDS parses binary datasets to generate metadata, which is used to allow random accesses to specific portions of the published data. ADDS provides programmable query interfaces to automate discovery mechanisms, and supports the binary BUFR format that is the World Meteorological Organization’s standard for observational data, and also netCDF a format often used to encode outputs of simulation models. This research is based on a collaborative effort with CIRA at Colorado State University and UCAR (University Corporation for Atmospheric Research) at Boulder.

 

Past research projects
SWARM
Swarm is a meta scheduling framework that targeted alleviating inefficiencies in batch queues normally used in high throughput computing systems that are part of Grid environments. Swarm interoperated with both Condor and Globus frameworks commonly found in these environments, and could manage executions of millions of jobs with uneven workloads. Swarm was successfully used for managing large-scale genome sequencing tasks, some of which had very long running times. Based on the expected workloads, jobs were scheduled either in smaller, local clusters or highly parallelized Grid clusters.

 

LEAD (Linked Environments for Atmospheric Discovery)
The NFS-funded Lead project for tornado predictions makes research resources such as atmospheric data from observational devices, forecasting models, and analyses available to researchers and students. As part of this effort, I designed the MyLead data cataloging system that provides programmable data cataloging features for input, output, and intermediate data required during execution of large scientific workflows for tornado predictions. Since these workflows span multiple organizations, data accesses often cross administrative boundaries and trust issues need to be resolved. I devised the TrustCell model that established end-to-end trust relationships prior to data accesses. The system relied on hierarchical trust relationships constructed from local and global trust associations to provide a measure of trustworthiness associated with data accesses.

 

CAROUSEL
The Carousel project focused on developing an environment for supporting ubiquitous accesses to real-time collaborative applications in Grid settings. Devices that were supported include portable devices, such as 3G SmartPhones and 801.11b equipped PDAs and conventional desktop PCs. As part of this project, I designed a data pipelining architecture that was formally verified using Perti Nets. I also developed a protocol for reliable communications between these pervasive devices in wireless settings.

 

 

 

| HOME | RESEARCH | TEACHING | PUBLICATIONS | ETC. |