Recent Research and Publication Summary
- Purna Gamage, Souparno Ghosh, Philip Gipson, Greg Pavur. “A Model Based Data Fusion Algorithm using Bayesian Hierarchal Modeling for Density Estimation of Rare Species,” Thesis Publication.
Estimating relative abundance of a species is one of the most important problems arising in ecology. Traditionally, such estimates are obtained using capture-mark recapture methodologies. Non-invasive procedures, for example, camera trap surveys have also been used extensively. However, such methodologies are not efficient when the focal species is relatively rare and exhibits cryptic behavior. When species are endangered, non-invasive techniques may be essential in order to avoid further threats to survival of individuals. Over the past decade, scent detection dogs were extensively trained to identify the scats of focal species and they have been used to perform scat surveys to assess occurrence of that species in particular geographical regions. Besides detection of presence, the relative abundance can also be estimated from DNA analysis of the collected scats. But Scat detection dog survey followed with the DNA analysis is very expensive.
Camera traps can be used to cover a large area. Therefore, the dogs can be used on the high dense areas discovered by the camera trap survey. In this study, a data fusion technique is developed to combine camera trap and scat surveys to draw inference on the relative abundance of the target species. The major challenge lies in developing a coherent model that can handle the discrete sampling protocol induced by camera traps and the continuous search paths of scat surveys. A Bayesian hierarchical extension of Spatial Capture recapture method (SCR) was used, which is specifically developed to perform inference on the abundance of unmarked or partially marked populations, to combine these two types of data sources.
Because, the standard form of Spatially explicit capture recapture (SECR) model cannot be used on camera trapping data, due to its unattainable requirement of unique identification of all individuals. This is not possible in many situations as capturing individuals physically is too intrusive for a vulnerable population and the poor quality of the photographs taken in camera trapping survey prevents reliable identification.
This non-invasive Bayesian approach produces density estimates of a population with relatively high accuracy when individuals cannot be uniquely identified or detected with certainty. Since this methodology completely avoids the necessity of capturing, marking and recapturing members of such populations, it suits studies of rare and fragile populations. The standard SECR method was used on scat detection-dog survey data, to estimate the density, since the DNA analysis of the scats managed to identify individuals uniquely. This model is applied to estimate the density of swift foxes during a study conducted in west Texas, during seasons or stages of the life cycle associated with breeding, by the Swift Fox Team in the Natural Resource Management Department, collaborated with Mathematics and Statistics Department at Texas Tech University.
Publications In Preparation
- Anderson Monken1, Flora Haberkorn1, Uma Krishnaswamy1, Purna Gamage2, Feras Batarseh3, 1- Federal Reserve Board of Governors; 2- Georgetown University; 3- Virginia Polytechnic Institute and State University, Harnessing AI Methods to Improve Multi-Country Macroeconomic Forecasting
We are going to harness the power of neural networks to explore the interdependency of country-level macroeconomic indicators to improve forecasting. Our recent work on graph neural networks for international trade demonstrates that network-based analysis of economic data can yield strong results. We plan to use a variety of macroeconomic indicators to predict GDP/CPI using a spatiotemporal graph neural network that applies neighborhood effects so weak GDP in one country can affect close neighbors. This prediction will be performed jointly so that we are optimizing a model that can predict macroeconomic indicators for all countries in the dataset. Comparisons to the performance of dynamic factor models (DFMs) and traditional forecasting techniques will show the benefits of this AI-based approach.
- Purna Gamage1, Adam Imran1, Patrick Miguel Aquino1 , Miao Wang1, Beixuan Jia1, Yunfei Zhang1, (1-Georgetown University), “Predicting the risk levels of diseases caused by heavy metal exposure in the US. (Using NHANES data from multiple years).”
As heavy metal usage is becoming increasingly prevalent and unavoidable in every industry, it is paramount to understand what exactly the health consequences associated with them are. Wide distribution of heavy metals in the environment, due to industrialization and large-scale farming, is a serious concern due to their toxicity in humans and other forms of life. These toxic metals impact the environment, by circulating and eventually accumulating throughout the food chain. Diseases such as anemia, chronic kidney disease (CKD), and cardiovascular disease (CVD) are major health concerns in the world. Environmental exposure is a contributory factor for these diseases. Heavy metals (e.g. arsenic, lead, mercury) in pesticides is often thought of as a contributor to the development of these diseases. The exposure to heavy metals can be caused through diet, environment, medication or a cause of work or play and it can enter the body through inhaling, ingesting or through the skin.
The framework laid out in this paper shows what metals are highly correlated with certain health conditions specifically: anemia, chronic kidney disease (CKD), and cardiovascular disease (CVD) and implement a prediction model that can predict the risk levels of developing these diseases according to the amount of heavy metals in our urine/blood. Poisoning by heavy metal can cause a variety of hematological disorders. Anemia is a hematological disorder, most commonly seen in children and young females. Common symptoms of anemia are tiredness, lethargy and pale color of skin. The heavy metals most commonly associated with hematologic toxicity are arsenic and its derivative arsine, copper, gold, lead, and zinc. Exposure to arsenic generally causes megaloblastic anemia whereas lead and cadmium causes hypochromic microcytic anemia. Iron deficiency anemia
is a common form of anemia around the world that increases the absorption of other elements such as lead (Pb) and cadmium (Cd). Therefore, in patients with hypochromic microcytic anemia, the serum levels of these elements may increase causing deterioration of anemia.
CVD is the leading cause of death in the world. Risk factors of CVD are age, male gender, positive family history, smoking, and lack of physical activities. In addition to these traditional risk factors, environmental exposure plays a major role. The potential association between chronic heavy metal exposure, like arsenic, lead, cadmium, mercury, and CVD has been less well defined. Heavy metal exposure impairs antioxidant metabolism and causes oxidative stress, which could be a potential mechanism that leads to increased CVD. CKD is a common progressive disease that is typically characterized by the permanent loss of functional nephrons. As CKD continues to progress, glomerular filtration rate decreases, and remaining nephrons are unable to effectively eliminate metabolic wastes and environmental toxicants from the body. Common causes for CKD are diabetes, and hypertension. However, environmental toxins such as heavy metals and agrochemicals causing nephrotoxicity leading to CKD is a major health concern in certain parts of the world. Chronic exposure to arsenic, cadmium, and mercury are linked to development of CKD.
We propose a series of statistical models: logistic regression, LDA, LASSO/RIDGE, support vector machines and will use some other Machine learning tools. All of the methods use NHANES laboratory data cleaned through the processes of: imputing missing values, synthetic minority oversampling, and collinearity analysis in hopes to predict the likelihood that someone would have a disease based on the amount of time they have spent with certain metals. Cross-validation and bootstrap are used to validate the standard errors of our model. Based on research arsenic, cadmium, lead, copper, and mercury have a large correlation with CKDs and CVDs. It is very important to understand the way in which these metals affect these diseases since they are widely spread in the environment and exposure to one or more of these metals is unavoidable. Since these metals are widely spread in the environment and exposure to one or more of these metals is unavoidable, it is important to understand the way in which these metals affect these diseases. Hence, it would be highly beneficial to implement a prediction model that can predict the higher risk of developing these diseases according to the heavy metal exposure.
- Geethanjalee Mudunkotuwa1, Dr. Leif Ellingson1, Dr. Purna Gamage2, Dr. Dushani Palliyaguru3, 1- Texas Tech University, 2 – Georgetown University, 3 – National Institute on Aging – NIH, “Predicting Biological Age using Biomarker data from publicly available human health data sets by using Machine Learning techniques.”
The average life span of people has increased over time. At present the average life expectancy of people has become 60 years or beyond. By 2050, the world’s population aged 60 years and older is expected to total 2 billion, up from 900 million in 2015. The rate of population aging has significantly increased all over the world. The population aging: the shift in distribution of a country’s population has started in high income countries and
currently low- and middle-income countries are experiencing a noticeable change. By the middle of the century, it is expected that countries such as China, Chile, the Islamic Republic of Iran, and the Russian Federation will show a similar proportion of older population as in Japan. Increase in life expectancy will pave the way to many opportunities such as further education, new careers and new ways to contribute to families. However, the extent and the contribution of these opportunities will heavily depend on the health of the older population.
“Aging can be defined as progressive physiological changes in an organism that lead to senescence (old age), or a decline of biological functions and of the organism’s ability to adapt to metabolic stress”. (Source: Encyclopedia Britannica). According to National Institute on Aging -NIH, aging goes along with gradual changes in most systems of the body. Cognizance of the cellular and molecular processes underlying the body changes as well as accompanying the age-related diseases is the focus of research on biology of aging.
At present, the world is experiencing the effect of increasing life expectancy of human beings and the trend of increasing aged population. Aging is defined as the gradual functional and structural decline of an organism, which results in increasing risk of diseases, impairment, and mortality over the life span. To have a better assessment of rate of aging of an individual, new approaches should be developed which provides more power than the Chronological Age.
In 1969, the idea of age-related biological changes was proposed initially by Alex Comfort. It is believed that the age-related biological changes can be quantified through the identification and measurement of biomarkers of aging. Biomarker is a measurable substance in an organism whose presence is indicative of some phenomenon such as disease, infection, or environmental exposure. So far, a noticeable amount of work has carried out towards identifying and measuring biomarkers which are related to aging of humans as well as animals. However, literature suggests that there is no single biomarker that measures the rate of aging accurately. According to literature, using multiple biomarkers or merging of multiple biomarkers into single latent variable may explain the complex aging process better.
Even though, many researches were carried out on measuring biological aging using biomarkers, there is no general agreement on the methods and validity of them. Past studies suggest many mathematical and statistical methods such as Multiple Linear Regression (MLR), Principal Component Analysis (PCA), Klemera and Doubal Method (KDM). However, limited number of studies have used Machine Learning Techniques such as Regression Trees, Random Forests and Artificial Neural Networks (ANN) to predict biological aging using biomarkers. Therefore, this study specially focuses on using machine learning techniques to predict biological age.
According to literature, there is a limitation in comparison of the above methods and validating as the intrinsic value of biological Aging is not possible to measure. Therefore, common criteria should be used to evaluate the reliability and validity of the above methods. The biomarkers which describe BA should satisfy the following criteria.
a) A biomarker should be able to predict biological age than Chronological age.
b) Biomarkers should be able to predict the remaining life span of and disease specific mortality in a population of which 90% of individuals are still alive.
c) The method of measurement should not affect life expectancy or any future age-related measurements .
Considering the above criteria, this study is focused on 2 areas.
1. Predicting Biological age using traditional statistical methods such as MLR.
2. Predicting Biological age using Machine Learning Techniques such as Regression Trees and Artificial Neural Networks (ANN).
- Purna Gamage, Souparno Ghosh, Philip Gipson, Greg Pavur. “Comparing precision of population density estimates using non-invasive survey methodology,” Computational and Mathematical Methods – Wiley Online Library.
- Greg Pavur, Philip Gipson, John Baccus, Purna Gamage,Souparno Ghosh, Colton Laws, Manuel Deleon. “Status of swift foxes (Vulpes velox) in Texas,” The Southwestern Naturalist.
The Massive Data Institute (MDI) Grant:
- Submitted grant proposals at the McCourt School of Public Policy for research seed grants and data training grants (Each proposal is attached herewith).
- Won the Data Training Grant but the Research Seed Grant was rejected.
Research Seed Grant:
Analysis of heavy metal exposure on human body for the US population and to predict the association between diseases and heavy metals (using NHANES data from multiple years). Even though this grant proposal was rejected, this problem is very important, interesting and need to do further investigation on. Therefore, I’m writing a paper on this research problem.
Data Training Grants:
MDI Steering Committee has approved my proposal for an MDI training grant for $3,300 to attend the Strata Data Conference. They believe my efforts can help propel Georgetown forward in advanced use of cutting tools in quantitative methods and data science. The intent, of course, is that these funds will put me in a position to apply for grants in the future and they look forward to hearing more details on where I apply and how that goes.