Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ABSTRACT
PRINCIPAL INVESTIGATOR:Mingzhou Song
PROJECT TITLE: Genome-wide nonparametric functional dependency studies across ethnic populations
PROGRESS TOWARDS GOAL: Aim 1: In order to capture complex genotype-phenotype interaction patterns, we developed non-parametric methods because they impose no predefined mathematical forms for interactions. We tested the working hypothesis that functional dependencies can reduce false positives among alternatives such as non-directional Pearsonâ€™s chi-square tests. We used data from Personal Genome Project and tested genes with biologically validated disease-specific DNA variations to validate our approach. We have also developed an exact functional test for small-sample-sized data to deal with the limitations of asymptotic test. We will apply the method on the American Indian diabetes data. We will test the BioVU data to derive significant SNP-to-cancer dependencies. For each population group and each disease type, our method selects top ranking SNPs that have potential causal effects on the disease. Results will be compared with literature such as GWASdb. Novel conserved SNPs will be detected if consistently present across populations. Aim 2: Epistasis refers to dependent interactions among SNPs, genes or loci asserting non-additive control on the disease phenotype. Most approaches are greedy in selecting one factor at a time to identify epistatic effects. These approaches are computationally efficient but overlook certain combinatorial interaction patterns such as exclusive-or, where each factor alone appears to have no effect on the phenotype but two factors can control the phenotype precisely. In the meanwhile, other non-greedy approaches often make strong assumptions on the interaction models, e.g., multiple linear regression. We are testing our working hypothesis that a SNP, by itself a weak indicator of cancer, may still orchestrate with another SNP to predict disease phenotype in a specific environmental context. Studying combinatorial effect of multiple factors requires large sample sizes made possible by the recent Personal Genome Project, including environmental factors for each subject. We will evaluate the capacity of FUNCHISQ on published epistatic studies. Then we will first apply the method to identify epistatic effects of SNPs on the American Indian diabetes data set. Then we will screen the BioVU data set for significant dependencies of cancer on epistatic SNPs and also environmental factors through modification of the epigenome. We will select top-ranking SNP pairs that jointly have potential causal effects on phenotypes. By comparing with literature, novel epistatic interactions will be detected. Due to the huge number of SNP pairs being screened, the computation cost is expected to be much higher than single SNP inter- actions. To overcome the exponentially growing search space for multiple SNPs, we will implement parallel computing to expedite the massive amount of combinatorial calculations of FUNCHISQ. Aim 3: This aim determines if the same type of cancer can be caused by different genetic factors between ethnic groups. We will test the working hypothesis that there may be cancer-specific SNPs that are distinct between ethnic populations. We have developed comparative functional dependency methods to reveal both ethnic- and cancer-specific genotypical signals that might have been lost if data were pooled for cancer SNP identification. Here we extended CPÏ‡2 to account for interaction directionality. The difference is that in computing interaction heterogeneity, we use FUNCHISQ statistics instead of Pearsonâ€™s chi-square statistics. In CPÏ‡2, parents selected for an interaction may have no causal effect on the child, which makes the subsequent comparison meaningless. Hence we expect the requirement of functional dependencies may reduce type I errors. The BioVU project has relatively large sample sizes for different ethnic groups to carry out this comparison. The comparative strength of SNPs will be evaluated between European American and African American samples for all eight cancer types, between European American and Hispanic samples for breast and colorectal cancers, between European American and Asian samples for breast and colorectal cancers. Both conserved and unique SNP-phenotype interactions across populations will be reported. Data from minority groups have small sample sizes due to the population size and cultural traditions towards medical research. List of significant results: 1. Developed non-parametric methods because they impose no predefined mathematical forms for interactions. 2. Tested the working hypothesis that functional dependencies can reduce false positives among alternatives such as non-directional Pearsonâ€™s chi-square tests. 3. Tested genes with biologically validated disease-specific DNA variations to validate our approach using data from Personal Genome Project. 4. Developed an exact functional test for small-sample-sized data. 5. Developed comparative functional dependency methods to reveal both ethnic- and cancer-specific genotypical signals.