High dimensional genomic studies and multiplicity
Multivariate, high-dimensional data allow scientists to ask many questions. To appropriately evaluate the significance of the patterns that emerge, one has to account for the look-everywhere effect or for the data-driven selections.
False discovery rate appears an appropriate criteria for global error and we are interested in its adaptation to genomic settings. Multivariate linear models are often a powerful first step in understanding the dependence structure of multiple variables: one of the problems we tackle is how to carry out model selection in this context in the presence of a large number of genomic explanatory variables.
Multivariate phenotypes - GTEX
Current genomics data often contains information on a large number of phenotypes: how to capitalize on this? We are involved in a large study of endophenotypes for bipolar disorder, which has motivated methods development.
We are also receiving funding from GTEx to develop statistical methods to identify eQTL with high sensitivity and at a low false positive rate, across multiple tissues. This is a collaboration with the group of Eleazar Eskin at UCLA.
Resequencing studies and identification of functional variants
Thanks to the decrease of sequencing costs we can acquire a comprehensive picture of genomic variations. How can statistical methods help to identify variants that are more likely functional?
DNA copy number variants reconstruction
Raw data from high-density genotyping arrays and resequencing can be used to
reconstruct DNA copy number. We have been involved both in method
development and data analysis projects.
Association mapping
We are generally interested in association mapping.
We have contributed to the development of
a Bayesian method for haplotype mapping. We have been quite
interested in the problems of multiple comparison in association genomescans, and worked on approaches to account for hidden population structure.
Linkage disequilibrium
I have been interested for a long time in how to measure linkage
disequilibrium and in the variations of LD across the genome and
across populations.
High density SNP genotyping
We developed models for intensity values of the Affymetrix and
Illumina genotyping arrays to be used in genotype calls, linkage studies, and loss of heterozygosity studies. In general, we are interested in understanding the measurements error associated with novel technologies.
Gene regulation networks
To recover the dynamic behavior of regulatory proteins and their
pathway of influence on cell behavior, we have combined
sequence analysis with results of gene expression array
We developed a sparse hidden component model to link transcription
factors activity to gene expression.
Most recently, we are interesed in incorporating measurements of methylation levels in our models.
Gene expression array denoising
Gene expression arrays represent a formidable tool, as they allow
investigation of thousand of genes at the same time. However, in order
to exploit at best their potential, one has to be able to deal successfully with the statistical issue involved in their analysis.
We have suggested a de-noising approach based on thresholding.
Using a Bayesian hierarchical model and an approach to multiple
comparison that is inspired by the False Discovery Rate, we denoise the signal coming from multiple array experiments with the specific goal of identifying the genes that are up-regulated or down-regulated in a given condition.
Genomic scale identification of promoter binding sites
One of the best understood mechanisms of transcription regulation is the action of regulatory proteins: binding on the up-stream region of a gene act either as promoters or suppressors.
We have developed a stochastic dictionary model to identify the position of known binding sites on a genome-wide scale. We use this information to improve the clustering of array experiments and to reconstruct the regulatory network.
Our model organism for these investigations has been E. Coli.
High Throughput Screens
In collaboration with Koppany Visnyei and Harley Kornblum we developed methods for the analysis of high-throughput screen
data. Denise Ferrari has put together an R
software package that implements our suggested pre-processing.
