|
High dimensional genomic studies and multiplicity
|
Multivariate, high-dimensional data allow scientists to ask many questions. To appropriately evaluate the significance of the patterns that emerge, one has to account for the look-everywhere effect or for the data-driven selections.
False discovery rate appears an appropriate criteria for global error and we are interested in its adaptation to genomic settings. Multivariate linear models are often a powerful first step in understanding the dependence structure of multiple variables: one of the problems we tackle is how to carry out model selection in this context in the presence of a large number of genomic explanatory variables.
Sabatti, C., S. Service, and N. Freimer (2003) "False discovery rates in linkage and association linkage genome screens for complex disorders," Genetics 164: 829-833. PMID: 12807801
Brodsky, J. (2011) "Block, Pass, Score: A Multivariate Methodology for Genome-wide Association Studies", UCLA dissertation
Sabatti, C. (2013) "Multivariate linear models for GWAS," in Advances in Statistical Bioinformatics, K. Do, S. Qin, M. Vannucci, ed.,
Cambridge University Press Preprint
Bogdan, M., E. van den Berg, C. Sabatti, W. Su, E. Candes (2014) "SLOPE -- Adaptive Variable Selection via Convex Optimization," arXiv:1407.3824
Peterson, C., M. Bogomolov, Y. Benjamini and C. Sabatti (2015) "Many Phenotypes without Many False Discoveries: Error Controlling
Strategies for Multi-Traits Association Studies," arXiv:1504.00701
Stell, L. and C. Sabatti (2015) "Genetic variant selection: learning across traits and sites," arXiv:1504.00946
|
Multivariate phenotypes - GTEX
|
Current genomics data often contains information on a large number of phenotypes: how to capitalize on this? We are involved in a large study of endophenotypes for bipolar disorder, which has motivated methods development.
We are also receiving funding from GTEx to develop statistical methods to identify eQTL with high sensitivity and at a low false positive rate, across multiple tissues. This is a collaboration with the group of Eleazar Eskin at UCLA.
Fears, S., S. Service, T. Teshiba, C. Araya, X. Araya, J. Bejarano, J. Gomez-Franco, B. Kremeyer, Z. Abaryan, I. Aldana, M. Ericson, M. Jalbrzkowski, J. Luykx, L. Navarro, N. Sharif, L. Altshuler, G. Bartzokis, J. Escobar, D. Glahn, J. Ospina-Duque, N. Risch, A. Ruiz-Linares, R. Cantor, C. Lopez-Jaramillo, G. Macaya, J. Molina, V. Reus, C. Sabatti, N. Freimer, and C. Bearden (2014) "Multi-system Component Phenotypes of Bipolar Disorder for Genetic Investigations of Extended Pedigrees" JAMA Psychiatry 71 : 375-87. PMID: 24522887
Peterson, C., M. Bogomolov, Y. Benjamini and C. Sabatti (2015) "Many Phenotypes without Many False Discoveries: Error Controlling
Strategies for Multi-Traits Association Studies," arXiv:1504.00701
Peterson, C., M. Bogomolov, Y. Benjamini, C. Sabatti (2015) "TreeQTL: hierarchical error control for eQTL findings"
|
Resequencing studies and identification of functional variants
|
Thanks to the decrease of sequencing costs we can acquire a comprehensive picture of genomic variations. How can statistical methods help to identify variants that are more likely functional?
Service, S., T. Teslovich, C. Fuchsberger, V. Ramenksy, P. Yajnik, D. Koboldt; D. Larson, Q. Zhang, L. Lin, R. Welch, L. Ding, M. McLellan, M. O'Laughlin, C. Fronick, L. Fulton; V. Magrini, P. Elliott, M. Jarvelin, M. Kaakinen, M. McCarthy, L. Peltonen, A. Pouta, L. Bonnycastle, F. Collins, N. Narisu, H. Stringham, J. Tuomilehto, S. Ripatti, R. Fulton, C. Sabatti, R. Wilson, M. Boehnke, and N. Freimer (2014) "Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci," PLoS Genetics 10: e1004147 PMID: 24497850
Stell, L. and C. Sabatti (2015) "Genetic variant selection: learning across traits and sites," arXiv:1504.00946
|
DNA copy number variants reconstruction
|
Raw data from high-density genotyping arrays and resequencing can be used to
reconstruct DNA copy number. We have been involved both in method
development and data analysis projects.
Wang, H., Y. Lee, S. Nelson, and C. Sabatti (2005) "Inferring genomic loss and location of tumor suppressor genes from high density genotypes," UCLA
Stat preprint 423,
Journal of the French Statistical Society, 146: 153-171
H. Wang, Veldink,J., R. Ophoff, C. Sabatti
(2008) "Markov models for inferring Copy
Number Variations from genotype data on Illumina platforms,"
UCLA Statistics Preprint #533 and Human
Heredity, 68: 1-22.
Stefansson, H. et al (2008) Large recurrent microdeletions associated
with schizophrenia, Nature 455 232-6.
Vrijenhoek, T., J. Buizer-Voskamp, I. van der Stelt,
E. Strengman, Genetic Risk and Outcome in Psychosis (GROUP)
Consortium, C. Sabatti, A. van Kessel, H. Brunner, R. Ophoff,
J. Veltman (2008) "Recurrent CNVs Disrupt Three Candidate Genes in
Schizophrenia Patients," The American Journal of Human
Genetics, 83: 504-510.
Zhang, Z., K. Lange, R. Ophoff, C. Sabatti (2010) "Reconstructing DNA copy number by penalized estimation and imputation," The Annals of Applied Statistics , 4: 1749-1773
Buizer-Voskamp JE, Muntjewerff JW; Genetic Risk and Outcome in Psychosis (GROUP) Consortium, Strengman E, Sabatti C, Stefansson H, Vorstman JA, Ophoff RA. (2011)
"Genome-Wide Analysis Shows Increased Frequency of Copy Number Variation Deletions in Dutch Schizophrenia Patients," Biol Psychiatry 70:655-62.
Zhang, Z., K. Lange and C. Sabatti (2012) "Reconstructing DNA copy number by joint segmentation of multiple sequences" Stanford Technical Report, Biostatistics series BIO 261
|
Association mapping
|
We are generally interested in association mapping.
We have contributed to the development of
a Bayesian method for haplotype mapping. We have been quite
interested in the problems of multiple comparison in association genomescans, and worked on approaches to account for hidden population structure.
Liu, J., C. Sabatti, J. Teng, B. Keats, and N. Risch (2001) "Bayesian analysis of haplotypes for linkage disequilibrium mapping," Genome Research 11: 1716-24. Preprint
Sabatti, C., S. Service, and N. Freimer (2003) "False discovery rates in linkage and association linkage genome screens for complex disorders," Genetics 164: 829-833. Reprint
Freimer, N. and C. Sabatti (2003) "The human phenome project,"
Nature Genetics 34: 15-21.
Reprint
Freimer, N. and C. Sabatti (2004) "Pedigree, sib-pair, and
association studies of common diseases; genetic mapping and
epidemiology," Nature Genetics 36:
1045-1051.
Reprint
Sabatti, C. (2006) "Comment on the `Likelihood-Based
Inference on haplotype effects in genetic association studies' by Lin
and Zeng," Journal of the American Statistical
Association 101: 104-106. (Invited contribution.)
Service, S., The international collaborative group on isolated
populations, C. Sabatti, N. Freimer (2007)
"Tag SNPs chosen from HapMap perform well in several population
isolates," Genetic Epidemiology, Epub ahead of print.
Freimer, N. and C. Sabatti (2007) "Human genetics: variants
in common diseases." Nature 445: 828-30. (Invited contribution.)
Ayers, K., C. Sabatti and K. Lange (2007) "A dictionary model for
haplotyping, genotype calling, and association mapping"
Genetic Epidemiology 31 : 672-683.
Sabatti. C., S. Service, A. Hartikainen, A. Pouta, S. Ripatti,
J. Brodsky, C. Jones, N. Zaitlen, T. Varilo, M. Kaakinen, U. Sovio,
A. Ruokonen, J. Laitinen, E. Jakkula, C. Lachlan, C. Hoggart,
P. Elliott, A. Collins, H. Turunen, S. Gabriel, M. McCarthy, M. Daly,
M-R. Jarvelin, N. Freimer, L. Peltonen (2009) "Genomewide association
analysis of metabolic phenotypes in a birth cohort from a founder
population," Nature Genetics, 41: 35-46.
Kang, H., J-H. Sul, S. Service, N. Zaitlen, S.Kong, N. Freimer, C. Sabatti*, E. Eskin* (2010) "Variance component model to account for sample structure in genome-wide association studies," Nature Genetics, 42 : 348-354.
Teslovich TM et al. (2010) "Biological, clinical and population relevance of 95 loci for blood lipids,"
Nature 466:707-713.
|
Linkage disequilibrium
|
I have been interested for a long time in how to measure linkage
disequilibrium and in the variations of LD across the genome and
across populations.
Sabatti, C. and N. Risch (2002) "Homozygosity and linkage
disequilibrium," Genetics 160: 1707-1719. Preprint
Sabatti, C. (2002) "Measuring dependence with volume tests," The American Statistician
50: 191-195. Preprint
Ayers, K., C. Sabatti, and K. Lange (2006)
"Reconstructing ancestral haplotypes with a dictionary model,"
Journal of Computational Biology, 3, 3: 767-785.
Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006)
"Linkage disequilibrium and haplotype homozygosity in population
samples genotyped at a high marker density," Human Heredity ,
62 : 175-189.
Chen, Y., C. Lin, C. Sabatti (2006) "Volume measures for linkage
disequilibrium," BMC Genetics
7:54
|
High density SNP genotyping
|
We developed models for intensity values of the Affymetrix and
Illumina genotyping arrays to be used in genotype calls, linkage studies, and loss of heterozygosity studies. In general, we are interested in understanding the measurements error associated with novel technologies.
Sabatti, C. and K. Lange (2005) "Bayesian Gaussian mixture models for high density genotyping arrays," UCLA
Stat preprint 421,
to appear in JASA.
Wang, H., Y. Lee, S. Nelson, and C. Sabatti (2005) "Inferring genomic loss and location of tumor suppressor genes from high density genotypes," UCLA
Stat preprint 423,
Journal of the French Statistical Society, 146:
153-171.
Wang, H., C. Lin, S. Service, The international collaborative group on isolated populations, Y. Chen, N. Freimer, C. Sabatti (2006)
"Linkage disequilibrium and haplotype homozygosity in population
samples genotyped at a high marker density," Human Heredity ,
62 : 175-189.
|
Gene regulation networks
|
To recover the dynamic behavior of regulatory proteins and their
pathway of influence on cell behavior, we have combined
sequence analysis with results of gene expression array
experiments.
We developed a sparse hidden component model to link transcription
factors activity to gene expression.
Most recently, we are interesed in incorporating measurements of methylation levels in our models.
Sabatti, C., L. Rohlin, M. Oh, and J. Liao. (2002) "Co-expression pattern from DNA microarray experiments as a tool for operon prediction,"
Nucleic Acid Research 30: 2886-2893. Reprint
Liao, J., R. Boscolo, Y. Yang, L. Tran, C. Sabatti, and
V. Roychowdhury (2003) "Network component analysis: reconstruction of
regulatory signals in biological systems," Proceedings of the
National Academy of Science 100: 15522-15527. Reprint
Kao, K., Y. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury, and J. Liao (2004) "Determination of multiple transcription regulator activities in Escherichia coli using network component analysis," Proceedings of the National Academy of Science 101: 641-646. Reprint
Sabatti, C. and G. James (2006)
"Bayesian sparse hidden components analysis for transcription regulation networks,"
Bioinformatics, 22: 739-746.
James, G., Sabatti, C., Zhou, N. and Zhu, J. (2010) "Sparse Regulatory Networks," The Annals of Applied Statistics , 4: 663-686.
|
Gene expression array denoising
|
Gene expression arrays represent a formidable tool, as they allow
investigation of thousand of genes at the same time. However, in order
to exploit at best their potential, one has to be able to deal successfully with the statistical issue involved in their analysis.
We have suggested a de-noising approach based on thresholding.
Using a Bayesian hierarchical model and an approach to multiple
comparison that is inspired by the False Discovery Rate, we denoise the signal coming from multiple array experiments with the specific goal of identifying the genes that are up-regulated or down-regulated in a given condition.
Sabatti, C., S. Karsten, and D. Geschwind (2002) "Thresholding rules for recovering a sparse signal from microarray
experiments,"
Mathematical Biosciences 176: 17-34. Preprint
Erickson, S. and C. Sabatti (2005) "Empirical Bayes estimation of a sparse vector of gene expression," Statistical Applications in
Genetics and Molecular Biology, 4 :22.
|
Genomic scale identification of promoter binding sites
|
One of the best understood mechanisms of transcription regulation is the action of regulatory proteins: binding on the up-stream region of a gene act either as promoters or suppressors.
We have developed a stochastic dictionary model to identify the position of known binding sites on a genome-wide scale. We use this information to improve the clustering of array experiments and to reconstruct the regulatory network.
Our model organism for these investigations has been E. Coli.
Sabatti, C. and K. Lange (2002) "Genomewide motif identification using a dictionary model," IEEE Proceedings 90: 1803-1810. Preprint
Sabatti, C., L. Rohlin, K. Lange, and J. Liao (2005) "Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites," Bioinformatics 21: 922-931. Preprint
|
High Throughput Screens
|
In collaboration with Koppany Visnyei and Harley Kornblum we developed methods for the analysis of high-throughput screen
data. Denise Ferrari has put together an R
software package that implements our suggested pre-processing.
Sabatti, C., K. Visnyei, H. Kornblum (2008) "Statistical
challenges in High-throughput Screens." UCLA
Stat Preprint 532
Visnyei, K., H. Onodera, R. Damoiseaux, K. Saigusa, S. Petrosyan, D. De Vries, D. Ferrari, J. Saxe, E. Panosyan, M. Masterman-Smith, J. Mottahedeh, K. Bradley, J. Huang, C. Sabatti, I. Nakano, H. Kornblum (2011)
"A molecular screening approach to identify and characterize inhibitors of glioblastoma multiforme stem cells,"
Molecular Cancer Therapeutics to appear .
|
|