|
|
|
| Monday, 28 February 2011 15:42 |
School on High Dimesional Design and Data ModellingDirectors: Phil Brown and Irene Poli
Monday 14th February - Henry WynnThe London School of Economics and Political ScienceUse algebraic methods to study complex experimental designs
Friday 18th March - Tom FearnUniversity College LondonChemometrics and Calibration in Near Infrared Spectroscopy Monday 21st March Mark GirolamiUniversity College LondonEfficient Sampling from High-Dimensional Distributions : MCMC on Riemann Manifolds The requirement to sample efficiently from high-dimensional densities arises in a vast number of application areas in statistics - ranging from spatial statistics to clinical proteomics. The challenges which have to be met include complex correlation structure as well as near degenerate densities along dimensions. A recent development in Markov chain Monte Carlo (MCMC) methodology appears to address a number of these issues in a systematic manner where the underlying geometric structure of statistical models is exploited in the design of transition operators for MCMC. This talk will provide a tutorial introduction to MCMC on Riemann manifolds and then study a range of high-dimensional problems considering the strengths and weaknesses of this methodology for High-D sampling.
Friday 1st April, Peter WinkerJustus-Liebig-Universität GießenThreshold Accepting in Statistics
Applied statistical research depends to a large extent on optimization techniques. Classical examples comprise parameter estimation and model selection. There is no guarantee that standard optimization tools, e.g. generalized gradient methods, are able to solve these problems efficiently. In the presence of multiple local optima or flat regions of the objective function, suboptimal results might challenge the quality of the statistical analysis based on such methods. Some examples will be discussed. During the last few years, the use of optimization heuristics is increasingly considered as a potential alternative to overcome the shortcomings of classical procedures in highly complex problem settings. After providing an attempt to classify the growing number of such methods, a specific local search heuristic, threshold accepting, is introduced and discussed in some more detail with references to several applications in statistics. Threshold accepting is particularly well suited for problems on discrete search spaces. Threshold accepting as most other optimization heuristics contains stochastic components. Thus, not only an application of such methods to statistics is of interest, but also the application of statistics to the analysis of the stochastic properties of the results produced with such tools. Some approaches are presented. Friday 15th April, Eric SchultesHedgehod Research LLCProbing the folded conformations of random-sequence RNA and proteins Genomic sequences code for RNAs and proteins that have precise structural properties mediating specific biochemical functions. Since the early days of molecular biology, it has been assumed that sequences chosen at random from sequence space (and therefore without having the benefit of natural selection) would be disordered and without biochemical function. Friday 20th May Jim GriffinUniversity of KentBayesian High Dimensional Analysis
University of Oxford Bayesian nonparametric mixture modelling of sparse signals with cluster specific variable selectionWe discuss a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a ‘sparsity prior’ representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate’s relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems including analysis of patterns of copy-number-variation present in colon cancer genomes. |
| Last Updated on Wednesday, 11 May 2011 10:30 |