Home High Dimesional Design and Data Modelling
PDF Print E-mail
Monday, 28 February 2011 15:42

School on High Dimesional Design and Data Modelling

Directors: Phil Brown and Irene Poli

 

Monday 14th February - Henry Wynn

The London School of Economics and Political Science


Use algebraic methods to study complex experimental designs
The "algebraic method" in experimental design has proved useful in under standing complex aliasing structures. For any design with multiple factors and having multiple levels one can obtain one or more saturated estimable polynomial models from which significant sub-models may be fitted. In a way similar to the Nyquist theory in times series the more design points one has (eg in a sequential situations) the more terms one can have in the model, but the situation quickly becomes complicated. Some progress can be made if models are restricted to having lower "average degree" in a well defined sense. Another advantage of the algebraic method is a better understanding of interactions in complex situations.

 

Friday 18th March - Tom Fearn

University College London


Chemometrics and Calibration in Near Infrared Spectroscopy
Quantitative near infrared spectroscopy (NIR) has made extensive use of high-dimensional data since the early 1980's, and this application has been closely associated with the development of so-called chemometric methods such as partial least squares. The talk will describe some of the methodology used for NIR calibration, highlighting both successes and pitfalls.



Monday 21st March Mark Girolami

University College London


Efficient Sampling from High-Dimensional Distributions : MCMC on Riemann Manifolds

The requirement to sample efficiently from high-dimensional densities arises in a vast number of application areas in statistics - ranging from spatial statistics to clinical proteomics. The challenges which have to be met include complex correlation structure as well as near degenerate densities along dimensions. A recent development in Markov chain Monte Carlo (MCMC) methodology appears to address a number of these issues in a systematic manner where the underlying geometric structure of statistical models is exploited in the design of transition operators for MCMC. This talk will provide a tutorial introduction to MCMC on Riemann manifolds and then study a range of high-dimensional problems considering the strengths and weaknesses of this methodology for High-D sampling.

 

Friday 1st April, Peter Winker

Justus-Liebig-Universität Gießen


Threshold Accepting in Statistics

 

Applied statistical research depends to a large extent on optimization techniques. Classical examples comprise parameter estimation and model selection. There is no guarantee that standard optimization tools, e.g. generalized gradient methods, are able to solve these problems efficiently. In the presence of multiple local optima or flat regions of the objective function, suboptimal results might challenge the quality of the statistical analysis based on such methods. Some examples will be discussed.

During the last few years, the use of optimization heuristics is increasingly considered as a potential alternative to overcome the shortcomings of classical procedures in highly complex problem settings. After providing an attempt to classify the growing number of such methods, a specific local search heuristic, threshold accepting, is introduced and discussed in some more detail with references to several applications in statistics. Threshold accepting is particularly well suited for problems on discrete search spaces.

Threshold accepting as most other optimization heuristics contains stochastic components. Thus, not only an application of such methods to statistics is of interest, but also the application of statistics to the analysis of the stochastic properties of the results produced with such tools. Some approaches are presented.



Friday 15th April, Eric Schultes

Hedgehod Research LLC


Probing the folded conformations of random-sequence RNA and proteins

Genomic sequences code for RNAs and proteins that have precise structural properties mediating specific biochemical functions. Since the early days of molecular biology, it has been assumed that sequences chosen at random from sequence space (and therefore without having the benefit of natural selection) would be disordered and without biochemical function.
Yet the observed sequence diversity in biology and experiments whereby new functional sequences are isolated from synthetic combinatorial libraries suggest that the opposite - sequence space must be densely packed with well-structured and potentially useful sequences. I will discuss results from our in vitro experiments that explicitly probe the conformations of 20 individual random-sequence polyribonucleotides (85mers) and 30 individual random-sequence polypeptides (up to 70mers) using standard physical and chemical methods. Our results clearly demonstrate that structural properties usually assumed to be evolutionarily derived (such as native folding) are actually very common among randomly-generated sequences.



Friday 20th May Jim Griffin

University of Kent


Bayesian High Dimensional Analysis


Monday 23rd May, Chris Holmes

University of Oxford


Bayesian nonparametric mixture modelling of sparse signals with cluster specific variable selection

We discuss a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a ‘sparsity prior’ representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate’s relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems including analysis of patterns of copy-number-variation present in colon cancer genomes.


Last Updated on Wednesday, 11 May 2011 10:30