Title: New methodologies for model selection and estimation of probabilistic graphical models. Other Titles: New methodologies for model selection and estimation of probabilistic graphical models. Authors: Pircalabelu, Eugen Issue Date: 28-May-2015 Abstract: The main topic of the doctoral thesis revolves around learning the structure of a graphical modelfrom data. The way a researcher constructs a graphical model for a given problem is either by seek-ing external advice from expert panels about plausible edges between nodes (thus using theory andeducated guesses) or by ’letting the data speak’ in the sense of using the data to select an appropriategraph that fits well. All presented techniques subscribe to the latter, and starting from the data weestimate plausible graphs informed by the inherent relations in the data.Probabilistic graphical models are more and more present in the statistical and machine learningliterature, due to (i) their capability of summarizing relevant relations that exist between variablesin the form of a graph where nodes are linked together by edges, (ii) their sound theoretical proper-ties that represent a ‘marriage’ between graph theory and probability theory and to (iii) their directapplicability to problems arising in image analysis, engineering, biomedical and computer sciences.Two of the most popular graphical models are the Bayesian networks and the Markov networkswhich have been extensively studied in the literature. The Bayesian networks represent the relationsbetween nodes based on a ‘directed acyclic’ graph, where all edges between nodes are directed andno loops are allowed in the graph. The directionality of the edges takes center stage for this graphicalmodel as an edge i → j would denote an asymmetric relation between the two nodes, where i is aparent node and j is a child node. The flow of information goes from i towards j and represents aform of dependence where the probability density function (or probability mass function in the caseof discrete variables) of node j conditional on node i is of interest.Since sometimes assuming such rsquo; relations from some nodes to the others is un-desired, one can use Markov networks where nodes in the graph are connected by undirected edges.This suggests a weaker form of association between nodes rather than the more stringent pseudo-causal relations implied by the directed graphs.We concentrate our efforts on estimating graphical models in different situations where the num-ber of nodes is smaller than the available sample sizes, on high-dimensional settings where the num-ber of nodes is larger than the sample size and on situations where the data follow a Gaussian distri-bution or where this assumption can be relaxed. Using an fMRI dataset that contains measurementsregarding brain activity for certain subjects, we develop also a procedure for the simultaneous esti-mation of high-dimensional undirected graphical models.Chapter 1 gives a brief introduction into probabilistic graphical models.In Chapter 2 we start by estimating graphs in a low dimensional setting using the focused infor-mation criterion (FIC). We employ a hill-climbing algorithm which modifies a graph edge by edgeuntil one obtains a graph that minimizes the graph FIC score. We explore here two different types ofgraphs, directed and undirected, and propose also an extension towards ancestral graphs.In Chapter 3, starting from an fMRI dataset where the number of nodes in the graph is largerthan the available sample size, we develop a chain graph estimation procedure that uses penalizednodewise models (that is, for each node in a graph we estimate its neighbors separately) where ateach node, the FIC criterion is used to select the appropriate model. In a chain graphical model oneallows for the inclusion of both directed and undirected edges.In Chapter 4 we step outside the Gaussian assumption and present the estimation of undirectedgraphical models based on the FIC score when the data at each node, are either count or 0/1 data.The procedure is based on a nodewise scoring algorithm where a penalized generalized linear modelis used at each node.In Chapter 5 we present the estimation of non-Gaussian directed graphical models by using copula models.The strategy is to use a hill-climbing procedure where bivariate copulas are used tomodel nodes coupled by an edge.In Chapter 6 using the fMRI dataset presented in Chapter 3, the simultaneous estimation ofhigh-dimensional undirected graphical models is presented. Working in a mixed scale framework,where different brain regions can be measured on a fine or coarser scale, we present an algorithm forestimating graphical models that takes the different coarseness levels into account.We introduce in Chapter 7 possible applications of the FIC concentrating on a different type ofgraph, known in the literature as a social network.Chapter 8 concludes and comments upon future challenges and open questions. Publication status: published KU Leuven publication type: TH Appears in Collections: Research Center for Operations Research and Business Statistics (ORSTAT), LeuvenStatistics Section

Files in This Item:
File Status SizeFormat
_PhDThesis_EugenPircalabelu.pdf Published 2948KbAdobe PDFView/Open Request a copy

 These files are only available to some KU Leuven Association staff members