Title: New methodologies for model selection and estimation of probabilistic graphical models.
Other Titles: New methodologies for model selection and estimation of probabilistic graphical models.
Authors: Pircalabelu, Eugen
Issue Date: 28-May-2015
Abstract: The main topic of the doctoral thesis revolves around learning the structure of a graphical model
from data. The way a researcher constructs a graphical model for a given problem is either by seek-
ing external advice from expert panels about plausible edges between nodes (thus using theory and
educated guesses) or by ’letting the data speak’ in the sense of using the data to select an appropriate
graph that fits well. All presented techniques subscribe to the latter, and starting from the data we
estimate plausible graphs informed by the inherent relations in the data.
Probabilistic graphical models are more and more present in the statistical and machine learning
literature, due to (i) their capability of summarizing relevant relations that exist between variables
in the form of a graph where nodes are linked together by edges, (ii) their sound theoretical proper-
ties that represent a ‘marriage’ between graph theory and probability theory and to (iii) their direct
applicability to problems arising in image analysis, engineering, biomedical and computer sciences.
Two of the most popular graphical models are the Bayesian networks and the Markov networks
which have been extensively studied in the literature. The Bayesian networks represent the relations
between nodes based on a ‘directed acyclic’ graph, where all edges between nodes are directed and
no loops are allowed in the graph. The directionality of the edges takes center stage for this graphical
model as an edge i → j would denote an asymmetric relation between the two nodes, where i is a
parent node and j is a child node. The flow of information goes from i towards j and represents a
form of dependence where the probability density function (or probability mass function in the case
of discrete variables) of node j conditional on node i is of interest.
Since sometimes assuming such rsquo; relations from some nodes to the others is un-
desired, one can use Markov networks where nodes in the graph are connected by undirected edges.
This suggests a weaker form of association between nodes rather than the more stringent pseudo-
causal relations implied by the directed graphs.
We concentrate our efforts on estimating graphical models in different situations where the num-
ber of nodes is smaller than the available sample sizes, on high-dimensional settings where the num-
ber of nodes is larger than the sample size and on situations where the data follow a Gaussian distri-
bution or where this assumption can be relaxed. Using an fMRI dataset that contains measurements
regarding brain activity for certain subjects, we develop also a procedure for the simultaneous esti-
mation of high-dimensional undirected graphical models.
Chapter 1 gives a brief introduction into probabilistic graphical models.
In Chapter 2 we start by estimating graphs in a low dimensional setting using the focused infor-
mation criterion (FIC). We employ a hill-climbing algorithm which modifies a graph edge by edge
until one obtains a graph that minimizes the graph FIC score. We explore here two different types of
graphs, directed and undirected, and propose also an extension towards ancestral graphs.
In Chapter 3, starting from an fMRI dataset where the number of nodes in the graph is larger
than the available sample size, we develop a chain graph estimation procedure that uses penalized
nodewise models (that is, for each node in a graph we estimate its neighbors separately) where at
each node, the FIC criterion is used to select the appropriate model. In a chain graphical model one
allows for the inclusion of both directed and undirected edges.
In Chapter 4 we step outside the Gaussian assumption and present the estimation of undirected
graphical models based on the FIC score when the data at each node, are either count or 0/1 data.
The procedure is based on a nodewise scoring algorithm where a penalized generalized linear model
is used at each node.
In Chapter 5 we present the estimation of non-Gaussian directed graphical models by using copula models.The strategy is to use a hill-climbing procedure where bivariate copulas are used to
model nodes coupled by an edge.
In Chapter 6 using the fMRI dataset presented in Chapter 3, the simultaneous estimation of
high-dimensional undirected graphical models is presented. Working in a mixed scale framework,
where different brain regions can be measured on a fine or coarser scale, we present an algorithm for
estimating graphical models that takes the different coarseness levels into account.
We introduce in Chapter 7 possible applications of the FIC concentrating on a different type of
graph, known in the literature as a social network.
Chapter 8 concludes and comments upon future challenges and open questions.
Publication status: published
KU Leuven publication type: TH
Appears in Collections:Research Center for Operations Research and Business Statistics (ORSTAT), Leuven
Statistics Section

Files in This Item:
File Status SizeFormat
_PhDThesis_EugenPircalabelu.pdf Published 2948KbAdobe PDFView/Open Request a copy

These files are only available to some KU Leuven Association staff members


All items in Lirias are protected by copyright, with all rights reserved.