The main topic of the doctoral thesis revolves around learning the structure of a graphical model from data. The way a researcher constructs a graphical model for a given problem is either by seek- ing external advice from expert panels about plausible edges between nodes (thus using theory and educated guesses) or by ’letting the data speak’ in the sense of using the data to select an appropriate graph that fits well. All presented techniques subscribe to the latter, and starting from the data we estimate plausible graphs informed by the inherent relations in the data. Probabilistic graphical models are more and more present in the statistical and machine learning literature, due to (i) their capability of summarizing relevant relations that exist between variables in the form of a graph where nodes are linked together by edges, (ii) their sound theoretical proper- ties that represent a ‘marriage’ between graph theory and probability theory and to (iii) their direct applicability to problems arising in image analysis, engineering, biomedical and computer sciences. Two of the most popular graphical models are the Bayesian networks and the Markov networks which have been extensively studied in the literature. The Bayesian networks represent the relations between nodes based on a ‘directed acyclic’ graph, where all edges between nodes are directed and no loops are allowed in the graph. The directionality of the edges takes center stage for this graphical model as an edge i → j would denote an asymmetric relation between the two nodes, where i is a parent node and j is a child node. The flow of information goes from i towards j and represents a form of dependence where the probability density function (or probability mass function in the case of discrete variables) of node j conditional on node i is of interest. Since sometimes assuming such rsquo; relations from some nodes to the others is un- desired, one can use Markov networks where nodes in the graph are connected by undirected edges. This suggests a weaker form of association between nodes rather than the more stringent pseudo- causal relations implied by the directed graphs. We concentrate our efforts on estimating graphical models in different situations where the num- ber of nodes is smaller than the available sample sizes, on high-dimensional settings where the num- ber of nodes is larger than the sample size and on situations where the data follow a Gaussian distri- bution or where this assumption can be relaxed. Using an fMRI dataset that contains measurements regarding brain activity for certain subjects, we develop also a procedure for the simultaneous esti- mation of high-dimensional undirected graphical models. Chapter 1 gives a brief introduction into probabilistic graphical models. In Chapter 2 we start by estimating graphs in a low dimensional setting using the focused infor- mation criterion (FIC). We employ a hill-climbing algorithm which modifies a graph edge by edge until one obtains a graph that minimizes the graph FIC score. We explore here two different types of graphs, directed and undirected, and propose also an extension towards ancestral graphs. In Chapter 3, starting from an fMRI dataset where the number of nodes in the graph is larger than the available sample size, we develop a chain graph estimation procedure that uses penalized nodewise models (that is, for each node in a graph we estimate its neighbors separately) where at each node, the FIC criterion is used to select the appropriate model. In a chain graphical model one allows for the inclusion of both directed and undirected edges. In Chapter 4 we step outside the Gaussian assumption and present the estimation of undirected graphical models based on the FIC score when the data at each node, are either count or 0/1 data. The procedure is based on a nodewise scoring algorithm where a penalized generalized linear model is used at each node. In Chapter 5 we present the estimation of non-Gaussian directed graphical models by using copula models.The strategy is to use a hill-climbing procedure where bivariate copulas are used to model nodes coupled by an edge. In Chapter 6 using the fMRI dataset presented in Chapter 3, the simultaneous estimation of high-dimensional undirected graphical models is presented. Working in a mixed scale framework, where different brain regions can be measured on a fine or coarser scale, we present an algorithm for estimating graphical models that takes the different coarseness levels into account. We introduce in Chapter 7 possible applications of the FIC concentrating on a different type of graph, known in the literature as a social network. Chapter 8 concludes and comments upon future challenges and open questions.