From Monomorphic to Polymorphic Well-Typings and Beyond

. Type information has many applications, it can be used for optimized compilation, termination analysis, error detection, . . . . How-ever logic programs are typically untyped. A well-typed program has the property that it behaves identically with or without type checking. Hence the automatic inference of a well-typing is worthwhile. Existing inferences are either cheap and inaccurate, or accurate and expensive. By giving up the concept that all calls to a predicate have types that are instances of a unique polymorphic type but instead allowing multiple polymorphic typings for the same predicate, we obtain a novel strongly-connected-component-based analysis that provides a good com-promise between accuracy and computational cost.


Introduction
While type information has many useful applications, e.g., in optimized compilation, termination analysis, documentation, debugging, . . . , as a matter of fact, most logic programming languages are untyped. In [3], Mycroft and O'Keefe propose a polymorphic type schema for Prolog which makes static type checking possible and has the guarantee that well-typed programs behave identically with or without type checking, i.e., the types do not affect the execution. While there is plenty of work on automatic type inference for logic programs, it was, to the best of our knowledge, not until [1] that a method was introduced to automatically infer a well-typing for logic programs. The paper describes how to infer a so-called monomorphic well-typing which derives a type signature for every predicate. The well-typing has the property that the type signature of each call is identical to that of the predicate signature. Below is a code fragment, followed by the results of the inference. Note that the list type is not the standard one. The reason is that app/3 is called once with lists of a's and b's and once with lists whose elements are the former lists. The well-typing constraint, stating that both calls must have the same signature as the predicate app/3 enforces the above unnatural solution. Hence, the monomorphic type inference is not so interesting for large programs as they likely use many different type instances of base predicates.
In a language with polymorphic types such as Mercury, [6], one typically declares app/3 as having type app(list(T),list(T),list(T)). The first call instantiates the type parameter T with the a type elem defined as elem ---> a ; b while the second call instantiates T with list(elem).
The sets of terms denoted by these polymorphic types list(elem) and list(list(elem)) are proper subsets of the monomorphic type list. For instance, the term [a|b] is of type list, but not of type list(elem). Hence, polymorphic types allow for a more accurate characterization of program terms.
The work in [1] also sketches the inference of a polymorphic well-typing. However, the rules presented there are incomplete. We refer to [4] for a comprehensive set. In this paper, we revisit the problem of inferring a polymorphic typing. However, we impose the restriction that calls to a predicate that occur inside the strongly connected component (SCC) that defines the predicate (for simplicity we refer to them as recursive calls) have the same type signature as the predicate. Other calls, appearing higher in the call graph of the program have a type signature that is a polymorphic instance of the definition's type. The motivation of the restriction is that it can be computationally very demanding when a recursive call is allowed to have a type that is a true instance of the definition's type. Henglein [2] showed that type checking in such a setting is undecidable, and Schrijvers and Bruynooghe [4] have strong indications of similar undecidability for type inference. Applied on the above program fragment, one obtains the following well-typing: :-app(list1(T),list2(T),list2(T)). :-p(list2(list2(elem)). :-call app1(list1(elem), list2(elem), list2(elem)). :-call app2(list1(list2(elem)), list2(list2(elem)), list2(list2(elem))).
Both list1 and list2 are renamings of the standard list type, hence this welltyping is equivalent to what one would declare in a language such as Mercury.
Note that the erroneous call spoils the type of app/3. Indeed, the type list2(T) has an extra case with the functor b. Moreover, it is not clear from the type information which call is at the origin of the spoiled type. This, together with the complexity of the polymorphic analysis motivated us to consider yet another setting where we derive types SCC by SCC. For the lowest SCC, defining app/3, we obtain: Note the stream(T) type for the second and third argument. This is a welltyping. Nothing in the definition enforces that the list structure is terminated by an empty list, hence this case is absent in the type for second and third argument. Note that none of the two types is an instance of the other one.
This reveals that in the SCC of p/1, app/3 is called with types that are instances of lists. These instances are represented as monomorphic types; with a small extra computational effort, one could separate them in the underlying polymorphic type and the parameter instances.
This reveals that eblist2 is not a standard list and that it is the first call to app/3 that employs this type.
This example shows that the SCC based polymorphic analysis provides more useful information than the true polymorphic one. It is interesting in another aspect. It gives up the usual concept underlying polymorphic typing that each predicate should have a unique principal typing and that all calls should have a type that is an instance of it. Indeed, the types of the first and second call are equivalent to instances of the type signatures app(list1(T),blist2(T),blist2(T)) and app(list1(T),list2(T),list2(T)) respectively, where the types list1(T) and list2(T) are the standard polymorphic list types but blist2(T) is defined Our contributions are the following: -We propose a new and efficient polymorphic type analysis based on an SCC by SCC traversal of the program. -We compare our approach with two other analyses, a cheap but inaccurate monomorphic analysis and an accurate but expensive polymorphic analysis. -Our small evaluation shows the respective merits of the different analyses.
In the rest of this abstract, we describe the different well-typings and their inference in more detail and we end with a small evaluation of their merits.

Problem Statement and Background Knowledge
Logic Programs Our syntax of logic programs is defined as follows: Pred, Functor and Var refer to sets of predicate symbols, function symbols and variables respectively. Elements of the first two sets are denoted with strings starting with a lower case, whereas elements of Var start with an upper case.
Types We adopt the terminology of Mercury [6] for our types. They are built from a number of type constructors t 0 , t 1 , . . . and type variables φ 0 , φ 1 , . . .: whereτ stands for τ 1 , . . . , τ n and the type constructors t are defined by a type definition, which is a finite set of type rules of the form: where f i are distinct function symbols and all type variables inτ i also appear in φ. No two type rules have the same type constructor in the left-hand side. Typing Judgements A predicate signature is of the form p(τ ) and declares a type τ i for every argument of predicate p. A type environment E for a program P is a set of typings X : τ , one for every variable X in P, and of predicate signatures p(τ ), one for every predicate p in P, and a type definition.
A typing judgement E e : τ asserts that e has type τ for the type environment E and E e : asserts that e is well-typed.
A typing judgement is valid if it respects the typing rules of the type system. We will consider three different type systems, but they differ only in one place, namely in the typing of predicate calls in rule bodies. Figure 1 shows the typing rules for all the other language constructs, common to all type systems. The Var rule states that a variable is typed as given in the type environment. The Term rule constructs the type of a compound term; the other rules state the well-typing of atoms and that a program is well-typed when all its parts are.  The different ways of well-typing a call are given in Figure 2. For the monomorphic analysis, the well-typing of a call is identical to that of the predicate in the environment (MonoCall rule). For the other two analyses, this only holds for the recursive calls (RecCall rule). The polymorphic analysis requires that the type of a non-recursive call is an instance (under type substitution θ) of the type of the predicate (PolyCall rule), while the SCC based analysis (SCCCall rule) requires that the well-typing of the call in Γ -which is p(τ 1 , . . . , τ n )-is such that there exists a typing environment (that can be different from Γ ) with the following properties: the subprogram defining the predicate (subprog(p/n)) is well-typed in Γ and the predicate signature of p/n is p(τ 1 , . . . , τ n ) itself. Note that this implies that there exists a polymorphic type signature for p/n such that p(τ 1 , . . . , τ n ) is an instance of it; however, that polymorphic type can be different for different calls.
In all three analyses, we are interested in minimal solutions. Informally: fewer cases in a type rule is better and one type is better than another, when the latter is equivalent to an instance of the former.

The Monomorphic Type Analysis
The monomorphic type system is simple. It requires that all calls to a predicate have exactly the same typing as the signature (rule MonoCall in Figure 2).
The monomorphic type inference (first described in [1]) consists of three phases: (1) Derive type constraints from the program text. (2) Normalize (or solve) the type constraints. (3) Extract type definitions and type signatures from the normalized constraints. A practical implementation may interleave these phases. In particular, (1) may be interleaved with (2) via incremental constraint solving. We discuss the three phases in more detail below.
Phase 1: Type Constraint Derivation For the purpose of constraint derivation we assume that a distinct type τ is associated with every occurrence of a term. In addition, every defined predicate p has an associated type signature p(τ ) and every variable X an associated type τ ; these are respectively denoted as pred (p(τ )) and var (X) : τ . The associated type information serves as the initial assumption for the type environment E; initially all types are unconstrained. Now we impose constraints on these types based on the program text. For the monomorphic system, we only need two different kinds of type constraint: τ 1 = τ 2 : the two types are (syntactically) equal, and τ ⊇ f (τ ): the type definition of type τ contains a case f (τ ). Figure 3 shows what constraints are derived from various language constructs. The unlisted language constructs do not impose any constraints. ti : τ i pred (p(τ1, . . . , τn)) (a :-g) ∈ P p(t1, . . . , tn) ∈ g (Head) t1 : τ 1 . . . tn : τ n pred (p(τ1, . . . , τn)) p(t1, . . . , tn):-g ∈ P τ i = τi Phase 2: Type Constraint Normalization In the constraint normalization phase we rewrite the bag of derived constraints (the constraint store) to propagate all available information. The normalized form is obtained as the fixed point of but three rewrite steps. The first rewrite step drops trivial equalities.
Phase 3: Type Information Extraction Type definitions and type expressions are derived simultaneously from the normal form of the constraint store. We infer the type expressions from: -A type α that does not appear as the first argument in a ⊇ constraint gets as its type expression a unique type variable A. -A type τ that appears on the lhs of a ⊇ constraint is assigned a unique type name t. This type name has as its arguments the type variables A i of corresponding types α i such that τ α i 1 are the type expressions of the τ i .
The type definitions follow from the type expressions in a straightforward manner. For each type τ with type expression t(A) we get a type definition t(A) −→ . . .. This definition contains one case f (. . .) for each constraint τ ⊇ f (τ ), where the argument expressions are the type expressions of the types τ .
Properties It is fairly easy to see that by the above approach we get a sound, complete and terminating algorithm.
Of particular interest is the time complexity of normalization: Theorem 1 (Time Complexity). The normalization algorithm has a nearlinear time complexity O(n · α(n)), where n is the program size and α is the inverse Ackermann function.
The α(n) factor follows from the Unif step, if implemented with the optimal union-find algorithm.

The Polymorphic Type Analysis
The polymorphic type system relaxes the monomorphic one. The type signature of non-recursive predicate calls is an instance of the predicate signature, rather than being identical to it. In [4], a type inference is described that allows this also for recursive calls. constraint derivation to the current setting). It has a complex set of rules because there is propagation of type information between predicate signature and call signature in both directions. A new case in a type rule from a type in the signature of a call is propagated to the corresponding type in the signature of the definition and in turn propagated to the corresponding types in the signature of all other calls. There, it can potentially interact with other constraints, leading to yet another case and triggering a new round of propagation. Experiments indicate a cubic time complexity.

The SCC Type Analysis
We summarize briefly the procedure for the SCC-based type inference. The strongly connected components (SCCs) of a program are sets of predicates, where each component is either a singleton containing a non-recursive predicate or a maximal set of mutually recursive predicates. There is a partial order on the SCCs; for components s 1 and s 2 , s 1 s 2 iff some predicate in s 1 depends (possibly indirectly) on a predicate in s 2 .
Basic Analysis In the SCC type analysis we first compute the SCCs and topologically sort them in ascending ordering wrt , yielding say s 0 , . . . , s m . Type constraints for the clauses for s i are then generated using the same rules as in Figure 3, except that in the Call rule applied to non-recursive calls we use a renamed copy of the signature of the called predicate, and extend the solved type constraints for s 0 , . . . , s i−1 by a renamed version of the type rules for the types in the renamed signature. Thus each call to a predicate in a lower SCC has its own type which does not interfere with calls to the same predicate elsewhere (interference which is unavoidable in the polymorphic analysis).
The complexity of the SCC type generation and solution is now related to the number of generated constraints including the renamed copies. This appears to be roughly quadratic in the size of the program, since the number of copies is proportional to m * k * (k + 1)/2 where k is the number of SCCs and m is the number of calls.
The above description requires that all (solved) type constraints of the lower SCCs s 0 , . . . , s i−1 are copied for a non-recursive call in SCC s i . In practice, it is not necessary to copy all these constraints, only those constraints relevant for the predicate definition. In other words, we may project all constraints on the variables in the predicate signature, before copying. Usually projection yields a much smaller set of constraints.

Evaluation
We evaluated the three algorithms on a suite of small programs (see Appendix A of [5]), also used in [1]. The monomorphic analysis finishes quickly, in less than 1 ms on a Pentium 4 2.00 GHz. The SCC analysis provides more accurate analysis in 1 to 3 times the time of the monomorphic analysis. The complex polymorphic analysis lags far behind; it's easily 10 to 100 times slower.
A contrived scalable benchmark based on the first example in this paper shows that the monomorphic and SCC analyses can scale linearly, while the polymorphic analysis exhibits a cubic behavior. The scalable program, which varies with n, is constructed as follows:  [5]) provokes the worst-case quadratic behavior of the SCC analysis, which is still much better than the cubic behavior and constant factors of the polymorphic analysis.

Conclusion and Future Work
Within the framework of polymorphic well-typings of programs, it is customary to have a unique principal type signature for predicate definitions and type signatures of calls (from outside the SCC defining the predicate) that are instances of the principal type. We have presented a novel SCC-based type analysis that gives up the concept of a unique principal type and instead allows different calls to have type signatures that are instances of different well-typings of the predicate definition. This offers two advantages. Firstly, it is much more efficient than a true polymorphic analysis and is only slightly more expensive than a monomorphic one. In practice, it scales linearly with program size. Secondly, when an unexpected case appears in a type rule (which may hint at a program error), it is easy to figure out whether it is due to the predicate definition or to a particular call. This information cannot be reconstructed from the inferred types in the polymorphic and the monomorphic analyses.
In future work we plan to investigate the quality of the new analysis by performing type inference on Mercury programs where all type information has been removed and comparing the inferred types with the original ones.