Annals Rev Engineering Dynamic Networks
Annals Rev Engineering Dynamic Networks
Networks
B. STIGLER,a A. JARRAH,b M. STILLMAN,c AND R. LAUBENBACHERb
a Mathematical Biosciences Institute, The Ohio State University, Columbus,
Ohio, USA
b Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
c Mathematics Department, Cornell University, Ithaca, New York, USA
INTRODUCTION
Address for correspondence: A. Jarrah, Virginia Tech, Virginia Bioinformatics Institute, Washington
Street (0477), Blacksburg VA 24061. Voice: 540-231-9456; fax: 540-231-2606.
ajarrah@vbi.vt.edu
doi: 10.1196/annals.1407.012
168
STIGLER et al. 169
Boolean algebra exploits the fact that the set {0, 1} is endowed with an
algebraic structure. Namely, we have an addition (1 + 1 = 0) and an obvious
multiplication, both of which satisfy the usual rules of arithmetic. It is natural
to consider more general finite number systems, such as the integers Z modulo
a prime integer, allowing for data discretization that is finer than the binary
ON/OFF discretization.
In this paper we assume that the set k of possible variable states forms a
finite number system called a finite field. It is well-known that every function
f : k n → k on a finite field k can be represented by a polynomial function.
Therefore we call the vector-valued function f = ( f 1 , . . . , f n ) : k n → k n
a polynomial dynamical system (PDS), where each f i : k n → k is called a
transition function. Note that each f i ∈ k[x 1 , . . . , xn ] is a polynomial in n
variables. This construction allows us to apply a rich collection of algorithms
from computational algebra and algebraic geometry.
Suppose we are given a data set D of state transition pairs (s i , t i ) for a network
on n nodes. Here, the inputs s i and the outputs t i are n-tuples of elements in
the finite field k, obtained by discretizing the real-valued measurements into
finitely many states (Dimitrova et al., manuscript submitted for publication). In
applications, k typically has less than 20 elements, depending on the dynamic
range of the data. We consider the following problem: find a PDS f : k n →
k n such that f (s i ) =t i . Note that as a consequence one also obtains a wiring
diagram for the network. In fact, the proposed algorithm consists of two parts,
the first of which infers one or a small family of most likely wiring diagrams
for the network. The second part infers a most likely dynamic model for each
of these wiring diagrams. While the second step in the algorithm could be used
as a stand-alone inference algorithm for dynamic modeling, coupling it with
the first part to provide constraints significantly improves its performance.
We present an algorithm, previously described in Jarrah et al.,1 which com-
putes all possible minimal wiring diagrams for a given data set of measure-
ments from a biochemical network and scores the diagrams. The algorithm
uses computational algebra, namely primary decomposition of monomial ide-
als, as the principal tool. By assigning a probability distribution on the set of
all minimal diagrams, we identify a single wiring diagram or a small fam-
ily of most likely diagrams. We have extended the algorithm to construct all
possible dynamic models on that wiring diagram and a most likely one is
identified by assigning a probability distribution on the set of all dynamic
models. The algorithm to compute a most likely dynamic model involves a
parameter choice, namely a total ordering on all terms of all appearing poly-
nomials. There are infinitely many such orderings. A geometric construct,
called the Gröbner fan, divides these infinitely many choices into finitely
many segments, each of which gives rise to a dynamic model. The probability
distribution is constructed on these finitely many models. We validated the
method on a Boolean data set from a published segment polarity network in
Drosophila melanogaster.
170 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
NETWORK INFERENCE
Let Dl = {(s1 , t1 ), . . . , (sm , tm )} be the data set for gene , where s i ∈ k n and
ti ∈ k. We first consider the problem of identifying all sets of variables V =
{xe1 , . . . , xeh } such that there exists a polynomial function f ∈ k[xe1 , . . . , xeh ]
where f (si ) = ti for all i = 1, . . . , r and V is minimal in the sense that if we
remove any variable x from V , there is no such function on V −{x}. Any set
that satisfies these criteria we call a minimal set.
We briefly describe the minimal-sets algorithm that finds and scores all such
minimal sets, and identifies one as the most likely by assigning a probability
distribution on the set of minimal sets. While a detailed description is available
in Jarrah et al.,1 we include it here for completeness.
We first partition the inputs {s 1 , . . . , s m } as follows. For a ∈ k, let
X a = {si ∈ k n : ti = a}.
The minimal sets with the highest score are returned by the algorithm. By
repeating this process for each gene, we obtain a family of minimal wiring
diagrams for the biochemical network.
Heuristically speaking, the minimal-sets algorithm captures minimal sets of
variable dependencies that allow for an explanation of the data as a functional
relationship, without actually computing that functional relationship. This is
left to the second part of the algorithm. If more than one possible minimal
set is returned, the family of them can be viewed as a set of experimental
hypotheses to be tested. A strength of the algorithm is that it does in fact find
ALL possible minimal wiring diagrams by completely surveying the space
172 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
For each of the minimal wiring diagrams returned by the first part of the
algorithm we now infer a unique most likely dynamic model which has this
wiring diagram as the collection of variable dependencies. As explained ear-
lier, this can be done a variable at a time, by inferring the most likely transition
function for this variable. Suppose, for a fixed gene in the network, the mini-
mal set for x is Vl = {xe1 , . . . , xeh } ⊂ {x1 , . . . , xn }. Then we are interested in
the set of all transition functions on V l given D l , that is, the set of polynomial
functions such that
Fl = { f ∈ k[xe1 , . . . , xeh ] : f ( pe1 , . . . , peh ) = a, for all p ∈ X a , a ∈ k}.
The set F can be computed similarly to how one solves a linear sys-
tem of equations.3 First, a particular solution f is computed, typically us-
ing the Lagrange interpolation formula. Then, by means of computational
algebra, the set of homogenous solutions is characterized as the ideal
I = {g ∈ k[xe1 , . . . , xeh ] : g( pe1 , . . . , peh ) = 0}. The set f + I represents the
model space, from which we can select a minimal model as follows. Compute
the normal form of f , denoted by f% I, by taking the remainder of f upon
division by the elements of I. This can be accomplished by first computing
a Gröbner basis of I, which is required for multivariate polynomial division.
However, the construction of a Gröbner basis requires fixing a term order, that
is, a total ordering on all possible monomials. Selecting a different term order
possibly results in a different normal form, thereby changing the model.
There is a geometric construction, called the Gröbner fan, which captures
the number of different forms by partitioning the set of all term orders into
cones where two term orders in the same cone give rise to the same normal
form (For more information about the Gröbner fan, see Sturmfels.4 ) Thus it is
enough to compute the normal form of f with respect to only one term order
from each cone. Then we pick the normal form that shows up most often as
the transition function for x . To be precise, suppose the Gröbner fan of I has
q cones and the distinct normal forms are { f 1 , . . . , f s } where f i is the normal
form of f with respect to only h of the q cones. Then the score of the normal
form f i is qh . This gives a probability distribution on the set of normal forms.
The normal form with the maximum score is chosen as the transition function
for x l . Repeating this for each gene in the network we obtain a dynamic model
of the network.
In essence, the Gröbner fan approach reduces the infinitely many possible
choices of term order to finitely many choices of a normal form and makes the
STIGLER et al. 173
We generated 168 state transitions from the Boolean model in Albert and
Othmer7 and applied Algorithm-1 to these data. We note that the data make
up < 0.01% of all possible state transitions.
174 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
FIGURE 1. The reverse-engineered wiring diagram for the Boolean network. Solid and
dashed edges make up the wiring diagram returned from the minimal-sets algorithm. Solid
edges represent true positives; dotted edges, true negatives; and boldface, false positives
(x11). Dashed edges arise from multiple minimal sets. For example, x8 has two highest-
scoring minimal sets: one involving x4, the other x16.
Each node had a unique highest-scoring minimal set, with the exception of
nodes 8 and 10: the data for x8 and x10 admitted two highest-scoring minimal
sets each. FIGURE 1 shows the reconstructed wiring diagram. We identified 38
edges, with all but 1 being correct, namely, the self-loop on x11, and missed 9
edges. While we failed to discover the remaining edges, all have been identified
as nonidentifiable interactions. For more details, see Jarrah et al.1
For each node, the highest-scoring normal forms associated to a choice
of minimal set, were returned. In this example, each minimal set admitted a
unique normal form, producing four distinct PDSs. The transition functions
are given below. Notice that each of the nodes 8 and 10 could take one of two
local transitions, since each of x8 and x10 has two highest-scoring minimal
STIGLER et al. 175
sets. Underlined terms represent false positives and terms in brackets are true
negatives.
f1 = x1
f 2 = x 1x 14 + x 2x 14 + x 2x 15 + x 2 + [12 terms]
f3 = x2
f 4 = x 16 + [5 terms]
f5 = x4
f 6 = x 5 + [1 terms]
f7 = x6
f 8 = x 11x 13x 20 + x 11x 13x 21 + x 4x 13 + x 11x 13 +
x 13x 20 + x 13x 21 + [37 terms]
f 8 = x 11x 13x 20 + x 11x 13x 21 + x 11x 13 + x 13x 16 +
x 13x 20 + x 13x 21 + [37 terms]
f 9 = x 8x 9 + x 8x 18 + x 8x 19 + x 8 + x 9 + x 18 + x 19
+ [6 terms]
f 10 = x 8x 9x 20 + x 8x 9x 21 + x 8x 20 + x 9x 20 + x 8x 21
+ x 9x 21 + [21 terms]
f 10 = x 8x 10x 20 + x 8x 10x 21 + x 8x 20 + x 10x 20 +
x 8x 21 + x 10x 21 + [25 terms]
f 11 = x 8x 9x 11 + x 8x 9 + x 9x 11 + x 8x 20 + x 8x 21 +
x 8 + x 9 + 1 + [31 terms]
f 12 = x 5 + 1
f 13 = x 12
f 14 = x 11x 13x 20 + x 11x 13x 21 + x 11x 13 + x 13x 20 +
x 13x 21 + [2 terms]
f 15 = x 11x 13x 20 + x 11x 13x 21 + x 11x 13 + x 13x 20 +
x 13x 21 + x 13 + [2 terms]
TABLE 1. A summary for the 4 PDSs for the Boolean network. Each value represents an
average over the terms in the transitions functions
TP FP TN
M1 (f 8, f10) 0.90 0.10 0.39
M2 (f 8 , f10) 0.90 0.10 0.39
M3 (f 8, f10 ) 0.86 0.14 0.40
M4 (f 8 , f10 ) 0.86 0.14 0.40
TP = true positives; FP = false positives; TN = true negatives.
DISCUSSION
ACKNOWLEDGMENTS
B.S. was supported by NSF Grant 0112050 and NIH Grant RO1 GM068947-
01. A.J. was partially supported by NSF Grant DMS-0511441. R.L. was par-
tially supported by NIH Grant RO1 GM068947-01 and NSF Grant DMS-
0511441. M.S. was partially supported by NSF Grant DMS-0311806.
REFERENCES