Neural-Symbolic Integration A Compositional Perspective
Neural-Symbolic Integration A Compositional Perspective
5051
the loss of the semantic constraints among the different ab-
∇l
duced inputs, and the learning process becomes vulnerable differentiation
to random supervision signals on those parts of the single ♚
abuced input that are forced to take values when they should l
have semantically been treated as irrelevant. ♕ ♗ 𝜔
The second difference is on the training process itself. loss function
Prior art uses an ad-hoc training procedure which requires
training of the neural module multiple times for the same
(𝑥, 𝑓 𝑥 ) neural-module predictions
training sample. That training approach is not only com-
putationally expensive, but it is also difficult to customize
𝜑
on different scenarios. Instead, our framework provides the ‘safe’ abduction
means to control the training process in a customized man-
ner by delegating to the symbolic module the encoding of
any domain-specific training choices. In particular, there ex- symbolic-module theory 𝑇
ist cases where one would wish to have the neural predic- safe :- placed(Z1), movable(Z1).
tions guide the choice of abduced inputs — presumably the draw :- placed(Z1), \+attacked(Z1), \+movable(Z1).
mate :- placed(Z1), attacked(Z1), \+movable(Z1).
problem that also motivates prior art. We show that such
neural-guided abduction can be done easily as an extension placed(Z1) :- pos(Z1), at(b(k),Z1), pos(Z2), pos(Z3), Z2\=Z3,
piece(w(P2)), at(w(P2),Z2), piece(w(P3)),
of our basic framework, by encoding in the symbolic mod- at(w(P3),Z3).
ule the knowledge of which abduced inputs are to be used
movable(Z1) :- pos(Z2), reached(Z2,k,Z1), \+attacked(Z2).
for training, using declarative or procedural techniques to attacked(Z2) :- pos(Z3), piece(w(P)), at(w(P),Z3),
resolve any inconsistencies and to rank the abduced inputs reached(Z2,P,Z3).
in terms of compatibility with the current neural predictions. reached((X,Y),k,(PX,PY)) :- abs(X,PX,DX), 1>=DX, abs(Y,PY,DY),
1>=DY, sum(DX,DY,S), 0<S.
Beyond the plugging in of theories with any semantics reached((X,Y),q,(PX,PY)) :- reached((X,Y),r,(PX,PY)).
and syntax, and beyond the already-mentioned support for reached((X,Y),q,(PX,PY)) :- reached((X,Y),b,(PX,PY)).
...
neural-guided abduction, the clean take of our proposed
compositional architecture easily extends to support other ic :-
piece(P), at(P,Z1), at(P,Z2), Z1\=Z2.
ic :-
piece(P1), piece(P2), at(P1,Z), at(P2,Z), P1\=P2.
features found in past works, including program induction ic :-
at(b(k),Z1), at(w(k),Z2), reached(Z1,k,Z2).
and domain-wide constraints. To our knowledge, a uniform ic :-
piece(b(P1)), at(b(P1),Z1), piece(b(P2)), at(b(P2),Z2),
Z1\=Z2.
handling of all these features is not present in past works. ic :- piece(w(P1)), at(w(P1),Z1), piece(w(P2)), at(w(P2),Z2),
piece(w(P3)), at(w(P3),Z3), Z1\=Z2, Z2\=Z3, Z3\=Z1.
We empirically evaluate — in what we believe to be a
more comprehensive manner than typically found in the
relevant literature — the performance of our framework Figure 1: Snippet of a theory for an example chess domain
against three frameworks that share the same goals with being used to train a neural module through abduction.
ours: D EEP P ROB L OG (Manhaeve et al. 2018), N EUR ASP
(Yang, Ishay, and Lee 2020), and ABL (Dai et al. 2019).
We demonstrate the superior performance of our framework panied by an entailment operator |= that allows exposing: a
both in terms of training efficiency and accuracy over a wide deduction method deduce that takes as input a set of atoms
range of scenarios showing the features described above. A and produces a set of atoms O = deduce(T, A) such
that T ∪ A |= O; an abduction method abduce that takes
Preliminaries as input a set of atoms O and produces a set (out of possibly
For concreteness of exposition, and without excluding other many) of atoms A ∈ abduce(T, O) such that T ∪ A |= O.
syntax and semantics, we assume that the symbolic compo- As part of exposing a method one needs to define its input
nent encodes a logic theory using the standard syntax found and output spaces. We will assume that A ⊆ A and call A
in the abductive logic programming literature (Kakas 2017). the set of symbolic inputs or abducibles; and that O ⊆ O
As typical in logic programming, the language comprises and call O the set of symbolic outputs or outcomes. We will
a set of relational predicates that hold over variables or con- also assume that atoms in A and O are grounded and dis-
stants. An atom is a predicate with its arguments. A for- joint. When convenient, we will represent a subset of atoms
mula is defined as a logical expression over atoms, using the as a formula: the conjunction of the subset’s members. An
logical connectors of Prolog, e.g., conjunction, disjunction, abductive proof for a given outcome O ⊆ O is any formula
negation. A theory is a collection of such formulas. Figure 1 A ∈ abduce(T, O). Observe that for any fixed outcome
shows a theory for determining the status of the game of a there might exist zero, one, or multiple abductive proofs.
certain variant of chess played on a 3 × 3 board with three Example 1 In our example chess domain, the set A of ab-
pieces: a black king, and two white pieces of different types. ducibles comprises all atoms of the form at(P, (X, Y )), cor-
As far as our proposed architecture is concerned, the pre- responding to the concept of a chess piece of type P be-
cise syntax and semantics of the theory are inconsequential. ing on the chess board at coordinates (X, Y ); P takes one
We will, therefore, not delve into a detailed analysis of the of the values in {b(k), w(k), w(q), w(r), w(b), w(n), w(p)},
aforementioned theory T , except as needed to highlight cer- where w(·) and b(·) stand for white or black pieces, and
tain features. What is of only importance is that T is accom- k, q, r, b, n, and p denote the king, queen, rook, bishop,
5052
knight and pawn, respectively; each of X and Y take one {⊥}. Thus, inference in our framework proceeds by running
of the values in {1, 2, 3}. The set O of outcomes is equal to the inference mechanism of the symbolic module over the
{safe, draw, mate}, corresponding to the concepts that the inferences of the neural module on a given neural input. To
black king has a valid move, is stalemated, or is mated. simplify our notation, and when there is no confusion, we
Deduction receives as input a subset of A that describes will write hTt (x) to mean deduce(T, r(nt (x))) for x ∈ X .
the state of the chess board, and produces as output a (sin-
gleton) subset of O on the status of the black king. Con- Example 3 In our example chess domain, consider a neu-
versely, abduction receives as input a (singleton) subset of ral module nt that receives as input x ∈ X a 3 × 3 grid of
O that describes the desired status of the black king, and images representing a chess board. The neural module out-
produces as output subsets of A, each describing a state of puts a vector ω = nt (x) ∈ Ωk that corresponds to what the
the chess board where the black king has the desired status. neural module predicts. One possible implementation is for
the neural module to have eight output nodes for each cell
A theory may be extended with integrity constraints, spe- at coordinates (X, Y ) of the chess board (hence, k = 8 × 9).
cial formulas that restrict the possible inferences that can be These eight output nodes represent, respectively, whether
drawn when applying the methods of deduction and abduc- their associated cell includes no piece, the black king, the
tion, by constraining which subsets of A are considered ac- white king, the white queen, the white rook, the white bishop,
ceptable. A subset A ⊆ A violates the integrity constraints the white knight, or the white pawn. ω assigns, effectively,
if and only if deduce(T, A) is a special symbol ⊥ 6∈ O. confidence values on each of these predictions for each cell.
Analogously, a subset A ⊆ A violates the integrity con- The translator function r could simply turn ω into a set
straints if and only if A 6∈ abduce(T, O) for each subset of abducibles A by considering for each cell the most confi-
O ⊆ O. Thus, integrity constraints in a theory need to be re- dent prediction and including the corresponding atom in A.
spected by every abductive proof for each outcome O ⊆ O. Thus, if the first eight components of ω, which correspond to
Example 2 In our example chess domain, the integrity con- predictions for cell (1, 1), were such the third value was the
straints are encoded as rules with an ic head. The five in- maximum one, then A would include at(w(k), (1, 1)).
tegrity constraints in Figure 1 capture, in order, the follow- A is provided as input to the symbolic component, which
ing requirements: the same piece type is not at more than deduces whether the chess board is in a safe, draw, or mate
one position; no two pieces are at the same position; the state (or in ⊥ in case A violates the integrity constraints).
black and white kings are not attacking each other; there is
at most one black piece on the chess board; there are at most In certain cases, the input x ∈ X to a neural-symbolic sys-
two white pieces on the chess board. The requirement for the tem might be associated with explicit input-specific knowl-
existence of at least one black king and at least two white edge provided in the form of a symbolic formula x. This
pieces is captured through the rule with the placed(Z1) side-information does not go through the usual pipeline as x,
head. If the set A of abducibles is extended to include all but can be readily accommodated by extending the theory T
atoms of the form empty((X, Y )) to denote explicitly the co- to include it, and by computing deduce(T ∪{x}, r(nt (x)))
ordinates of the board cells that are empty, then additional instead. Our compositional perspective affords us to remain
integrity constraints and rules can be added in the theory to agnostic, at the architecture level, on how side-information
ensure that no piece can be placed at an empty cell, an that will be dealt with by the symbolic module (e.g., as integrity
every non-empty cell should hold some piece. constraints or as weak preferences), and puts the burden on
the theory itself to make this domain-specific determination.
Framework
We consider a neural-symbolic system built by composing a
Neural-Module Learning
neural module feeding into a symbolic module. As in standard supervised learning, consider a set of labeled
samples of the form {hxj , f (xj )i}j , with f being the target
Module Compositionality function that we wish to learn, xj corresponding to the fea-
We let X and Ω = [0, 1]k be, respectively, the space of pos- tures of the sample, and f (xj ) being the label of the sample.
sible inputs and the space of possible outputs of the neural In the context of our neural-symbolic architecture, learn-
module. At any given training iteration t, the neural module ing seeks to identify, after t iterations over a training subset
effectively implements a function nt : X → Ω. For nota- of labeled samples, a hypothesis function hTt (·) that suffi-
tional simplicity, we will overload the use of the symbol nt ciently approximates the target function f (·) on a testing
to denote both the function and the underlying neural net- subset of labeled samples. Given a fixed theory T for the
work itself. We assume that there is a translator function r symbolic module, the only part of the hypothesis function
that maps each ω ∈ Ω to a set of abducibles r(ω) ∈ A. hTt (·) = deduce(T, r(nt (·))) that remains to be learned is
Given a symbolic module with a theory T , the end-to- the function nt implemented by the neural module.
end reasoning of the neural-symbolic system at iteration t We put forward Algorithm 1 to achieve this goal. In line
is the process that maps an input in X to an outcome subset with our compositional treatment, the algorithm does not
of O as follows: the system receives an input x ∈ X ; the delve into the internals of the neural and the symbolic mod-
neural module computes the vector ω = nt (x); the translator ule, but accesses them only through the methods that they
maps ω to the abducibles A = r(ω) ⊆ A; the symbolic expose: inference and backpropagation for the neural mod-
module computes the outcome O = deduce(T, A) ⊆ O ∪ ule; deduction and abduction for the symbolic module.
5053
Algorithm 1 TRAIN(x, f (x), nt ) → nt+1 cumstances where it might be beneficial to prune some of
its parts. Caution should, however, be exercised, as pruning
1: ω ··= W
nt (x)
might end up removing the part of the abductive feedback
2: ϕ ··= W abduce(T, f (x)) . basic form or
that corresponds to the true state of affairs (cf. Example 4),
ϕ ··= abduce(T ∪ r(ω), f (x)) . NGA form
and might, thus or otherwise, misdirect the learning process.
3: ` ··= loss(ϕ, r, ω) . using WMC
One case worth considering is neural-guided abduction
4: nt+1 ··= backpropagate(nt , 5`)
(NGA), where the prediction of the neural module is used
5: return nt+1
as a focus point, and only abductive proofs that are proxi-
mal perturbations of that point find their way into the abduc-
tive feedback. What counts as a perturbation, how proximity
The algorithm considers the label f (x) of a given sam- is determined, and other such considerations are ultimately
ple, viewed as a (typically singleton) subset of O, and ab- domain-specific, and are not specified by the framework.
duces all abductive proofs A ∈ abduce(T, f (x)) ⊆ A.
Taking the disjunction of all abductive proofs, the algorithm Example 5 In our example chess domain, consider a neural
computes the abductive feedback formula ϕ that captures all module that is highly confident in distinguishing empty from
the acceptable outputs of the neural module that would lead, non-empty cells, but less confident in determining the exact
through the theory T, the system to correctly infer f (x). types of the pieces in the non-empty cells. Consider, further,
The abductive feedback acts as a supervision signal for a particular training sample hx, f (x)i on which the neu-
the neural module. Combining that signal with the actual ral component identifies the non-empty cells as being (1, 1),
output ω of the neural module (through the use of the trans- (3, 1), and (2, 3). It is then natural for the symbolic module
lator function r), we can compute the loss of the neural mod- to attempt to utilize the predictions of the neural module to
ule. Critically, the resulting loss function is differentiable, prune and focus the abductive feedback that it will provide
even if the theory T of the symbolic module is not! By dif- for the further training of the neural module.
ferentiating the loss function we can use backpropagation to If, for example, f (x) labels the chess board as being in a
update the neural module to implement function nt+1 . safe state, then the abductive feedback will exclude the last
Rather than requiring for the theory to be differentiable, as disjunct from Example 4, since it represents a chess board
done in certain past works (Donadello, Serafini, and d’Avila with pieces at cells other than (1, 1), (3, 1), and (2, 3), and
Garcez 2017; Marra et al. 2019; Serafini and d’Avila Garcez will maintain the first three disjuncts as they respect the neu-
2016; Sourek et al. 2015; van Krieken, Acar, and van Harme- ral predictions in terms of the positions of the three pieces.
len 2019; Manhaeve et al. 2018), the use of abduction for
neural-symbolic integration poses no a priori constraints on To support neural-guided abduction, we must, first, estab-
the form of the theory, but proceeds to extract its “essence” lish a communication channel between the neural module
in a differentiable form, albeit in an outcome-specific man- and the abduction mechanism, in order for the neural mod-
ner. Fortuitously, the space of possible outcomes is usually ule to provide its predictions to the abduction mechanism.
considerably restricted, which readily allows the caching of Our proposed architecture can seamlessly implement this
the abductive proofs, or even their precomputation prior to communication channel by treating the communicated in-
the training phase. Put differently, the use of abduction al- formation as input-specific knowledge. Given, therefore, a
lows replacing any arbitrary Wtheory T by the set of its abduc- training sample (x, f (x)), we can simply call the abduction
tive feedbacks {ϕO | ϕO = abduce(T, O), O ⊆ O}. method not by providing only the theory T and the outcome
f (x) as inputs, but by first extending the theory T with the
Example 4 In our example chess domain, consider a train- neural predictions ω = nt (x) as translated by the translator
ing sample (x, f (x)), where x is a 3 × 3 grid of images function r. Thus, the abductive feedback in Algorithm 1 is
representing a chess board with a white queen at cell (1, 1), now computed as ϕ ··= abduce(T ∪ r(ω), f (x)).
W
a white bishop at cell (3, 1), and a black king at cell (2, 3), As we have already mentioned, the treatment of this side-
and f (x) labels the chess board as being in a safe state. information is not determined by the framework, but is left
Starting from the label, we compute the abductive feedback to the theory itself. Although the side-information might, in
. . . ∨ [at(w(q), (1, 1)) ∧ at(w(b), (3, 1)) ∧ at(b(k), (2, 3)) ∧ some domains, provide confident predictions that could act
. . . ∧ empty((3, 3))] ∨ [at(w(r), (1, 1)) ∧ at(w(n), (3, 1)) ∧ as hard constraints for the theory (cf. Example 5), our treat-
at(b(k), (2, 3)) ∧ . . . ∧ empty((3, 3))] ∨ [at(b(k), (1, 1)) ∧ ment allows also the handling of domains where the side-
at(w(p), (3, 1)) ∧ at(w(r), (2, 3)) ∧ . . . ∧ empty((3, 3))] ∨ information might be noisy, incorrect, or even in direct vio-
[at(w(p), (1, 1)) ∧ at(w(n), (2, 2)) ∧ at(b(k), (2, 3)) ∧ . . . ∧ lation of the existing integrity constraints of the theory.
empty((3, 3))] ∨ . . .. Among the shown disjuncts, the first
Such neural predictions might still offer some useful guid-
one represents the input chess board, the next two repre-
ance to the abduction process. Depending on the syntactic
sent chess boards that are safe and have pieces only at cells
and semantic expressivity of the symbolic module, the the-
(1, 1), (3, 1) and (2, 3), and the last represents a chess board
ory can provide a declarative or a procedural way to resolve
that is safe, but has pieces at cells (1, 1), (2, 2) and (2, 3).
the inconsistencies that arise in a domain-specific manner.
Neural-Guided Abduction Example 6 In our example chess domain, consider a par-
Although computing the entire abductive feedback is gener- ticular training sample hx, f (x)i on which the prediction
ally the appropriate choice of action, there might exist cir- of the neural module, as translated by the translator into
5054
symbolic inputs, corresponds to the subset {at(w(q), (1, 1)), 2008). For the purposes of computing WMC, ϕ was first
at(w(b), (3, 1)), at(b(k), (2, 3)), . . ., empty((3, 3))}. compiled into an arithmetic circuit (Darwiche 2011).
Assume, first, that f (x) labels the chess board as being in In order to avoid recomputing the same models or the
a safe state. Then, there exists exactly one abductive proof same abductive feedbacks during training, we used caching
that matches the neural prediction perfectly. As this corre- across all the systems that we evaluated. Furthermore, we
sponds to a zero-cost perturbation of the neural prediction, encoded the theories of the symbolic modules with an eye
only it ends up in the abductive feedback. As a result, the towards minimizing the time to perform abduction, ground-
neural module ends up reinforcing exactly what it predicted. ing, or inference. Experiments were ran on an Ubuntu 16.04
Assume, now, that f (x) labels the chess board as being in Linux PC with Intel i7 64-bit CPU and 94.1 GiB RAM.
a draw state. Then, there is no abductive proof that matches
the neural prediction perfectly. Rather, there is an abductive Scenarios
proof [at(w(q), (1, 1)) ∧ at(w(r), (3, 1)) ∧ at(b(k), (2, 3)) ∧ Benchmark datasets have been used to provide inputs to the
. . . ∧ empty((3, 3))] that differs from the neural prediction neural module as follows: MNIST (LeCun et al. 1998) for
only in changing the type of an already predicted white images of digits; HASY (Thoma 2017) for images of math
piece, while maintaining its position, and also maintaining operators; GTSRB (Stallkamp et al. 2011) for images of
the types and positions of the other two pieces. This abduc- road signs. Below we describe each experimental scenario:
tive proof could be evaluated to have a minimal-cost among
the perturbations of the neural prediction, and only it ends ADD2x2 (Gaunt et al. 2017): The input is a 2 × 2 grid of
up in the abductive feedback. As a result, the neural mod- images of digits. The output is the four sums of the pairs
ule ends up reinforcing parts of what it sees, while helping of digits in each row / column. The symbolic module com-
revise locally one of its mistakes (perhaps because it is still putes the sum of pairs of digits.
unable to fully differentiate between rooks and bishops). OPERATOR2x2 (new; ADD2x2 with program induction):
Assume, finally, that f (x) labels the chess board as be- The input is a 2 × 2 grid of images of digits. The output
ing in a mate state. Then, there is no abductive proof that is the four results of applying the math operator op on the
matches the neural prediction perfectly. In fact, there are no pairs of digits in each row / column. The math operator op
abductive proofs that respect the positions of the pieces as in {+, −, ×} is fixed for each row / column but unknown.
predicted by the neural module. Abduction will then seek to The symbolic module computes the sum, difference, and
identify perturbations that, if possible, move a single piece product of pairs of digits. The neural module seeks to in-
with respect to the predicted ones, or move and change the duce the unknown operator and to recognize the digits.
type of a single piece, etc., that would respect the label f (x).
APPLY2x2 (Gaunt et al. 2017): The input is three digits
Depending on how one costs the various perturbations, one
d1 , d2 , d3 and a 2 × 2 grid of images of math operators
or more abductive proofs can be evaluated to have minimal-
opi,j . The output is the four results of applying the math
cost, and all those will end up in the abductive proof.
operators in each row / column on the three digits (e.g.,
d1 op11 d2 op12 d3 ). The symbolic module computes re-
Evaluation sults of applying pairs of math operators on three digits.
We have empirically assessed the training time and test ac- DBA(n) (Dai et al. 2019): The input is a mathematical ex-
curacy of our proposed compositional framework, hereafter pression comprising n images of {0,1} digits and math
abbreviated as N EURO L OG, against three prior approaches operators (including the equality operator). The output is
that share the same goals with us: D EEP P ROB L OG (Man- a truth value indicating whether the mathematical expres-
haeve et al. 2018), N EUR ASP (Yang, Ishay, and Lee 2020) sion is a valid equation. The symbolic module evaluates
and ABL (Dai et al. 2019). Comparing with other architec- the validity of an equation. Our DBA scenario extends that
tures, such as (Gaunt et al. 2017), which are concerned not from (Dai et al. 2019) by allowing math operators to ap-
only with neural-module learning, but also with symbolic- pear on both sides of the equality sign.
module learning, is beyond the scope of the current paper.
The code and data to reproduce the experiments are avail- MATH(n) (Gaunt et al. 2017): The input is a mathematical
able at: https://bitbucket.org/tsamoura/neurolog/src/master/. expression comprising n images of digits and math oper-
ators. The output is the result of evaluating the mathemat-
Implementation ical expression. The symbolic module computes results of
math operators on integers.
Abductive feedback in N EURO L OG was computed using the
A-system (Nuffelen and Kakas 2001) running over SICStus PATH(n) (Gaunt et al. 2017): The input is an n × n grid of
Prolog 4.5.1. Each abductive feedback ϕ was grounded (and, images of road signs and two symbolically-represented
hence, effectively propositional) by construction, which fa- grid coordinates. The output is a truth value indicating
cilitated the use of semantic loss (Xu et al. 2018) for training whether there exists a path from the first to the second
the neural module. The semantic loss of ϕ was computed by coordinate. The symbolic module determines valid paths
treating each atom in ϕ as a Boolean variable, weighted by between coordinates given as facts.
the activation value of the corresponding output neuron of MEMBER(n) (new): The input is a set of n images of digits
the neural module, and by taking the negative logarithm of and a single symbolically-represented digit. The output is
its weighted model count (WMC) (Chavira and Darwiche a truth value indicating whether the single digit appears
5055
ADD2x2 OPERATOR2x2 APPLY2x2 DBA(5) MATH(3) MATH(5)
NL OG 91.7 ± 0.7 90.8 ± 0.8 100 ± 0 95.0 ± 0.2 95.0 ± 1.2 92.2 ± 0.9
DL OG 88.4 ± 2.5 86.9 ± 1.0 100 ± 0 95.6 ± 1.8 93.4 ± 1.4 timeout
ABL 75.5 ± 34 timeout 88.9 ± 13.1 79 ± 12.8 69.7 ± 6.2 6.1 ± 2.8
NASP 89.5 ± 1.8 timeout 76.5 ± 0.1 94.8 ± 1.8 27.5 ± 34 18.2 ± 33.5
NL OG 531 ± 12 565 ± 36 228 ± 11 307 ± 51 472 ± 15 900 ± 71
DL OG 1035 ± 71 8982 ± 69 586 ± 9 4203 ± 8 1649 ± 301 timeout
ABL 1524 ± 100 timeout 1668 ± 30 1904 ± 92 1903 ± 17 2440 ± 13
NASP 356 ± 4 timeout 454 ± 652 193 ± 2 125 ± 6 217 ± 3
PATH(4) PATH(6) MEMBER(3) MEMBER(5) CHESS-BSV(3) CHESS-ISK(3) CHESS-NGA(3)
NL OG 97.4 ± 1.4 97.2 ± 1.1 96.9 ± 0.4 95.4 ± 1.2 94.1 ± 0.8 93.9 ± 1.0 92.7 ± 1.6
DL OG timeout timeout 96.3 ± 0.3 timeout n/a n/a n/a
ABL timeout timeout 55.3 ± 3.9 49.0 ± 0.1 0.3 ± 0.2 44.3 ± 7.1 n/a
NASP timeout timeout 94.8 ± 1.3 timeout timeout 19.7 ± 6.3 n/a
NL OG 958 ± 89 2576 ± 14 333 ± 23 408 ± 18 3576 ± 28 964 ± 15 2189 ± 86
DL OG timeout timeout 2218 ± 211 timeout n/a n/a n/a
ABL timeout timeout 1392 ± 8 1862 ± 28 9436 ± 169 7527 ± 322 n/a
NASP timeout timeout 325 ± 3 timeout timeout 787 ± 307 n/a
Table 1: Empirical results. NL OG stands for N EURO L OG, DL OG for D EEP P ROB L OG and NASP for N EUR ASP. The first four
rows in each table show the % testing accuracy, while the last four rows show the total training time in seconds.
in the set of digits. The symbolic module determines set suite of scenarios are not available, since D EEP P ROB L OG’s
membership of an element given as a fact. syntax does not readily support the integrity constraints (nor
the procedural constructs for the CHESS-NGA(n) scenario)
We have also used the chess domain from our running ex-
in the symbolic module. Since neural-guided abduction is
ample to highlight certain (new) features of our framework:
not supported by any of the other three systems, we report
a richer class of theories, non-declarative theories, and
results on CHESS-NGA only for N EURO L OG.
neural-guided abduction. We denote by CHESS-BSV(n)
and CHESS-NGA(n), the scenarios corresponding, respec- The results offer support for the following conclusions:
tively, to Example 4 and Example 6: in the former sce- (C1) The average accuracy of N EURO L OG is comparable
nario, the full abductive feedback is used to train the neu- to, or better than, that of the other systems. N EURO L OG per-
ral module, and in the latter scenario a non-declarative the- forms similarly to D EEP P ROB L OG on those scenarios that
ory is used to enumerate and evaluate, against the neural are supported by the latter and in which D EEP P ROB L OG
predictions, the various abductive proofs to select which does not time out, while it may perform considerably bet-
parts of the abductive feedback to retain. We also con- ter than N EUR ASP and ABL. For example, the average
sider a third variant that sits between the former two, accuracy of N EURO L OG is up to 70% higher than that of
called CHESS-ISK(n), which roughly corresponds to Ex- N EUR ASP in the MATH scenarios, and up to 40% higher
ample 5, but rather than receiving the positions of the three than that of N EUR ASP in the MEMBER scenarios.
pieces from a confident neural module, it receives them as N EUR ASP and ABL are vulnerable to weak supervision
externally-provided (and noiseless) information. In all sce- signals, as their performance decreases when the number
narios, the chess pieces are represented by images of digits. of abductive proofs per training sample increases. For ex-
ample, the average accuracy of ABL drops from 69.7% in
Results and Analysis MATH(3) to 6.1% in MATH(5), while it drops from 44%
Results of our empirical evaluation are shown in Table 1, and in CHESS-ISK(n), where each training sample is provided
in Figures 2, 3, and 4. Each system was trained on a training with the coordinates of the non-empty cells, to less than 1%
set of 3000 samples, and was ran independently 10 times in CHESS-BSV(n) where no such information is provided.
per scenario to account for the random initialization of the With regards to ABL, this phenomenon may be attributed
neural module or other system stochasticity. Training was to the consideration of a single abductive proof per train-
performed over 3 epochs for N EURO L OG, D EEP P ROB L OG ing sample instead of considering all the relevant abductive
and N EUR ASP, while the training loop of ABL was invoked proofs as N EURO L OG does. Considering a single abductive
3000 times. Note that there is no one-to-one correspondence proof may result in excluding the correct one; i.e., the one
between the training loop of ABL and that of the other three corresponding to the true state of the sample input. Notice
systems: in each iteration, ABL considers multiple training that when the number of abductive proofs per training sam-
samples and based on them it trains the neural component ple increases, the probability of excluding the right abduc-
multiple times. In all systems, the neural module was trained tive proof from consideration increases as well, resulting in
using the Adam algorithm with a learning rate of 0.001. very weak supervision signals, as seen in CHESS-BSV.
Results on running D EEP P ROB L OG on the CHESS-?(n) (C2) Compared to N EUR ASP and ABL, N EURO L OG is
5056
accuracy % / iter. ADD2x2 OPERATOR2x2 APPLY2x2 DBA(n)
Figure 2: Empirical results for N EURO L OG, D EEP P ROB L OG, and N EUR ASP. Solid lines and lightly-colored areas show,
respectively, the average behavior and the variability of the behavior across different repetitions.
less sensitive to the initialization of the neural module. For achieves comparable accuracy to N EURO L OG in less train-
example, the accuracy of N EUR ASP spans 15%–94% in ing time in the ADD2x2 and DBA scenarios. Its training time
MATH(3), and that of ABL spans 57%–94% in APPLY2x2. is also lower than that of N EURO L OG in the two MATH(n)
With regards to ABL, this sensitivity may be, again, at- scenarios, however, for these cases its accuracy is very poor.
tributed to the consideration of a single abductive proof per (C4) When compared to CHESS-BSV(3), the use of side-
training sample. The learning process of ABL obscures and information in CHESS-ISK(3) and CHESS-NGA(3) leads to
abduces part of the neural predictions, so that the modified asymptotically faster training. The higher training time dur-
predictions are consistent with the theory and also lead to the ing the earlier iterations, which is particularly pronounced in
entailment of the sample label (see the “Related Work” sec- the CHESS-NGA(3) scenario (see Figure 4), corresponds to
tion). Considering a single abductive proof has high chances the phase where new abductive proofs are still being com-
of missing the right one, and hence the training process ends puted. Recall that in CHESS-BSV(3) an abductive proof is
up being biased on the obscuring process, which, in turn, distinct for each label (i.e., mate, draw, safe), whereas in
depends upon the initial weights of the neural module. CHESS-ISK(3) and CHESS-NGA(3) an abductive proof is
(C3) The average training time of N EURO L OG may be distinct for each combination of label and side-information.
significantly less than that of the other systems. For example, Once the bulk of the distinct abductive proofs is computed
the average total training time is: 16m47s for N EURO L OG and cached, the training time per iteration drops. Unsurpris-
in MATH(5) versus 22m48s for D EEP P ROB L OG in the ingly, this initial phase is longer for the CHESS-NGA(3) sce-
simpler MATH(3) scenario; 42m93s for N EURO L OG in nario, where the distinct abductive proofs are more, as they
PATH(6) versus D EEP P ROB L OG and N EUR ASP timing out depend on a more variable space of side-information.
in the simpler PATH(4) scenario; 16m for N EURO L OG in The average end accuracy for the CHESS-?(3) scenarios
CHESS-ISK(3) versus 125m for ABL in the same scenario. is comparable; see Table 1. The average interim accuracy
With regards to ABL, its high training time may be at- of CHESS-NGA(3) is, however, relatively lower during early
tributed to its trial-and-error use of abduction. At each train- training, where the neural module predictions are still highly
ing iteration, an optimization process obscures and performs noisy / random. Specifically, the average accuracy at 1000
abduction multiple times over different subsets of the train- iterations is: 73.9 ± 1.5 for CHESS-BSV(3), 73.4 ± 5.2 for
ing samples. It holds, in particular, that although ABL com- CHESS-ISK(3), 51.1 ± 7.9 for CHESS-NGA(3).
putes a single abductive proof per training sample, it may Scalability: Computing abductive proofs is intractable
perform abduction multiple times for the same sample. (NP-hard to decide their existence; #P-hard to enumer-
With regards to N EUR ASP, its high training time may ate/count them). Neural-guided abduction reduces this cost
be attributed to the grounding that N EUR ASP applies on in practice by excluding irrelevant proofs, but the problem
the theory; i.e., computing all the consequences that are se- remains worst-case hard. However, since abductive proofs
mantically entailed. Instead of computing all such forward- are a function of only the sample label and side-information,
reasoning consequences, abduction is driven by the sample N EURO L OG can cache and reuse them across different train-
label, and evaluates (and grounds) only the relevant part ing samples, showing that in practice our approach can be
of the theory. It is worth noting, however, that N EUR ASP more computationally efficient than prior art, e.g., ABL.
5057
accuracy % / iter. ADD2x2 APPLY2x2 DBA(n) MATH(3)
Figure 3: Empirical results for ABL. Solid lines and lightly-colored areas show, respectively, the average behavior and the vari-
ability of the behavior across different repetitions. Dashed lines show the final average accuracy of N EURO L OG as reported in
Figure 2. Due to the different training regimes of ABL and N EURO L OG, an iteration-by-iteration comparison is not meaningful.
5058
Neural Theorem Prover (Rocktäschel and Riedel 2017) is d’Avila Garcez, A. S.; Broda, K.; and Gabbay, D. M. 2002.
an alternative to Prolog’s QA engine to support noisy theo- Neural-symbolic learning systems: foundations and appli-
ries. It proceeds by embedding predicates and constants into cations. Perspectives in neural computing. Springer.
a vector space and uses vector distance measures to com-
Donadello, I.; Serafini, L.; and d’Avila Garcez, A. S. 2017.
pare them. Neural Logic Machines (Dong et al. 2019) im-
Logic Tensor Networks for Semantic Image Interpretation.
plements rules inside a tensor network providing thus the
CoRR abs/1705.08968.
ability to reason uniformly over neural modules and logi-
cal theories. However, its semantics is not connected to any Dong, H.; Mao, J.; Lin, T.; Wang, C.; Li, L.; and Zhou, D.
logic semantics (e.g., Tarski, Sato, or fuzzy) and no soft or 2019. Neural Logic Machines. In ICLR.
hard constraints are imposed at inference time. Evans, R.; and Grefenstette, E. 2018. Learning Explanatory
Rules from Noisy Data. Journal of Artificial Intelligence
Conclusion Research 61: 1–64.
We have introduced a compositional framework for neural- Gaunt, A. L.; Brockschmidt, M.; Kushman, N.; and Tarlow,
symbolic integration that utilizes abduction to support a uni- D. 2017. Differentiable Programs with Neural Libraries. In
form treatment of symbolic modules with theories beyond ICML, 1213–1222.
any specific logic, or a declarative representation altogether.
Our empirical results have demonstrated not only the prac- Hölldobler, S.; Störr, H.-P.; and Kalinke, Y. 1999. Approxi-
tical feasibility of this perspective, but also its superior per- mating the Semantics of Logic Programs by Recurrent Neu-
formance over state-of-the-art approaches in terms of cross- ral Networks. Applied Intelligence 11: 45–58.
domain applicability, testing accuracy, and training speed. Kakas, A. C. 2017. Abduction. In Sammut, C.; and Webb,
Two are the key directions for future work: (i) further con- G. I., eds., Encyclopedia of Machine Learning and Data
sideration of the use of non-logic or non-declarative theories Mining, 1–8. Boston, MA: Springer US.
for the symbolic module; (ii) explicit treatment of symbolic-
module learning, which, unlike program induction, will not Kalyan, A.; Mohta, A.; Polozov, O.; Batra, D.; Jain, P.; and
delegate the burden of learning to the neural module. With Gulwani, S. 2018. Neural-Guided Deductive Search for
respect to the latter direction, in particular, the considera- Real-Time Program Synthesis from Examples. In ICLR.
tion of human-in-the-loop learning paradigms (such as the LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998.
Machine Coaching paradigm (Michael 2019), for example) Gradient-Based Learning Applied to Document Recogni-
would present an interesting challenge for neural-symbolic tion. Proceedings of the IEEE 86(11): 2278–2324.
integration systems, bringing into focus the issue of learning
in a manner that is cognitively-compatible with humans. Manhaeve, R.; Dumancic, S.; Kimmig, A.; Demeester, T.;
and De Raedt, L. 2018. DeepProbLog: Neural Probabilistic
Acknowledgements Logic Programming. In NeurIPS, 3749–3759.
This work was supported by funding from the EU’s Hori- Marra, G.; Giannini, F.; Diligenti, M.; and Gori, M. 2019.
zon 2020 Research and Innovation Programme under grant Integrating Learning and Reasoning with Deep Logic Mod-
agreements no. 739578 and no. 823783, and from the Gov- els. CoRR abs/1901.04195.
ernment of the Republic of Cyprus through the Directorate Michael, L. 2019. Machine Coaching. In IJCAI 2019 Work-
General for European Programmes, Coordination, and De- shop on Explainable Artificial Intelligence (XAI), 80–86.
velopment. The authors would like to thank Antonis Kakas
for help with the abduction system used in this work. Nuffelen, B. V.; and Kakas, A. 2001. A-system: Declarative
Programming with Abduction. In Logic Programming and
References Nonmotonic Reasoning, 393–397.
Balog, M.; Gaunt, A. L.; Brockschmidt, M.; Nowozin, S.; Parisotto, E.; Mohamed, A.; Singh, R.; Li, L.; Zhou, D.; and
and Tarlow, D. 2017. DeepCoder: Learning to Write Pro- Kohli, P. 2017. Neuro-Symbolic Program Synthesis. In
grams. In ICLR. ICLR.
Bošnjak, M.; Rocktäschel, T.; Naradowsky, J.; and Riedel, S. Rocktäschel, T.; and Riedel, S. 2017. End-to-end Differen-
2017. Programming with a Differentiable Forth Interpreter. tiable Proving. In NIPS.
In ICML, 547–556. Sadeghian, A.; Armandpour, M.; Ding, P.; and Wang, D. Z.
Chavira, M.; and Darwiche, A. 2008. On probabilistic in- 2019. DRUM: End-To-End Differentiable Rule Mining On
ference by weighted model counting. Artificial Intelligence Knowledge Graphs. In NeurIPS, 15321–15331.
172(6): 772 – 799. Serafini, L.; and d’Avila Garcez, A. S. 2016. Logic Tensor
Dai, W.-Z.; Xu, Q.; Yu, Y.; and Zhou, Z.-H. 2019. Bridg- Networks: Deep Learning and Logical Reasoning from Data
ing Machine Learning and Logical Reasoning by Abductive and Knowledge. CoRR abs/1606.04422.
Learning. In NeurIPS, 2815–2826. Sourek, G.; Aschenbrenner, V.; Zelezný, F.; and Kuzelka,
Darwiche, A. 2011. SDD: A New Canonical Representation O. 2015. Lifted Relational Neural Networks. CoRR
of Propositional Knowledge Bases. In IJCAI, 819–826. abs/1508.05128.
5059
Stallkamp, J.; Schlipsing, M.; Salmen, J.; and Igel, C. 2011.
The German Traffic Sign Recognition Benchmark: A multi-
class classification competition. In IEEE International Joint
Conference on Neural Networks, 1453–1460.
Sun, H.; Dhingra, B.; Zaheer, M.; Mazaitis, K.; Salakhutdi-
nov, R.; and Cohen, W. 2018. Open Domain Question An-
swering Using Early Fusion of Knowledge Bases and Text.
In EMNLP, 4231–4242.
Thoma, M. 2017. The HASYv2 dataset. CoRR
abs/1701.08380.
Towell, G. G.; and Shavlik, J. W. 1994. Knowledge-based
artificial neural networks. Artificial Intelligence 70(1): 119
– 165.
van Krieken, E.; Acar, E.; and van Harmelen, F. 2019. Semi-
Supervised Learning using Differentiable Reasoning. IF-
CoLog Journal of Logic and its Applications 6(4): 633–653.
Wang, P.; Donti, P. L.; Wilder, B.; and Kolter, J. Z. 2019.
SATNet: Bridging deep learning and logical reasoning using
a differentiable satisfiability solver. In ICML.
Xu, J.; Zhang, Z.; Friedman, T.; Liang, Y.; and Van den
Broeck, G. 2018. A Semantic Loss Function for Deep
Learning with Symbolic Knowledge. In ICML, 5502–5511.
Yang, F.; Yang, Z.; and Cohen, W. W. 2017. Differentiable
Learning of Logical Rules for Knowledge Base Reasoning.
In NeurIPS, 2319–2328.
Yang, Z.; Ishay, A.; and Lee, J. 2020. NeurASP: Embracing
Neural Networks into Answer Set Programming. In IJCAI,
1755–1762.
5060