Linear and Nonlinear Programming

Linear Algebra for Neural Networks This technique can be expected to matrix and vector functions.
It involves the notion of gradient and Hessian. Now a vector function f (x) is expressed as: f (x) l f (a)jf (xka)T ]f(a) jf (xka)T ]#(a) f (xka)j . f # approximation but necessitates more computation. Here the correction for iteration nj1 is w[n+ ] l w[n]j l w[n]k(H")(]f(w)) " (where ]f(w)is computed for w[n]). (36)
(31)
See also: Articial Neural Networks: Neurocomputation; Hebb, Donald Olding (190485); Neural Networks, Statistical Physics of; Perceptrons; Statistical Pattern Recognition
4.3 Iterati e Minimization A learning rule can be shown to converge to an optimum if it diminishes the value of the error function at each iteration. When the gradient of the error function can be evaluated, the gradient technique (or steepest descent) adjusts the weight vector by moving it in the direction opposite to the gradient of the error function. Formally, the correction for the (nj1)-th iteration is w[n+ ] l w[n]j l w[n]k]f(w) (32) " (where ]f(w)is computed for w[n]). As an example, let us show that for a linear heteroassociator, the WidrowHo learning rule minimizes iteratively the squared error between target and output. The error function is e# l (tko)# l t#jo#k2to l t#jxTwwTxk2twTx. The gradient of the error function is ce l 2(wTx)xk2tx lk2(tkwTx)x. cw
Bibliography
Abdi H 1994 Les ReT seaux de Neurones. PUG, Grenoble, France Abdi H, Valentin D, Edelman B 1999 Neural Networks. Sage, Thousand Oak, CA Bishop C M 1995 Neural Network for Pattern Recognition. Oxford University Press, Oxford, UK Ellacott S, Bose D 1996 Neural Networks: Deterministic Methods of Analysis. International Thomson Computer Press, London Hagan M T, Demuth H B, Beale M 1996 Neural Networks Design. PWS, Boston Haykin S 1999 Neural Networks: A Comprehensi e Foundation, 2nd edn. Prentice Hall, Upper Saddle River, NJ Reed R D, Marks R J 1999 Neural Smithing. MIT Press, Cambridge, MA Ripley B D 1996 Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, MA Rojas R 1996 Neural Networks. Springer-Verlag, New York
(33)
H. Abdi
Linear and Nonlinear Programming

(34) Linear and nonlinear programming refers, broadly speaking, to the area of applied mathematics dealing with the following problem: nd numerical values for a given set of variables so that they are feasible (i.e., they satisfy certain constraints, typically given by equalities or inequalities) and also a certain criterion, called objecti e function, which depends on such variables, is optimized, that is it attains its minimum value among all the combination of feasible variables. As an example, we describe in Section 3 the diet problem, consisting of nding a combination of ingredients which satisfy nutritional requirements at minimum cost. Unlike classical problems of applied mathematics, most of which originate in physics, linear and nonlinear programming problems generally lack solutions given by closed formulae, and must be solved through numerical procedures, called algorithms, performed on computers. Dierent features of the problem data (e.g., linearity or nonlinearity of the objective function and the constraints) call for dierent methods in order to achieve ecient solution of the problem. The design of such algorithms, and the analysis of their performance,
The weight vector is corrected by moving it in the opposite direction of the gradient. This is obtained by adding a small vector denoted w opposite to the gradient. This gives the following correction for iteration nj1: ce w[n+ ] l w[n]jw l w[n]k " cw l w[n]j(tkwTx)x l w[n]j(tko)x. (35) This gives the rule dened by Eqn. 9. The gradient method works because the gradient of w[n] is a rst order Taylor approximation of the gradient of the optimal weight vector w. It is a favorite 4 technique in neural networks because the popular error backpropagation is a gradient technique. Newtons method is a second order Taylor approximation, it uses the inverse of the Hessian of w (supposing it exists). It gives a better numerical 8868
Linear and Nonlinear Programming are the backbone of the theoretical side of the area. On the practical side, the ability of these methods to solve very large problems (i.e., with a large number of variables) has allowed for the modeling of highly realistic and detailed real life situations, so that nowadays these methods are routinely applied to the day-to-day execution of complex tasks in a wide range of activities, like oil reneries, power stations, airlines and many others. At that time, the mathematization of economics was at a rather initial stage, and most mathematical results oered only new qualitative insights, rather than eective procedures for nding numerical solutions of economic problems, because of the complicated nature of the modeling tools, nonlinear in general. LP and SIMPLEX allowed actual computation of solutions for any problem where an economic agent (a rm, for instance) has to choose the most ecient among many possible courses of action, provided that all the relevant variables could be connected through linear relations, such as the input-output matrices proposed in the 1930s by V. Leontie, which enter the LP formulation as the matrix A of Eqn. (5) below. This situation gave rise to a new area of economic research, called activity analysis (Koopmans 1951, Gale 1960) which involved many of the most distinguished economists at the time, such as T. Koopmans, K. Arrow, D. Gale, and others. The inuence of LP en economics was reinforced by the introduction of the duality theory of LP. Shortly after Dantzig obtained his rst results on LP, he showed them to one of the leading mathematicians of the time, von Neumann, who was then developing another new branch of applied mathematics, namely game theory (Von Neumann and Morgenstern 1944). Observing immediately that a certain class of games tted the LP format, Von Neumann suggested the extension to LP of a game theoretic result, giving birth to LP duality, which, given an LP problem (e.g., Eqns. (2)(3) below), associates with it another problem, called dual, which shares exactly the same data c, b, and A, but arranged in a dierent way. Duality plays a very signicant role in economic models, where the variables of the dual problem, called shadow prices, have an interesting economic interpretation. The DantzigWolfe decomposition method (Dantzig and Wolfe 1960), which was a SIMPLEX-based method for solving certain special instances of LP problems, but which could also be seen as a model of a central planning procedure combined with autonomous decision by lower level agents, reinforced this impact of LP on economics, which produced perhaps, as a negative side eect, exaggerated attempts to force the modeling of nonlinear economic phenomena as linear ones. The impact of LP on mathematics was also signicant, and contributed to reinforcing the status of the new computational mathematics viz-a-viz the classical branches (e.g., analysis, geometry, algebra). Its success resulted from its ability to oer SIMPLEX as an ecient computational procedure and the LP duality theory as a theoretical counterpart. It should be mentioned that the mathematical tools required by the LP theory are rather elementary, and the theory could easily have been developed more than a century before it appeared. In fact, for the cases in which no inequalities are present (e.g., Eqns. (4)(5) without Eqn. (6), but with a nonlinear f instead of cx) the theory for the resulting equality constrained mini8869
1. Linear Programming: History and Rele ance

Linear Programming (LP) was one of the rst achievements of Operations Research, which can be described as an interphase between mathematics, engineering, and economics, or as the formulation and implementation of mathematical models of complex systems. When the American army was confronted with the logistics involved in the World War II eort, it became clear that traditional allocation procedures were insucient. On the other hand, computers, the rst of which were being developed at the same time, oered the promise of actual solutions of many previously intractable problems. The US Air Force, which needed to deliver in time up to 20,000 dierent items to bases all around the world, created an Operations Research oce, staed with several bright and young mathematicians, many of whom continued their research on the same topic after the war in the Rand Corporation. One of them was G. Dantzig, who chose the LP problem, as formulated in Section 2, as an object deserving careful study. This choice was particularly succesful. Several special cases, such as transportation problems, or minimum cost ow problems, which correspond to LP problems where the matrix A of Section 2 has a particular structure (e.g., each aij is either 0 or 1) had been or were being studied at the time, and ecient methods for their solution were being introduced. For the transportation problem, the work in 1939 by the Russian mathematician L. V. Kantorovich (1960) clearly anticipated Dantzigs LP theory, but it became known in the West much later. The methods for these special instances were, however, rather dierent from each other and hardly generalizable, while, on the other hand, the LP format comprehended all of them, and seemed in fact rather universal, in the sense that most problems in economics could be cast in this setting, at the cost of approximating nonlinear functions by linear ones (see Section 4). At the same time, the LP problem was not too general, in the sense that it was possible to devise an ecient method for the numerical solution of all its instances, provided their data were not too large, using the then available computational equipment. This procedure, called the SIMPLEX method (see Section 3), was introduced by Dantzig in 1947 (Dantzig 1949), and it had an enormous impact in several research elds, among them economics.
Linear and Nonlinear Programming mization problem had been started by Lagrange at the beginning of the nineteenth century, and thousand of papers had been written on it, while only a handful of them had been devoted to the subject of linear inequalities, like Eqn. (3) or Eqn. (6), before 1945 (Dantzig 1963). A very likely reason for this oversight lies in the fact that the motivation for the study of minimization problems had been, for centuries, mainly research in physics, studying systems which move along deterministic trajectories, which are usually expressed by equalities. Inequalities, on the other hand, are typical of economic situations where agents are constantly confronting options for possible action rather than following xed trajectories determined by initial conditions. The low level of formalization of economics up to the 1940s, combined with the impossibility of developing at that time numerical methods, in the absence of computers, kept the eld of linear inequalities outside mainstream mathematics for a century and a half. meaning that the m inequalities, Eqn. (1), are satised with x* instead of xi (1 i n), and that i c x* jc x* j cnx* c x jc x j c n x n for all n" " #x # (x , ,nx ) "which# also satises the m " # vector l i n inequalities in Eqn. (1). In short mathematical notation, this is written as min cx s.t. Ax b. (2) (3)
An LP problem presented as Eqns. (2)(3) is said to be in canonical form. There are other forms which are equi alent to it, in the sense that a solution of the canonical form allows one to obtain in an immediate way a solution of the equivalent form and conversely. One such form consists of inverting all the inequalities in (1) ( instead of ). The most important equivalent form is the standard one, where the unknowns are required to be nonnegative (x 0, xn 0) and the inequalities in Eqn. (1) are " changed into equalities, which is denoted in short notation as min cx s.t. Ax l b x 0. (4) (5) (6)
2. Linear Programming: Denition and Example

The LP problem is formulated as follows. The data consists of (a) a string c and n real numbers: c , , cn (an n" vector c); (b) a string b of m real numbers: b , , bm (an m" vector b), and (c) a rectangular array A of mn real numbers (an min-matrix A): a ,a , , a n "" "# " a ,a , , a n #" ## # a m ,a m , , a mn. " # The unknowns to be determined form an n-vector x l ( x , , xn). The objecti e function f is dened as " f (x) l c x jc x j(jc n xn. " " # # The feasible region F consists of all the n-vectors x l ( x , , xn) such that " a x ja x j(ja nx n b "" " "# # " " a x ja x j(ja nx n b # #" " ## # # a m x ja m x j(ja mnx n b m. " " # #
We present next as an example a particular instance of the so called diet problem, which is a case of the LP in canonical form with reversed inequalities. The manager of a poultry ranch has three alternative chicken food products with prices c , c , and c per pound. Let " # $ a , a , and a be the protein content of a pound of "" "# products 1, 2, "$ 3 respectively, and a , a , and a and #" ## #$ the corresponding mineral contents per pound. If b " and b are the minimal amounts of protein and # required for every chicken in the ranch, then minerals the inequalities a x ja x ja x b "" " "# # "$ $ " a x ja x ja x b #" " ## # #$ $ # mean that an amount of x pounds of product 1, x # pounds of product 2, and" x pounds of product 3 $ satisfy such minimal nutritional requirements, and a solution x* l (x* , x* , x* ) of the diet problem " # $ min cx s.t. Ax b "
(1)
The problem consists of nding an n-vector x* l (x* , , x* ), belonging to F, which minimizes f (x) in F, n " 8870
indicates the amounts of each product which satisfy the nutritional requirements at the lowest possible cost.
3. Linear Programming: SIMPLEX and other Computational Methods

The rst signicant result in LP theory concerns the feasible region F, which, for a problem in standard form, consists of those n-vectors x which satisfy Eqns. (5)(6). When n l 3, such set is a polyhedron in the usual sense. For larger values of n, it is still what mathematicians call an n-dimensional polyhedron, and the result says that if the LP problem has solutions, then at least one ertex of the polyhedron F is a solution. Also, assuming that m n and that nothing in Eqn. (5) is superuous, vertices of F are precisely the n-vectors obtained as follows: choose nkm components of the vector x and give them the value 0. Look at Eqn. (5) as a system of m equations but now in m unknowns, namely the components of x which have not been set to 0, or, in other words, form an mimmatrix B with the m colums of the matrix A correspoding to the nonzeroed variables, and look at the system Bx l b. If such a system has a solution x4 with nonnegative components, the resulting vector x (formed by the components of x4 plus the zeroes in the remaining components) is a vertex of F. These facts provide immediately the following brute force procedure for solving LP problems: consider consecutively each one of the possible choices of m out of the n original colums of A. Check whether each choice gives raise to a vertex x, , and if it does, evaluate the objective function at x, , that is, compute f (x, ) l n cjx, j. Comj=" pare all the values so obtained and pick up the vertex x* with the lowest value for f (x*). The drawback is that the number of choices of m objects out of n grows very fast with m and n. For instance, if we take an LP problem with n unknowns and the following 2n inequalities xi kxi ui kli (1 (1 i i n) n) (7) (8)
meaning just that each variable xi must be between a lower bound li and an upper one ui, the resulting polyhedron F has 2n vertices, resulting from all the possible choices of xi l li or xi l ui for each of the n variables. Now, 2n grows very fast: for n l 100, checking all the vertices exceeds the capabilities of any existing computer. The SIMPLEX methods, instead of checking all the vertices, proceeds as follows. It constructs one vertex, say x, , and checks whether it is or not a solution (which is a rather easy task). If it is not, it does not proceed to just any other vertex, but to one which is adjacent to x, , which mathematically means that it shares with x, all but one of its nkm zero components. There are at most n adjacent vertices (corresponding to all possible choices of the nonshared column) and at least one, say x# , has an objective value f (x# ) f (x, ). Then x# is taken as the new candidate, and, if it is not a solution, then the process is repeated with vertices adjacent to x# , and so on, until a solution is
found, which is guaranteed to happen. How long does this procedure take? More than 50 years of practice show that, almost always, the number of visited vertices is not too high; generally less that m log n, which is negligible as compared to, for example 2n (rigorous versions of this statement on the average performance of SIMPLEX can be found in Borgwardt 1987 and Smale 1983). However, Klee and Minty (1972) found a highly articial example, with a carefully chosen objective function f, and a feasible set F obtained as a slight variation of the one given by Eqns. (7)(8), for which SIMPLEX goes through all the 2n vertices before nding the solution, which is just the last one. Despite Klee and Mintys example, SIMPLEX was absolutely superior to any other known procedure for solving LP problems, in terms of computational performance. Thus, the following theoretical issue was posed: does it exist a method for LP which is ne er too slow, for instance such that the number of arithmetic operations required to solve any particular LP problem does not exceed a xed power of the size of the problem (understood as some measure of the size of c, b, and A)? SIMPLEX is not up to this requirement, because it needs more than 2n operations to solve Klee and Mintys example, and 2n exceeds any xed power of n, which can be taken as a measure of the size of this specic LP problem. Methods with such property were called polynomial. The fact that no polynomial method for LP had been found in the rst 30 years after the introduction of SIMPLEX, made most mathematicians think that no such method could ever be devised, and therefore it was a surprise when Kachyan presented such a procedure in 1979 (Kachyan 1979). Unfortunately, Kachyans method, called the elipsoidal one, turned out to be far worse than SIMPLEX in almost all problems (though, of course, much better in the very rare cases in which SIMPLEX takes almost forever, like in Klee and Mintys example). Thus, the elipsoidal method aroused only theoretical interest, and did not replace SIMPLEX in any real life application. Nevertheless, existence of a polynomial method (albeit inecient in the average) encouraged research attempting to nd a procedure which would be both polynomial and ecient in practice (i.e., good both in the average cases and in the worst ones). Such a method was discovered by Karmarkar in 1984 (Karmarkar 1984), and extensive progress has been achieved in this class of methods for LP (called interior point ones) since the mid-1980s (Roos et al. 1997). They dier from SIMPLEX in the fact that, instead of considering only vertices, which lie in the boundary of the feasible polyhedron F, all the intermediate vectors, which are tested as candidates for solutions, are taken in the interior of this polyhedron. Thus, these methods fully neglect the combinatorial nature of LP problems (i.e., the facts that for an LP problem with solutions there exists a vertex which is a solution, and that the number of vertices is nite). Interior point methods are 8871
Linear and Nonlinear Programming quite ecient, but none of the many of those which have been proposed turned out to be universally superior to all others (nor to SIMPLEX, for that matter). Currently, the situation in LP, regarding computational methods, is similar to most other applied mathematics areas: instead of a method universally superior to all others, as was the case for LP between 1947 and 1984, a variety of methods are available, some of which are clearly superior, in terms of computational performance, for some specic cases of LP problems, but inferior to at least another one for other types of LP problems. D; in fact, no known method is able to provide exact solutions for an arbitrary quadratic programming problem. Another very important theoretical issue refers to necessary and\or sucient conditions on an n-vector x* to be a solution of Eqns. (9)(10). In the case of equality constraints, that is, when all the inequalities in Eqn. (10) are replaced by equalitiessuch a condition was known in the early nineteenth centuryif we denote as ] f (x) the n-vector whose j-th component is the derivative of f with respect to its j-th variable at the n-vector x, and we use the same denition for ] gi(x) (1 i m), then the so called Lagrangian condition states that ] f (x) l u ] g (x)j(jum ] gm (x) (12) " " where the real numbers u , , um are called Lagrangian " multipliers. In fact, if x solves Eqns. (9)(10), then Lagrange multipliers do exist. Conversely, existence of Lagrange multipliers for a given n-vector x*, together with an additional condition involving second order derivatives of f and the gis at x*, guarantee that x* is a local solution of the problem, meaning that f (x*) f (x) for all x satisfying Eqn. (10) and close enough to x* (i.e., such that all the dierences x*kxi i are small enough). The extension of the Lagrangian condition to the case of inequality constraints was the rst theoretical achievement of NLP. The resulting conditions, called KarushKuhnTucker ones, were published by Kuhn and Tucker (1951), and had been anticipated in Karushs MSc dissertation. They state that the Lagrange multipliers, besides satisfying the Lagrangian condition, Eqn. (12), must be nonnegative, and also that ui l 0 if gi(x) 0. Also, an additional condition, called constraint qualication, must be imposed on the feasible region, that is, on the set of n-vectors which satisfy Eqn. (10). The 150 years between the establishment of the conditions for the equality and the inequality constrained cases is again just a consequence of classical mathematicians lack of interest in inequalities, as discussed above. A third important theoretical issue was the extension to NLP of the duality theory of LP. This was achieved only in the con ex case (Rockafellar 1970) that is, when both f and all the gis are convex. We recall that a function h is convex when h ( xj(1k)y) h (x)j(1k)h ( y)
4. Nonlinear Programming: Theory and Rele ance

As soon as LP was born, it was clear to its inventors and practitioners that linearity of the data severely limited the ability to model most real life systems, and therefore, already in the late 1940s, attention was given to the nonlinear programming problem (NLP), which generalizes LP through the admission of rather arbitrary functions, instead of linear ones, both in the objective and in the constraints, so that its mathematical formulation can be given as min f (x , , xn) (9) " s.t. g (x , , xn) 0, , gm (x , , xn) 0. (10) " " " Here, both f and the gis (1 i m) are functions which assign to each n-vector x l (x , , xn) a real " number denoted as f (x , , xn) and gi(x , , xn) re" " spectively. This assignment is generally nonlinear, in the sense that it might involve, for example powers of the variables, x , , xn larger than 1. " The rst theoretical dierence with LP arises from the fact that generally it is impossible to devise procedures able to produce, after a nite number of arithmetic operations, an exact solution of the problem. Thus, the general feature of an NLP method is that it generates a sequence of n-vectors (to be denoted as x!, , xk, ), starting from a given n-vector x!. In general, none of the xk,s will be a solution, but for large enough k, each component of the vector xk should be very close to the corresponding component of a solution of the problem. An exception to this situation is the special case of quadratic programming (Boot 1964), in which all the gis are linear (i.e., as in Eqn. (1)) and the objective function f is quadratic, that is f (x , , xn) l d jk xj xkj cjxj (11) " j=" k=" j=" where the real numbers djk form an nin-matrix D. For this case there exist several extensions of the SIMPLEX method for LP, which produce exact solutions, but with the following limitation: each one of these methods works for a certain class of matrices 8872
n n n
for all n-vectors x and y and all real number between 0 and 1. Though NLP is used extensively in many real-life applications, it is much less widespread than LP. This is due to the facts that NLP lacks an universal method, like SIMPLEX for LP, so that the procedure to be used must be chosen according to the nature of the specic problem, and that NLP methods in general do not work as black boxes, fed with the problem data and handing over the solution, but rather must be ne
Linear and Nonlinear Programming tuned, choosing appropriate parameters, properly interpreting the output of the method, etc. Thus, an expert in Operations Research, unavailable in most moderate sized institutions, is required for successful implementation and solution of NLP problems. On the other hand, the impact of NLP on mathematics itself has been much deeper than the one of LP. Important results in several classical areas, like convex analysis and variational analysis, have been drastically improved as a consequence of the development of NLP theory (see Rockafellar 1970, Rockafellar and Wets 1998). problem and approximate Lagrange multipliers, as dened by (12). Ecient implementations of any of the methods above usually require knowledge of the rst and second derivatives of f and the gis. Also, excepting in the convex case, the generated n-vector approximate in general just local solutions, as discussed in Section 5. Recent research in NLP has attempted to overcome these two limitations. Nonsmooth optimization is devoted to devising methods which work for problems whose data functions do not have derivatives (Clarke 1983), while global optimization addresses the issue of nding global solutions, that is, n-vectors x* satisfying Eqn. (10) and such that f (x*) f (x) for all other nvector x which also satises Eqn. (10), and not only for n-vectors close to x* (Horst and Tuy 1993). See also: Linear Algebra for Neural Networks; Linear Hypothesis
5. Nonlinear Programming: Methods

The simplest case to be considered is the unconstrained one, where the conditions in Eqn. (10) are absent, and the problem consists just of minimizing the function f. Most methods for this case generate sequences of nvectors x!, , xk, which approximate a local solution of the problem. The simplest ones, which were known in the nineteenth century, though they only became signicant with the advent of computers, are Cauchys methods, or steepest descent, and Newtons method. Newtons method generally requires less iterations than Cauchys for attaining a certain precision (i.e., a suciently good approximation xk will be achieved with a lower value of k) but each of its steps is more time consuming (see Mangasarian 1969). More modern methods for unconstrained minimization include quasi-Newton ones, which combine the best features of Cauchys and Newtons methods (Dennis and More! 1977), and trust region methods (Dennis and Schnabel 1983). In the constrained case, the constraints gi(x) 0 (1 i m) are present, and most methods reduce the original problem Eqns. (9)(10) to a sequence of simpler problems, each one of which has to be solved by another appropriate procedure. Most methods can be seen as belonging to one of the three following families: (a) Feasible methods: each step consists of two stages. In the rst one the constraints are ignored and an unconstrained method is used to reduce the objective f; in the second one only the constraints in Eqn. (10) are taken into account (see Avriel 1976). (b) Sequential quadratic programming: in these methods, at each step the problem is approximated by a quadratic programming problem, which can then be solved with one of the exact methods mentioned in section 5 (see Gomes et al. 1999). (c) Penalty methods: In this case, the original problem Eqns. (9)(10) is replaced by a sequence of unconstrained problems, each one of which is then solved with some method for unconstrained optimization. Within this family of methods, special attention has been given to the so called augmented Lagrangian methods (see Bertsekas 1982), devised to nd at the same time approximate solutions to the
Bibliography
Avriel M 1976 Nonlinear Programming, Analysis and Methods. Prentice Hall, Upper Saddle River, NJ Bertsekas D P 1982 Constrained Optimization and Lagrange Multipliers. Academic Press, New York Boot J C G 1964 Quadratic Programming. North Holland, Amsterdam Borgwardt K H 1987 The Simplex Method: A Probabilistic Analysis. Springer, Berlin Clarke F H 1983 Optimization and Nonsmooth Analysis. Wiley, New York Dantzig G B 1949 Programming of independent activities II, mathematical model. Econometrica 17: 20011 Dantzig G B 1963 Linear Programming and Extensions. Princeton University Press, Princeton, NJ Dantzig G B, Wolfe P 1960 Decomposition principle for linear programs. Operations Research 8: 10111 Dennis J E, More! J J 1977 Quasi-Newton methods: Motivation and theory. SIAM Re iew 19: 4689 Dennis J E, Schnabel R B 1983 Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall, Upper Saddle River, NJ Gale D 1960 The Theory of Linear Economic Models. McGrawHill, New York Gomes F M, Maciel M C, Mart! nez J J 1999 Nonlinear programming algorithms using trust regions and augmented Lagrangians with nonmonotone penalty parameters. Mathematical Programming 84: 161200 Hillier F S, Lieberman G J 1980 Introduction to Operations Research. Holden Day, San Francisco Horst R, Tuy H 1993 Global Optimization. Deterministic Approaches. Springer, Berlin Kantorovich L V 1960 Mathematical methods in the organization and planning of production. Management Science 6: 366422 Karmarkar N A 1984 A new polynomial time algorithm for linear programming. Combinatorica 4: 37395 Khachiyan L G 1979 A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR 244: 10936 Klee V, Minty G J 1972 How good is the simplex algorithm. In: Shisha O (ed.) Inequalities-III. Academic Press, New York pp. 15975
8873

Koopmans T (ed.) 1951 Acti ity Analysis of Production and Allocation. Wiley, New York Kuhn H W, Tucker A W 1951 Nonlinear programming. Econometrica 19: 501 Mangasarian O L 1969 Nonlinear Programming. McGraw-Hill, New York Rockafellar R T 1970 Con ex Analysis. Princeton University Press, Princeton, NJ Rockafellar R T, Wets R J-B 1998 Variational Analysis. Springer, Berlin Roos C, Terlaky T, Vial J-Ph 1997 Theory and Algorithms for Linear Optimization: An Interior Point Approach. Wiley, New York Smale S 1983 On the average number of steps of the simplex method of linear programming. Mathematical Programming 27: 24162 Von Neumann J, Morgenstern O 1944 Theory of Games and Economic Beha ior. Princeton University Press, Princeton, NJ
fathers and mothers education, family size and intactness, home items, and percent of fathers who are white collar. Sta salaries per pupil is x . The average # score on a verbal test given to the schools teachers is x . Denoting dierent schools using the subscript i and $ the mean of yi by mi, a linear hypothesis states that for some unknown numbers (parameters) , , , ! " # and , $ mi l j xi j xi j xi ! " " " # $ $ Mosteller and Tukey (1977, pp. 326, 566) and Mosteller et al. (1983, pp. 40820) give excerpts and analysis of the data. In other applications, the predictors only identify whether an observation is in some group. For example, in 1978 observations yij were collected on the age at which people in Albuquerque committed suicide, see Koopmans (1987, p. 409). Here i is used to identify the persons group membership (Hispanic, Native American, non-Hispanic Caucasian), and j identies individuals within a group. The three categories are taken to be mutually exclusive for the present discussion (although the US government now allows for individuals to identify themselves with multiple races in various surveys and the decennial census). We can dene group identier predictor variables. Let i take " the value 1 if an individual belongs to group 1 (Hispanic) and 0 otherwise, with similar predictors to identify other groups, say i and i for Native $ Americans and non-Hispanic # Caucasians. Note that the predictor variables do not depend on the value of j identifying individuals within a group. Denoting the mean of yij by mij, a linear hypothesis states that for some unknown parameters , , , , " # $ mij l j ij ij i " " # # $ $ Since two of the s are always zero, this model is often written more succinctly as mij l ji A linear hypothesis is usually combined with other assumptions about the observations y. Most commonly, the assumptions are that the observations are independent, have the same (unknown) variance #, and have normal (Gaussian) distributions. For the two examples, these assumptions are written yi indep. N(mi, #) and yij indep. N(mij, #)
A. N. Iusem
Linear Hypothesis
1. Introduction
The term linear hypothesis is often used interchangeably with the term linear model. Statistical methods using linear models are widely used in the behavioral and social sciences, e.g., regression analysis, analysis of variance, analysis of covariance, multivariate analysis, time series analysis, and spatial data analysis. Linear models provide a exible tool for data analysis and useful approximations for more complex models. A common object of linear modeling is to nd the most precise linear model that explains the data, to use that model to predict future observations, and to interpret that model in the context of the data collection. Traditionally, analysis of variance models have been used to analyze data from designed experiments while regression analysis has been used to analyze data from observational studies but the techniques of both analysis methods apply to both kinds of data. See also Experimental Design: O er iew and Obser ational Studies: O er iew.
2. Denition
The linear hypothesis is that the mean (average) of a random observation can be written as a linear combination of some observed predictor variables. For example, Coleman et al. (1996) provides observations on various schools. The dependent variable y consists of the average verbal test score for sixth-grade students. The report also presents predictor variables. A composite measure of socioeconomic status x is based on " 8874
where, for example, N(mi, #) indicates a normal distribution with mean mi and variance #. Incorporating these additional assumptions, the linear
Copyright # 2001 Elsevier Science Ltd. All rights reserved. International Encyclopedia of the Social & Behavioral Sciences
ISBN: 0-08-043076-7

Linear and Nonlinear Programming

Uploaded by

Linear and Nonlinear Programming

Uploaded by

Linear Algebra for Neural Networks This technique can be expected to matrix and vector functions.