Find Lu Factorizationof a and Complete Solution to Ax B
Linear Equations and Eigensystems
George Lindfield , John Penny , in Numerical Methods (Fourth Edition), 2019
2.7 LU Decomposition
LU decomposition (or factorization) is a similar process to Gaussian elimination and is equivalent in terms of elementary row operations. The matrix A can be decomposed so that
(2.14)
where L is a lower triangular matrix with a leading diagonal of ones and U is an upper triangular matrix. Matrix A may be real or complex. Compared with Gaussian elimination, LU decomposition has a particular advantage when the equation system we wish to solve, , has more than one right side or when the right sides are not known in advance. This is because the factors L and U are obtained explicitly and they can be used for any right sides as they arise without recalculating L and U. Gaussian elimination does not determine L explicitly but rather forms so that all right sides must be known when the equation is solved.
The major steps required to solve an equation system by LU decomposition are as follows. Since , then becomes
where b is not restricted to a single column. Letting leads to
Because L is a lower triangular matrix this equation is solved efficiently by forward substitution. To find x we then solve
Because U is an upper triangular matrix, this equation can also be solved efficiently by back substitution.
We now illustrate the LU decomposition process by solving (2.10) with . We are not concerned with b and we do not form an augmented matrix. We proceed exactly as with Gaussian elimination, see Table 2.1, except that we keep a record of the elementary row operations performed at the ith stage in and place the results of these operations in a matrix rather than over-writing A.
We begin with the matrix
(2.15)
Following the same operations as used in Table 2.1, we will create a matrix with zeros below the leading diagonal in the first column using the following elementary row operations:
(2.16)
and
(2.17)
Now A can be expressed as the product as follows:
Note that row 1 of A and row 1 of are identical. Thus row 1 of has a unit entry in column 1 and zero elsewhere. The remaining rows of are determined from (2.16) and (2.17). For example, row 2 of is derived by rearranging (2.16); thus:
(2.18)
or
(2.19)
since row 1 of is identical to row 1 of A. Hence row 2 of is .
We now move to the next stage of the decomposition process. In order to bring the largest element of column 2 in onto the leading diagonal we must interchange rows 2 and 3. Thus becomes the product as follows:
Finally, to complete the process of obtaining an upper triangular matrix we make
Hence, becomes the product as follows:
Thus , implying that as follows:
Note that owing to the row interchanges L is not strictly a lower triangular matrix but it can be made so by interchanging rows.
Matlab implements LU factorization by using the function lu and may produce a matrix that is not strictly a lower triangular matrix. However, a permutation matrix P may be produced, if required, such that with L lower triangular.
We now show how the Matlab function lu solves the example based on the matrix given in (2.15):
>> A = [3 6 9;2 5 2;-3 -4 -11]
A =
3 6 9
2 5 2
-3 -4 -11
To obtain the L and U matrices, we must use that Matlab facility of assigning two parameters simultaneously as follows:
>> [L1 U] = lu(A)
L1 =
1.0000 0 0
0.6667 0.5000 1.0000
-1.0000 1.0000 0
U =
3.0000 6.0000 9.0000
0 2.0000 -2.0000
0 0 -3.0000
Note that the L1 matrix is not in lower triangular form, although its true form can easily be deduced by interchanging rows 2 and 3 to form a triangle. To obtain a true lower triangular matrix we must assign three parameters as follows:
>> [L U P] = lu(A)
L =
1.0000 0 0
-1.0000 1.0000 0
0.6667 0.5000 1.0000
U =
3.0000 6.0000 9.0000
0 2.0000 -2.0000
0 0 -3.0000
P =
1 0 0
0 0 1
0 1 0
In the preceding output, P is the permutation matrix such that L*U = P*A or P'*L*U = A. Thus P'*L is equal to L1.
The Matlab operator \ determines the solution of using LU factorization. As an example of an equation system with multiple right sides we solve where
Performing LU decomposition such that gives
Thus is given by
We note that implicitly we have two systems of equations which when separated can be written:
In this example, L is not strictly a lower triangular matrix owing to the reordering of the rows. However, the solution of this equation is still found by forward substitution. For example, , so that . Then . Hence , etc. The complete Y matrix is
Finally solving by back substitution gives
The Matlab function det determines the determinant of a matrix using LU factorization as follows. Since then . The elements of the leading diagonal of L are all ones so that . Since U is upper triangular, its determinant is the product of the elements of its leading diagonal. Thus, taking account of row interchanges the appropriately signed product of the diagonal elements of U gives the determinant.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128122563000117
Computer Solution of Large Linear Systems
In Studies in Mathematics and Its Applications, 1999
Proof
Suppose that there exists an LU factorization. From the proof of proposition 2.6, we have
Therefore,
and we have also
In block form, this is written as:
and from proposition 2.6,
where all the matrices in block position (1, 1) are square of order k. By equating blocks, we have
Therefore, L 1,1 U 1,1 is the LU factorization of the leading principal submatrix of order k of A and L 2,2 U 2,2 is the factorization of the matrix W 2,2 in the bottom right hand corner of A k+1 as we have W 2,2 = L 2,2 U 2,2
Note that det(A) = det(A k+1). We have det(L 1,1) = 1 and det(A 1,1) = det(U 1,1). Since the matrix U 1,1 is upper triangular, its determinant is equal to the product of the diagonal elements. Therefore, for all k,
This shows that the principal minors are non–zero. The converse of the proof is easily derived by induction.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/S0168202499800034
Factorization of the Discretized Problem
Jacques Henry , Angel M. Ramos , in Factorization of Boundary Value Problems Using the Invariant Embedding Method, 2016
5.5 Row permutation
We now wish to consider the effect of block row permutation and show the connection with the continuous calculation in section 3.5. We adopt the conditions of the calculation in section 5.2. Consider the matrix Ah of problem . It is a well-known fact that the Gaussian method with row permutation is theoretically equivalent to a method that first performs all permutations and then performs LU factorization on the permuted matrix. In general, this is only relevant in a theoretical setting because at the k-th step the decision to perform a permutation is made based on the state of the factorization calculation at that step. However, in our case, we have block tridiagonal matrices and there is therefore only one single possible block row permutation at each step: the permutation that moves the sub-diagonal block into the diagonal position. Suppose we decide to perform this permutation systematically. In this case, the first row block ends up in last position, whereas all other blocks "move up" by one step:
[5.29]
The problem that we must solve then becomes:
[5.30]
We apply the method of invariant embedding as in section 5.2. At step i 0 ∈ {2, …, N}, we have:
The solution has an affine dependency on the data and . As the system is upper triangular with − I on the diagonal, this solution can be obtained by simple successive substitution. However, as we want the solution uh to satisfy the condition , the data cannot be independent but must satisfy an affine relation. We write this relation in the form:
[5.31]
We check that Pi and w i + 1/2 are, respectively, solutions of [5.11] and [5.9]. For i 0 = 0, regardless of the value of the data γ 1, we want . We therefore deduce that:
[5.32]
Consider the solution of [5.30] corresponding to the data . We have:
[5.33]
because is a solution of the same problem of dimension (i 0 − 2) (p − 1) for the data and . By subtracting [5.33] from [5.31]:
[5.34]
By applying [5.31] to reduce to just :
[5.35]
By substituting this into [5.34] and using the fact that is arbitrary, we have that:
[5.36]
and for the independent term , we have:
[5.37]
These equations are satisfied for i 0 ranging from 2 to N − 1. From the initial conditions[5.32], we also recover [5.11] and [5.9]. However, the "upward" phase for calculating ui is now accomplished by solving [5.30] by substitution, which we can write as:
[5.38]
adding the following equation to find u N − 1:
[5.39]
This calculation shows that the block LU factorization with systematic block row permutation may be interpreted as a discretization of the method of invariant embedding in the form of a family of Cauchy problems that was presented in section 3.5.
Remark 5.4
It is a well-known fact that the stability of the Gaussian method is improved by performing the row permutation that brings the sub-diagonal element with the highest absolute value into the pivotal position. When discretizing the second derivatives, as is the case here, the blocks with the highest spectral radius naturally end up in the diagonal position: indeed, in our case, if λ h,i are the eigenvalues of − ∇ h,2 2, the diagonal term of Ah presented in [5.1] has the eigenvalues 2(1 + hλ h,i ), whereas the sub-diagonal blocks have the eigenvalue − 1.
There is therefore no benefit in performing block row permutations from the point of view of stability. The above calculation even shows that, in our case, the problem with row permutations leads to an ill-posed problem in the limit h → 0: the Cauchy problem for the Laplacian. This shows the connection between the destabilization of the Gaussian factorization with pivoting and the instability of the Cauchy problem. However, as commented in section 3.5, in the formulation corresponding to the Cauchy problem, storing the matrices Pi is not necessary to calculate ui .
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9781785481437500050
Computational and Numerical Challenges in Environmental Modeling
Giorgio Semenza , in Studies in Computational Mathematics, 2006
6.5.3 Using preconditioning
As mentioned in the previous subsection, dropping small non-zero elements can be useful in the attempts to preserve better the sparsity of the original matrix and, thus, to improve the performance of the selected sparse algorithm. While the performance is often improved when dropping of small non-zero elements is applied during the Gaussian elimination, the accuracy of the calculated triangular matrices L and U will often become very poor. This is why a preconditioning is necessary when this approach is used.
Ten preconditioned methods
Ten well-known numerical methods for solving systems of linear algebraic equations by using a direct method based on the Gaussian elimination and preconditioned methods for sparse matrices are discussed below. The methods are listed in Table 6.1. Some information about the abbreviations, as well as references, where more details about the ten methods can be found, are given in Table 6.1.
Table 6.1. Solvers for systems of linear algebraic equations with general sparse coefficient matrices, which are used in this section.
No. | Method | Abbreviation | Reference |
---|---|---|---|
1 | Direct solution | DIR | [272], [281] |
2 | Iterative Refinement | IR | [274], [281] |
3 | Pure Modified Orthomin | PURE | [262], [281] |
4 | Preconditioned Modified Orthomin | ORTH | [262], [281] |
5 | Conjugate Gradients Squared | CGS | [230] |
6 | Bi-Conjugate Gradients STAB | BiCGSTAB | [255] |
7 | Transpose-Free Quasi Min. Residual | TFQMR | [107] |
8 | Generalized Minimum Residual | GMRES | [207] |
9 | Eirola-Nevanlinna Method | EN | [86], [263] |
10 | Block Eirola-Nevanlinna Method | BEN | [270] |
The method for calculating directly the solution (DIR) is based on the use of a sparse matrix technique for calculation of an LU factorization of the coefficient matrix of the system of linear algebraic equations without dropping small non-zero elements, as in (Zlatev [281]).
The same method is also used to calculate approximate LU factorizations, which are used as preconditioners in the iterative methods. Let us reiterate here that a special parameter RELTOL (relative drop-tolerance), 0 ≤ RELTOL < 1, is used to calculate an approximate LU factorization. The use of this parameter to drop small non-zero elements during the Gaussian elimination was described in one of the previous subsections.
It should be added here that the relative drop-tolerance can also be used to drop all small non-zero elements in the original matrix before the beginning of the Gaussian elimination. The situation is very similar to that described in the previous subsection. If the absolute value of an element aij of the original coefficient matrix is smaller than the product of RELTOL and the largest in absolute value in row i, then aij is dropped (replaced by zero). This means that aij dropped when the following inequality
(6.7)
is satisfied. Note that (6.7) is very similar to (6.4), but while (6.7) is applied to the elements of the original matrix, (6.4) is applied to the elements calculated at some stage s + 1 of the Gaussian elimination.
If RELTOL is set equal to zero, then the direct solution can be calculated (see also the previous paragraph). However, if 0 < RELTOL < 1 then the LU factorization obtained by using the selected positive value of the relative drop-tolerance RELTOL must be used as a preconditioner in one of the iterative methods.
If RELTOL becomes larger, then in general the obtained LU factorization is becoming less accurate, but more sparse. This means that the number of iterations needed to obtain the required accuracy of the solution of the system of linear algebraic equations will tend to become greater for larger values of RELTOL, but the computational work per iteration will, in general, become smaller. If RELTOL is very large, then the rate of convergence can be very slow or the preconditioned iterative method may even not converge (the calculations must be repeated by using a smaller value of the relative drop-tolerance RELTOL when this happens; it might be necessary sometimes to reduce the value of RELTOL several times). This shows that it is very important to select robust and reliable stopping criteria when preconditioned iterative methods are used. The proper implementation of such stopping criteria will allow us to decide
- •
-
whether the required accuracy is achieved or not (and to stop the iterative process when the required accuracy is achieved), and
- •
-
whether the iterative process is slowly convergent or does not convergent at all and,
- −
-
to reduce the relative drop-tolerance, and
- −
-
to recalculate the LU factorization with the reduced drop-tolerance
when there are problems with the convergence.
It should be emphasized here that robust and reliable stopping criteria are also needed when pure iterative methods are applied.
Preconditioning is always used when the system of linear algebraic equations is solved by the last seven methods in Table 6.1. The Modified Orthomin Method (Zlatev [281]) is used both as a pure iterative method (PURE) and as a preconditioned iterative method (ORTH). If an approximate LU factorization is used in IR, then this method can be considered as a preconditioned iterative method.
It must be emphasized here that if the preconditioned iterative method used either is not convergent or the convergence is not sufficiently fast, then the preconditioner can automatically be improved, as described above, by reducing the relative drop-tolerance RELTOL (see more details in Zlatev [281]).
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/S1570579X06800212
Numerical Methods for Roots of Polynomials - Part II
J.M. McNamee , V.Y. Pan , in Studies in Computational Mathematics, 2013
10.4.3 Generalizations of the Quotient-Difference Method
Fox and Hayes (1968) associate a band matrix with the polynomial and perform an LU factorization, where L and U are triangular band matrices. They restrict their attention mainly to a quartic polynomial, but Kershaw (1987) extends their work to the general polynomial of degree N. Kershaw associates with a semi-infinite band matrix
(10.117)
The first n rows are arbitrary, and from row n + 1 onwards the element in the main diagonal is . We factorize A into LU where L is lower triangular with unit diagonals and band-width n + 1, while U is upper triangular with band-width m + 1. That is,
(10.118)
while
(10.119)
We associate polynomials with the matrices A (top or arbitrary part), L, U as follows:
(10.120)
(10.121)
(10.122)
The part of A below the nth row is associated with . Kershaw shows that usually and tend to fixed polynomials and in such a way that
(10.123)
Of course the case n = 1 gives a linear factor, and n = 2 a quadratic for . Or (although Kershaw does not mention this) we may use and repeat the process recursively until we have only linear and/or quadratic factors.
Jones and Magnus (1980) discuss what they call the F–G relations, which are very close to the Q–D scheme. Assuming that the zeros of satisfy
(10.124)
Let
(10.125)
(10.126)
(10.127)
Then for m = 0,1,… let
(10.128)
and
(10.129)
(10.130)
(Note: we assume that
(10.131)
The authors prove that if
(10.132)
Then
(10.133)
(but we suspect that the above is a misprint for on the right of (10.133)). In some numerical tests this method converged somewhat faster than the Q–D method, at least in some cases.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780444527301000047
Factorization by the QR Method
Jacques Henry , Angel M. Ramos , in Factorization of Boundary Value Problems Using the Invariant Embedding Method, 2016
Abstract:
In Chapter 5, we saw that in the context of a method of finite differences, one particular discretization of the factorization of problem corresponds exactly to the block Gaussian LU factorization of the matrix of the discretized problem . In other words, this factorization may be viewed as an infinite-dimensional generalization of the Gaussian factorization. It is natural to ask ourselves if there also exists a generalization of the QR method, where Q is an orthogonal matrix and R is an upper triangular matrix. We know that the QR method applies not only to square matrices, but also to rectangular matrices with more rows than columns and so corresponds to overdetermined problems that we can solve in the sense of least squares. This generalization to overdetermined problems is also possible for the factorization of the boundary value problems studied here (this is discussed in the doctoral thesis of Maria Orey and an additional boundary condition is discussed), but we will restrict our attention in this chapter to well-posed problems.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9781785481437500086
The Passive Dendritic Tree
FABRIZIO GABBIANI , STEVEN J. COX , in Mathematics for Neuroscientists, 2010
8.1 THE DISCRETE PASSIVE TREE
We work in the concrete context of Figure 8.2 on the way to a more general understanding. We have indexed the compartments, following an observation of Hines, in a manner that leads to minimal fill-in in the LU factorization, Exercise 5.2, of the resulting linear system associated with the backward Euler and trapezoid schemes. The physical lengths and radii of the three fibers are
respectively, while the length of each compartment, except the soma, is dx. The soma is presumed to have surface area A s and is not typically further compartmentalized.
If we inject I stim at the soma then Kirchhoff's Current Law, at the node with potential v 3,4, requires
(8.1)
where, as in §6.1, vi,ijj = Vi,ijj – V Cl , while at the branch point (v 3,1) we find
Current balance at the remaining nodes proceeds exactly as before, recall Eq. (6.5). In particular, with the squared space constant of branch j, we find
(8.2)
where A 3 = 2πa 3dx. We write this collection of equations as the linear system
(8.3)
and H is the Hines matrix
where
The genius of H is at least double – it factors easily and is similar to a symmetric matrix. Regarding the former, we note that, as in the tridiagonal S of Chapter 6, Gaussian Elimination applied to this matrix requires only one elimination per column.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780123748829000083
Further Topics in Linear Programming
Bernard Kolman , Robert E. Beck , in Elementary Linear Programming with Applications (Second Edition), 1995
Algorithm
Virtually all LP codes designed for production, rather than teaching, use the revised simplex method. This method has several desirable features, including the ability to handle a large number of variables. The real limit on the size of a problem is the number of constraints (see Section 3.5). Other features will be described when we discuss error detection and correction.
Most of the large LP codes provide an option for computing B −1 that is based upon a procedure from numerical linear algebra called LU factorization . That is, B is written as LU, the product of a lower triangular matrix L and an upper triangular matrix U. The inverses of upper and lower triangular matrices are easily calculated. Then B −1 = U −1 L −1. Many large linear programming models have sparse matrices (ones with few nonzero entries). The matrix representations can then be highly compressed and L −1 and U −1 can be calculated in RAM, with special routines for sparse matrices, resulting in significant time savings. For this reason, more and more codes will provide an LU-factorization option.
The revised simplex algorithm with iterative B −1 calculation is usually programmed to check itself at specified intervals. Between checks it follows the description we gave in Section 3.4. The check involves computing the next B −1 in a manner different from the one we described. The matrix B can be constructed from the list of basic variables and the original problem as it was read in and stored. Then a very good method of numerically inverting B, such as the LU-factorization method described above, is used. This procedure of occasionally recomputing B −1 from the given problem serves to produce a more accurate basic feasible solution. However, in general the procedure is expensive in terms of computation time and must be used sparingly.
As was indicated in Section 2.2 most LP codes provide several options for handling degeneracy when it occurs.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780124179103500061
Inherently Parallel Algorithms in Feasibility and Optimization and their Applications
Y. Saad , in Studies in Computational Mathematics, 2001
4 PRECONDITIONERS
We now turn our attention to preconditioning techniques for distributed sparse matrices. Many of the ideas used in preconditioning distributed sparse matrices are borrowed from the Domain Decomposition literature; see, for example[15,36,57]. The simplest of these use block preconditionings based on the domains, and are termed Schwarz alternating procedures in the Partial Differential Equations literature. Let Ri be a restriction operator which projects a global vector onto its restriction in subdomain i. Algebraically, this is represented by an ni × n matrix, consisting of zeros and ones, where ni is the size of subdomain i. The transpose of Ri represents a prolongation operator from subdomain i to the whole space. Let Ai be the local submatrix associated with subdomain i. In the Block Jacobi (or additive Schwarz) procedure, the global solution is updated as,
where r is the residual vector b − Ax. Each of the solves with the matrices Ai is done independently. The method does not specify which technique is used for solving these local systems. It clear that a direct solution method could be used in case each subproblem is small enough.
However, it is common to use iterative solvers for the local systems since these are often fairly large. An iterative solver consists of an accelerator such as the conjugate gradient algorithm, and a preconditioner such as an Incomplete LU (ILU) factorization. For details on such techniques see, for example, [2,5,18,49]. In terms of software, a publically available package called SPARSKIT [46] includes a fair number of standard accelerators and preconditioners. Among the preconditioners available in SPARSKIT, the ILU with threshold (ILUT) reaches a good compromise between simplicity and robustness and is widely used.
In the of domain decomposition methods, however, the accelerator for the global iteration must take into account the fact that the preconditioner may not a constant operator (i.e., that it may change at each step of the outer iteration). Many variations are possible. In general, we found that iterating to solve each of the sub problems accurately is not cost-effective. Instead often a simple forward-backward sweep, with ILU factors obtained from an ILUT preconditioner, often yields the fastest combination. Subdomain partitions may be allowed to overlap. This technique works reasonably well for a small number of overlapping subdomains as was shown in experiments using a purely algebraic form in [44].
This can be extended to block Multiplicative Schwarz procedure, which is a block- Gauss-Seidel iteration in which, likewise, a block is associated with a domain. Here, the algorithm can be described by a simple loop which updates the local solutions for subdomain 1, 2, …, s in turn, taking into account the latest value reached for the current solution each time. The only difference with the block Jacobi iteration is that the solution and residual are updated immediately after each local correction δi . In the block Jacobi case, these local corrections are all computed using the same residual r and then they are added at the same time to the current solution x.
A technique of labeling the domains, known as "multicoloring", has been be exploited to extract parallelism from the Gauss-Seidel procedure, see e.g., [57]. In this procedure, the subdomains are colored such that subdomains that are coupled by an equation are always assigned a different color. One such coloring is illustrated in Figure 5 where the numbers refer to a color. With this coloring, the Gauss-Seidel loop can now proceed differently. Since couplings between vertices of different domains are only possible when these domains have different colors, the Gauss-Seidel sweep can proceed by color. Thus, in Figure 5 all 4 domains of color number 1 can update their variables first at the same time. Then the two domains with color number 2 can simultaneously do their update, and so on. For this example this color-based sweep will require 4 steps instead of s = 12 steps for the domain-based sweep. The color-based loop is as follows.
For color = 1,…, numcolors Do:
If mycolor = color Do:
x :=x + δ
r :=r-Aδ
EndIf
EndDo
Here, mycolor represents the color of the node to which the subdomain is assigned. Though this procedure represents an improvement over the standard Gauss-Seidel procedure, it is still sequential with respect to the colors since when one color is active (one of the steps of the color loop), all processors of a different color will be idle. This may lead to substantial loss of efficiency. Several remedies have been proposed, the most common one of which is to further partition each subdomain in each processor and then find a global coloring of the new, finer, partitioning. The goal here is to have all colors of this new partitioning represented in each processor. An example of a second level partitioning of the domain in Figure 5 is shown in Figure 6.
This two-level coloring now presents a different disadvantage, namely that the rate of convergence of the new procedure is usually smaller than that of the original scheme, due to the fact that the subdomains are smaller. Other remedies have been proposed, see e.g., [52] for a discussion. Our own experience is that many of the remedies do not come for free in that they tend to increase the cost of the scheme either due to the increased number of iterations or other overhead.
Distributed versions of the Incomplete LU factorization ILU(0) have been developed and tested on the Connection Machine 5 as early as 1993 [38]. More recently, the idea was generalized to higher order ILU(k) factorizations by Karypis and Kumar [32] and Hysom and Pothen [26]. While the original ILU factorization is sequential in nature, it is fairly easy to devise parallel versions by what amounts to reordering on the global system. The effect of such reorderings can be detrimental to convergence and this has been the subject of recent research, see, e.g., [6].
The strategy for unraveling parallelism in ILU is illustrated in Figure 7 in the simplest case of ILU(0). The unknowns associated with the interior nodes do not depend on the other variables. As a result they can be eliminated independently and in parallel in the ILU(0) process. The rows associated with the nodes on the interface of the subdomain will require more attention. Indeed, now the order in which to eliminate these rows is important, since this order will essentially determine the amount of parallelism. At one extreme we can have the rows eliminated one by one in the natural order 1,2,…,n which results in no parallelism. We can also try to reorder the equations in such a way as to maximize parallelism as is done in sparse Gaussian elimination - but this would be too costly for incomplete LU factorizations. A somewhat intermediate solution, is to eliminate the equations using a simple "schedule", i.e., a global order in which to eliminate the rows, set by a priority rule among processors. As an example, we could impose a global order based on the labels of the processors, meaning that between two adjacent processors, the one with a lower number has a higher priority in processing rows than the other. Thus, in our illustration of Figure 7 Processor 10 processes the rows associated with its interior nodes first, then it will wait for those interface rows belonging to processors 2, 4 and 6. Once they are received, by Processor 10, it will eliminate all its rows associated with (all) its interface nodes. Finally, once this is done, the interface rows in Processor 10 that are awaited by Processors 13 and 14 will be sent to them. This sets a global ordering for the elimination process. The local ordering of the rows, i.e., the order in which we process the interface rows in the same processor (e.g. Processor 10) may not be as important. The global order based on PE-number, defines a natural priority graph and parallelism can easily be exploited in a data-driven implementation.
However, it seems rather artificial to use the label of a processor to determine the global order of the elimination. In fact, we can also define a proper order in which to perform the elimination by replacing the PE-numbers by any labels provided any two neighboring processors have a different label. The most natural way of doing this is to perform a multicoloring of the subdomains, and use the colors in exactly the same way as above to define an order of the tasks. The colors can be defined to be positive integers as in the illustration of Figure 5.
It is possible to eliminate the interior variables ui from each of the local systems (3). Doing so will leave us with a system which involves only interface variables (denoted by yi (3)), and referred to as a Schur complement system. This system can be solved and then the other variables can be computed. One of the main difficulties with this approach is to precondition the Schur complement system. Indeed, depending on the formulation, this system may be dense or have diagonal blocks that are dense [49]. It is possible to define approximations for this Schur complement from which preconditioners can then be derived [8,7,13]. An alternative is to define preconditioners for the global system based on some approximate solution of the Schur complement system [53,51]. This approach does not require that the Schur complement system be solved exactly since the solution is only used to define a preconditioner. Generally speaking, Schur complement preconditioners are often harder to implement but of better qualily than Schwarz-based preconditioners.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/S1570579X01800252
Mesh Generation
Marshall Bern , Paul Plassmann , in Handbook of Computational Geometry, 2000
2.2 Solution methods
The solution of the sparse linear system is usually the most computationally demanding of the three steps. Solution methods include direct factorization and preconditioned iterative methods, methods which can vary dramatically in required storage and computational cost for different problems. Moreover, the discrete formulation and mesh generation steps greatly influence the efficacy of a solution method. For example, if one chooses to use higher-order basis functions in the finite element method, one can use a coarser mesh, and a smaller but denser linear system.
Direct factorization methods, such as sparse Cholesky or LU factorization, tend be slower but more robust than iterative methods. Luckily the extra computational cost can be amortized when the factorization is reused to solve for more than one right-hand side. An important issue in direct methods is ordering the vertices to minimize the "fill", the number of intermediate nonzeros. Nested dissection [ 72] uses graph separators to save an asymptotic factor of O(n 3/d ) in the fill. Any planar graph admits separators of size O(n 1/2); reasonable three-dimensional meshes admit separators of size O(n 2/3) [89].
Iterative methods [17] have proved effective in solving the linear systems arising in physical modeling. Most large problems cannot be effectively solved without the use of preconditioning. A popular preconditioner involves an incomplete factorization. Rather than computing the exact factors for the matrix A = LU, approximate factors are computed such that and the preconditioned system
is solved iteratively. Ideally, the incomplete factors should be easy to compute and require a modest amount of storage, and the condition number of the preconditioned system should be much better than the original system.
Multigrid methods [140] can achieve the ultimate goal of iterative methods, convergence in O(1) iterations, for certain classes of problems. These methods use a sequence of meshes, graded from fine to coarse. The iterative solution of the fine linear system is accelerated by solving the coarser systems. The cycle repeatedly projects the current residual from a finer mesh onto the next coarser mesh and interpolates the solution from the coarser mesh onto the finer. One drawback of multigrid is that computing a sequence of meshes may be difficult for complicated geometries.
Domain decomposition [122] represents something of a hybrid between iterative and direct approaches. This approach divides the domain into possibly overlapping small domains; it typically solves subproblems on the small domains directly, but iterates to the global solution in which neighboring subproblem solutions agree. This approach enjoys some of the superior convergence properties of multigrid methods, while imposing less of a burden on the mesh generator. In fact, the domain is often partitioned so that subproblems admit structured meshes.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780444825377500073
Find Lu Factorizationof a and Complete Solution to Ax B
Source: https://www.sciencedirect.com/topics/mathematics/lu-factorization