I understand that L needs to be square, but all the examples that I could find for doing decomposition on non-square matrices is when the number of columns is greater than the number of rows. How should L be squared when the number of rows is greater than the number of columns?
The way Cuda's LU decomposition function works is that it overwrites the matrix A by its factors. The causes both L and U to have the dimensions of the original matrix and I need figure out how to square the L factor when A is non-square.
As a guess I am trying to set the diagonals of the L matrix returned to me by the Cuda library's LU decomposition function to the identity and the rest to zeroes, but it does not seem to be the correct choice.