[Rcpp-devel] RcppArmadillo inv() depends on Lapack, zgetri_ not available on CRAN / R-forge?

baptiste auguie baptiste.auguie at googlemail.com
Tue Jun 7 23:20:26 CEST 2011


On 8 June 2011 09:04, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Tue, Jun 7, 2011 at 3:47 PM, baptiste auguie
> <baptiste.auguie at googlemail.com> wrote:
>> On 8 June 2011 08:03, Douglas Bates <bates at stat.wisc.edu> wrote:
>>> On Tue, Jun 7, 2011 at 2:34 PM, baptiste auguie
>>> <baptiste.auguie at googlemail.com> wrote:
>>>> Hi Doug,
>>>>
>>>> On 8 June 2011 03:43, Douglas Bates <bates at stat.wisc.edu> wrote:
>>>>>
>>>>> On Jun 6, 2011 4:17 AM, "baptiste auguie" <baptiste.auguie at googlemail.com>
>>>>> wrote:
>>>>>>
>>>>>> Thank you for the explanations below.
>>>>>>
>>>>>> On 5 June 2011 10:40, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>>>> >
>>>>>> > On 5 June 2011 at 10:12, baptiste auguie wrote:
>>>>>> > | Hi Dirk and all,
>>>>>> > |
>>>>>> > | On 4 June 2011 12:04, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>>>> > | >
>>>>>> > | > Baptiste,
>>>>>> > | >
>>>>>> > | > On 4 June 2011 at 11:45, baptiste auguie wrote:
>>>>>> > | > | Dear list,
>>>>>> > | > |
>>>>>> > | > | My package cda, which I was hoping to release on CRAN, fails to
>>>>>> > | > | compile on R-forge with error,
>>>>>> > | > |
>>>>>> > | > | ** testing if installed package can be loaded
>>>>>> > | > | Error in dyn.load(file, DLLpath = DLLpath, ...) :
>>>>>> > | > |   unable to load shared object
>>>>>> > '/tmp/RtmpbztUMm/Rinst1829c04c/cda/libs/cda.so':
>>>>>> > | > |   /tmp/RtmpbztUMm/Rinst1829c04c/cda/libs/cda.so: undefined symbol:
>>>>>> > zgetri_
>>>>>> > | > |
>>>>>> > | > | It builds fine on my local machines (Mac OS 10.5, 10.6).
>>>>>> > | > |
>>>>>> > | > | >From an older discussion on this list <
>>>>>> > | > |
>>>>>> > http://www.mail-archive.com/rcpp-devel@lists.r-forge.r-project.org/msg00678.html>
>>>>>> > | > | the issue seems to be that Armadillo's inv() relies on a function
>>>>>> > that
>>>>>> > | > | is not provided by R, only by LAPACK. I have just replaced inv()
>>>>>> > by
>>>>>> > | > | pinv() and solve() in my code; merely to see what happens, but
>>>>>> > chances
>>>>>> > | > | are they also require a full LAPACK.
>>>>>> > |
>>>>>> > | Indeed, the error on R-forge is now with zgels_, required to solve
>>>>>> > | linear systems. It seems one cannot solve Armadillo linear systems
>>>>>> > | without LAPACK in the current situation.
>>>>>> >
>>>>>> > Yes. Doug, Romain and myself should address that, or at least make it
>>>>>> > clear
>>>>>> > what feature of the full Armadillo are lacking in RcppArmadillo.
>>>>>> >
>>>>>> > | > Sometime relatively early in the RcppArmadillo development process,
>>>>>> > Doug
>>>>>> > | > convinced Romain and myself to go for a pure template solution with
>>>>>> > Armadillo
>>>>>> > | > as all / most things found during the configure (or in this case,
>>>>>> > cmake)
>>>>>> > | > stage can be assumed 'found' given that we have around us by design.
>>>>>> >  So no
>>>>>> > | > testing, no local library and full reliance and what R gives us.
>>>>>> > | >
>>>>>> > | > That was a brilliant idea, and has freed us from having to rely on
>>>>>> > building
>>>>>> > | > and shipping a library, having to tell users how to set PKG_LIBS etc
>>>>>> > pp and I
>>>>>> > | > firmly believe that this helped tremendously in getting
>>>>>> > RcppArmadillo more
>>>>>> > | > widely used. So I would not want to revert this.
>>>>>> > |
>>>>>> > | It sounds like a good choice, I agree. Yet solving a linear system is
>>>>>> > | quite a crucial task in linear algebra; it would be nice if we could
>>>>>> > | come up with a tutorial recipe for dummies like me.
>>>>>> > |
>>>>>> > | >
>>>>>> > | > In any event, it seems that you need more LAPACK than R has for you.
>>>>>> >  That is
>>>>>> > | > likely to be a dicey situation as you per se do not know whether R
>>>>>> > was built
>>>>>> > | > and linked with its own (subset) copy of LAPACK, or whether it uses
>>>>>> > system
>>>>>> > | > LAPACK libraries (as e.g. the Debian / Ubuntu systems do).  So you
>>>>>> > may be in
>>>>>> > | > a spot bother and I not sure what I can recommend --- other than
>>>>>> > trying your
>>>>>> > | > luck at some short configure snippets that will run at package build
>>>>>> > time to
>>>>>> > | > determine whether the system you want to build cda on it 'rich'
>>>>>> > enough to
>>>>>> > | > support it.  I can help you off list with some configure snippets as
>>>>>> > some of
>>>>>> > | > my packages have configure code; adding a test for zgetri should be
>>>>>> > feasible.
>>>>>> > | >
>>>>>> > | > | Does anybody have any experience
>>>>>> > | > | dealing with such issues w.r.t releasing a package on R-forge /
>>>>>> > CRAN?
>>>>>> > | > | Is there any chance they would consider installing LAPACK on those
>>>>>> > | > | servers, or is there no point in asking such things?
>>>>>> > | >
>>>>>> > | > I don't think it is a matter of fixing the R-Forge server. I think
>>>>>> > it is a
>>>>>> > | > matter of making your package installable on the largest number of
>>>>>> > user
>>>>>> > | > systems.  Also try win-builder.r-project.org to see how it fares on
>>>>>> > that
>>>>>> > | > platform.
>>>>>>
>>>>>> Unsurprisingly, it fails, with the same complaint as R-forge.
>>>>>>
>>>>>> > |
>>>>>> > | I was under the impression that R-forge or CRAN, if it had LAPACK
>>>>>> > | installed, could produce binaries for the relevant platforms, and
>>>>>> > | users would not have to build the package themselves and would not be
>>>>>> > | required of having LAPACK on their machine. That's probably a
>>>>>> > | misconception, isn't it?
>>>>>> >
>>>>>> > If and only statically linked binaries or libraries where produced,
>>>>>> > which is
>>>>>> > generally not the case. Many OSs (Linux incl) ship source only and
>>>>>> > otherwise
>>>>>> > link dynamically, others (Windoze) use dynamic linking and OS X is for
>>>>>> > all I
>>>>>> > know somewhere in the middle (as you can get prebuild packages with
>>>>>> > dynamic
>>>>>> > linking or build from source).
>>>>>>
>>>>>> I see; so basically the user will always need to have a full LAPACK
>>>>>> installed. Here's one question then, if R-core didn't consider
>>>>>> necessary to include those particular functions from LAPACK,
>>>>>> presumably that means that R defines its own routines to solve linear
>>>>>> systems and invert matrices. Is there any possibility to use those
>>>>>> routines with Armadillo?
>>>>>
>>>>> One important point has been missed in this discussion, which is that the
>>>>> particular Lapack subroutine that is not found in the subset of Lapack
>>>>> shipped with R is for general, complex-valued matrices (just google the name
>>>>> zgetri).  The way that Armadillo is structured it is either all-in or
>>>>> all-out with respect to complex-valued matrices  If you allow for complex
>>>>> matrices then you must have all the supporting subroutines for whatever you
>>>>> are calling.  If you call inv() you need to allow for all the [sdcz]getri
>>>>> routines to be available.
>>>>
>>>> The matrix I need to invert is definitely always complex; in fact,
>>>> convenient complex algebra is the main attraction of Armadillo for me.
>>>
>>> Argh.  Of course, I was being foolish.  Because Armadillo is a
>>> header-only library it does not access any Lapack subroutine until the
>>> call is instantiated.  I still haven't quite gotten used to thinking
>>> only having the headers.
>>>
>>>>>
>>>>> So one possibility is to check the Armadillo sources to see if you can
>>>>> suppress the use of complex-valued matrices when compiling your package.  A
>>>>> quick glance indicates that this might now be easy.
>>>>>
>>>>> Otherwise, remember the first rule of numerical linear algebra which is, "If
>>>>> your algorithm involves computing the inverse of a matrix you should rethink
>>>>> the algorithm because there is a better way of doing it".  So why do you
>>>>> need inv()?
>>>>> If the answer is to solve a linear system of equations then you
>>>>> use solve(), you do not use inv().  If, for some truly unusual reason you
>>>>> actually need the inverse (and, remember in 99.99% of the cases where people
>>>>> think they need the inverse, they don't) then you use a combination of
>>>>> solve() and eye().
>>>>
>>>> I tried this, and in fact I do use solve() elsewhere in my code, which
>>>> calls for another LAPACK routine (zgels.f) not present in R. The
>>>> reason I still need inv() is that I have to solve about 300 times a
>>>> linear system AX=B with the same matrix A but varying B. A quick
>>>> timing last week revealed that using solve() 300 times was typically
>>>> slower by a factor of two  in my use case than using inv() once (or
>>>> pinv() for that matter, it makes not appreciable difference). I'm
>>>> happy to be shown otherwise though.
>>>
>>> I forgot that Armadillo doesn't provide a convenient way of using the
>>> LU decomposition (that is one of the things that I find frustrating
>>> about Armadillo).  Did you try a single solve call in which the
>>> right-hand side is an identity matrix?  On looking at the Armadillo
>>> sources it seems that it should call zgesv which is included in R's
>>> subset of the Lapack routines.
>>
>> It seems curious, looking at the bestiary of LAPACK functions, that so
>> many of them seem to perform a similar task. I wonder what are the
>> practical downsides of using a code that solves AX=I rather than one
>> inverting A. Also, when I replaced inv() by solve() and pinv(),
>> R-forge still failed to build complaining that zgesv_ was not present.
>> I assumed it was used by solve(), but perhaps it was pinv(). I'll give
>> it a try anyway, but I'd rather hope we can figure out a less ad hoc
>> solution.
>
> Once upon a time we actually counted the number of floating point
> operations that were needed to perform a particular calculation, which
> is the reason for all the special-case code in Lapack.  The difference
> between using the LU decomposition of a matrix to solve a system in
> which the right hand side is an identity, versus a special-purpose
> piece of code like zgetri is that zgetri can take advantage of the
> fact that the identity matrix is diagonal, thereby saving a relatively
> small number of operations.

That makes sense, thanks.

>
> I'm not sure why there should be a complaint about zgesv_ not being
> available.  It's there in the R sources (src/modules/lapack/cmplx.f
> starting at line 3944).  That routine is trivial because it just
> checks its arguments then calls zgetrf and zgetrs.
>>

Sorry, I meant zgels_ not zgesv_ (who came up with those names?!). You
can see the error here:
https://r-forge.r-project.org/R/?group_id=160&log=build_src&pkg=cda&flavor=patched

Best,

baptiste

>> Thanks,
>>
>> baptiste
>>
>>>
>>>
>>>> One option that I'd like to consider is whether the appropriate LAPACK
>>>> routines could be wrapped and shipped in a separate package
>>>> (discontinued rblas could provide a good starting point). Sadly, I
>>>> know nothing about static/dynamic libraries and all this business..
>>>>
>>>> Thanks,
>>>>
>>>> baptiste
>>>>
>>>>>
>>>>>> > | Sorry for being dense, I don't know anything about linking R to
>>>>>
>>>>>> > | external dependencies.
>>>>>> >
>>>>>> > It can be done; there are many examples -- for example every package
>>>>>> > using
>>>>>> > the GSL.
>>>>>>
>>>>>> I just checked how RcppGSL does it, and well, this configure magic is
>>>>>> way above my head.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> baptiste
>>>>>> _______________________________________________
>>>>>> Rcpp-devel mailing list
>>>>>> Rcpp-devel at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>>>>
>>>>
>>>
>>
>


More information about the Rcpp-devel mailing list