[Basta-users] Kullback-Leibler divergence interpretation
Jelle Boonekamp
jjboonekamp at gmail.com
Fri Feb 15 15:14:40 CET 2013
Hi Fernando,
Thank you very much for your answer, it is indeed very helpful to me. In the r-script attached you can see how I calculated the amount of overlap (for two randomly generated distributions in this example). When the means and standard deviations are identical, the overlap calculated is about 1, which gives me confidence in this method. I made an error yesterday in calculating the amount of overlap for the posterior distributions of b0 and b1 with the KLDC's of 0.88 and 0.99. In fact they where p=0.40 and p=0.15, respectively. I must be doing something wrong, because clearly a KLDC=0.99 cannot describe divergence of two distributions with 15% overlap, right?
Actually I am not so interested in a P-value or any hard boundary as such, but I would want to describe how likely the distributions with KLDC=0.99 are differing, and similar for the other two. Can I interpret these KLDC's as actual probabilities? For my data it is important because my treatment effect on b0 and b1 is in the opposite direction, i.e. baseline mort is lower in the group where the age-dependence on mort is stronger. This actually raises another question: when fitting Gompertz, b0 and b1 are not independent. I am interested in whether the treatment effect on b1 (age dependence on mort) is still there when I keep the b0 parameter fixed, lets say to the average of both groups. Can I do this in BaSTA?
You talked about the possibility of calculating Bayesian p-values. How would I do this?
Best, Jelle
On Friday, 15 February 2013 at 09:50, Fernando Colchero wrote:
> Hi Jelle,
>
> Well, you could calculate Bayesian p-values, and their interpretation is, roughly, what you calculated. The KLDC values we have in BaSTA show the mean (calibrated) Kullback-Leibler discrepancies (KL) between both distributions. The issue is that KL values are non-symetrical. For example, if you want to determine the amount of overlap between a distribution P and a distribution D, unless they have the same variance, KL(P, D) is not equal to KL(D, P). I am not sure how you calculated the percentage of overlap, but it is likely that you had one side of the overlap, say P with respect to D, but it is possible that values will be different the other way around.
>
> I think it is safe to state that your 0.88 and 0.99 values indicate that there is little overlap in the first and almost none in the second, which suggests that both parameters for both groups are different. Still, I understand that it would be handy to have a clear threshold as with traditional p-values. However, if you think about it, such thresholds are somewhat arbitrary and may not apply to a specific system. That is why some Bayesian and even non-Bayesian statisticians prefer to state how likely or how probable is that x and y are related instead of having hard boundaries.
>
> I hope that this is helpful. Best,
>
> Fernando
>
>
>
> Fernando Colchero
> Assistant Professor
> Department of Mathematics and Computer Sciences
> Max Planck Odense Center on the Biodemography of Aging
>
> Tlf. +45 65 50 46 35
> Email colchero at imada.sdu.dk (mailto:colchero at imada.sdu.dk)
> Web www.sdu.dk/staff/colchero (http://www.sdu.dk/staff/colchero)
> Pers. web www.colchero.com (http://www.sdu.dk/staff/colchero)
> Adr. Campusvej 55, 5230, Odense, Dk
>
> University of Southern Denmark
>
>
>
>
>
> On Feb 14, 2013, at 4:36 PM, Jelle Boonekamp <jjboonekamp at gmail.com (mailto:jjboonekamp at gmail.com)> wrote:
> > Dear BaSTA users,
> >
> > I am wrestling a bit with the interpretation of the Kullback-Leibler metric describing the posterior distributions of model parameters. In my example I get KLDC values of 0.88 and 0.99 for the b0 and b1 Gompertz parameters respectively, when comparing two groups of individuals. If I understood correctly then a value of 1 of this calibrated KLD indicates that there is no overlap between distributions, and a value of 0.5 indicates that they are identical. However, when I calculate by hand the percentage of overlapping (which I think can be interpreted as measure of significance since these posterior values are normally distributed) of both distributions I get 0.24 and 0.066 respectively (KLDC 0.88 and 0.99). I would have thought that at least the distributions with KLDC = 0.99, to show less overlap than what I calculated by hand (0.066).
> >
> > Can someone shed some light on this?
> >
> > Best, Jelle
> >
> > --
> > Jelle Boonekamp
> > Behavioural Biology
> > University of Groningen
> > P.O. Box 11103
> > 9700 CC Groningen
> > The Netherlands
> >
> > tel: +31.50.363 7853
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> > _______________________________________________
> > Basta-users mailing list
> > Basta-users at lists.r-forge.r-project.org (mailto:Basta-users at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/basta-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130215/8f7ba4b7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: distribution divergence testing.R
Type: application/octet-stream
Size: 1623 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/basta-users/attachments/20130215/8f7ba4b7/attachment.obj>
More information about the Basta-users
mailing list