From kidi3back at gmail.com Fri Apr 1 10:33:39 2016 From: kidi3back at gmail.com (215) Date: Fri, 1 Apr 2016 01:33:39 -0700 (PDT) Subject: [datatable-help] prediction.strength test prediction strength with an separate test set? Message-ID: <1459499619171-4719188.post@n4.nabble.com> I am at the moment using prediction.strength{fpc} to test how well my kmeans clustered data classifies using knn. I do that by calling this function. prediction.strength(training, Gmax = 10, M = 5, classification = "knn", count = TRUE,nnk = 20) Is it possible to provide a separate test set to compute the mean correct prediction = mean.pred, instead of using the training set itself... Or to compute it from some form of cross validation. -- View this message in context: http://r.789695.n4.nabble.com/prediction-strength-test-prediction-strength-with-an-separate-test-set-tp4719188.html Sent from the datatable-help mailing list archive at Nabble.com. From timvanderstap89 at gmail.com Sat Apr 2 01:05:31 2016 From: timvanderstap89 at gmail.com (TimvdStap) Date: Fri, 1 Apr 2016 16:05:31 -0700 (PDT) Subject: [datatable-help] Correcting for transience in mark-recapture data with R Message-ID: <1459551931821-4719206.post@n4.nabble.com> Hi everyone, I'm working on estimating the population size of Risso's dolphins in the Azores using mark-recapture data, whereby I take transient individuals into account (i.e., I estimate a transience-corrected population size). For this, I use an R script, as used in Madon et al (2013) (see link to paper at bottom of this post). Though I feel I have managed to get the script running, for some reason the transience-corrected population size for my final year is far *higher* than the original population size. This is obviously not supposed to happen, but I am unsure as to whether the transience-corrected population size for my final year is wrong, or the initial population size. Attached I have a sample of my data as a .csv-file, and a word document with the R script (as it's too long to post here). Of course I have made some minor changes to the script to fit my data better. If anyone could have a look at the script, and possibly tell me where I am going wrong or what I'm missing, that would be great!!! I have been staring at the script for so long and trying to figure this one out, that I feel I'd miss even the most obvious of mistakes at the moment.. Any help is greatly appreciated!! Kind regards, Tim http://onlinelibrary.wiley.com/doi/10.1111/j.1748-7692.2012.00610.x/abstract?userIsAuthenticated=false&deniedAccessCustomisedMessage= Risso_example.csv Rscript_TvdS.docx -- View this message in context: http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206.html Sent from the datatable-help mailing list archive at Nabble.com. From kaheil at gmail.com Sat Apr 2 02:05:11 2016 From: kaheil at gmail.com (Yasir Kaheil) Date: Sat, 02 Apr 2016 00:05:11 +0000 Subject: [datatable-help] Correcting for transience in mark-recapture data with R In-Reply-To: <1459551931821-4719206.post@n4.nabble.com> References: <1459551931821-4719206.post@n4.nabble.com> Message-ID: Why don't you just simply read the CSV with read.csv instead of that function in the beginning? Le ven. 1 avr. 2016 7:36 PM, TimvdStap a ?crit : > Hi everyone, > > I'm working on estimating the population size of Risso's dolphins in the > Azores using mark-recapture data, whereby I take transient individuals into > account (i.e., I estimate a transience-corrected population size). For > this, > I use an R script, as used in Madon et al (2013) (see link to paper at > bottom of this post). Though I feel I have managed to get the script > running, for some reason the transience-corrected population size for my > final year is far *higher* than the original population size. This is > obviously not supposed to happen, but I am unsure as to whether the > transience-corrected population size for my final year is wrong, or the > initial population size. > > Attached I have a sample of my data as a .csv-file, and a word document > with > the R script (as it's too long to post here). Of course I have made some > minor changes to the script to fit my data better. If anyone could have a > look at the script, and possibly tell me where I am going wrong or what I'm > missing, that would be great!!! I have been staring at the script for so > long and trying to figure this one out, that I feel I'd miss even the most > obvious of mistakes at the moment.. > > Any help is greatly appreciated!! > > Kind regards, > > Tim > > > http://onlinelibrary.wiley.com/doi/10.1111/j.1748-7692.2012.00610.x/abstract?userIsAuthenticated=false&deniedAccessCustomisedMessage= > > Risso_example.csv > > Rscript_TvdS.docx > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timvanderstap89 at gmail.com Sat Apr 2 01:48:21 2016 From: timvanderstap89 at gmail.com (TimvdStap) Date: Fri, 1 Apr 2016 16:48:21 -0700 (PDT) Subject: [datatable-help] Correcting for transience in mark-recapture data with R In-Reply-To: References: <1459551931821-4719206.post@n4.nabble.com> Message-ID: Hi Yasir, For now, I am trying to run the script as identical as possible as the way Madon and her colleagues ran it in her paper. When I removed the function(data=data) it did not change anything to the results, hence why I just left it as it is. Cheers, Tim On Fri, Apr 1, 2016 at 4:35 PM, Yasir Kaheil [via R] < ml-node+s789695n4719208h77 at n4.nabble.com> wrote: > Why don't you just simply read the CSV with read.csv instead of that > function in the beginning? > > Le ven. 1 avr. 2016 7:36 PM, TimvdStap <[hidden email] > > a ?crit : > >> Hi everyone, >> >> I'm working on estimating the population size of Risso's dolphins in the >> Azores using mark-recapture data, whereby I take transient individuals >> into >> account (i.e., I estimate a transience-corrected population size). For >> this, >> I use an R script, as used in Madon et al (2013) (see link to paper at >> bottom of this post). Though I feel I have managed to get the script >> running, for some reason the transience-corrected population size for my >> final year is far *higher* than the original population size. This is >> obviously not supposed to happen, but I am unsure as to whether the >> transience-corrected population size for my final year is wrong, or the >> initial population size. >> >> Attached I have a sample of my data as a .csv-file, and a word document >> with >> the R script (as it's too long to post here). Of course I have made some >> minor changes to the script to fit my data better. If anyone could have a >> look at the script, and possibly tell me where I am going wrong or what >> I'm >> missing, that would be great!!! I have been staring at the script for so >> long and trying to figure this one out, that I feel I'd miss even the most >> obvious of mistakes at the moment.. >> >> Any help is greatly appreciated!! >> >> Kind regards, >> >> Tim >> >> >> http://onlinelibrary.wiley.com/doi/10.1111/j.1748-7692.2012.00610.x/abstract?userIsAuthenticated=false&deniedAccessCustomisedMessage= >> >> Risso_example.csv >> >> Rscript_TvdS.docx >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206.html >> Sent from the datatable-help mailing list archive at Nabble.com. >> _______________________________________________ >> datatable-help mailing list >> [hidden email] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > > _______________________________________________ > datatable-help mailing list > [hidden email] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > Yasir Kaheil > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206p4719208.html > To unsubscribe from Correcting for transience in mark-recapture data with > R, click here > > . > NAML > > -- View this message in context: http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206p4719209.html Sent from the datatable-help mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From suttoncarl at ymail.com Sat Apr 2 05:38:11 2016 From: suttoncarl at ymail.com (carlsutton) Date: Fri, 1 Apr 2016 20:38:11 -0700 (PDT) Subject: [datatable-help] := and ':=" throwing errors In-Reply-To: References: <1458539732478-4718825.post@n4.nabble.com> Message-ID: <1459568291870-4719212.post@n4.nabble.com> Thank you, I did not even think of wrapping in ( ) for the ':=" scenario. Feeling a bit silly about that. Just fooling around I did discover that dt[, "new.var" := fun(var), by = something] does work. For some reason I had been wrapping that (go figure), i.e. dt[, .("new.var" := fun(var)), by = something] and that just gave error messages. Two ways to do it and I choose the wrong one both times. Sincerely appreciate the help Carl Sutton ----- Carl Sutton -- View this message in context: http://r.789695.n4.nabble.com/and-throwing-errors-tp4718825p4719212.html Sent from the datatable-help mailing list archive at Nabble.com. From slcox417 at gmail.com Tue Apr 5 16:54:10 2016 From: slcox417 at gmail.com (SamLC) Date: Tue, 5 Apr 2016 07:54:10 -0700 (PDT) Subject: [datatable-help] Fitting interaction term in GAMM with random effect Message-ID: <1459868050740-4719322.post@n4.nabble.com> Hi, I am trying to fit a model with a random effect of DeploymentID with a nested AR1 autoregressive correlation structure. For the fixed component I am fitting a smooth of tide. I have two sets of models I am fitting with different data sets. For the smooth of tide, I want a separate smooth to be fitted per SiteID. In one set of models this is fine (each SiteID contains multiple DeploymentIDs). In the other SiteID and DeploymentID are identical. I am wondering how to code this. I am not interested in the intercept of SiteID hence why it has previously been a random effect. I am interested in how smooths vary between SiteIDs and hence why this is a fixed effect. Example data structure first data set: SiteID DeploymentID 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 3 2 4 2 4 2 4 2 4 2 4 2 5 2 5 2 5 3 6 3 7 3 8 etc etc Example data structure seconddata set: SiteID DeploymentID 1 1 2 2 3 3 4 4 My problem is that I understand to fit a interaction term, one must use gamm(Y~s(tide,k=5,bs="cc",by=SiteID)+SiteID,knots=list(tide=c(0,1)),correlation=corAR1(form=~1|DeploymentID)..... - if I include +SiteID then I should NOT include DeploymentID as a random effect also (for the second model where SiteId and DeploymentID are identical - but this is ok for the first model)? -the problem is when I want to compare nested models I run into issues if the smooth term is dropped as I do not have a random or smooth term in the model. Can I code as gamm(Y~s(tide,k=5,bs="cc",by=SiteID),knots=list(tide=c(0,1)),random=list(DeploymentID=~1),correlation=corAR1(form=~1|DeploymentID)..... Any help on this is appreciated.... Cheers -- View this message in context: http://r.789695.n4.nabble.com/Fitting-interaction-term-in-GAMM-with-random-effect-tp4719322.html Sent from the datatable-help mailing list archive at Nabble.com. From sarahhas95 at gmail.com Wed Apr 6 09:03:56 2016 From: sarahhas95 at gmail.com (bananawhy) Date: Wed, 6 Apr 2016 00:03:56 -0700 (PDT) Subject: [datatable-help] Rearranging Dataframe for Stripchart Message-ID: <1459926236544-4719351.post@n4.nabble.com> Helloo, I'm only a beginner in R and am trying to rearrange my data set in order to plot 2 groups of numerical data in a strip chart. Currently, the table of my data looks like: > table(dengue) logTiter strain 0 2.42 4.18 4.43 4.68 5.9 6.29 6.45 8.04 9.37 10.85 11.31 WB1 3 1 0 2 0 1 0 0 1 0 0 0 wild 0 0 1 0 1 1 1 1 0 1 1 1 I am trying to represent the "logTiter" values in each group (WB1 and Wildtype) so that I can represent the data in a strip chart. Any advice? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Rearranging-Dataframe-for-Stripchart-tp4719351.html Sent from the datatable-help mailing list archive at Nabble.com. From carlos.falcao.correia at gmail.com Wed Apr 6 22:22:44 2016 From: carlos.falcao.correia at gmail.com (karl69) Date: Wed, 6 Apr 2016 13:22:44 -0700 (PDT) Subject: [datatable-help] RELSURV - difficulty interpreting RSADD and RSMUL coefficients Message-ID: <1459974164138-4719386.post@n4.nabble.com> Hi, allUsing relsurv package - rsadd and rsmul I got the following coefficients for a colorectal cancer study: Addictive - Est?ve Multiplicative - Andersen Coef (SE) Exp(coef) p-value Coef (SE) Exp(coef) p-valuesex female -0.06957 (0.0646) 0.9328 0.28145 0.51641 (0.0559) 1.6760 <2e-16Age 45-54 0.15426 (0.1668) 1.1668 0.35513 -0.67519 (0.1635) 0.5091 3.63e-05Age 55-64 0.27521 (0.1535) 1.3168 0.07300 -1.28001 (0.1497) 0.2780 <2e-16Age 65-74 0.39229 (0.1515) 1.4804 0.00963 -2.05659 (0.1279) 0.1279 <2e-16Age 75+ 0.84512 (0.1569) 2.3283 7.22e-08 -2.63199 (0.1488) 0.0719 <2e-16I can see that the variables that are used in lifetables, like sex and age changed its sign.I noticed, too, that for other covariates, like cancer stage, coefficients are bigger in addictive model than in multiplicative, but keeping the same sign.I'm not being able to interpret these differences. Can someone help?Thanks,Carlos Falcao -- View this message in context: http://r.789695.n4.nabble.com/RELSURV-difficulty-interpreting-RSADD-and-RSMUL-coefficients-tp4719386.html Sent from the datatable-help mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlos.falcao.correia at gmail.com Wed Apr 6 22:30:11 2016 From: carlos.falcao.correia at gmail.com (karl69) Date: Wed, 6 Apr 2016 13:30:11 -0700 (PDT) Subject: [datatable-help] How to use FLEXSURV or RSTPM2 with relative survival? Message-ID: <1459974610868-4719388.post@n4.nabble.com> Hi, allI worked with RELSURV package without great problems, but intending to use flexible parametric models in relative survival, I found FLEXSURV and RSTPM2 packages.With relsurv I use ratetables originated by lifetables data, but after reading flexsurv and rstpm2 packages I can't find out how to incorporate data from lifetables on these 2 packages because I didn't find nothing similar to relsurv ratetables.Did someone used those packages with relative survival data? Can someone help me on how to do that?Thanks,Carlos Falcao -- View this message in context: http://r.789695.n4.nabble.com/How-to-use-FLEXSURV-or-RSTPM2-with-relative-survival-tp4719388.html Sent from the datatable-help mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik at ofb.net Thu Apr 7 22:13:53 2016 From: frederik at ofb.net (frederik at ofb.net) Date: Thu, 7 Apr 2016 13:13:53 -0700 Subject: [datatable-help] numeric rounding for 'order' Message-ID: <20160407201353.GF7159@ofb.net> Sorry, I forgot to Cc the list for this. Arunkumar, do you have an answer? You said: > If you?ve a better idea, please let us know and we would definitely be > willing to implement that. and I said > My "better idea" at this point is, if speed is not an issue, then > 'order' could use a numeric rounding of zero. (see below) Thank you, Frederick ----- Forwarded message from frederik at ofb.net ----- Date: Wed, 27 Jan 2016 15:52:25 -0800 From: frederik at ofb.net To: Arunkumar Srinivasan Subject: Re: [datatable-help] sorting on a floating point column X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Level: User-Agent: Mutt/1.5.24 (2015-08-30) X-My-Tags: inbox Thanks Arun for your reply. The '?order' page says: Columns of ?numeric? types (i.e., ?double?) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. Have a look at ?setNumericRounding? to learn more. But I'm not sure what unexpected behavior this avoids. It seems like it *causes* unexpected behavior (even if I'm the first to comment in two years)... And '?setNumericRounding' says: Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; So it sounds like the cases where you benefit from numeric rounding are "joining or grouping", not in sorting. My "better idea" at this point is, if speed is not an issue, then 'order' could use a numeric rounding of zero. Alternatively, I would expand upon the '?order' documentation to clarify that the reason for rounding is, for example, speed - and not the elimination of "unexpected behavior". Thank you, Frederick On Thu, Jan 28, 2016 at 12:10:37AM +0100, Arunkumar Srinivasan wrote: > Why do you want a minimal test case, when setNumericRounding explains? > that the behavior I reported is intentional?? > Because you refer to a post that?s quite a few years old, and data.table has moved along from ?tolerance? quite some time ago. And therefore it wasn?t clear to me what the exact issue is ? whether you?re using an older version or a newer one, but you dint know that it wasn?t due to tolerance issue. > > I now see that this is also documented in the data.table::order page.? > So I guess it is already "documented visibly".? > Glad you got to read that. > > And setNumericRounding explains that it is slightly faster to ignore? > the last two bytes, requiring fewer radix sort passes.? > That?s not the reason for the function though, as it?s explained in `?setNumericRounding` with examples at the bottom of that page.? > > I wanted to share my experience that this behavior is confusing. > With floating point numbers, there?s always limitations. I find the examples under ?setNumericRounding confusing cases as well (which would return wrong results if we did not round). We try to reduce confusion by managing most obvious cases, or so we think. If you?ve a better idea, please let us know and we would definitely be willing to implement that. > --? > Arun > > On 28 January 2016 at 00:03:19, frederik at ofb.net (frederik at ofb.net) wrote: > > data.table 1.9.6 > > What's surprising is that sorting a list of floats wouldn't do the > obvious thing, and sort them exactly. Is it surprising that this would > be surprising? > > Why do you want a minimal test case, when setNumericRounding explains > that the behavior I reported is intentional? > > I now see that this is also documented in the data.table::order page. > So I guess it is already "documented visibly". > > And setNumericRounding explains that it is slightly faster to ignore > the last two bytes, requiring fewer radix sort passes. > > I wanted to share my experience that this behavior is confusing. Thank > you at least for pointing me to your documentation. > > Frederick > > On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote: > > This is following up on a thread from a couple years ago:? > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html? > > Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page:?https://github.com/Rdatatable/data.table > > > > I ran into this problem myself, it took a bit of time to debug because?it is so surprising.? > > What?s surprising? Reproducible example please. data.table package version, R version as well please.? > > Without that my best guess is for you to look at `?setNumericRounding`. > > > > --? > > Arun > > > > On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote: > > > > This is following up on a thread from a couple years ago: > > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > > > I ran into this problem myself, it took a bit of time to debug because > > it is so surprising. > > > > In my case, I was using order() to sort a list of floats. > > > > I expected the result to be monotonic but it wasn't! > > > > Then I found out that the problem was due to 'order' being part of the > > data.table library. By using base::order, I was able to get correct > > behavior. > > > > I don't understand why improperly ordering floating point data helps > > the data.table library accomplish anything, whether it is looking up > > keys or what. > > > > Also, it must be much slower to compare floats with a tolerance, than > > to just compare them. I seem to recall that floats were designed so > > that normal comparison is quite fast. > > > > Please fix this bug, or at least document it more visibly. > > > > Thank you, > > > > Frederick Eaton > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help ----- End forwarded message ----- From timvanderstap89 at gmail.com Fri Apr 8 21:00:41 2016 From: timvanderstap89 at gmail.com (TimvdStap) Date: Fri, 8 Apr 2016 12:00:41 -0700 (PDT) Subject: [datatable-help] Correcting for transience in mark-recapture data with R In-Reply-To: References: <1459551931821-4719206.post@n4.nabble.com> Message-ID: <1460142041949-4719498.post@n4.nabble.com> Hi everyone, Small update: I think the (main) problem is with estimation of M (= total # of marked individuals). By definition, M in year /t/ should always be equal or greater than the M in year /t-1/, and for the final year M should be /total number of rows in the dataset/, right? However, the M in my final year is *lower* than the M in previous years, most likely because both /z/ and /r/ are 0 for the final year. Is my reasoning correct and, if so, any suggestions on how I can correct for this? Again, many thanks in advance! ~Tim -- View this message in context: http://r.789695.n4.nabble.com/Correcting-for-transience-in-mark-recapture-data-with-R-tp4719206p4719498.html Sent from the datatable-help mailing list archive at Nabble.com. From aragorn168b at gmail.com Mon Apr 11 12:42:08 2016 From: aragorn168b at gmail.com (Arunkumar Srinivasan) Date: Mon, 11 Apr 2016 12:42:08 +0200 Subject: [datatable-help] numeric rounding for 'order' In-Reply-To: <20160407201353.GF7159@ofb.net> References: <20160407201353.GF7159@ofb.net> Message-ID: Hi Frederik, the reason this was implemented is to avoid issues like this (copied from ?setNumericRounding), which IIRC I pointed to you before: DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a") DT setNumericRounding(0) # turn off rounding DT[.(0.4)] # works DT[.(0.6)] # no match, confusing since 0.6 is clearly there in DT So while numeric rounding of ?0? solves your issue, it still persists on other cases (like the one shown above).? Also you seem to be suggesting to use this *only* for order(). Why? Why not ?setorder()? or ?setkey()?? FYI, speed is/was never really an issue and is just a (positive) side-effect. I see two options: 1. Identify, if possible, clearly and set the rounding appropriately so that we run into this issue very rarely. i.e., ad-hoc numeric rounding. 2. If it is not possible, then, rounding last two bytes really doesn?t solve *most* issues w.r.t. rounding (which was its original purpose), as? opposed to without any rounding.. in which case, there?s no need for setNumericRounding, so that we can attribute the inconsistencies? to floating point representation inaccuracies. Having had my share of experiences with floating point issues, my guess would be the latter. Perhaps better to continue on the github project? page (if you could please file an issue there with a minimal example of *your* problem). --? Arun On 7 April 2016 at 22:14:08, frederik at ofb.net (frederik at ofb.net) wrote: Sorry, I forgot to Cc the list for this. Arunkumar, do you have an answer? You said: > If you?ve a better idea, please let us know and we would definitely be > willing to implement that. and I said > My "better idea" at this point is, if speed is not an issue, then > 'order' could use a numeric rounding of zero. (see below) Thank you, Frederick ----- Forwarded message from frederik at ofb.net ----- Date: Wed, 27 Jan 2016 15:52:25 -0800 From: frederik at ofb.net To: Arunkumar Srinivasan Subject: Re: [datatable-help] sorting on a floating point column X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Level: User-Agent: Mutt/1.5.24 (2015-08-30) X-My-Tags: inbox Thanks Arun for your reply. The '?order' page says: Columns of ?numeric? types (i.e., ?double?) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. Have a look at ?setNumericRounding? to learn more. But I'm not sure what unexpected behavior this avoids. It seems like it *causes* unexpected behavior (even if I'm the first to comment in two years)... And '?setNumericRounding' says: Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; So it sounds like the cases where you benefit from numeric rounding are "joining or grouping", not in sorting. My "better idea" at this point is, if speed is not an issue, then 'order' could use a numeric rounding of zero. Alternatively, I would expand upon the '?order' documentation to clarify that the reason for rounding is, for example, speed - and not the elimination of "unexpected behavior". Thank you, Frederick On Thu, Jan 28, 2016 at 12:10:37AM +0100, Arunkumar Srinivasan wrote: > Why do you want a minimal test case, when setNumericRounding explains? > that the behavior I reported is intentional?? > Because you refer to a post that?s quite a few years old, and data.table has moved along from ?tolerance? quite some time ago. And therefore it wasn?t clear to me what the exact issue is ? whether you?re using an older version or a newer one, but you dint know that it wasn?t due to tolerance issue. > > I now see that this is also documented in the data.table::order page.? > So I guess it is already "documented visibly".? > Glad you got to read that. > > And setNumericRounding explains that it is slightly faster to ignore? > the last two bytes, requiring fewer radix sort passes.? > That?s not the reason for the function though, as it?s explained in `?setNumericRounding` with examples at the bottom of that page.? > > I wanted to share my experience that this behavior is confusing. > With floating point numbers, there?s always limitations. I find the examples under ?setNumericRounding confusing cases as well (which would return wrong results if we did not round). We try to reduce confusion by managing most obvious cases, or so we think. If you?ve a better idea, please let us know and we would definitely be willing to implement that. > --? > Arun > > On 28 January 2016 at 00:03:19, frederik at ofb.net (frederik at ofb.net) wrote: > > data.table 1.9.6 > > What's surprising is that sorting a list of floats wouldn't do the > obvious thing, and sort them exactly. Is it surprising that this would > be surprising? > > Why do you want a minimal test case, when setNumericRounding explains > that the behavior I reported is intentional? > > I now see that this is also documented in the data.table::order page. > So I guess it is already "documented visibly". > > And setNumericRounding explains that it is slightly faster to ignore > the last two bytes, requiring fewer radix sort passes. > > I wanted to share my experience that this behavior is confusing. Thank > you at least for pointing me to your documentation. > > Frederick > > On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote: > > This is following up on a thread from a couple years ago:? > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html? > > Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page:?https://github.com/Rdatatable/data.table > > > > I ran into this problem myself, it took a bit of time to debug because?it is so surprising.? > > What?s surprising? Reproducible example please. data.table package version, R version as well please.? > > Without that my best guess is for you to look at `?setNumericRounding`. > > > > --? > > Arun > > > > On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote: > > > > This is following up on a thread from a couple years ago: > > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > > > I ran into this problem myself, it took a bit of time to debug because > > it is so surprising. > > > > In my case, I was using order() to sort a list of floats. > > > > I expected the result to be monotonic but it wasn't! > > > > Then I found out that the problem was due to 'order' being part of the > > data.table library. By using base::order, I was able to get correct > > behavior. > > > > I don't understand why improperly ordering floating point data helps > > the data.table library accomplish anything, whether it is looking up > > keys or what. > > > > Also, it must be much slower to compare floats with a tolerance, than > > to just compare them. I seem to recall that floats were designed so > > that normal comparison is quite fast. > > > > Please fix this bug, or at least document it more visibly. > > > > Thank you, > > > > Frederick Eaton > > _______________________________________________ > > datatable-help mailing list > > datatable-help at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help ----- End forwarded message ----- _______________________________________________ datatable-help mailing list datatable-help at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas_g at gmx.ch Mon Apr 11 19:04:38 2016 From: thomas_g at gmx.ch (ThomasG) Date: Mon, 11 Apr 2016 10:04:38 -0700 (PDT) Subject: [datatable-help] Extract residual standard error from many lms Message-ID: <1460394278125-4719613.post@n4.nabble.com> Hi all I am struggeling to summarise all the residual standard errors im of my lm that contains many models. I have already posted my question here with some code example: http://stackoverflow.com/questions/36490215/creating-a-customised-results-table-from-many-linear-models It seems that the package broom should provide a solution but I could not find it myself. Thank you for your help! -- View this message in context: http://r.789695.n4.nabble.com/Extract-residual-standard-error-from-many-lms-tp4719613.html Sent from the datatable-help mailing list archive at Nabble.com. From kaheil at gmail.com Mon Apr 11 19:37:26 2016 From: kaheil at gmail.com (Yasir Kaheil) Date: Mon, 11 Apr 2016 17:37:26 +0000 Subject: [datatable-help] Extract residual standard error from many lms In-Reply-To: <1460394278125-4719613.post@n4.nabble.com> References: <1460394278125-4719613.post@n4.nabble.com> Message-ID: Use summary (model object) Le lun. 11 avr. 2016 1:36 PM, ThomasG a ?crit : > Hi all > I am struggeling to summarise all the residual standard errors im of my lm > that contains many models. > I have already posted my question here with some code example: > > http://stackoverflow.com/questions/36490215/creating-a-customised-results-table-from-many-linear-models > It seems that the package broom should provide a solution but I could not > find it myself. > Thank you for your help! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Extract-residual-standard-error-from-many-lms-tp4719613.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik at ofb.net Mon Apr 11 22:37:55 2016 From: frederik at ofb.net (frederik at ofb.net) Date: Mon, 11 Apr 2016 13:37:55 -0700 Subject: [datatable-help] numeric rounding for 'order' In-Reply-To: References: <20160407201353.GF7159@ofb.net> Message-ID: <20160411203755.GN7159@ofb.net> Hi Arun, I wrote up a github issue here: https://github.com/Rdatatable/data.table/issues/1642 Thanks, Frederick On Mon, Apr 11, 2016 at 12:42:08PM +0200, Arunkumar Srinivasan wrote: > Hi Frederik, the reason this was implemented is to avoid issues like this (copied from ?setNumericRounding), which IIRC I pointed to you before: > DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a") > DT > setNumericRounding(0) # turn off rounding > DT[.(0.4)] # works > DT[.(0.6)] # no match, confusing since 0.6 is clearly there in DT > So while numeric rounding of ?0? solves your issue, it still persists on other cases (like the one shown above).? > Also you seem to be suggesting to use this *only* for order(). Why? Why not ?setorder()? or ?setkey()?? > FYI, speed is/was never really an issue and is just a (positive) side-effect. > > I see two options: > > 1. Identify, if possible, clearly and set the rounding appropriately so that we run into this issue very rarely. i.e., ad-hoc numeric rounding. > 2. If it is not possible, then, rounding last two bytes really doesn?t solve *most* issues w.r.t. rounding (which was its original purpose), as? > opposed to without any rounding.. in which case, there?s no need for setNumericRounding, so that we can attribute the inconsistencies? > to floating point representation inaccuracies. > > Having had my share of experiences with floating point issues, my guess would be the latter. Perhaps better to continue on the github project? > page (if you could please file an issue there with a minimal example of *your* problem). > > --? > Arun > > On 7 April 2016 at 22:14:08, frederik at ofb.net (frederik at ofb.net) wrote: > > Sorry, I forgot to Cc the list for this. > > Arunkumar, do you have an answer? You said: > > > If you?ve a better idea, please let us know and we would definitely be > > willing to implement that. > > and I said > > > My "better idea" at this point is, if speed is not an issue, then > > 'order' could use a numeric rounding of zero. > > (see below) > > Thank you, > > Frederick > > > > ----- Forwarded message from frederik at ofb.net ----- > > Date: Wed, 27 Jan 2016 15:52:25 -0800 > From: frederik at ofb.net > To: Arunkumar Srinivasan > Subject: Re: [datatable-help] sorting on a floating point column > X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no > version=3.4.1 > X-Spam-Level: > User-Agent: Mutt/1.5.24 (2015-08-30) > X-My-Tags: inbox > > Thanks Arun for your reply. The '?order' page says: > > Columns of ?numeric? types (i.e., ?double?) have their last two > bytes rounded off while computing order, by defalult, to avoid any > unexpected behaviour due to limitations in representing floating > point numbers precisely. Have a look at ?setNumericRounding? to > learn more. > > But I'm not sure what unexpected behavior this avoids. It seems like > it *causes* unexpected behavior (even if I'm the first to comment in > two years)... And '?setNumericRounding' says: > > Computers cannot represent some floating point numbers (such as > 0.6) precisely, using base 2. This leads to unexpected behaviour > when joining or grouping columns of type 'numeric'; > > So it sounds like the cases where you benefit from numeric rounding > are "joining or grouping", not in sorting. My "better idea" at this > point is, if speed is not an issue, then 'order' could use a numeric > rounding of zero. Alternatively, I would expand upon the '?order' > documentation to clarify that the reason for rounding is, for example, > speed - and not the elimination of "unexpected behavior". > > Thank you, > > Frederick > > On Thu, Jan 28, 2016 at 12:10:37AM +0100, Arunkumar Srinivasan wrote: > > Why do you want a minimal test case, when setNumericRounding explains? > > that the behavior I reported is intentional?? > > Because you refer to a post that?s quite a few years old, and data.table has moved along from ?tolerance? quite some time ago. And therefore it wasn?t clear to me what the exact issue is ? whether you?re using an older version or a newer one, but you dint know that it wasn?t due to tolerance issue. > > > > I now see that this is also documented in the data.table::order page.? > > So I guess it is already "documented visibly".? > > Glad you got to read that. > > > > And setNumericRounding explains that it is slightly faster to ignore? > > the last two bytes, requiring fewer radix sort passes.? > > That?s not the reason for the function though, as it?s explained in `?setNumericRounding` with examples at the bottom of that page.? > > > > I wanted to share my experience that this behavior is confusing. > > With floating point numbers, there?s always limitations. I find the examples under ?setNumericRounding confusing cases as well (which would return wrong results if we did not round). We try to reduce confusion by managing most obvious cases, or so we think. If you?ve a better idea, please let us know and we would definitely be willing to implement that. > > --? > > Arun > > > > On 28 January 2016 at 00:03:19, frederik at ofb.net (frederik at ofb.net) wrote: > > > > data.table 1.9.6 > > > > What's surprising is that sorting a list of floats wouldn't do the > > obvious thing, and sort them exactly. Is it surprising that this would > > be surprising? > > > > Why do you want a minimal test case, when setNumericRounding explains > > that the behavior I reported is intentional? > > > > I now see that this is also documented in the data.table::order page. > > So I guess it is already "documented visibly". > > > > And setNumericRounding explains that it is slightly faster to ignore > > the last two bytes, requiring fewer radix sort passes. > > > > I wanted to share my experience that this behavior is confusing. Thank > > you at least for pointing me to your documentation. > > > > Frederick > > > > On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote: > > > This is following up on a thread from a couple years ago:? > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html? > > > Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page:?https://github.com/Rdatatable/data.table > > > > > > I ran into this problem myself, it took a bit of time to debug because?it is so surprising.? > > > What?s surprising? Reproducible example please. data.table package version, R version as well please.? > > > Without that my best guess is for you to look at `?setNumericRounding`. > > > > > > --? > > > Arun > > > > > > On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote: > > > > > > This is following up on a thread from a couple years ago: > > > > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > > > > > I ran into this problem myself, it took a bit of time to debug because > > > it is so surprising. > > > > > > In my case, I was using order() to sort a list of floats. > > > > > > I expected the result to be monotonic but it wasn't! > > > > > > Then I found out that the problem was due to 'order' being part of the > > > data.table library. By using base::order, I was able to get correct > > > behavior. > > > > > > I don't understand why improperly ordering floating point data helps > > > the data.table library accomplish anything, whether it is looking up > > > keys or what. > > > > > > Also, it must be much slower to compare floats with a tolerance, than > > > to just compare them. I seem to recall that floats were designed so > > > that normal comparison is quite fast. > > > > > > Please fix this bug, or at least document it more visibly. > > > > > > Thank you, > > > > > > Frederick Eaton > > > _______________________________________________ > > > datatable-help mailing list > > > datatable-help at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > ----- End forwarded message ----- > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help From borodegadega at yahoo.com Mon Apr 18 06:36:18 2016 From: borodegadega at yahoo.com (dkkim) Date: Sun, 17 Apr 2016 21:36:18 -0700 (PDT) Subject: [datatable-help] Homework Help R Beginner In-Reply-To: <1460050446869-4719436.post@n4.nabble.com> References: <1460050446869-4719436.post@n4.nabble.com> Message-ID: <1460954178238-4719824.post@n4.nabble.com> You could do the following: 1: Open your txt file using read.table or read.csv command 2: Open your dat file (I am not sure which syntax you can use in R) 3: Now you can combine the columns from both file into one for example MyCombineDataSet <- cbind(txt$column1,dat$column1..) etc 4. You can save your data using the following: write.csv("MyCombineDataSet", file="MyCombineDataset.csv") 5. The file will be at your working directory Hope this help -- View this message in context: http://r.789695.n4.nabble.com/Homework-Help-R-Beginner-tp4719436p4719824.html Sent from the datatable-help mailing list archive at Nabble.com. From danilo.malara83 at gmail.com Wed Apr 20 05:21:59 2016 From: danilo.malara83 at gmail.com (Danilo83) Date: Tue, 19 Apr 2016 20:21:59 -0700 (PDT) Subject: [datatable-help] Multiple camparison treatments gainst 2 controls Message-ID: <1461122519725-4719918.post@n4.nabble.com> Dear all, I'm new in "r" program and to statistic in general. I have some data from a time course experiment using different concetration of drugs. In total i have 2 controls and 4 treatments I would like to compare each treatment to each controls. The data are in triplicate (3 independent samples) took every 30 min up to 6 hours and at time 12 and 24. I need to plot them using the means and then run a multicomparison test against controls (Dunnett). I think that the library "multicomp" should do what i need. however i'm not very familiar with the script, anyone can give me some reference or example of cript that i can modify using my data? Thanks Danilo -- View this message in context: http://r.789695.n4.nabble.com/Multiple-camparison-treatments-gainst-2-controls-tp4719918.html Sent from the datatable-help mailing list archive at Nabble.com.