From dd.brigalla at gmail.com Mon May 1 11:03:16 2017 From: dd.brigalla at gmail.com (dandrea) Date: Mon, 1 May 2017 02:03:16 -0700 (PDT) Subject: [datatable-help] Competing Risk Nomogram Message-ID: <1493629396511-4733256.post@n4.nabble.com> Dear R users, I have been using STATA for all my biostatistical analyses. For my new project I needed a competing risk nomogram and switched to R. Sadly I am not able to produce the nomogram. I have a dataset of bladder cancer, and I want to build the nomogram for the prediction of progression after 2 and 5 years. I first run a cox regression >library(plyr) >VH$SurvObj <- with(Surv(TimetoProg, Progression == 1)) >res.cox1 <- coxph(SurvObj ~ ConcomitantCIS + Tumorsize + Multifocal + LVI + VH, data = VH) >res.cox1 Call: coxph(formula = SurvObj ~ ConcomitantCIS + Tumorsize + Multifocal + LVI + VH, data = VH) coef exp(coef) se(coef) z p ConcomitantCIS1 -0.115 0.891 0.276 -0.42 0.6769 Tumorsize1 0.403 1.496 0.158 2.54 0.0110 Multifocal1 0.417 1.518 0.160 2.61 0.0091 LVI1 1.196 3.306 0.172 6.94 4e-12 VH1 1.921 6.826 0.160 12.00 <2e-16 First question: why do I not get a reasonable p value for LVI and VH? >library(rms) > mynom <- svycox.nomogram(.design = SurvObj, .model = Surv(TimetoProg, > Progression==1) ~ ConcomitantCIS + Tumorsize + Multifocal + LVI + VH, > .data = VHtrainset, pred.at = 24, fun.lab = "2yr Prob") Error: $ operator is invalid for atomic vectors and now I am really stuck! I would really appreciate any help! David -- View this message in context: http://r.789695.n4.nabble.com/Competing-Risk-Nomogram-tp4733256.html Sent from the datatable-help mailing list archive at Nabble.com. From yarmi1224 at hotmail.com Thu May 4 08:37:23 2017 From: yarmi1224 at hotmail.com (Eva Chiou) Date: Wed, 3 May 2017 23:37:23 -0700 (PDT) Subject: [datatable-help] How do I use R to build a dictionary of proper nouns? Message-ID: <1493879843940-4733354.post@n4.nabble.com> I want to do patents text mining in R. I need to use the proper nouns of domain ontology to build a dictionary. Then use the dictionary to analysis my corpus of patent files. I want to calculate the proper nouns and get the word frequency that appears in each file. Now I have done the preprocess for the corpus and extract the proper nouns from domain ontology. But I have no idea how to build a proper nouns dictionary and use the dictionary to analysis my corpus. The following are my texts, corpus preprocesses and proper nouns. my patent text corpus preprocesses proper nouns from domain ontology -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-R-to-build-a-dictionary-of-proper-nouns-tp4733354.html Sent from the datatable-help mailing list archive at Nabble.com. From emily1858 at gmail.com Fri May 5 14:08:55 2017 From: emily1858 at gmail.com (eg1858) Date: Fri, 5 May 2017 05:08:55 -0700 (PDT) Subject: [datatable-help] Linear Regression problem Message-ID: <1493986135744-4733405.post@n4.nabble.com> Hello, I was assigned a problem for a math class that involves coding in R. I have very little experience and cant make this work. Question: The code below produces a dataset of size n = 20 containing a random variable X from a uniform distribution and a random variable Y from a normal distribution. Clearly, X and Y are independently generated. x <- runif(20, 0, 1) y <- rnorm(20, 2, 2) 1. Generate 100 different datasets using the above code each of size n = 20. You get to observe only the generated datasets (and assume variance is unknown). My attempt model <- NULL LM <-list() x <- runif(20,0,1) y <- rnorm(20,2,2) #Generate 100 different datasets with n=20 for(i in 1:100) { model<- lm(y~x) LM[[i]] <- model print(summary(LM[[i]])$coefficient)[2,1] } summary(LM[[i]]) Any help? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Linear-Regression-problem-tp4733405.html Sent from the datatable-help mailing list archive at Nabble.com. From s3tochri at uni-bayreuth.de Thu May 11 08:21:20 2017 From: s3tochri at uni-bayreuth.de (Tobic89) Date: Wed, 10 May 2017 23:21:20 -0700 (PDT) Subject: [datatable-help] Add specific trend to regression - Fixed Effects Regression Message-ID: <1494483680651-4733658.post@n4.nabble.com> Hey guys, I am currently trying to add an object specific trend to a fixed effects regression. With STATA it is easy but not with R. For the regression I am using the plm-package. Is it possible to use plm or better another package? As underlying data I have panel-data for different cities. So the time trend has to be city-specific. Hopefully you can help me. All the best, Tobi -- View this message in context: http://r.789695.n4.nabble.com/Add-specific-trend-to-regression-Fixed-Effects-Regression-tp4733658.html Sent from the datatable-help mailing list archive at Nabble.com. From s3tochri at uni-bayreuth.de Thu May 11 11:58:06 2017 From: s3tochri at uni-bayreuth.de (Tobic89) Date: Thu, 11 May 2017 02:58:06 -0700 (PDT) Subject: [datatable-help] Error: invalid type (closure) for the variable 'time' Message-ID: <1494496686944-4733663.post@n4.nabble.com> Hey, I just have trouble running a FE-Regression with the plm-package. I recieve the following error: "Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, : invalid type for the variable 'time' " Do you have an idea how to fix it? I used the formula: -- View this message in context: http://r.789695.n4.nabble.com/Error-invalid-type-closure-for-the-variable-time-tp4733663.html Sent from the datatable-help mailing list archive at Nabble.com. From vedahung1116 at gmail.com Fri May 12 17:16:04 2017 From: vedahung1116 at gmail.com (Veda) Date: Fri, 12 May 2017 08:16:04 -0700 (PDT) Subject: [datatable-help] inconsistency between loadings and coefficient in plsr Message-ID: <1494602164703-4733715.post@n4.nabble.com> Hello experts, My experiment had 13 experimental variables and 1 dependent variable and the data were collected from 30 participants. Because the 13 experimental variables are highly correlated with each other, I use PLS to extract important factors from those variables to account for the dependent variable. This is my model: # determine the number of component ncomp=selectNcomp(plsr(data=input,trimRT~var1+var2+var3+....var13,5,validation='CV',scale=TRUE), "randomization",alpha=0.05) # feed the number of component to function to calculate loading and coefficients plsr(data=input,trimRT~var1+var2+var3+....var13,ncomp,validation='CV',scale=TRUE) I have three question regarding plsr(): 1. Should I let the model know the dependent variable (predicted variable) collected from difference people? If so, how should I code the information of subject ID in the following function? 2. Some variables are repeated-measures and some are not. How should I code this information in the function? 3. The results from loading of predictors in one factor did not match the results of coefficients (please see attached figures). For instance, given that the variable 10 had higher loading than other variables in component 5, I expected to see the coefficient of the variable 10 was higher in terms of magnitude (regardless the sign) than other variables in component 5. Why is it not the case? Your inputs are appreciated. Thanks. Best, Veda -- View this message in context: http://r.789695.n4.nabble.com/inconsistency-between-loadings-and-coefficient-in-plsr-tp4733715.html Sent from the datatable-help mailing list archive at Nabble.com. From krzysztof.czauderna at coi.pl Sat May 13 21:07:53 2017 From: krzysztof.czauderna at coi.pl (repidemiologist) Date: Sat, 13 May 2017 12:07:53 -0700 (PDT) Subject: [datatable-help] Select all districts within a 100 km radius of a district Message-ID: <1494702473058-4733799.post@n4.nabble.com> Dear R Users! I need to limit my dataset to all districts within a 100 km radius of a selected district (geographical area of interest). I have many variables in the dataset and among them geographical coordinates of centroids, e.g.: Longitude Latitude 1 -61.68667 17.02444 2 -61.88722 17.10527 3 -61.79445 17.16333 4 -61.68667 17.02444 5 -61.72917 17.60861 ... Now, I need in the dataset only these observations (districts) which are in the radius of 100 km around the first one (-61.68667 17.02444). How to do this? Can you help me? Many thanks... -- View this message in context: http://r.789695.n4.nabble.com/Select-all-districts-within-a-100-km-radius-of-a-district-tp4733799.html Sent from the datatable-help mailing list archive at Nabble.com. From bioglp at gmail.com Sat May 13 22:35:14 2017 From: bioglp at gmail.com (glaporta) Date: Sat, 13 May 2017 13:35:14 -0700 (PDT) Subject: [datatable-help] Select all districts within a 100 km radius of a district In-Reply-To: <1494702473058-4733799.post@n4.nabble.com> References: <1494702473058-4733799.post@n4.nabble.com> Message-ID: <1494707714547-4733801.post@n4.nabble.com> Hi, you can add a new distance column and apply a filter to it. I hope this help, Gianandrea coord <- 'your data frame' library(geosphere) dist <- vector() for(i in 1:5){ dist.tmp <- (distm(coord[1,],coord[i,],fun = distHaversine)) dist <- c(dist,dist.tmp) } coord$dist <- dist coord[coord$dist>100000,] -- View this message in context: http://r.789695.n4.nabble.com/Select-all-districts-within-a-100-km-radius-of-a-district-tp4733799p4733801.html Sent from the datatable-help mailing list archive at Nabble.com. From panugu at umc.edu Thu May 18 23:05:19 2017 From: panugu at umc.edu (panugu) Date: Thu, 18 May 2017 14:05:19 -0700 (PDT) Subject: [datatable-help] had non-zero exit status Message-ID: <1495141519962-4734085.post@n4.nabble.com> I tried to install tidyr package on R version 3.1.0 and getting below error message. Please advise. The downloaded source packages are in '/tmp/RtmpmKd1bj/downloaded_packages' Updating HTML index of packages in '.Library' Making 'packages.html' ... done Warning messages: 1: In install.packages("tidyr") : installation of package 'rlang' had non-zero exit status 2: In install.packages("tidyr") : installation of package 'tibble' had non-zero exit status 3: In install.packages("tidyr") : installation of package 'tidyr' had non-zero exit status -- View this message in context: http://r.789695.n4.nabble.com/had-non-zero-exit-status-tp4734085.html Sent from the datatable-help mailing list archive at Nabble.com. From panugu at umc.edu Fri May 19 14:30:11 2017 From: panugu at umc.edu (panugu) Date: Fri, 19 May 2017 05:30:11 -0700 (PDT) Subject: [datatable-help] had non-zero exit status In-Reply-To: <1495141519962-4734085.post@n4.nabble.com> References: <1495141519962-4734085.post@n4.nabble.com> Message-ID: <1495197011540-4734109.post@n4.nabble.com> Ran install.packages("tidyr", dependencies=TRUE) getting below error message. Please advise. R version 3.1.0 pic -g -O2 -c splice.c -o splice.o In file included from splice.c:2: vector.h: In function 'namespace_rlang_sym': vector.h:94: error: 'R_DoubleColonSymbol' undeclared (first use in this function) vector.h:94: error: (Each undeclared identifier is reported only once vector.h:94: error: for each function it appears in.) make: *** [splice.o] Error 1 ERROR: compilation failed for package 'rlang' * removing '/usr/local/lib64/R/library/rlang' ERROR: dependency 'rlang' is not available for package 'tibble' * removing '/usr/local/lib64/R/library/tibble' ERROR: dependencies 'tibble', 'dplyr' are not available for package 'tidyr' * removing '/usr/local/lib64/R/library/tidyr' -- View this message in context: http://r.789695.n4.nabble.com/had-non-zero-exit-status-tp4734085p4734109.html Sent from the datatable-help mailing list archive at Nabble.com. From fperickson at wisc.edu Fri May 19 16:17:13 2017 From: fperickson at wisc.edu (Frank Erickson) Date: Fri, 19 May 2017 10:17:13 -0400 Subject: [datatable-help] had non-zero exit status In-Reply-To: <1495197011540-4734109.post@n4.nabble.com> References: <1495141519962-4734085.post@n4.nabble.com> <1495197011540-4734109.post@n4.nabble.com> Message-ID: This is a mailing list for the data.table package. Have a look at other resources: https://www.r-project.org/help.html On Fri, May 19, 2017 at 8:30 AM, panugu wrote: > Ran > install.packages("tidyr", dependencies=TRUE) > > getting below error message. Please advise. R version 3.1.0 > pic -g -O2 -c splice.c -o splice.o > In file included from splice.c:2: > vector.h: In function 'namespace_rlang_sym': > vector.h:94: error: 'R_DoubleColonSymbol' undeclared (first use in this > function) > vector.h:94: error: (Each undeclared identifier is reported only once > vector.h:94: error: for each function it appears in.) > make: *** [splice.o] Error 1 > ERROR: compilation failed for package 'rlang' > * removing '/usr/local/lib64/R/library/rlang' > ERROR: dependency 'rlang' is not available for package 'tibble' > * removing '/usr/local/lib64/R/library/tibble' > ERROR: dependencies 'tibble', 'dplyr' are not available for package 'tidyr' > * removing '/usr/local/lib64/R/library/tidyr' > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/ > had-non-zero-exit-status-tp4734085p4734109.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > datatable-help at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/datatable-help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From crosspide at hotmail.com Tue May 23 13:42:35 2017 From: crosspide at hotmail.com (agent dunham) Date: Tue, 23 May 2017 04:42:35 -0700 (PDT) Subject: [datatable-help] cluster - daisy - date variable Message-ID: <1495539755260-4734302.post@n4.nabble.com> Dear community, I want to perform cluster annalysis. My variables are mixed, and one of them of date-type. I've thought of gower distance, and the function daisy to compute the dissimilarity distance. What should I write for the date variable in the daisy-type argument? Thanks in advance, -- View this message in context: http://r.789695.n4.nabble.com/cluster-daisy-date-variable-tp4734302.html Sent from the datatable-help mailing list archive at Nabble.com. From super_jak1985 at gmx.de Thu May 25 16:35:51 2017 From: super_jak1985 at gmx.de (Fabian Werner) Date: Thu, 25 May 2017 16:35:51 +0200 Subject: [datatable-help] data.table global and local scope mixture Message-ID: An HTML attachment was scrubbed... URL: From crosspide at hotmail.com Wed May 31 15:15:55 2017 From: crosspide at hotmail.com (agent dunham) Date: Wed, 31 May 2017 06:15:55 -0700 (PDT) Subject: [datatable-help] kproto - clustMixType - optimal number of clusters Message-ID: <1496236555945-4735538.post@n4.nabble.com> Dear community, I've a dataset of 430000 rows, and 6 columns (1 continuous, 4 nominal, 1 ordinal). I'm trying to cluster this data via kproto. How can I estimate the optimal number of clusters? I haven't found anything at clustMixType. Is there anything at any other package? Thanks in advance, -- View this message in context: http://r.789695.n4.nabble.com/kproto-clustMixType-optimal-number-of-clusters-tp4735538.html Sent from the datatable-help mailing list archive at Nabble.com.