From dd.brigalla at gmail.com  Mon May  1 11:03:16 2017
From: dd.brigalla at gmail.com (dandrea)
Date: Mon, 1 May 2017 02:03:16 -0700 (PDT)
Subject: [datatable-help] Competing Risk Nomogram
Message-ID: <1493629396511-4733256.post@n4.nabble.com>

Dear R users,
I have been using STATA for all my biostatistical analyses. For my new
project I needed a competing risk nomogram and switched to R. Sadly I am not
able to produce the nomogram.
I have a dataset of bladder cancer, and I want to build the nomogram for the
prediction of progression after 2 and 5 years. I first run a cox regression

>library(plyr)
>VH$SurvObj <- with(Surv(TimetoProg, Progression == 1))
>res.cox1 <- coxph(SurvObj ~ ConcomitantCIS + Tumorsize + Multifocal + LVI +
VH, data = VH)
>res.cox1
Call:
coxph(formula = SurvObj ~ ConcomitantCIS + Tumorsize + Multifocal + 
    LVI + VH, data = VH)

                  coef exp(coef) se(coef)     z      p
ConcomitantCIS1 -0.115     0.891    0.276 -0.42 0.6769
Tumorsize1       0.403     1.496    0.158  2.54 0.0110
Multifocal1      0.417     1.518    0.160  2.61 0.0091
LVI1             1.196     3.306    0.172  6.94  4e-12
VH1              1.921     6.826    0.160 12.00 <2e-16

First question: why do I not get a reasonable p value for LVI and VH?

>library(rms)
> mynom <- svycox.nomogram(.design = SurvObj, .model = Surv(TimetoProg,
> Progression==1) ~ ConcomitantCIS + Tumorsize + Multifocal + LVI + VH,
> .data = VHtrainset, pred.at = 24, fun.lab = "2yr Prob")
Error: $ operator is invalid for atomic vectors

and now I am really stuck!
I would really appreciate any help!

David


--
View this message in context: http://r.789695.n4.nabble.com/Competing-Risk-Nomogram-tp4733256.html
Sent from the datatable-help mailing list archive at Nabble.com.

From yarmi1224 at hotmail.com  Thu May  4 08:37:23 2017
From: yarmi1224 at hotmail.com (Eva Chiou)
Date: Wed, 3 May 2017 23:37:23 -0700 (PDT)
Subject: [datatable-help] How do I use R to build a dictionary of proper
	nouns?
Message-ID: <1493879843940-4733354.post@n4.nabble.com>

I want to do patents text mining in R. 
I need to use the proper nouns of domain ontology to build a dictionary. 
Then use the dictionary to analysis my corpus of patent files.
I want to calculate the proper nouns and get the word frequency that appears
in each file.

Now I have done the preprocess for the corpus and extract the proper nouns
from domain ontology.
But I have no idea how to build a proper nouns dictionary and use the
dictionary to analysis my corpus.

The following are my texts, corpus preprocesses and proper nouns.

my patent text
<http://r.789695.n4.nabble.com/file/n4733354/1.png> 

corpus preprocesses
<http://r.789695.n4.nabble.com/file/n4733354/2.png> 

proper nouns from domain ontology
<http://r.789695.n4.nabble.com/file/n4733354/3ontology_proper_nouns_keywords.png> 


--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-R-to-build-a-dictionary-of-proper-nouns-tp4733354.html
Sent from the datatable-help mailing list archive at Nabble.com.

From emily1858 at gmail.com  Fri May  5 14:08:55 2017
From: emily1858 at gmail.com (eg1858)
Date: Fri, 5 May 2017 05:08:55 -0700 (PDT)
Subject: [datatable-help] Linear Regression problem
Message-ID: <1493986135744-4733405.post@n4.nabble.com>

Hello, 

I was assigned a problem for a math class that involves coding in R. I have
very little experience and cant make this work.


Question:
The code below produces a dataset of size n = 20 containing a random
variable X from a
uniform distribution and a random variable Y from a normal distribution.
Clearly, X and
Y are independently generated.
x <- runif(20, 0, 1)
y <- rnorm(20, 2, 2)
1. Generate 100 different datasets using the above code each of size n = 20.
You get to
observe only the generated datasets (and assume variance is unknown).


My attempt

model <- NULL
LM <-list()
	x <- runif(20,0,1)
	y <- rnorm(20,2,2)

#Generate 100 different datasets with n=20
for(i in 1:100) {

 	model<- lm(y~x)
 	LM[[i]] <- model
 	print(summary(LM[[i]])$coefficient)[2,1]
 	
}

summary(LM[[i]])


Any help?
Thanks


--
View this message in context: http://r.789695.n4.nabble.com/Linear-Regression-problem-tp4733405.html
Sent from the datatable-help mailing list archive at Nabble.com.

From s3tochri at uni-bayreuth.de  Thu May 11 08:21:20 2017
From: s3tochri at uni-bayreuth.de (Tobic89)
Date: Wed, 10 May 2017 23:21:20 -0700 (PDT)
Subject: [datatable-help] Add specific trend to regression - Fixed Effects
	Regression
Message-ID: <1494483680651-4733658.post@n4.nabble.com>

Hey guys,

I am currently trying to add an object specific trend to a fixed effects
regression. With STATA it is easy but not with R. For the regression I am
using the plm-package.
Is it possible to use plm or better another package?

As underlying data I have panel-data for different cities. So the time trend
has to be city-specific.

Hopefully you can help me.

All the best,

Tobi


--
View this message in context: http://r.789695.n4.nabble.com/Add-specific-trend-to-regression-Fixed-Effects-Regression-tp4733658.html
Sent from the datatable-help mailing list archive at Nabble.com.

From s3tochri at uni-bayreuth.de  Thu May 11 11:58:06 2017
From: s3tochri at uni-bayreuth.de (Tobic89)
Date: Thu, 11 May 2017 02:58:06 -0700 (PDT)
Subject: [datatable-help] Error: invalid type (closure) for the variable
	'time'
Message-ID: <1494496686944-4733663.post@n4.nabble.com>

Hey,

I just have trouble running a FE-Regression with the plm-package. I recieve
the following error:
"Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data =
data,  : invalid type for the variable 'time' "

Do you have an idea how to fix it?
I used the formula:


--
View this message in context: http://r.789695.n4.nabble.com/Error-invalid-type-closure-for-the-variable-time-tp4733663.html
Sent from the datatable-help mailing list archive at Nabble.com.

From vedahung1116 at gmail.com  Fri May 12 17:16:04 2017
From: vedahung1116 at gmail.com (Veda)
Date: Fri, 12 May 2017 08:16:04 -0700 (PDT)
Subject: [datatable-help] inconsistency between loadings and coefficient in
	plsr
Message-ID: <1494602164703-4733715.post@n4.nabble.com>

Hello experts,

My experiment had 13 experimental variables and 1 dependent variable and the
data were collected from 30 participants. Because the 13 experimental
variables are highly correlated with each other, I use PLS to extract
important factors from those variables to account for the dependent
variable. 
This is my model:
# determine the number of component
ncomp=selectNcomp(plsr(data=input,trimRT~var1+var2+var3+....var13,5,validation='CV',scale=TRUE),
"randomization",alpha=0.05)

# feed the number of component to function to calculate loading and
coefficients
plsr(data=input,trimRT~var1+var2+var3+....var13,ncomp,validation='CV',scale=TRUE)

I have three question regarding plsr():
1. Should I let the model know the dependent variable (predicted variable)
collected from difference people? If so, how should I code the information
of subject ID in the following function?
2. Some variables are repeated-measures and some are not. How should I code
this information in the function?
3. The results from loading of predictors in one factor did not match the
results of coefficients (please see attached figures). For instance, given
that the variable 10 had higher loading than other variables in component 5,
I expected to see the coefficient of the variable 10 was higher in terms of
magnitude (regardless the sign) than other variables in component 5. Why is
it not the case? 

<http://r.789695.n4.nabble.com/file/n4733715/1.png> 
<http://r.789695.n4.nabble.com/file/n4733715/2.png> 

Your inputs are appreciated. Thanks.

Best,
Veda


--
View this message in context: http://r.789695.n4.nabble.com/inconsistency-between-loadings-and-coefficient-in-plsr-tp4733715.html
Sent from the datatable-help mailing list archive at Nabble.com.

From krzysztof.czauderna at coi.pl  Sat May 13 21:07:53 2017
From: krzysztof.czauderna at coi.pl (repidemiologist)
Date: Sat, 13 May 2017 12:07:53 -0700 (PDT)
Subject: [datatable-help] Select all districts within a 100 km radius of a
	district
Message-ID: <1494702473058-4733799.post@n4.nabble.com>

Dear R Users! I need to limit my dataset to all districts within a 100 km
radius of a selected district (geographical area of interest).

I have many variables in the dataset and among them geographical coordinates
of centroids, e.g.:
Longitude Latitude
1  -61.68667 17.02444
2  -61.88722 17.10527
3  -61.79445 17.16333
4  -61.68667 17.02444
5  -61.72917 17.60861
...


Now, I need in the dataset only these observations (districts) which are in
the radius of 100 km around the first one (-61.68667 17.02444). How to do
this? Can you help me?

Many thanks... 


--
View this message in context: http://r.789695.n4.nabble.com/Select-all-districts-within-a-100-km-radius-of-a-district-tp4733799.html
Sent from the datatable-help mailing list archive at Nabble.com.

From bioglp at gmail.com  Sat May 13 22:35:14 2017
From: bioglp at gmail.com (glaporta)
Date: Sat, 13 May 2017 13:35:14 -0700 (PDT)
Subject: [datatable-help] Select all districts within a 100 km radius of
	a district
In-Reply-To: <1494702473058-4733799.post@n4.nabble.com>
References: <1494702473058-4733799.post@n4.nabble.com>
Message-ID: <1494707714547-4733801.post@n4.nabble.com>

Hi, you can add a new distance column and apply a filter to it.
I hope this help,
Gianandrea

coord <- 'your data frame'

library(geosphere)
dist <- vector()
for(i in 1:5){
  dist.tmp <- (distm(coord[1,],coord[i,],fun = distHaversine))
  dist <- c(dist,dist.tmp)
}
coord$dist <- dist
coord[coord$dist>100000,]


--
View this message in context: http://r.789695.n4.nabble.com/Select-all-districts-within-a-100-km-radius-of-a-district-tp4733799p4733801.html
Sent from the datatable-help mailing list archive at Nabble.com.

From panugu at umc.edu  Thu May 18 23:05:19 2017
From: panugu at umc.edu (panugu)
Date: Thu, 18 May 2017 14:05:19 -0700 (PDT)
Subject: [datatable-help] had non-zero exit status
Message-ID: <1495141519962-4734085.post@n4.nabble.com>

I tried to install tidyr package on R  version 3.1.0 and getting below error
message. Please advise.

The downloaded source packages are in
        '/tmp/RtmpmKd1bj/downloaded_packages'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning messages:
1: In install.packages("tidyr") :
  installation of package 'rlang' had non-zero exit status
2: In install.packages("tidyr") :
  installation of package 'tibble' had non-zero exit status
3: In install.packages("tidyr") :
  installation of package 'tidyr' had non-zero exit status


--
View this message in context: http://r.789695.n4.nabble.com/had-non-zero-exit-status-tp4734085.html
Sent from the datatable-help mailing list archive at Nabble.com.

From panugu at umc.edu  Fri May 19 14:30:11 2017
From: panugu at umc.edu (panugu)
Date: Fri, 19 May 2017 05:30:11 -0700 (PDT)
Subject: [datatable-help] had non-zero exit status
In-Reply-To: <1495141519962-4734085.post@n4.nabble.com>
References: <1495141519962-4734085.post@n4.nabble.com>
Message-ID: <1495197011540-4734109.post@n4.nabble.com>

Ran 
install.packages("tidyr", dependencies=TRUE)

getting below error message. Please advise. R version 3.1.0
pic  -g -O2  -c splice.c -o splice.o
In file included from splice.c:2:
vector.h: In function 'namespace_rlang_sym':
vector.h:94: error: 'R_DoubleColonSymbol' undeclared (first use in this
function)
vector.h:94: error: (Each undeclared identifier is reported only once
vector.h:94: error: for each function it appears in.)
make: *** [splice.o] Error 1
ERROR: compilation failed for package 'rlang'
* removing '/usr/local/lib64/R/library/rlang'
ERROR: dependency 'rlang' is not available for package 'tibble'
* removing '/usr/local/lib64/R/library/tibble'
ERROR: dependencies 'tibble', 'dplyr' are not available for package 'tidyr'
* removing '/usr/local/lib64/R/library/tidyr'


--
View this message in context: http://r.789695.n4.nabble.com/had-non-zero-exit-status-tp4734085p4734109.html
Sent from the datatable-help mailing list archive at Nabble.com.

From fperickson at wisc.edu  Fri May 19 16:17:13 2017
From: fperickson at wisc.edu (Frank Erickson)
Date: Fri, 19 May 2017 10:17:13 -0400
Subject: [datatable-help] had non-zero exit status
In-Reply-To: <1495197011540-4734109.post@n4.nabble.com>
References: <1495141519962-4734085.post@n4.nabble.com>
 <1495197011540-4734109.post@n4.nabble.com>
Message-ID: <CAJd-hd=mdgVAhOFyQGeckEx_gV=NKJhgKV3YV_bEX_S4dbEy5w@mail.gmail.com>

This is a mailing list for the data.table package. Have a look at other
resources: https://www.r-project.org/help.html

On Fri, May 19, 2017 at 8:30 AM, panugu <panugu at umc.edu> wrote:

> Ran
> install.packages("tidyr", dependencies=TRUE)
>
> getting below error message. Please advise. R version 3.1.0
> pic  -g -O2  -c splice.c -o splice.o
> In file included from splice.c:2:
> vector.h: In function 'namespace_rlang_sym':
> vector.h:94: error: 'R_DoubleColonSymbol' undeclared (first use in this
> function)
> vector.h:94: error: (Each undeclared identifier is reported only once
> vector.h:94: error: for each function it appears in.)
> make: *** [splice.o] Error 1
> ERROR: compilation failed for package 'rlang'
> * removing '/usr/local/lib64/R/library/rlang'
> ERROR: dependency 'rlang' is not available for package 'tibble'
> * removing '/usr/local/lib64/R/library/tibble'
> ERROR: dependencies 'tibble', 'dplyr' are not available for package 'tidyr'
> * removing '/usr/local/lib64/R/library/tidyr'
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/
> had-non-zero-exit-status-tp4734085p4734109.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20170519/687bc15d/attachment.html>

From crosspide at hotmail.com  Tue May 23 13:42:35 2017
From: crosspide at hotmail.com (agent dunham)
Date: Tue, 23 May 2017 04:42:35 -0700 (PDT)
Subject: [datatable-help] cluster - daisy - date variable
Message-ID: <1495539755260-4734302.post@n4.nabble.com>

Dear community, 

I want to perform cluster annalysis.

My variables are mixed, and one of them of date-type.

I've thought of gower distance, and the function daisy to compute the
dissimilarity distance.

What should I write for the date variable in the daisy-type argument?

Thanks in advance, 

 
--
View this message in context: http://r.789695.n4.nabble.com/cluster-daisy-date-variable-tp4734302.html
Sent from the datatable-help mailing list archive at Nabble.com.

From super_jak1985 at gmx.de  Thu May 25 16:35:51 2017
From: super_jak1985 at gmx.de (Fabian Werner)
Date: Thu, 25 May 2017 16:35:51 +0200
Subject: [datatable-help] data.table global and local scope mixture
Message-ID: <trinity-b8a70bbb-7761-458b-b2eb-1fba9d47d84b-1495722951669@3capp-gmx-bs07>

An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20170525/121d643c/attachment.html>

From crosspide at hotmail.com  Wed May 31 15:15:55 2017
From: crosspide at hotmail.com (agent dunham)
Date: Wed, 31 May 2017 06:15:55 -0700 (PDT)
Subject: [datatable-help] kproto - clustMixType - optimal number of clusters
Message-ID: <1496236555945-4735538.post@n4.nabble.com>

Dear community, 

I've a dataset of 430000 rows, and 6 columns (1 continuous, 4 nominal, 1
ordinal).

I'm trying to cluster this data via kproto. 

How can I estimate the optimal number of clusters?
I haven't found anything at clustMixType. Is there anything at any other
package?

Thanks in advance, 


--
View this message in context: http://r.789695.n4.nabble.com/kproto-clustMixType-optimal-number-of-clusters-tp4735538.html
Sent from the datatable-help mailing list archive at Nabble.com.