[datatable-help] Generating pseudodata as in Elements of Statistical Learning

machinelearner supipia at gmx.de
Sat Dec 9 19:00:15 CET 2017


Hi dear statisticians,

I am trying to implement a Simulation from the book "Elements of Statistical
Learning" by Hastie et al. 
My Problem is that I don't understand how to generate the pseudodata as they
did. 
The book says /For each of N  =100 Samples, we generated p standard Gaussian
features X with pairwise correlation 0.2. The outcome Y was generated
according to a linear model/  Y = \sum_{j=1}^p X_j*b_j + sigma*Epsilon,
(Sorry, don't know if a math mode exists here?)
/ where Epsilon was generated from a Standard Gaussian Distribution. For
each dataset, the set of coefficients b_j were also generated from a
Standard Gaussian Distribution. We investigated p = 20, 100 and 1000. The
standard deviation sigma was chosen in each case so that the
signal-to-noise-ratio Var[E(Y|X)]/sigma² equaled 2. /

So, what I managed to generate so far are the Xs, the Epsilons and the bs. 
I don't get how I'm meant to generate Y without knowing sigma and according
to the description of sigma, I need Y to compute it. 

Can someone please help me? What am I not understanding here??
Thanks and best regards!
 





--
Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html


More information about the datatable-help mailing list