From supipia at gmx.de Sat Dec 9 19:00:15 2017 From: supipia at gmx.de (machinelearner) Date: Sat, 9 Dec 2017 11:00:15 -0700 (MST) Subject: [datatable-help] Generating pseudodata as in Elements of Statistical Learning Message-ID: <1512842415435-0.post@n4.nabble.com> Hi dear statisticians, I am trying to implement a Simulation from the book "Elements of Statistical Learning" by Hastie et al. My Problem is that I don't understand how to generate the pseudodata as they did. The book says /For each of N =100 Samples, we generated p standard Gaussian features X with pairwise correlation 0.2. The outcome Y was generated according to a linear model/ Y = \sum_{j=1}^p X_j*b_j + sigma*Epsilon, (Sorry, don't know if a math mode exists here?) / where Epsilon was generated from a Standard Gaussian Distribution. For each dataset, the set of coefficients b_j were also generated from a Standard Gaussian Distribution. We investigated p = 20, 100 and 1000. The standard deviation sigma was chosen in each case so that the signal-to-noise-ratio Var[E(Y|X)]/sigma? equaled 2. / So, what I managed to generate so far are the Xs, the Epsilons and the bs. I don't get how I'm meant to generate Y without knowing sigma and according to the description of sigma, I need Y to compute it. Can someone please help me? What am I not understanding here?? Thanks and best regards! -- Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html From agribhimchaulagain at gmail.com Thu Dec 21 08:13:21 2017 From: agribhimchaulagain at gmail.com (bhimchaulagain) Date: Thu, 21 Dec 2017 00:13:21 -0700 (MST) Subject: [datatable-help] Count consecutive number of days with condition in R Message-ID: <1513840401153-0.post@n4.nabble.com> My question is about calculating the days with consecutive hours that meet the certain conditions. For example, I have a data frame which is a hourly weather data as mentioned below. I am trying to calculate the number of days within a specified period (for eg Start date: 11/15/13 and End date: 11/19/13) having 2m T avg (F) in between 65-75 & 'RelHum avg 2m (pct)' >=90 consecutive for 4 or more than 4 hours in each day. Both temperature and relative humidity conditions mentioned above should satisfy the condition of consecutive for 4 or more than 4 hours. I couldn't even start to work on this problem. I would greatly appreciate if you can help me in this problem. Period 2m T avg (F) RelHum avg 2m (pct) ------ ----------- ------ ------------ 11/15/13 0:00 57.91 93 11/15/13 1:00 57.93 93 11/15/13 2:00 58.8 92 11/15/13 3:00 58.99 92 11/15/13 4:00 58.79 93 11/15/13 5:00 59.56 94 11/15/13 6:00 59.82 94 11/15/13 7:00 61.39 95 11/15/13 8:00 66.56 92 11/15/13 9:00 72.93 82 11/15/13 10:00 76.79 72 11/15/13 11:00 77.82 70 11/15/13 12:00 77.99 70 11/15/13 13:00 78.69 68 11/15/13 14:00 77.66 70 11/15/13 15:00 76.94 70 11/15/13 16:00 76.53 70 11/15/13 17:00 74.7 76 11/15/13 18:00 72.96 81 11/15/13 19:00 71.63 84 11/15/13 20:00 70.79 87 11/15/13 21:00 70.33 88 11/15/13 22:00 68.49 90 11/15/13 23:00 67.86 92 11/16/13 0:00 68.81 92 11/16/13 1:00 69.3 91 11/16/13 2:00 69.07 92 11/16/13 3:00 69.35 92 11/16/13 4:00 69.33 93 11/16/13 5:00 69.3 94 11/16/13 6:00 69.04 95 11/16/13 7:00 69.08 95 11/16/13 8:00 70.73 95 11/16/13 9:00 72.86 94 11/16/13 10:00 75.15 93 11/16/13 11:00 76.09 89 I have written some codes to calculate the number of hours in a certain time period having the temperature and RH conditions met. However, I got problems while trying to calculate the number of days in a specified period with temperature and RH conditions as mentioned above which is continuous for 4 or more than 4 hours in each day. I used the following code to calculate the number of hours for the temperature and Rh conditions which will not consider if those conditions are consecutive for 4 or more than 4 hours. fawn <- read_excel('FAWN_report.xlsx') fawn <-as.data.frame(fawn) #Visualize data (if needed) head(fawn) fawn$days <- floor_date(fawn$Period, "day") fawn$weeks <- floor_date(fawn$Period, 'week') fawn$month <- floor_date(fawn$Period, 'month') fawn$hours <- floor_date(fawn$Period+1, 'hours') #this function returns the hour count between two dates satisfying #specific conditions. Enter start date, end date, Relative #humidity (inclusive), average Temperature low and high(inclusive) if needed numberOfHours <- function(start_date, end_date, start_time, end_time, RH, avgTempLow=NULL, avgTempHigh = NULL){ fawnSubset <- fawn %>% subset(days > start_date & days <= end_date & `RelHum avg 2m (pct)` >= RH) if(start_time > 24 || start_time < 0 || end_time > 24 || end_time < 0){ stop('Please choose time that is from 0-24') } if(start_time > end_time){ fawnTime1 <- subset(fawnSubset, hour(fawnSubset$hours) >= start_time) fawnTime2 <- subset(fawnSubset, hour(fawnSubset$hours) <= end_time) fawnSubset <- rbind(fawnTime1, fawnTime2) } else { fawnSubset <- subset(fawnSubset, hour(fawnSubset$hours) >= start_time & hour(fawnSubset$hours) <= end_time) } if(is.null(avgTempLow)==TRUE){ timePeriod <- paste(start_date,' to ', end_date) tally <- dim(fawnSubset)[1] answer <- cbind(timePeriod, tally) } else { fawnSubset <- fawnSubset %>% subset(`2m T avg (F)` >= avgTempLow & `2m T avg (F)`<= avgTempHigh) timePeriod <- paste(start_date,' to ', end_date) tally <- dim(fawnSubset)[1] answer <- cbind(timePeriod, tally) } return(answer) } write.table(numberOfHours('2013-12-20', '2014-01-09', 0, 24, 90, 65, 75), sep = ",", col.names = T, row.names = F, '2014_numberOfHours_specifiedHours__RH_T_7-27.csv') -- Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html From raymond.fu at usaa.com Thu Dec 21 18:51:58 2017 From: raymond.fu at usaa.com (cameron) Date: Thu, 21 Dec 2017 10:51:58 -0700 (MST) Subject: [datatable-help] Error: evaluation nested too deeply: infinite recursion Message-ID: <1513878718411-0.post@n4.nabble.com> Hi I am trying to change a data frame into timeSeries and i got infinite recursion error Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)? my codes: a <- data.frame(c(1:10)) rownames(a) <- seq(from = as.Date("2017-12-01"), to = as.Date("2017-12-10") , by = "day") timeSeries(a) i tried setting expression to 5e5 options( expressions = 5e5 ), but rstudio crashed. i am using R version 3.3.3 Thank for your help -- Sent from: http://r.789695.n4.nabble.com/datatable-help-f2315188.html