From marcis.bratka at gmail.com Fri Oct 23 21:46:37 2015 From: marcis.bratka at gmail.com (=?UTF-8?Q?M=C4=81rcis_Bratka?=) Date: Fri, 23 Oct 2015 22:46:37 +0300 Subject: [MonetDB.R] Non ascii characters Message-ID: Hi, does MonetDB.R support non ascii characters? I get error when running following lines: testDf1 <- data.frame(test = '??????') testDf2 <- data.frame(test = '?') dbWriteTable(conn, "testDf1", testDf1) dbWriteTable(conn, "testDf2", testDf2) Error in .local(conn, statement, ...) : Unable to execute statement 'INSERT INTO testDf2 VALUES ('?')'. Server says '!invalid start of UTF-8 sequence'. R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=Latvian_Latvia.1257 LC_CTYPE=Latvian_Latvia.1257 LC_MONETARY=Latvian_Latvia.1257 [4] LC_NUMERIC=C LC_TIME=Latvian_Latvia.1257 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] readr_0.2.1 dplyr_0.4.3 MonetDB.R_0.9.7 digest_0.6.8 DBI_0.3.1 loaded via a namespace (and not attached): [1] lazyeval_0.1.10 magrittr_1.5 R6_2.1.1 assertthat_0.1 parallel_3.2.2 tools_3.2.2 [7] Rcpp_0.12.1 Thanks Marcis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajdamico at gmail.com Sat Oct 24 09:50:10 2015 From: ajdamico at gmail.com (Anthony Damico) Date: Sat, 24 Oct 2015 03:50:10 -0400 Subject: [MonetDB.R] Non ascii characters In-Reply-To: References: Message-ID: i believe the problem is that ? is latin encoding and monetdb needs UTF-8 encoding? On Fri, Oct 23, 2015 at 3:46 PM, M?rcis Bratka wrote: > Hi, > > does MonetDB.R support non ascii characters? I get error when running > following lines: > > testDf1 <- data.frame(test = '??????') > testDf2 <- data.frame(test = '?') > > dbWriteTable(conn, "testDf1", testDf1) > dbWriteTable(conn, "testDf2", testDf2) > > > Error in .local(conn, statement, ...) : > Unable to execute statement 'INSERT INTO testDf2 VALUES ('?')'. > Server says '!invalid start of UTF-8 sequence'. > > > > R version 3.2.2 (2015-08-14) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=Latvian_Latvia.1257 LC_CTYPE=Latvian_Latvia.1257 LC_MONETARY=Latvian_Latvia.1257 > [4] LC_NUMERIC=C LC_TIME=Latvian_Latvia.1257 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] readr_0.2.1 dplyr_0.4.3 MonetDB.R_0.9.7 digest_0.6.8 DBI_0.3.1 > > loaded via a namespace (and not attached): > [1] lazyeval_0.1.10 magrittr_1.5 R6_2.1.1 assertthat_0.1 parallel_3.2.2 tools_3.2.2 > [7] Rcpp_0.12.1 > > > > Thanks > Marcis > > _______________________________________________ > Monetr-users mailing list > Monetr-users at lists.r-forge.r-project.org > http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/monetr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajdamico at gmail.com Sat Oct 24 12:35:57 2015 From: ajdamico at gmail.com (Anthony Damico) Date: Sat, 24 Oct 2015 06:35:57 -0400 Subject: [MonetDB.R] Non ascii characters In-Reply-To: References: Message-ID: this looks like a bug to me. thanks for reporting, marcis! note that your testdf1 does not throw an error, only testdf2 does. hannes, i am using monetdb.r 1.0.0 - here is the script and below it the output. note it does create the table even though it returned an error, which also should not happen? what do you think? library(MonetDB.R) sessionInfo() conn <- dbConnect(MonetDB.R(), "monetdb://localhost/demo") testDf2 <- data.frame(test = '?', stringsAsFactors = FALSE) Encoding(testDf2$test) <- 'UTF-8' # this throws an error dbWriteTable(conn, "testdf2", testDf2) # AND the table is now in the database! which should not happen dbListTables( conn ) > library(MonetDB.R) > sessionInfo() R version 3.2.2 (2015-08-14) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MonetDB.R_1.0.0 DBI_0.3.1 loaded via a namespace (and not attached): [1] tools_3.2.2 codetools_0.2-14 digest_0.6.8 > conn <- dbConnect(MonetDB.R(), "monetdb://localhost/demo") > testDf2 <- data.frame(test = '?', stringsAsFactors = FALSE) > Encoding(testDf2$test) <- 'UTF-8' > > # this throws an error > dbWriteTable(conn, "testdf2", testDf2) Error in if (chr == "\n") f <- qs <- paste0(qs, "\\", "n") : missing value where TRUE/FALSE needed In addition: Warning message: In strsplit(str, "", fixed = TRUE) : input string 1 is invalid UTF-8 > > > # AND the table is now in the database! which should not happen > dbListTables( conn ) [1] "testdf2" > On Sat, Oct 24, 2015 at 6:20 AM, M?rcis Bratka wrote: > Tried version 1.0 and encoding UTF-8, but no luck > > testDf1 <- data.frame(test = '??????', stringsAsFactors = FALSE) > testDf2 <- data.frame(test = '?', stringsAsFactors = FALSE) > testDf3 <- data.frame(test = 'qwerty', stringsAsFactors = FALSE) > > Encoding(testDf1$test) <- 'UTF-8' > Encoding(testDf2$test) <- 'UTF-8' > > > Encoding(testDf1$test)[1] "UTF-8"> testDf1$test[1] "\xe2\xe8\xe7\xec\xef\xe8"> Encoding(testDf2$test)[1] "UTF-8"> testDf2$test[1] "\u0080" > > > dbWriteTable(conn, "testdf1", testDf1) # error > dbWriteTable(conn, "testdf2", testDf2) # error > dbWriteTable(conn, "testdf3", testDf3) # works > > Error in if (chr == "\n") f <- qs <- paste0(qs, "\\", "n") : > missing value where TRUE/FALSE neededIn addition: Warning message:In strsplit(str, "", fixed = TRUE) : input string 1 is invalid UTF-8 > > > > Thanks > Marcis > > > > > > > 2015-10-24 10:50 GMT+03:00 Anthony Damico : > >> i believe the problem is that ? is latin encoding and monetdb needs UTF-8 >> encoding? >> >> On Fri, Oct 23, 2015 at 3:46 PM, M?rcis Bratka >> wrote: >> >>> Hi, >>> >>> does MonetDB.R support non ascii characters? I get error when running >>> following lines: >>> >>> testDf1 <- data.frame(test = '??????') >>> testDf2 <- data.frame(test = '?') >>> >>> dbWriteTable(conn, "testDf1", testDf1) >>> dbWriteTable(conn, "testDf2", testDf2) >>> >>> >>> Error in .local(conn, statement, ...) : >>> Unable to execute statement 'INSERT INTO testDf2 VALUES ('?')'. >>> Server says '!invalid start of UTF-8 sequence'. >>> >>> >>> >>> R version 3.2.2 (2015-08-14) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> Running under: Windows 7 x64 (build 7601) Service Pack 1 >>> >>> locale: >>> [1] LC_COLLATE=Latvian_Latvia.1257 LC_CTYPE=Latvian_Latvia.1257 LC_MONETARY=Latvian_Latvia.1257 >>> [4] LC_NUMERIC=C LC_TIME=Latvian_Latvia.1257 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] readr_0.2.1 dplyr_0.4.3 MonetDB.R_0.9.7 digest_0.6.8 DBI_0.3.1 >>> >>> loaded via a namespace (and not attached): >>> [1] lazyeval_0.1.10 magrittr_1.5 R6_2.1.1 assertthat_0.1 parallel_3.2.2 tools_3.2.2 >>> [7] Rcpp_0.12.1 >>> >>> >>> >>> Thanks >>> Marcis >>> >>> _______________________________________________ >>> Monetr-users mailing list >>> Monetr-users at lists.r-forge.r-project.org >>> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/monetr-users >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hannes.muehleisen at cwi.nl Mon Oct 26 14:13:22 2015 From: hannes.muehleisen at cwi.nl (=?utf-8?Q?Hannes_M=C3=BChleisen?=) Date: Mon, 26 Oct 2015 14:13:22 +0100 Subject: [MonetDB.R] Non ascii characters In-Reply-To: References: <3D6161E9-90A6-4E22-8B94-22C4D3AB7D09@cwi.nl> <7F1397C8-1366-4708-B65D-44234B1C0999@cwi.nl> Message-ID: Hi Marcis, I added a fix that calls enc2utf8() on character columns before importing them into MonetDB. Install preview version as follows: install.packages("MonetDB.R", repos=c("http://dev.monetdb.org/Assets/R/", "http://cran.rstudio.com/")) Hannes > On 26 Oct 2015, at 13:08, Anthony Damico wrote: > > marcis, when hannes uploads a fix, you'll see the new version (not yet on cran) at > > https://www.monetdb.org/Assets/R/ > > > On Mon, Oct 26, 2015 at 7:54 AM, Hannes M?hleisen wrote: > agreed, it should not. think i fixed it, but not yet uploaded. > > > On 26 Oct 2015, at 12:51, Anthony Damico wrote: > > > > thanks. and you see that this is two separate bugs? if a dbWriteTable fails for any reason, the table it was writing should not end up in the database > > > > On Mon, Oct 26, 2015 at 7:46 AM, Hannes M?hleisen wrote: > > Hi, > > > > this works fine on OSX/Linux with MonetDB.R 1.0.0, but I can confirm the issue on Windows. Will have a look. > > > > Hannes > > > > > > > On 24 Oct 2015, at 19:42, Anthony Damico wrote: > > > > > > hannes understands encoding hiccups better than i do. i dumped the special character in my surname long ago. ;) > > > > > > On Saturday, October 24, 2015, M?rcis Bratka wrote: > > > Then this might be due to some locale settings?! I guess if we get rid of error with testdf2 then it will likely work with testdf1 for me. > > > > > > 2015-10-24 14:58 GMT+03:00 Anthony Damico : > > > that does not occur for me > > > > > > On Sat, Oct 24, 2015 at 7:53 AM, M?rcis Bratka wrote: > > > Actually, for me testdf1 also gives the same error. > > > > > > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4127 bytes Desc: not available URL: