[Blotter-commits] r582 - in pkg/FinancialInstrument: . sandbox
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Mar 17 16:59:08 CET 2011
Author: peter_carl
Date: 2011-03-17 16:59:08 +0100 (Thu, 17 Mar 2011)
New Revision: 582
Added:
pkg/FinancialInstrument/sandbox/
pkg/FinancialInstrument/sandbox/download.pitrader.R
pkg/FinancialInstrument/sandbox/parse.EODdata.R
Log:
- added sandbox folder
- added two scripts for downloading and parsing historical data
- scripts might be used as demos later
Added: pkg/FinancialInstrument/sandbox/download.pitrader.R
===================================================================
--- pkg/FinancialInstrument/sandbox/download.pitrader.R (rev 0)
+++ pkg/FinancialInstrument/sandbox/download.pitrader.R 2011-03-17 15:59:08 UTC (rev 582)
@@ -0,0 +1,116 @@
+# Script for managing the download of data from
+# http://pitrading.com/free_market_data.htm
+# including some long series of continuous futures contracts
+
+# This script requires the following directory structure:
+# filesroot [directory set in the script below]
+# Each symbol's processed csv files are stored in sub-directories
+# named for each symbol, e.g., ~/Data/EOD\ Global\ Indexes/XMI.IDX.
+# These directories and files will be created and updated by this script.
+# filesroot/.incoming
+# New or updated zip files should be placed here for processing.
+# This is also the working directory for the processing done in
+# this script. Unzipped csv files are redirected here for processing.
+# Temporary files are stored here before being appended to the symbol
+# file csv in the appropriate directory
+
+filesroot = "/home/peter/Data/pitrading"
+
+start_t<-Sys.time()
+
+# Create and set the working directory if it doesn't exist
+if (!file.exists(paste(filesroot, "/.incoming", sep="")))
+ dir.create(paste(filesroot, "/.incoming", sep=""), mode="0777")
+setwd(paste(filesroot, "/.incoming", sep=""))
+
+# Does the archive directory structure exist?
+if (!file.exists("../.archive")){
+ dir.create("../.archive", mode="0777")
+ dir.create("../.archive/zip_files", mode="0777")
+}
+if (!file.exists("../.archive/zip_files"))
+ dir.create("../.archive/zip_files", mode="0777")
+
+# Use wget so that we don't need a list of files to work from
+system("wget -r -l1 -H -t1 -nd -N -np -A.zip http://pitrading.com/free_market_data.htm")
+
+# -r -H -l1 -np These options tell wget to download recursively.
+# That means it goes to a URL, downloads the page there, then follows
+# every link it finds. The -H tells the app to span domains, meaning
+# it should follow links that point away from the page. And the -l1
+# (a lowercase L with a numeral one) means to only go one level deep;
+# that is, don't follow links on the linked site. It will take each
+# link from the list of pages, and download it. The -np switch stands
+# for "no parent", which instructs wget to never follow a link up to a
+# parent directory.
+#
+# We don't, however, want all the links -- just those that point to
+# zip files we haven't yet seen. Including -A.zip tells wget to only
+# download files that end with the .zip extension. And -N turns on
+# timestamping, which means wget won't download something with the same
+# name unless it's newer.
+#
+# To keep things clean, we'll add -nd, which makes the app save every
+# thing it finds in one directory, rather than mirroring the directory
+# structure of linked sites.
+
+# Unzip the files to text files
+system("unzip \\*.zip")
+system("mv *.zip ../.archive/zip_files/")
+
+# What files did we download?
+files = list.files()
+
+# Each file contains the full history for the symbol, so we just need to
+# move the file into the correct base directory. We don't need to do any
+# data parsing.
+pisymbols = vector()
+for(i in 1:length(files)) {
+ # generate a list of symbols from the files we downloaded
+ filename.txt <- files[i]
+ pisymbols[i] <- substr(filename.txt, 1, nchar(filename.txt) - 4)
+}
+
+# The extra ".CC" appended to each symbol to indicate that the data is for a
+# "continuous contract" rather than a futures contract to be used as a root.
+# We're modifying the symbols used so that they don't conflict with actual
+# futures contracts.
+for(pisymbol in pisymbols) {
+ # check to make sure directories exist for each
+ dir.create(paste("../", pisymbol, ".CC", sep=""), showWarnings = FALSE,
+ recursive = FALSE, mode = "0777")
+ # move files into appropriate directory
+ system(paste("mv ", pisymbol, ".txt", " ../", pisymbol, ".CC/", pisymbol, ".CC.csv", sep=""))
+}
+
+end_t<-Sys.time()
+print(c("Elapsed time: ",end_t-start_t))
+print(paste("Processed ", length(pisymbols), " symbols.", sep=""))
+
+# The following currencies need to be created first:
+# require(FinancialInstrument)
+# currency("USD")
+# currency("JPY")
+# currency("AUD")
+# currency("CHF")
+# currency("GBP")
+# currency("MXN")
+# currency("EUR")
+# currency("CAD")
+
+# We need to have a csv file that describes the metadata
+# for all the symbols. Remember to change the symbol to append ".CC" to each.
+# load.instruments(paste(filesroot, "/.scripts/instr.pitrading.csv", sep=""))
+
+# Dates in files are formatted as "%m/%d/%Y".
+# Now, whenever you log in you need to register the instruments. This
+# might be a line you put into .Rprofile so that it happens automatically:
+# require(quantmod) # this requires a development build after revision 560 or so.
+# setSymbolLookup.FI(base_dir=filesroot, split_method='common', storage_method='csv', src='csv', extension='csv', format='%m/%d/%Y')
+
+# Now you should be able to:
+# > getSymbols("INX.CC")
+# [1] "INX.CC"
+# > chart_Series(INX.CC)
+# > head(INX.CC)
+
Property changes on: pkg/FinancialInstrument/sandbox/download.pitrader.R
___________________________________________________________________
Added: svn:keywords
+ Revision Id Date Author
Added: pkg/FinancialInstrument/sandbox/parse.EODdata.R
===================================================================
--- pkg/FinancialInstrument/sandbox/parse.EODdata.R (rev 0)
+++ pkg/FinancialInstrument/sandbox/parse.EODdata.R 2011-03-17 15:59:08 UTC (rev 582)
@@ -0,0 +1,170 @@
+# Script for parsing downloaded Global Index data from
+# http://EODdata.com (which requires purchase),
+# and instructions for defining the symbols as instruments and
+# registering them with quantmod's getSymbols function.
+
+# Peter Carl
+
+# This script requires the following directory structure:
+# filesroot [directory set in the script below]
+# Each symbol's processed csv files are stored in sub-directories
+# named for each symbol, e.g., ~/Data/EOD\ Global\ Indexes/XMI.IDX.
+# These directories and files will be created and updated by this script.
+# filesroot/.incoming
+# New or updated zip files should be placed here for processing.
+# This is also the working directory for the processing done in
+# this script. Unzipped csv files are redirected here for processing.
+# Temporary files are stored here before being appended to the symbol
+# file csv in the appropriate directory
+# filesroot/.archive
+# Directory that contains the original files after processing, sorted
+# into the following two sub-directories:
+# filesroot/.archive/zip_files
+# filesroot/.archive/csv_files
+
+# Files are downloaded into yearly zip files:
+# INDEX_2005.zip
+# INDEX_2006.zip
+# ... etc.
+
+# Set the root of the file structure here:
+# filesroot = "/home/peter/Data/EOD\ Global\ Indexes"
+filesroot = "/home/peter/Data/EOD.Global.Indexes"
+
+setwd(paste(filesroot, "/.incoming", sep=""))
+
+start_t<-Sys.time()
+
+# Does the directory structure exist?
+if (!file.exists("../.archive")){
+ dir.create("../.archive", mode="0777")
+ dir.create("../.archive/csv_files", mode="0777")
+ dir.create("../.archive/zip_files", mode="0777")
+}
+if (!file.exists("../.archive/zip_files"))
+ dir.create("../.archive/zip_files", mode="0777")
+
+if (!file.exists("../.archive/csv_files"))
+ dir.create("../.archive/csv_files", mode="0777")
+
+# Unzip the zip files contained in filesroot/.incoming
+zipfiles = list.files(pattern="*.zip")
+if(length(zipfiles)>0){
+ system("unzip \\*.zip")
+ system("mv *.zip ../.archive/zip_files/")
+} else {
+ print("No zip files to process.")
+}
+
+# That creates a set of daily files, with filenames formatted as:
+# INDEX_20060123.csv
+
+# What csv files are we working with?
+files = list.files(pattern="*.csv")
+if(length(files) == 0){
+ stop("There are no csv files to process in the .incoming directory.")
+}
+
+# Check to see if the files have already been processed
+prevfiles = list.files("../.archive/csv_files")
+rmfiles = files[files %in% prevfiles]
+if(length(rmfiles) >0){
+ file.remove(rmfiles)
+ files = files[!files %in% prevfiles]
+}
+
+if (length(files) == 0)
+ stop("There are no files to process or these files have been processed previously. Stopping.")
+
+# Each file contains something like the following:
+# Symbol,Date,Open,High,Low,Close,Volume
+# ADR.IDX,23-Jan-2006,748.15,758.54,748.15,757.8,0
+# ADVA.IDX,23-Jan-2006,45,559,45,549,0
+# ADVN.IDX,23-Jan-2006,899,2213,899,2147,0
+# ADVQ.IDX,23-Jan-2006,1440,1666,1440,1645,0
+# AEX.IDX,23-Jan-2006,428.16,432.38,427.9,431.88,102237000
+# AJT.IDX,23-Jan-2006,16.66,16.67,16.54,16.59,0
+# ATX.IDX,23-Jan-2006,3846.97,3872.43,3804.52,3872.43,3018800
+# BANK.IDX,23-Jan-2006,3088.71,3109.2,3086.72,3105.77,0
+# BDI.IDX,23-Jan-2006,2417,2417,2417,2417,0
+
+# Remove the first column and place the rest of the line in a csv file
+# named for the symbol using awk
+# awk -F "," 'NR!=1 {file=$1;sub($1FS,blank); print >>file ".csv"}' INDEX_20060207.csv
+# Ignores the header in the original file (with NR!=1)
+for (file in files){
+ print(paste("Splitting ",file,sep=""))
+ system(paste('awk -F "," ' , "'NR!=1 {filename=$1; sub($1FS,blank); print >> filename",'".csv"',"}'" , file, sep=" "))
+ # Move the now-processed dated csv files into a 'processed' directory
+ system(paste("mv ", file, " ../.archive/csv_files/", file, sep=""))
+}
+# That creates a .csv file for each symbol processed and moves processed
+# files into the archive directory.
+
+# Now, we want to append the resulting csv files to csv data in the
+# directory for each symbol.
+
+# What symbols do we need to process?
+tmpfiles = list.files()
+header = 'Date,Open,High,Low,Close,Volume'
+for (file in tmpfiles){
+ targetdir = strtrim(file, (nchar(file)-4))
+ fullpathdir = paste("../", targetdir, sep="")
+ fullpathfile = paste("../", targetdir, "/", file, sep="")
+ if (!file.exists(fullpathdir)){ # Does the directory exist?
+ # No, create the directory
+ dir.create(fullpathdir, mode="0777")
+ # ...and an empty file with a header within it
+ system(paste("echo ", header," > ", fullpathfile, sep=""))
+ }
+ # Yes, directory exists
+ # ... so append the local tmp file to the existing data file
+ print(paste("Updating ", file, sep=""))
+ system(paste("cat ", file, " >> ", fullpathfile, sep=""))
+ # ... and remove the local tmp file
+ file.remove(file)
+}
+end_t<-Sys.time()
+print(c("Elapsed time: ",end_t-start_t))
+print(paste("Processed ", length(files) ," days of prices for ", length(tmpfiles), " symbols.", sep=""))
+
+# EOD provides a two-column text file, tab separated, that contains a list
+# of symbols and a short description. Its usually downloaded as
+# "INDEX.txt" from somewhere on the website. Although it is sparse, we
+# need to load the list of symbols as instruments and we might as well
+# keep the description (for what it's worth).
+
+# In addition, we need to add columns for other metadata we want to
+# associate with the symbols. Create a csv file in a spreadsheet with
+# the columns from the INDEX.txt file labeled "primary_id" and "description",
+# then add columns for currency (fill the column with a nonsense value
+# like "EOD", for now), exchange ("EOD"), multiplier ("1"),
+# source ("EODdata.com"), and any other attribute you want to associate
+# with the symbol. These suggested values are, of course, junk but
+# indicate that the metadata needs to be corrected later. With a
+# large number of symbols like this, this is just a quick way to get
+# started with the data. Once a few symbols have been selected, you
+# can correct the metadata for those contracts and use them as you
+# see fit.
+
+# Define the nonsense currency:
+# require(FinancialInstrument)
+# currency(EOD)
+
+# Then load the instruments csv file you created:
+# load.instruments("~/Data/EOD.Global.Indexes/.scripts/instr.EODdata.csv")
+
+# Now, whenever you log in you need to register the instruments. This
+# might be a line you put into .Rprofile so that it happens automatically:
+# require(quantmod) # this requires a development build after revision 560 or so.
+# setSymbolLookup.FI(base_dir='/home/peter/Data/EOD.Global.Indexes', split_method='common', storage_method='csv', src='csv', extension='csv', format='%d-%b-%Y')
+
+# Now you should be able to:
+# > getSymbols("FTSE.IDX")
+# [1] "FTSE.IDX"
+# > chart_Series(FTSE.IDX)
+# > head(FTSE.IDX)
+
+
+
+
Property changes on: pkg/FinancialInstrument/sandbox/parse.EODdata.R
___________________________________________________________________
Added: svn:keywords
+ Revision Id Date Author
More information about the Blotter-commits
mailing list