R: Input and output: scripts, saving and loading data (including database access)
|
Back to Local tips for R
http://www.psychol.cam.ac.uk/statistics/R/enteringdata.html
|
A few file-handling commands may be useful:
setwd("c:/myfiles") # use / or \\ to separate directories under Windows (\\ becomes \ once processed through the escape character mechanism) dir() # list the contents of the current directory
Running scripts:
source("myfile.R") # load and execute a script of R commands
For a startup script, edit ".Rprofile" in your home directory (for details see ?Startup). Here's an example:
# RNC ~/.Rprofile # auto width adjustment .adjustWidth <- function(...){ options(width=Sys.getenv("COLUMNS")) TRUE } .adjustWidthCallBack <- addTaskCallback(.adjustWidth) .First <- function() cat("\n Script ~/.Rprofile executed.\n\n") .Last <- function() cat("\n Goodbye!\n\n")
Redirecting output:
sink("myfile.txt") # redirect console output to a file sink() # restore output to the screen pdf("mygraph.pdf") # subsequent graphical output will go to a PDF png("mygraph.png") # subsequent graphical output will go to a PNG jpeg("mygraph.jpeg") # subsequent graphical output will go to a JPEG bmp("mygraph.bmp") # subsequent graphical output will go to a BMP postscript("mygraph.ps") # subsequent graphical output will go to a PostScript file dev.off() # back to the screen
my.data = read.csv(filename) my.data = read.csv(file.choose()) # Note: (1) = and <- are synonymous, and are the assignment operator (while == tests for equality) # (2) file.choose() pops up a live filename picker # (3) The default is to assume a header row with variable names (header=TRUE), # and no row names, but you can change all these defaults (e.g. row.names=1 reads # row names from the first column). attach(my.data) # you might then want to attach the new data to the path, though this is optional write.csv(my.data, filename2) # Write the data to a new file. There are several options available; see the help (use ?write.csv) write.csv(my.data, file="d:/temp/newfile.csv", row.names=FALSE) # Here's one: turn off row names to avoid creating a spurious additional column. read.table(...) # } A more generic way to read/write tabular data from/to disk write.table(...) # } (read.csv and write.csv are specialized versions of read.table and write.table)
In the R Commander, you can use Data / Import data / from text file or clipboard, and, having selected a data set, Data / Active data set / Export active data set.
There are several ways to read from Excel spreadsheets. I find this easy:
library(RODBC) channel <- odbcConnectExcel("Osteomalacia_data.xls") # specify the filename patientdata <- sqlFetch(channel, "Vitamin_D_levels") # specify a sheet within the spreadsheet indexcasedata <- sqlFetch(channel, "Sheet2") # by default Excel names individual sheets Sheet1, Sheet2, ..., though you may have renamed them something more informative odbcClose(channel)
But in a Linux environment, the Windows ODBC settings etc. probably won't work out of the box, so this works well:
library(gdata) data <- read.xls("myexceldata.xls", sheet=1) # load the first worksheet
Very easy:
library(foreign) mydata <- data.frame(read.spss("filename.sav")) # Remember you can also use file.choose() in place of the filename, as above.
For saving in SPSS format from R, see extensions.
There are special mechanisms to support Excel files (see above) and Microsoft Access databases, but in general any database can be accessed via its ODBC interface. To use this, the actual database in question (e.g. "C:\MyExperiment\MonkeyCantab_database_7.mdb") needs to be set up via the Windows ODBC Manager (Data Source Administrator) tool, giving that particular database a data source name (DSN), such as "MonkeyCantabDB7". This is then used by all programs that request data via ODBC. Like this:
# 1. Connect library(RODBC) channel <- odbcConnect("my_DSN") # specify your DSN here # if you need to specify a username/password, use: # channel <-odbcConnect("mydsn", uid="username", pwd="password") # 2. List all tables sqlTables(channel) # 3. Fetch a whole table into a data frame mydataframe <- sqlFetch(channel, "my_table_name") # fetch a table from the database in its entirety close(channel) # 4. Fetch the results of a query into a data frame. Example: mydf2 <- sqlQuery(channel, "SELECT * FROM MonkeyCantab_LOOKUP_TaskTypes WHERE TaskType < 6")
If you're using MySQL, you can talk to the database directly:
library(RMySQL) # use install.packages("RMySQL") if this produces an error # if the install.packages() command produces an error, under Ubuntu: # use "sudo apt-get install libmysql++-dev" (in addition to MySQL itself, i.e. the # "mysql-server mysql-client mysql-navigator mysql-admin" packages) con <- dbConnect(MySQL(), host="localhost", port=3306, dbname="mydatabase", user="myuser", password="mypassword") dbListTables(con) dbListFields(con, "table_name") d <- dbReadTable(con, "table_name") e <- dbGetQuery(con, "SELECT COUNT(*) FROM table_name") # and much more possible
As you'd expect, the easiest:
save(myobject1, myobject2, ..., file="D:/temp/mydata.rda") load(file="D:/temp/mydata.rda") # note that the load command recreates the "mydata" object without prompting # you can also use save.image() to save a whole workspace
In the R Commander, you can use Data / Load data set, and, having selected a data set, Data / Active data set / Save active data set.
To export the definition of an R object (which you can then re-import using "object = THISTHING"):
dput(object, "")
To read a tabular object with a header row from the clipboard (see here):
object = read.table("clipboard", header=T)