The following article will detail how create R command scripts for HTCondor. This article will also demonstrate how to configure R repository and install packages to your local directory.

What is R ?

 

  • R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

 

1. Installing R packages without root access

 $ R

   First, you need to designate a directory where you will store the downloaded packages. On my machine, I use the directory /home/kross48/packages/ After creating a package directory, to install a package we use the command:

 > install.packages("ggplot2", lib="/home/kross48/packages/")
 > library(ggplot2, lib.loc="/home/kross48/packages/")

2 . Installing R packages locally from a tar file.

 $ R CMD INSTALL arules_1.1-9.tar.gz --library=/home/kross48/packages

It’s a bit of a pain having to type "/your_packages_directory/" all the time. To avoid this burden,  we create a file .Renviron in our home area, and add the line R_LIBS=/data/Rpackages/ to it. This means that whenever you start R, 
the directory "/your_packages_directory/" is added to the list of places to look for R packages and so:

 > install.packages("ggplot2")
 > library(ggplot2)

3. Setting the repository Creating an .Rprofile

Every time you install a R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, simply: create a file .Rprofile in your home area. Add the following piece of code to it:
   cat(".Rprofile: Setting Cloud repositoryn")
   r = getOption("repos") # hard code the cloud repo for CRAN
   r["CRAN"] = "https://cloud.r-project.org/"
   options(repos = r)
   rm(r)

or

  local({
  r 

4. Setting up HTCondor Jobs

   Sample R Script :
   library("mvtnorm",lib.loc="/home/kross48/packages/")
   library("rngWELL",lib.loc="/home/kross48/packages/")
   library("randtoolbox",lib.loc="/home/kross48/packages/")
   sink('test2.txt')
   cat('This is my first R program\n')
   sink()
   print("success")

   Sample command file: 
   universe = vanilla
   getenv = true
   executable = /usr/bin/Rscript 
   arguments = test2.R
   log = $(Cluster).log
   output = $(Cluster).$(process).out
   error = $(cluster).$(Process).error
   queue

or Example running shell script inside a job 

   universe = vanilla
   getenv = true
   executable = test.sh
   log = $(Cluster).log
   output = $(Cluster).$(process).out
   error = $(cluster).$(Process).error
   queue

   Sample Bash script:
   #!/bin/bash
   export R_LIBS=/home/kross48/packages
   # run your script
   /usr/bin/Rscript test.R