Thursday, April 8, 2010

Dynamic R Plotting

I needed to create a graph that contained multiple lines of data.
This is relatively easy to do in R when you know about the data like the maximum values and the number of lines.
What is not so easy is to create an R script that will dynamically create the graph without knowing anything about the data ahead of time.

It appears that the call to plot() needs to have sort of parameters to create graph.
Just calling plot.new() isn't going to give me the results that I need.
I am certain that there is a better way to do this but it works.
I just used a counter in the loop and if it is the first time through I call plot with the first line.
If it is not the first time through then the lines() method is called to add the data.

The colors are dynamically created based on the number of lines.

# required libraries
library(RMySQL)

cntr <- 1# set a counter that will represent the # of times through the loop

# connect to db using credentials in my.cnf
dbcon <- dbConnect( MySQL(), group="localnfl")

# get the years with data from the database
# returns a data frame
years <- dbGetQuery( dbcon, "SELECT DISTINCT( year) FROM nfl.bet_lines")

#define the image that will hold the graph
png("../../public_html/betline.png", width=1024, height=1024)

#create a vector to hold text for legend
legend_text <- character()

# choose enough colors (one for each year)
# returns a character vector
colors = rainbow( length(years$year) )
# iterate over each year
for( y in years$year )
{
  #create a string that will be the SQL
  # used this as a poor man's prepared statement to pass in year
  sql <- sprintf("SELECT IF( (hscore-vscore)>0 AND lineopen>0,
                         ABS(lineopen- (hscore-vscore))*1,
                         ABS(lineopen- (hscore-vscore))*-1) AS mov
                  FROM nfl.bet_lines
                  WHERE week <= 17 AND year = %d
                  ORDER BY mov asc",y )

  # run the query returns a one column data frame
  values <- dbGetQuery( dbcon, sql )

  # if this is the first time through create the plot with
  # type="n" for no actual plot ( could probably just plot the first line)
  if( cntr == 1 )
  {
    # create the plot with no axes labels( ann =FALSE )
    plot( row.names(values), values$mov, ylim=c(-50,50), type="n", ann=FALSE )
  }

  lines( values$mov, type="l", lwd="2", col=colors[cntr] )

  #add the current year numeric value to the legend text as a string
  legend_text[length(legend_text)+1] = as.character(y)

  cntr <- cntr + 1  #increment the counter
}

#create the main title, double font size(cex=2), italic is 4
title(main="Mad Title", cex.main=2, font.main=4)

# place text on top(3)
mtext( "actual > 0 and opening line > 0", 3 )

# Label the x and y axes
title(ylab="| Line - Actual |")
title(xlab="Game Number")

#create the legend
legend( "topleft", legend_text, fill=colors )

dev.off()

dbDisconnect(dbcon) 

Let me know if you have any suggestions on improving this R script.

No comments:

Post a Comment