Thursday, April 29, 2010

Parsing Atom Feeds with Python

One of the things that I have really come to like about Python is how easy it is to prototype some code to try an idea out. I wanted to parse some blog feeds to try out some semantic analysis algorithms.

I just imported feedparser and let 'er rip.
Here is a simple example to print out the titles of this blogs Atom feed.

import feedparser

def parseit( feedurl ):
    
    data = feedparser.parse( feedurl )

    print 'Title => ' + data['title']

    print 'There are %d entries.' % (len(data['entries']))

    for entry in data['entries']:
        print entry.title

if __name__ == "__main__":

    print 'Running'
    parseit('http://dataoracle.blogspot.com/feeds/posts/default')

Thursday, April 8, 2010

Dynamic R Plotting

I needed to create a graph that contained multiple lines of data.
This is relatively easy to do in R when you know about the data like the maximum values and the number of lines.
What is not so easy is to create an R script that will dynamically create the graph without knowing anything about the data ahead of time.

It appears that the call to plot() needs to have sort of parameters to create graph.
Just calling plot.new() isn't going to give me the results that I need.
I am certain that there is a better way to do this but it works.
I just used a counter in the loop and if it is the first time through I call plot with the first line.
If it is not the first time through then the lines() method is called to add the data.

The colors are dynamically created based on the number of lines.

# required libraries
library(RMySQL)

cntr <- 1# set a counter that will represent the # of times through the loop

# connect to db using credentials in my.cnf
dbcon <- dbConnect( MySQL(), group="localnfl")

# get the years with data from the database
# returns a data frame
years <- dbGetQuery( dbcon, "SELECT DISTINCT( year) FROM nfl.bet_lines")

#define the image that will hold the graph
png("../../public_html/betline.png", width=1024, height=1024)

#create a vector to hold text for legend
legend_text <- character()

# choose enough colors (one for each year)
# returns a character vector
colors = rainbow( length(years$year) )
# iterate over each year
for( y in years$year )
{
  #create a string that will be the SQL
  # used this as a poor man's prepared statement to pass in year
  sql <- sprintf("SELECT IF( (hscore-vscore)>0 AND lineopen>0,
                         ABS(lineopen- (hscore-vscore))*1,
                         ABS(lineopen- (hscore-vscore))*-1) AS mov
                  FROM nfl.bet_lines
                  WHERE week <= 17 AND year = %d
                  ORDER BY mov asc",y )

  # run the query returns a one column data frame
  values <- dbGetQuery( dbcon, sql )

  # if this is the first time through create the plot with
  # type="n" for no actual plot ( could probably just plot the first line)
  if( cntr == 1 )
  {
    # create the plot with no axes labels( ann =FALSE )
    plot( row.names(values), values$mov, ylim=c(-50,50), type="n", ann=FALSE )
  }

  lines( values$mov, type="l", lwd="2", col=colors[cntr] )

  #add the current year numeric value to the legend text as a string
  legend_text[length(legend_text)+1] = as.character(y)

  cntr <- cntr + 1  #increment the counter
}

#create the main title, double font size(cex=2), italic is 4
title(main="Mad Title", cex.main=2, font.main=4)

# place text on top(3)
mtext( "actual > 0 and opening line > 0", 3 )

# Label the x and y axes
title(ylab="| Line - Actual |")
title(xlab="Game Number")

#create the legend
legend( "topleft", legend_text, fill=colors )

dev.off()

dbDisconnect(dbcon) 

Let me know if you have any suggestions on improving this R script.

Thursday, April 1, 2010

R Convert Numbers to String

For a recent R graph I needed to pull some numbers out of a database and later use those numbers as labels in the graph legend.

The legend method takes a "character or expression vector" NOT a vector of numbers much to my dismay. So I was left with a vector of numbers that I wanted to use as a legend and the legend method would not accept them. I was unable to find a method that converted this for me a`la java ( String.valueOf() ).

Here is how I did it:
#assume a vector of numbers already exists
numbers <- c(10,20,30,40)

#create a vector to hold text for legend
legend_text <- character()

#add the current numeric value to the legend text as a string
for number in numbers
{
  legend_text[length(legend_text)+1] = as.character(number)
}

#legend_text now contains a character vector and can be passed to the legend method