Archive for December, 2009

Genome Browser Snapshots

December 22, 2009

We use the UCSC Genome Browser for a wide variety of genomic visualisation tasks. I spend a fair proportion of my time preparing tracks for UCSC, including (but not limited to):

  • Solexa runs.
  • Restriction enzyme digestion sites.
  • Marking particular genes (and also occasionally using a colour map to indicate expression level.)

I’ve also implemented a navigation frame (hello, late 90s technology) so you can click through different regions. It’s a bit of a hack, but has been very useful to some of my colleagues, so on the whole a worthwhile bit of code.

The UCSC browser has a very clear URL based interface to uploading tracks and selecting the proper genome build for working with. You can put a track on a webserver and reference it by its URL to upload it. You can also reference a URL full of other URLs to upload many tracks at once, which is really handy. It also has a neat session file feature, which allows you to save the other tracks you want displayed below your own data.

Most of the time, uploading the tracks to UCSC and browsing manually is the right way to interact with the data. However, as always, it’s sometimes useful to be able to script the visualisations. So I wrote a class that allows you to reference a session and track URL and retrieve a PDF file with a snapshot of the browser with the chosen data.

According the UCSC techs, there is a limit of one hit every 15 seconds and 5,000 images a day.

#!/usr/bin/env python

import urllib
import re
import os
import sys

# gets a screenshot from UCSC of a specifid bed file
# a lot of the work is done in the bed file itself
# and a session file, which tells UCSC what tracks should display and how.
# Upgrades of this code might include exposing a few of those session parameters
# through the object

baseurl = "http://genome.ucsc.edu/cgi-bin/"
pagename = "hgTracks"
convertcmd = "/usr/bin/convert"


class UCSCTools:
	def __init__(self, params=None):
		self.params = params
		
	# params is a dictionary of url parameters
	def setParams(self, params):
		self.params = params

	def setOneParam(self, param, value):
		self.params[param] = value

	def removeParam(self, param):
		if param in self.params:
			del self.params[param]

	def getImageURL(self):
		self.setOneParam("hgt.psOutput", "on")
		return self._getURL()
		
	def getLinkURL(self):
		self.removeParam("hgt.psOutput")
		return self._getURL()

	def _getURL(self):
		paramsenc = urllib.urlencode(self.params)
		url = "%s%s?%s" % (baseurl, pagename, paramsenc)
		return url

	def dumpPDF(self, pdffile):
		url = self.getImageURL()
		imgpagestring = urllib.urlopen(url).read()
		self._getPDFFromImagePage(imgpagestring, pdffile)

	def dumpPDFAndPNG(self, filebase):
		pdffile = filebase + ".pdf"
		self.dumpPDF(pdffile)
		pngfile = self.convertImage(pdffile, "png")
		return pdffile, pngfile

	# do we need both of these parameters?
	def setSessionDetails(self, sessionurl):
		self.setOneParam("hgS_doLoadUrl", "submit")
		self.setOneParam("hgS_loadUrlName", sessionurl)

	# in this case, bedurl could be a single bed file or
	# a url with a list of urls containing the bed files we want to upload
	def setBEDFiles(self, bedurl):
		self.setOneParam("hgt.customText", bedurl)

	def convertImage(self, inpath, outtype):
		# I wish python had extension removal in basename...
		basepath = inpath.replace(".pdf", "")
		outpath = basepath + "." + outtype
		# magic density value gives us the right appearance for a web output
		cmd = "%s -density 125 %s %s" % (convertcmd, inpath, outpath)
		print "executing convert command:", cmd
		os.system(cmd)
		return outpath

	# private
	def _getPDFLinkFromImagePage(self, pagestring):
		pdfre = 'HREF="([^"]+.pdf)"'
		match = re.search(pdfre, pagestring)
		return match.group(1)

	# private
	def _getPDFFromImagePage(self, pagestring, pdffile):
		link = self._getPDFLinkFromImagePage(pagestring)
		url = baseurl + link
		pdfdata = urllib.urlopen(url).read()
		pdffh = open(pdffile, "w")
		pdffh.write(pdfdata)

if __name__ == "__main__":
	if len(sys.argv) != 3:
		print "usage: ./ucscimagegrab.py <sessionfile> <bedfileurl>"
		sys.exit(1)
	# these should both be publically available urls
	sessionurl = sys.argv[1]
	bedurl = sys.argv[2]
	params = { 	"hgt.psOutput"	:	"on",
				}
	ut = UCSCTools(params)
	ut.setSessionDetails(sessionurl)
	ut.setBEDFiles(bedurl)
	print ut.getLinkURL()
#	ut.dumpPDFAndPNG("tester")

(I know I should find a better way to distribute my code than this. Sorry.)