Genome Browser Snapshots

We use the UCSC Genome Browser for a wide variety of genomic visualisation tasks. I spend a fair proportion of my time preparing tracks for UCSC, including (but not limited to):

  • Solexa runs.
  • Restriction enzyme digestion sites.
  • Marking particular genes (and also occasionally using a colour map to indicate expression level.)

I’ve also implemented a navigation frame (hello, late 90s technology) so you can click through different regions. It’s a bit of a hack, but has been very useful to some of my colleagues, so on the whole a worthwhile bit of code.

The UCSC browser has a very clear URL based interface to uploading tracks and selecting the proper genome build for working with. You can put a track on a webserver and reference it by its URL to upload it. You can also reference a URL full of other URLs to upload many tracks at once, which is really handy. It also has a neat session file feature, which allows you to save the other tracks you want displayed below your own data.

Most of the time, uploading the tracks to UCSC and browsing manually is the right way to interact with the data. However, as always, it’s sometimes useful to be able to script the visualisations. So I wrote a class that allows you to reference a session and track URL and retrieve a PDF file with a snapshot of the browser with the chosen data.

According the UCSC techs, there is a limit of one hit every 15 seconds and 5,000 images a day.

#!/usr/bin/env python

import urllib
import re
import os
import sys

# gets a screenshot from UCSC of a specifid bed file
# a lot of the work is done in the bed file itself
# and a session file, which tells UCSC what tracks should display and how.
# Upgrades of this code might include exposing a few of those session parameters
# through the object

baseurl = "http://genome.ucsc.edu/cgi-bin/"
pagename = "hgTracks"
convertcmd = "/usr/bin/convert"


class UCSCTools:
	def __init__(self, params=None):
		self.params = params
		
	# params is a dictionary of url parameters
	def setParams(self, params):
		self.params = params

	def setOneParam(self, param, value):
		self.params[param] = value

	def removeParam(self, param):
		if param in self.params:
			del self.params[param]

	def getImageURL(self):
		self.setOneParam("hgt.psOutput", "on")
		return self._getURL()
		
	def getLinkURL(self):
		self.removeParam("hgt.psOutput")
		return self._getURL()

	def _getURL(self):
		paramsenc = urllib.urlencode(self.params)
		url = "%s%s?%s" % (baseurl, pagename, paramsenc)
		return url

	def dumpPDF(self, pdffile):
		url = self.getImageURL()
		imgpagestring = urllib.urlopen(url).read()
		self._getPDFFromImagePage(imgpagestring, pdffile)

	def dumpPDFAndPNG(self, filebase):
		pdffile = filebase + ".pdf"
		self.dumpPDF(pdffile)
		pngfile = self.convertImage(pdffile, "png")
		return pdffile, pngfile

	# do we need both of these parameters?
	def setSessionDetails(self, sessionurl):
		self.setOneParam("hgS_doLoadUrl", "submit")
		self.setOneParam("hgS_loadUrlName", sessionurl)

	# in this case, bedurl could be a single bed file or
	# a url with a list of urls containing the bed files we want to upload
	def setBEDFiles(self, bedurl):
		self.setOneParam("hgt.customText", bedurl)

	def convertImage(self, inpath, outtype):
		# I wish python had extension removal in basename...
		basepath = inpath.replace(".pdf", "")
		outpath = basepath + "." + outtype
		# magic density value gives us the right appearance for a web output
		cmd = "%s -density 125 %s %s" % (convertcmd, inpath, outpath)
		print "executing convert command:", cmd
		os.system(cmd)
		return outpath

	# private
	def _getPDFLinkFromImagePage(self, pagestring):
		pdfre = 'HREF="([^"]+.pdf)"'
		match = re.search(pdfre, pagestring)
		return match.group(1)

	# private
	def _getPDFFromImagePage(self, pagestring, pdffile):
		link = self._getPDFLinkFromImagePage(pagestring)
		url = baseurl + link
		pdfdata = urllib.urlopen(url).read()
		pdffh = open(pdffile, "w")
		pdffh.write(pdfdata)

if __name__ == "__main__":
	if len(sys.argv) != 3:
		print "usage: ./ucscimagegrab.py <sessionfile> <bedfileurl>"
		sys.exit(1)
	# these should both be publically available urls
	sessionurl = sys.argv[1]
	bedurl = sys.argv[2]
	params = { 	"hgt.psOutput"	:	"on",
				}
	ut = UCSCTools(params)
	ut.setSessionDetails(sessionurl)
	ut.setBEDFiles(bedurl)
	print ut.getLinkURL()
#	ut.dumpPDFAndPNG("tester")

(I know I should find a better way to distribute my code than this. Sorry.)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: