Dan Starr

Software Engineer, Dept. of Astronomy, UC Berkeley

July 2006Present


Primary software engineer for Berkeley's Center for Time Domain Informatics (http://cftd.info/).  Also primary developer of Berkeley's real-time "Transients Classification Pipeline". This project incorporates machine learning to identify and classify science from the PTF telescope's nightly datastream. This primarily Python based pipeline runs parallelized over 20-120 cores, uses a spatially indexed MySQL database approaching 1 billion rows, and classifies around 1 million timeseries datasets per night in order identify science which should be followed up with other telescopes. This project combines a variety of ML classifiers, is deployed on a beowulf cluster as well as other machines, and recently tested using a Hadoop Streaming, Cascading pipeline on both Yahoo's M45 cluster and a Cloudera derived custom AMI EC2 cluster.

My group is hosting UC Berkeley's "Machine-Learning with Real-time & Streaming Applications" Conference taking place on May 7-11, 2012.


View Dan Starr's profile on LinkedInFollow d_starr on Twitter
  • P146.pdf   1064k - Jan 19, 2012 1:47 PM by Dan Starr (v1)
    ‎ADASS 2011 Paper: ALLStars: Overcoming Multi-Survey Selection Bias using Crowd-Sourced Active Learning‎
  • http://arxiv.org/abs/1106.2832   0k - Jul 8, 2011 9:36 AM by Dan Starr (v1)
    ‎Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification‎
  • arXiv.org 1101.1959v1   0k - Jan 12, 2011 10:06 AM by Dan Starr (v1)
    ‎Astrophysical time series classification paper using a variety of machine learning algorithms. Submitted to ApJ 2011-01-11.‎
  • 2010Aug_BerkeleyPythonBootCamp_TestingDebugging.pdf   3642k - Aug 26, 2010 2:22 PM by Dan Starr (v2)
    ‎Python Testing & Debugging talk given at Berkeley Python Bootcamp 2010-08-25 ‎
Showing 4 files from page dstarr files.
 
Subpages (1): Personal Info