Dan Starr
Software Engineer, Dept. of Astronomy, UC Berkeley
July 2006
– Present
Primary software engineer for Berkeley's Center for Time Domain Informatics (http://cftd.info/). Also primary developer of Berkeley's real-time "Transients
Classification Pipeline". This project incorporates machine learning to
identify and classify science from the PTF telescope's nightly
datastream. This primarily Python based pipeline runs parallelized over
20-120 cores, uses a spatially indexed MySQL database approaching 1
billion rows, and classifies around 1 million timeseries datasets per
night in order identify science which should be followed up with other
telescopes. This project combines a variety of ML classifiers, is
deployed on a beowulf cluster as well as other machines, and recently
tested using a Hadoop Streaming, Cascading pipeline on both Yahoo's M45
cluster and a Cloudera derived custom AMI EC2 cluster.
My group is hosting UC Berkeley's "Machine-Learning with Real-time & Streaming Applications" Conference taking place on May 7-11, 2012.

-
P146.pdf
1064k - Jan 19, 2012 1:47 PM by Dan Starr (v1)
ADASS 2011 Paper:
ALLStars: Overcoming Multi-Survey Selection Bias using Crowd-Sourced Active Learning
-
http://arxiv.org/abs/1106.2832
0k - Jul 8, 2011 9:36 AM by Dan Starr (v1)
Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification
-
arXiv.org 1101.1959v1
0k - Jan 12, 2011 10:06 AM by Dan Starr (v1)
Astrophysical time series classification paper using a variety of machine learning algorithms. Submitted to ApJ 2011-01-11.
-
2010Aug_BerkeleyPythonBootCamp_TestingDebugging.pdf
3642k - Aug 26, 2010 2:22 PM by Dan Starr (v2)
Python Testing & Debugging talk given at Berkeley Python Bootcamp 2010-08-25
|
|
|
|
|