Affiliations




Home‎ > ‎

CDI Seminar Series


The Center for Time Domain Informatics hosts a series of talks under the auspices of the NSF CDI grant “Real-time Classification of Massive Time-series Data Streams."  The seminar series brings in speakers that work on the interface of astronomy, statistics, and computer science.  Many speakers stay at UC Berkeley for several days.  If you would like to arrange to meet the speaker, contact Joey Richards.


Upcoming Talks:



Title:  Wavelet Spectral Analysis for Irregularly Sampled Time Series

Speaker:   Debashis Mondal
              University of Chicago, Dept. of Statistics
Tuesday, April 10, 2012 |  2:00 - 3:00 PM  | 1011 Evans Hall  

Abstract:

Examples of irregularly sampled time series abound in many areas of science, but their analyses introduce numerous statistical challenges. For example, the standard wavelet variance analysis, which has emerged as an accepted statistical approach for studying the variability of time series, is intended to be applied only to regularly sampled time series, and can not easily cope with irregular or unevenly sampled data.  After a brief review of the existing approaches to analysis of irregularly sampled time series,  we will explore two new statistical approaches to this problem.  First, we will discuss approximate scale-based analysis of variance for time series based upon  the so-called Slepian wavelets.  In many ways, this approach is comparable to the multitaper spectral approach based on the notion of generalized Slepian sequences and others. Slepian wavelets arise as eigenfunctions of an energy maximization problem in a pass band of frequencies.  For irregularly sampled time series data, we will extend the notion of dyadic scales, and derive corresponding statistical theory for Slepian-based wavelet variances. We will show via a simulation study how our method adapts to sampling times with mild irregularities.  Second, we will consider a general framework for estimating wavelet variances for irregularly sampled time series.  Here, we will extend the work of Mondal and Percival (2010), and propose new inference procedures.  We will demonstrate potential use of our methods on a light curve data from variable stars. If time permits, we will discuss situations where wavelet approaches might have an edge over more  traditional spectral approaches, such as the famous Lomb-Scargle periodogram,  the multitaper spectral analysis, and the work of Masry. 

This is joint work with Don Percival.



Title:  Measuring the Undetectable: Finding faint rare objects in large astronomical surveys

Speaker:   David Hogg
              Associate Professor of Physics, NYU
Friday, April 13, 2012 |  11:00 AM - 12:00 PM  | 1011 Evans Hall  

Abstract:

Standard astronomical practice (make catalogs, search in catalogs, follow up with image analysis or new data) prevents us from making many important kinds of discoveries in large archived astronomical surveys.  I will show that we can measure the proper motions or variabilities of sources that are too faint to be detected at any of the imaging epochs in a multi-epoch survey (like SDSS Stripe 82 or LSST).  I will show methodologies we are pursuing to measure the properties of stellar populations that are unresolved (because of confusion) or gravitational lenses that are unresolved (because of poor PSF).  If we can find ways to avoid the lossy step of making catalogs, we might be able to enormously amplify the scientific return from the next generation of astronomical imaging surveys.  Warning: Some of my proposals may appear unrealistic!



Past Talks:


Title:  Uncovering the Morphological Properties of Galaxies at High Redshift

Speaker:   Peter Freeman
              Carnegie Mellon University, Dept. of Statistics
Tuesday, March 13, 2012 |  1:00 - 2:00 PM  | 1011 Evans Hall  

Abstract:

A thorough investigation of cosmological theories of hierarchical structure formation requires the accurate and precise identification of galaxy morphologies as a function of redshift.  One aspect of any such investigation is the determination of the galaxy merger rate and its time evolution.  Astronomers identify mergers by finding complex substructures within a galaxy's project brightness profile, such as double nuclei.  Because visual classification is time consuming both in development of infrastructure and implementation, and because working with full images is computationally inefficient, astronomers fall back on using nonparametric summary statistics in their attempt to detect mergers. However, they are finding that established statistics that work well in the local Universe do not work as efficiently at high redshift.  In this talk, I will discuss new summary statistics that we have developed at Carnegie Mellon that markedly improve merger detection, as well as new avenues for morphological description that we are beginning to explore.


(Special Mini-CDI Seminar)
Title:  Exploring the Dark Universe: Computational, Statistical, and Data Challenges 

Speaker:   Katrin Heitmann
              Intelligence & Space Research Division, Los Alamos Nat. Lab
Wednesday, March 14, 2012 |  1:00 - 2:00 PM  | 1011 Evans Hall  

Abstract:

Cosmology -- the study of the origin, evolution, and constituents of the Universe -- is in a scientifically very exciting phase. Two decades of surveying the sky have culminated in the celebrated ``Cosmological Standard Model''. Yet, two of its key pillars, dark matter and dark energy -- together accounting for 95% of the mass-energy of the Universe -- remain mysterious.  Deep fundamental questions demand answers: What is dark matter made of? Why is the Universe's expansion rate accelerating? Should general relativity be modified? What is the nature of primordial fluctuations?  What is the exact geometry of the Universe? To address these burning questions, survey capabilities are being exponentially improved. Next-generation observatories will open new routes to understand the true nature of the ``Dark Universe''. These observations will pose tremendous challenges on many fronts -- from the sheer size of the data that will be collected (more than a hundred Petabytes) to its modeling and interpretation. The interpretation of the data requires sophisticated simulations on the world's largest supercomputers. The cost of these simulations, the uncertainties in our modeling abilities, and the fact that we have only one Universe that we can observe opposed to carrying out controlled experiments, all come together to create a major test for computational, statistical, and data analysis methods. 

In this talk I will give a very brief introduction to the Dark Universe and outline the challenges ahead. To combat these challenges, close cross-disciplinary collaborations between physicists, statisticians, and computer scientists will be crucial. I will discuss two examples of successful collaborative work and propose new tasks where cosmologists urgently need help from the data and statistics community. 



Title:  Weighing the Dark Sky

Speaker:   Ethan Anderes
               Assistant Professor, UC Davis Dept. of Statistics
Thursday, March 8, 2012 |  12:30 - 1:30 PM  | 1011 Evans Hall  

Abstract:

This talk presents a new estimation method for mapping dark matter density from observed CMB intensity and polarization fields. Our method uses Bayesian techniques to estimate  the average  curvature of the lensing gravitational potential over small local regions. These local curvatures are then used to construct an estimate of a low pass filter of the projected dark matter density. By utilizing Bayesian/likelihood methods one can easily overcome problems with missing and/or non-uniform pixels and problems with partial sky observations (E and B mode mixing, for example). Moreover, our methods are local in nature which allow us to easily model spatially varying beams and are highly parallelizable. We note that our estimates do not rely on the typical Taylor approximation which is used to construct estimates of the gravitational potential by Fourier coupling. This work is based on collaboration with Lloyd Knox (Physics, UC Davis) and Alexander van Engelen (Physics, McGill).



Title:  Machine Learning Methods for Real Time and Archival Classification of Astronomical Transients and Variables

Speaker:   Umaa Rebbapragada
                Principal Investigator, JPL Machine Learning and Instrument Autonomy (MLIA) Group
Thursday, March 1, 2012 |  12:30 PM - 2:00 PM  | 1011 Evans Hall  

Abstract:

This talk presents machine learning techniques for archival and real time classification of astronomical transients and variables.  These methods were developed as part of collaborations with the Australian Square Kilometre Array Pathfinder's (ASKAP) Variable and Slow Transients (VAST) survey and the Palomar Transient Factory.    VAST is an unprecedented wide-field survey that will enable novel scientific discoveries related to known and unknown classes of radio transients and variables.  Archival (offline) classification occurs in the data archive in order to enable source type queries from end users.  Real time (online) classification occurs during real time processing in order to trigger appropriate follow up when transient phenomena are detected.  Both tasks require automated methods to classify sources in the time domain.  In order to estimate classification performance in both settings, and determine best practices prior to the launch of ASKAP's BETA in 2012, we performed a study of machine learning techniques on simulated VAST light curves.  Through this study, we identify candidate light curve characterizations and classification algorithms, and study performance under different observing strategies and levels of noise in both the offline and online settings.  Our results show that the choice of light curve characterization influences classification performance more strongly than learning algorithm selection, and that a combination of feature sets yields best performance.

The Palomar Transient Factory (PTF) is a fully-automated synoptic sky survey that has demonstrated real-time discovery of astronomical transient events.  I will briefly discuss preliminary results on the binary classification of optical transient and variable sources from PTF as real or bogus.

Talk Slides



Title:  Data Mining to Perform Novel Science on Large Astronomical Datasets

Speaker:   Nick Ball
                Assistant Research Officer, Herzberg Institute for Astrophysics

 

Tuesday, February 14, 2012 |  11 AM - 12:30 PM  | 1011 Evans Hall  

Abstract:

I will give an overview of my work since 2004 on using data mining to perform novel science on large astronomical datasets, focusing on (1) Morphological galaxy classification in the Sloan Digital Sky Survey (SDSS) using artificial neural networks; (2) Star-galaxy separation in the SDSS using decision trees; (3) Photometric redshifts of SDSS and Galaxy Evolution Explorer quasars using k nearest neighbors; and (4) Separation of galaxies that are Virgo members from those in the background using unsupervised clustering in the Next Generation Virgo Cluster Survey. Several of these represent somewhat pioneering studies that have much relevance to the current and future era of terascale and petascale data. For each study, I will provide a brief review of the result, then relate the result to more recent developments and possible future directions. Finally, I will provide a few general remarks on the current state of Astroinformatics, and its future prospects, from the point of view of an astronomer who utilizes data mining.



Title:  Mapping the Galactic Halo in the Era of Wide-Area Surveys

Speaker:   Branimir Sesar

               Postdoctoral Scholar, Astronomy Department, Caltech

Tuesday, January 24, 2012 | 1:00 PM - 2:30 PM | 1011 Evans Hall  

Abstract:

Studies of the Galactic stellar halo can help constrain the formation history of the Milky Way and galaxy formation processes in general. In the past few years, these studies have benefited greatly from the wealth of data provided by wide-area, multi-wavelength, and multi-epoch surveys such as SDSS, LINEAR, and PTF. I will present an analysis of Galactic halo structure and substructure traced by main-sequence and RR Lyrae stars selected from these wide-area surveys and will outline some of the challenges and solutions to handling such large data sets in astronomy.



Title:  Modeling stellar variability and correlated noise in photometric time-series with Gaussian processes

Speaker:   Suzanne Aigrain

                 Lecturer of Astrophysics, Dept of Astrophysics, Oxford Univ.

Friday, December 9, 2011 | 11:00 AM - 12:00 PM | 1011 Evans Hall              

Abstract:

Gaussian processes (GPs) are a family of Bayesian statistical methods, that have become widely used in the machine learning literature over the last 15 years. They are particularly well-suited to modeling time-series containing correlated noise and/or stochastic astrophysical signals. In my talk I will attempt to give a didactic introduction to Gaussian processes and show a few example applications from my own research on exoplanets and stellar activity. I will also briefly mention sequential decision-making methods based on Gaussian processes, which can be used to optimise sampling strategy in the context of limited observational and computational resources.



Title:  Classification of the variable stars of Hipparcos and Gaia

Speaker:   Laurent Eyer
                 Geneva Observatory, Master of Research and Teaching
                 Coordination Unit Manager of the Gaia space mission (Variability Processing)
Monday, December 5, 2011 | 11:00 AM - 12:00 PM | 1011 Evans Hall              

Abstract:

Hipparcos was a space mission of the European Space Agency dedicated to astrometry, launched in 1989. In addition to the astrometric measurements, this mission also provided photometric measurements which were used to make a systematic study of the variable phenomena.  In 2011, Hipparcos is still the only whole sky survey with a published systematic variability analysis.  Building on the success of Hipparcos, The Gaia mission, a future space mission of the European Space Agency, to be launched in 2013, will allow us to probe the variable universe in an unprecedented way. In this talk, a global approach towards the study of variable phenomena will be presented.



Title: Staring at the Black-Box: Statistical Inference in the Physical Sciences

Speaker:   Paul Baines
                 Assistant Professor, Department of Statistics, UC Davis
Monday, October 24, 2011 | 3:30 PM - 4:30 PM | 1011 Evans Hall
               
Abstract:

Many modern statistical applications involve noisy observations of an underlying process that can best be described by a complex deterministic system. In fields such as astronomy, astrophysics and the environmental sciences, these systems often involve the solution of partial differential equations that represent the best available understanding of the physical processes. Statistical computation in this context is typically hampered by either look-up tables or expensive "black-box" function evaluations. 

We present an example from astrophysics with a "look-up table likelihood": the analysis of stellar populations. Astrophysicists have developed sophisticated models describing how intrinsic physical properties of stars relate to observed photometric data. The mapping between the parameters and the data-space cannot be solved analytically and is represented as a series of look-up tables. We present a flexible hierarchical model for analyzing stellar populations. Our computational framework is applicable to many "black-box" settings, and robust to the structure of the black-box. The performance of various sampling schemes will be presented, together with the results for an Astronomical dataset.

This is joint work with Xiao-Li Meng, Andreas Zezas and Vinay Kashyap


Title: The Era of Precision Astronomy

Speaker:   Christopher Miller
                 Assistant Professor, Department of Astronomy, University of Michigan
Monday, September 26, 2011 | 3:30 PM - 4:30 PM | 1011 Evans Hall
                 
Abstract: 

Astronomy, Astrophysics, and Cosmology have entered an era defined by  precision. From a technical perspective, progress will rely on modern  computational and statistical algorithms, as well as high performance  computing and networking infrastructure. From a scientific  perspective, progress will rely on an unprecedented understanding of  the systematic uncertainties in the data. I will discuss these topics  in the context of cosmology, as well as the evolution of galaxies and  clusters of galaxies, paying particular attention to future astronomical facilities (or a lack thereof).


Title: Transfer and Active Learning in Astronomical Datasets

Speaker:    Ricardo Vilalta
                 Associate Prof., University of Houston, Department of Computer Science
                 Director, Pattern Analysis Laboratory
Thursday, September 1, 2011 | 2:00 PM - 3:00 PM | 1011 Evans Hall

Abstract: 

Machine learning has recently become a key computational tool for the analysis and understanding of scientific data. The talk discusses two paradigms in machine learning of great interest to the astronomical community. The first paradigm is called "transfer learning", where the idea is to exploit the existence of a data model from a similar -but not exactly the same- domain of application to the new domain. The second paradigm is called "active learning", where we are able to dynamically select only those instances of the new domain most informative for prediction. Both transfer and active learning can greatly enhance the predictive accuracy and efficiency of learning. The talk describes how these paradigms can be applied to the automatic geomorphic mapping of planet Mars by dividing a landscape (represented by an image, digital elevation model DEM, or other spatial datasets) into a set of landscape elements having specific surface patterns. These elements are later grouped into a set of clear landforms (e.g., craters, valleys, ridges, etc.). The talk discusses how these two paradigms can be applied to other problems in astronomy such as variable star classification, galaxy classification, etc. 

Slides: www



TitleUltracompact binaries containing two stellar remnants

Speaker: Paul Groot
                 Department Chair in Nijmegen, the Netherlands
                 (on 2011 sabbatical at CalTech)
Tuesday, June 21, 2011 | 11:00 AM - 12:30 PM | 1011 Evans Hall
Abstract
The ultimate end point of binary stellar evolution, interacting
ultracompact binaries contain two stellar remnants in periods as short
as 5 minutes. These systems play a central role on the discussion of
steady gravitational wave sources, mergers of stellar remnants,
progenitors of supernovae Type Ia and .Ia, and are a unique
possibility to study the influence of chemical composition on
accretion disk, nova and dwarf nova outburst physics. I will give an
overview of our current understanding of these enigmatic binaries,
some of which are no larger than 8 times the Earth's radius. Over the
last ten years enormous observational and theoretical steps forwards
have been made in this field.

Title: Accounting for Calibration Uncertainty in High Energy Astrophysics via the Partially Collapsed Gibbs Samplers with MH updates
Speaker: David A. van Dyk
                University of California, Irvine & Imperial College London
Monday, April 4, 2011 | 2:30 PM - 4:00 PM | 1011 Evans Hall

Abstract

The analysis of high-energy spectra and images in astronomy relies on prelaunch and space-based analysis of the operating characteristics of the photon detectors used for space-based data collection. This involves the observation and analysis of known sources along with sophisticated computer models of the telescopes. The resulting calibration products include point-spread functions, exposure maps, effective area curves, and redistribution matrices for the photon energies. Although these products are only known approximately and with complex correlation structures among their components, they are typically taken as known in the final analyses. In this talk we explore the effect of calibration uncertainly on parameter estimation and uncertainty assessment and develop a suite of statistical methods that aim to properly account for calibration uncertainty.  Our proposed methods vary from a relatively simple but approximate technique based on multiple imputation to a computationally expensive fully Bayesian technique.  Our Bayesian model fitting relies on Markov chain Monte Carlo for posterior simulation and involves the use of Metropolis Hastings (MH) updates within a Partially Collapsed Gibbs (PCG) sampler (van Dyk and Park, 2008, Journal of the American Statistical Association, Park and van Dyk, 2009, Journal of Computational and Graphical Statistics). While computationally efficient  the PCG sampler may involve functionally incompatible conditional distributions and we illustrate how the introduction of MH into the sampler requires care in order to ensure that the target stationary distribution is maintained.  Finally, we use a sample of radio loud quasars to illustrate the substantial effect that properly accounting for calibration uncertainty can have on the error bars of the fitted parameters in high-energy spectral analysis.


Title:
The Challenge and Potential of Likelihood-Free Inference in Cosmology
Speaker: Chad Schafer
               Carnegie Mellon University, Department of Statistics
Monday, March 28, 2011 | 2:30 PM - 4:00 PM | 1011 Evans Hall

Abstract

Statistical inference of cosmological quantities of interest is 

complicated by significant observational limitations, including heteroscedastic measurement error and irregular selection effects. These observational difficulties exacerbate challenges posed by the often-complex relationship between estimands and the distribution of observables; indeed, in some situations it is only possible to simulate realizations of observations under various assumed cosmological theories. When faced with these challenges, one is naturally led to consider utilizing repeated simulations of the full data generation process, and then comparing observed and simulated data sets to constrain the parameters. This talk will discuss the issues faced in implementing such a likelihood-free procedure, with emphasis on how to best make use of the rich data sources available. I will discuss Approximate Bayesian Computation, a procedure originally motivated by similar inference problems in population genetics, and propose frequentist alternatives in the context of a bivariate luminosity function estimation problem.

This work is supported by NASA AISR Grant NNX09AK58G.


(CDI Mini-Seminar)
Title: 
 
The Damped Random Walk: A Statistical Description for the Time Variability of Quasars
Speaker: Chelsea MacLeod
               University of Washington, Department of Astronomy
Thursday, March 17, 2011 | 10:30 AM - 11:30 AM | 544 Campbell Hall

Abstract

The optical variability of Type I broad line quasars provides information on the physics of accretion disks.  The mathematical damped random walk model successfully describes complex quasar light curves using only three free parameters: the mean value, a driving amplitude, and a characteristic timescale. I will discuss the application of this model to SDSS stripe 82 light curves for ~10,000 quasars. Using the knowledge gained from the stripe 82 sample, we can understand the observed ensemble variability of ~35,000 quasars with only 2 observations in the full SDSS footprint.  I will conclude with the current challenges and further improvements to be made using future sky surveys such as LSST.



Title: 
Bayesian Inference and Nested Sampling in Astronomy

Speaker: Brendon Brewer
               UC Santa Barbara, Department of Physics
Tuesday, February 22, 2011 | 2:00PM - 3:30PM | 1011 Evans Hall

Abstract:
 
In recent years, Bayesian Inference has become the standard framework for quantifying uncertainty in astronomical conclusions. In practice, methods such as Markov Chain Monte Carlo (MCMC) are commonly used to produce samples from the posterior distribution, making marginalization straightforward. However, MCMC typically cannot compute the normalizing constant needed for model selection, and often has difficulty exploring multimodal or strongly correlated distributions. For these reasons, a new method called Nested Sampling is gaining popularity, particularly among astronomers. I will describe the basic ideas behind Nested Sampling, and then explain a modified version that significantly improves its accuracy, at no additional computational cost. Finally, I will present some applications to the fields of gravitational lens modeling and reverberation mapping of active galactic nuclei.
Slides: www


Title: 
Reconstructing the Truth
Speaker: Andy Connolly
               University of Washington, Department of Astronomy
Wednesday, February 9, 2011 | 12:00PM - 1:30PM | 1011 Evans Hall

Abstract
With the dramatic growth in survey astronomy, mapping the sky over a broad range of wavelengths is becoming a common place activity. Imaging and spectroscopic surveys will soon map large volumes of the universe in unprecedented detail. One of the challenges we now face is how can we reconstruct an underlying model for the universe or find unusual or anomalous sources from inherently noisy and incomplete data streams. In this talk I will discuss how we can approach these questions in the context of reconstructing the mass density field from gravitational lensing and, through comparisons to non-parametric models, searching for unusual sources within spectral surveys.




Title: 
Exoplanets, Galactic Orbits, and Multiplicity
Speaker: Jim Berger
               Duke University, Department of Statistics
Thursday, February 3, 2011 | 2:00PM - 3:30PM | 1011 Evans Hall

Abstract: Some recent work on discovery of exoplanets using Bayesian model selection methods will be discussed, including a new algorithm for computing marginal likelihoods of models. A problem involving the orbital composition of galaxies will also be considered; this becomes a statistical inverse problem with high-dimensional constraints. Finally, if time permits, some thoughts on the Bayesian approach to control of multiple hypothesis testing will be presented. 



Title: Bayesian Inference from Photometric Surveys
Speaker: Tamas Budavari
              Johns Hopkins University, Department of Astronomy
Tuesday, November 30, 2010 | 11:00AM - 12:30PM | 1011 Evans Hall

Abstract:  
With the upcoming survey telescopes just around the corner, the 
statistical and computational challenges of astronomy are more prominent 
than ever. We will discuss some of the fundamental issues that are at the
core of all photometric analyses. A powerful Bayesian approach is 
introduced for cross-identifying astronomical sources, which is 
extendable, e.g., to incorporate models of spectral energy distributions, 
to accommodate the proper motion of stars, or to match transient events in 
space and time. Probabilistic inferences open up new possibilities for 
determining properties of celestial objects based on their photometric 
observations. Constraints on photometric redshifts and other physical 
parameters in the more general inversion problem are derived from first 
principles that also point us toward the next steps. 

Slides: www


(CDI Mini-Seminar)
Title: Data-Intensive Computing in Astronomical Analysis
Speaker: Tamas Budavari
               Johns Hopkins University, Department of Astronomy
Wednesday, December 1, 2010 | Informal Discussion at CFTDI Coffee Hour

Slides: www


Title: Chi-square fitting, KS tests, MLE, and the bootstrap: Fitting astrophysical models to data
Speaker: Jogesh Babu
               Penn State University, Department of Statistics
Monday, November 22, 2010 | 11:00AM - 12:00PM | 261 Campbell Hall

Abstract: Complicated models from astrophysical theory are often fit to observational data. An example from X-ray
 astronomy, fitting a complicated thermal model with several temperatures to a spectrum from the Chandra X-ray Observatory, is considered here. First, `chi-square minimization' is commonly used for fitting functions often disregard mathematical assumptions. Second, the use of the Kolmogorov-Smirnov (KS) test for goodness-of-fit testing is misused in astronomy when the model parameters are estimated from the dataset under study. Third, the KS is inefficient at detecting deviations between the data and model at the tails of the distribution. Fourth, the KS test cannot justifiably be applied to multivariate data as KS is no longer distribution-free. After a historical review of maximum likelihood approaches to model fitting, we show how bootstrap resampling methods, a simple Monte Carlo procedure on data, can be used to estimate the null distributions in such cases including multivariate problems. Recent extensions of resampling methods address inference when the data are drawn from an unknown distribution which may or may not belong to a specified family of distributions. This is the `model misspecification' problem; e.g. does the X-ray spectrum arise from thermal or nonthermal processes
?

Slides: www


Title: Large-Scale Prediction Problems
Speaker: Brad Efron (www)
              Stanford University Department of Statistics
Wednesday, November 10, 2010 | 11:00AM - 12:30PM | 1011 Evans Hall

Abstract: Classical prediction methods such as Fisher's linear discriminant function were designed for small-scale problems, where the number N of candidate predictors was much smaller than the number of observations n. Modern scientific devices often reverse this situation. A micro-array analysis, for example, might include n=100 subjects measured on N=10,000 genes, each of which is a potential predictor. I will discuss "Ebay", an empirical Bayes prediction algorithm designed to handle N >> n situations. It is closely related to the Shrunken Centroids algorithm of Tibshirani, Hastie, Narasimhan, and Chu.

Slides: www


(CDI Mini-Seminar)
Title: Supernova Challenge smackdown
Speaker: Dovi Poznanski, Joey Richards
               UC Berkeley, Department of Astronomy
June 10, 2010 | 3PM | 1011 Evans Hall

Abstract: A discussion about the SN challenge and the results of their separate classification entries. 
About the SN Challenge: http://arxiv.org/abs/1001.5210



Title: The New LSST Informatics and Statistical Sciences Research Team
Speaker: Kirk Borne  (www)
               
George Mason University, 
Professor of Astrophysics and Computational Science            
May 10, 2010 | 11am | 1011 Evans Hall

Abstract: The proposed Large Synoptic Survey Telescope (LSST) project (www.lsst.org) has several research collaboration teams that support science planning for future operations (2016-2026).  Most of these teams are are focused on traditional astronomical research subdisciplines.  The newest research team is moving in a new direction: Data Sciences.  More details on the science that will be enabled by the LSST are available in the LSST Science Book (http://www.lsst.org/lsst/scibook).  This talk will review the LSST project and the new research team.  The Informatics and Statistical Sciences research collaboration team comprises over 30 scientists from nearly as many institutions. The focus of the group will be to inform the LSST project management team (particularly the LSST Science Working Group and the data management team) on issues regarding the uses and usability of LSST's massive 100-petabyte science data archive and 20-petabyte science catalog database for research in the focus areas of Informatics and Statistics.  Informatics refers to the data-intensive science research areas of data mining, machine learning, visualization, and data-intensive computing. These research activities will also be described within the context of the new Data-Enabled Science (DES) agenda of the NSF Directorate for Mathematical and Physical Sciences (MPS).  We will review highlights from the recent DES report and some of the DES recommendations for NSF's MPS divisions -- this report was presented to the NSF MPS Advisory Committee in April 2010.



Title: Machine Learning and Statistics on Astronomically Large Datasets

Speaker: Alex Gray  (www)
               
Georgia Institute of Technology, 
Director of FASTlab, CS and Eng. Division College of Computing
April 19, 2010 | 11am | 1011 Evans Hall

Abstract: I'll describe algorithms and data structures for allowing the most powerful machine learning and multivariate statistical methods, which often scale quadratically or even cubically with the number of data points, to be performed many orders of magnitude faster than naive implementations. Such techniques can make previously impossible statistical analyses tractable on the scale of entire astronomical sky surveys, which contain hundreds of millions of data. I will discuss scalable algorithms we have developed for n-point correlations, friends-of-friends (aka hierarchical clustering), nearest-neighbors, kernel density estimation, nonparametric Bayes classification, local linear regression, principal component analysis, semidefinite manifold learning methods, hidden Markov models, k-means, support vector machine classifiers, and Gaussian process regression, among others. In addition to techniques inspired by computational geometry, fast multipole methods, Monte Carlo integration, and optimization theory, we employ a distributed framework which can be thought of as a higher-order analog to Google's MapReduce. Our algorithms have enabled several first-of-a-kind large-scale cosmological analyses probing fundamental questions of physics.



(CDI Mini-Seminar)
Title: Gamma-Ray Burst Forecasting: Redshift Inference From Early-Time Metrics
Speaker: Adam Morgan
              UC Berkeley, 
Department of Astronomy
      
February 17, 2010 | 11am | 1011 Evans Hall

Slides: (www)


Title: Variable Stars - What do we know?

Speaker: Steven B. Howell
               National Optical Astronomy Observatory (NOAO)              
February 8, 2010 | 11am | 1011 Evans Hall

Abstract: Variable stars have been observed in some detail for over 100 years. Many recent surveys have attempted to provide more categorization of variable objects beyond periodic and non-periodic. This has occurred with limited success. I will present an introduction to the many aspects of variable stars and where we stand today and look ahead a bit into the next generation of large, multi-color surveys and what they could do to help understand variability.

Title: Analytics using Aster Data’s nCluster

Speaker: Jonathan Goldman  (www)
               Director of Analytics and Applications at Aster Data                
December 14, 2009 | 11am | 1011 Evans Hall

Abstract: Aster Data is a proven leader in big data management and big data analysis for data-driven applications.  Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable ultra-fast, deep analysis of massive data sets. Jonathan Goldman, Dir of Analytics and Applications, will present the overall architecture of Aster Data and discuss the vision for embedding advanced analytics within Aster Data’s nCluster. He’ll draw from some examples developed while at LinkedIn. He’ll discuss Map Reduce applications that can be run on the Aster system. The goal is to make this an informal and open discussion and learn more about the challenges facing the Real-time Classification of Massive Time-series Data Streams.

(CDI Mini-Seminar)
Title: Time series related attributes in astronomy and results from the Transients Classification Pipeline.
Speaker: Dan Starr  (www)
               UC Berkeley, Department of Astronomy             
November 7, 2009 | 11am | 1011 Evans Hall

Abstract: I'll head a discussion about time series characterizing attributes currently used by the TCP, with the intent to get insight and feedback by the group on other attributes or algorithms which may be of use.  I'll present some work and successes made by other astronomers with respect to classifying periodic variable sources, and I'll show decision trees and classification results generated by the TCP.


Title: New Analysis Methods for Event Data; with Applications to Fermi Gamma Ray Space Telescope Data.

Speaker: Jeff Scargle  (www)

               NASA Ames, Astrophysicist in the Planetary Systems Branch, Astrobiology and Space Science Division                 
October 22, 2009 | 1pm | 1011 Evans Hall

Abstract: Jeff is best known for his statistics with periodic signatures in unevenly sampled data (i.e.: the Lomb-Scargle Periodogram) but he has worked on a variety of interesting statistics and computation problems involving astronomical time-series data.