Research Systems Administrator Group

2023 RSAG Meetings

June 21, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom


John Jay Miller, Office of the Vice Chancellor for Research and Graduate Education


NSPM-33 refers to guidance issued in January 2022 (from the White House Office of Science and Technology Policy) to federal agencies for implementing National Security Presidential Memorandum 33.  This guidance includes cybersecurity areas and protocols that may be relevant to UW-Madison’s research systems.

 

John, who is Interim Director of the Research Security Program at the OVCRGE will lead a discussion of the requirements, the UW-Madison response so far, and implications for UW-Madison’s research computing and data systems.





NSPM33_RSAG_6_21_23.pptx

April 19, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom


Brian Lin and Aaron Moate, Center for High Throughput Computing

 

The Infrastructure Services (IS) team at the Center for High Throughput Computing manages a fabric of services that enable research computing on campus and across the nation. At this meeting, the IS team will discuss these services and what it takes to maintain them while advancing the frontiers of research cyberinfrastructure.




2023-04-19.chtc-inf-svc-rsag-1.pdf

March 15, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom


Cory Sherman, Department of Medicine

 

Cory will talk about his work creating an integration between Kubernetes and GitLab.



February 15, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom


Brian Wilt and Ken Hahn, Engineering

 

Brian and Ken will talk about some of the research support they're doing and their work expanding services available to research groups in the College of Engineering, including storage and data movement.


January 25, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom


Russell Dimond and David Thompson, Social Science Computing Coop

 

Russell and David will talk about a new SLURM partitioning resource that is being set up with funding from the Research Core Revitalization Program.  They’ll share their experience putting this new resource into production as well as the training and documentation they’ve developed to enable researchers to take advantage of this new service.


RSAG_SSCC_012523_RLD.pdf
RSAG_SSCC_012523_DAT.pdf

2022 RSAG Meetings

November 30, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom

Vladimir Brik, IceCube


IceCube has recently deployed a 9.6PB (raw) Ceph cluster and migrated to it some of their Lustre file systems. Vladimir will talk about his experiences making this happen.


November 16, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom

Suman Banerjee, Computer Science


Suman will discuss research projects and resources at the Wisconsin Wireless and NetworkinG Systems (WiNGS) lab.


September 28, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom

Jeff Savoy, Chief Information Security Officer, Office of Cybersecurity


Jeff will share information about UW System Policy 1038 and its implications for event logging. Topics could include logging efforts at the System, campus, and departmental level as well as backend technologies.




Cybersec_C2E_Logging_Preso_to_RSAG.pdf

July 20, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom

RSAG Community


This will be a very informal discussion in which we’ll share updates from each of our groups. Please feel free to briefly discuss any projects, new resources and capabilities, or even ideas you are addressing in your work.




June 15, 3:30-4:30, Hybrid Meeting : Room 3139A Computer Science and Zoom

Brian Bockelman, Center for High Throughput Computing


Brian will share updates on activities at the CHTC.



2021 RSAG Meetings

June 16, 3:30-4:30, Zoom Meeting

Brian Kyle, Wisconsin Energy Institute


Brian will present on the GLBRC’s (Great Lakes Bioenergy Research Center) Data Catalog, their one-stop shop for GLBRC’s data, and how he and colleagues at the Wisconsin Energy Institute host/maintain/deploy a highly available rails application and its supporting infrastructure. This includes a range of things under their app stack, including gitops/IaC, app test and release processes, high availability, object storage, ansible, logging, and the like.



June 9, 3:30-4:30, Zoom Meeting

Brian Bockelman, Morgridge Institute/Center for High Throughput Computing


Brian will discuss recent improvements in capabilities for machine learning at the Center for High Throughput Computing.



CHTC and GPULab - RSAG - June 2021.pdf

April 21, 3:30-4:30, Zoom Meeting

Richard Kunert, UW Biotechnology Center


Richard will talk about his work setting up various mechanisms for delivering data to  UW Biotechnology Center customers. These include methods such as ResearchDrive uploads, Globus, SFTP and Web.


UWBC Data Downloads.pdf

March 17, 3:30-4:30, Zoom Meeting

Matthew Larson, Cryo-EM Center, Biochemistry


Matt will talk about the computing and storage that help support the Cryo-EM Research Center. He’ll discuss the uses of cryo-EM, detectors and generation of movie data, GPU processing requirements, his center’s Ceph storage, and use of Globus with the systems.


RSAG-Talk-MRL-3-17-2021_final.pdf

2020 RSAG Meetings

June 3, 3:30-4:30, Virtual Meeting

Michael Layde, Steve Devoti, Jeremy Sarauer, Pat Christian, DoIT

UW-Madison now has an enterprise license for Globus for data transfers between groups on campus and with external collaborators. The presenters will provide an overview of what the enterprise license includes and what's new with Globus Connect Server version 5 that may be of interest. The enterprise license supports multiple server/nodes at our institution and we'd like to facilitate a discussion at this meeting about collaborations and data sharing use cases at your research center and how Globus could support those.  

Globus

May 21, 3:30-4:30, Virtual Meeting

Marc West and Kristopher Keipert, NVIDIA

This session will provide an update on new NVIDIA capabilities pertinent to high performance computing clusters, which were formally announced by the NVIDA CEO on May 14 (See https://www.youtube.com/nvidia). Topics will include A100 GPU, the DGX A100 server, plus some software updates – SPARK 3.0,  Jarvis,  Parabricks. This session will be interactive with plenty of opportunities for questions and discussion.

May 13, 3:30-4:30, Virtual Meeting

Colin Vanden Heuvel from Mechanical Engineering will talk about management of heterogeneous hardware at the Simulation Based Engineering Lab.

Management of Heterogeneous Hardware

March 25, 3:30-4:30, Virtual Meeting

David Schultz and Steve Barnet, Physics, will talk about IceCube cloud bursting in the public cloud.

IceCube Cloud Demo.pdf

January 15, 3:30-4:30, Rm 302 Middleton Building, 1305 Linden Drive (3rd floor)

Chris Harrison from Biostatistics and Medical Informatics, will describe his experience organizing the UW-Madison booth at SuperComputing 19 this fall and get your ideas about how to represent UW-Madison at next year's conference. 


SC19 - UW.pdf

2019 RSAG Meetings

July 24, 2019, Union South (See TITU)

Sage Weil, Sage founder and chief architect of Ceph will share some Ceph basics plus what's new in Nautilus and coming in the Octopus updates. Bring details of your Ceph implementation to share along with any questions you'd like to ask Sage.


2019.07.24 wisc it group.pdf

June 5, 2019

3:30-4:30, Union South, see TITU (NEW LOCATION!!!!)

Tom Limoncelli, Site Reliability Engineering Manager, StackOverflow

Tom is the keynote speaker at the IT Professionals conference on June 6 (https://itproconf.wisc.edu). He agreed to meet with the RSAG group shortly after he arrives in Madison the day before! This will be an informal discussion that can be about any topics we like, so it will likely be completely different from his keynote the next day, which is about applying DevOps outside of software development.

Tom suggested these topics, as a starter:

Tom’s background is in sys admin and he has authored several books on that and related topics (https://www.amazon.com/Thomas-A.-Limoncelli/e/B004J0QIVM%3Fref=dbs_a_mng_rwt_scns_share)

March 13, 2019

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

CaRCC (Campus Research Computing Consortium)

Lauren Michael from the Center for High Throughput Computing will talk about CaRCC (https://carcc.org), the role UW is playing, and opportunities for RSAG members and others to participate.

February 13, 2019

 3:30-4:30, Rm 302 Middleton Building, 1305 Linden Drive (3rd floor)

Storage Discussion

Several research centers on campus will share their current storage solutions and challenges:

Biotechnology Center (Richard Kunert)

IceCube (Steve Barnet) 

SSEC Ceph (Kevin Hrpcek) 

SSEC Lustre (Scott Nolin)

WEI (Dirk Norman) 

DoIT Storage (Mike Layde)


2018  RSAG Meetings

October 10, 2018

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Dennis Lange and Mike Ippolito, DoIT Network Services

Dennis and Mike will  facilitate an open discussion to collect input from RSAG on the impact and challenges that changes to the routable private address space would create for research networking.

September 26, 2018 (Rescheduled date)

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

RSAG members who visited Argonne National Labs last month will debrief on what they saw and learned.

August 17, 2018

A subset of RSAG members are taking a field trip to Argonne National Labs.

June and July, 2018 

No meetings

Wednesday, May 16, 2018

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Chris Harrison, Biostats

Chris  discussed early findings from a paper he’ll be presenting at a workshop associated with the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2018)  at the end of May.  The topic of the paper is "atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers.”


Atsnp Infrastructure.pdf
atsnp_newOrg.pdf

Wednesday, March 21, 2018

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Chad Seys, Physics

Chad will talk about the HDFS (Hadoop File System) at Physics, to continue the distributed file system theme we’ve been on.

 

PhysicsHDFS032118.pdf

Wednesday, February 21, 2018

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Kevin Hrpcek, SSEC

Kevin was back to talk about SSEC's Ceph installation, how it is set up and how it works. The installation is currently Ceph Luminous 12.2..2 sized at 5 PB with 50 million objects. Some features of the system include an Erasure Code pool to save space when replicating objects—this adds a 25% overhead to IOPS. BlueStore is used as a storage backend and the system has an integration of the Ceph Rados Block device with Kubernetes to section off some of the data collection/analysis they need to do in short timeframes. For automated management and monitoring the system is set up with Puppet and Icinga NRPE plugin.

rsag_ceph.pdf

Wednesday, January 17, 2018

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Kevin Hrpcek, SSEC

Kevin described some unique things he’s doing with Condor, flocking, and Docker. He runs a Docker container on all of his compute hosts and allows flocking only within this container when his management scripts allow it. This is all done to increase security and reduce risk to his hosts and network. 

rsag_condor_flock.pdf

2017 RSAG Meetings

Wednesday, December 13, 2017

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Heath Skarlupka, IceCube

Heath discussed his recent deployment of a Kubernetes cluster for running a containerized ElasticSearch Cluster.

Wednesday, November 15, 2017

3:30-4:30, Room 302 Middleton Building, 1305 Linden Drive (3rd floor)

Tom Jordan, DoIT Middleware

Tom discussed federated identity management in the research space and how it can be used for cross-institutional research collaboration. DoIT’s middleware group has been pretty engaged with Internet2 / InCommon and Tom will talk about how other research cooperatives (LIGO, XSEDE, etc) are using federated identity management as well as the campus resources DoIT can provide to support cross-institutional collaboration, including help with federated identity management with vendor products and configuring research Service Providers to work with InCommon and eduGain. He introduced COmanage, a tool that can be used to build and manage identity within cross-institutional virtual organizations.


Federated Identity for Research.pptx

Wednesday, October 18, 3:30-4:30, Rm 302 Middleton Building (1305 Linden Dr)

Globus Connect

Richard Kunert, Biotechnology Center

Richard discussed several way the Biotechnology Center supports data downloads , including Web downloads, SFTP and Globus Connect. He also covered how the Biotechnology Center manages authentication to enable data sharing between PIs.

RSAG Oct 18 UW Biotech.pdf

Wednesday, September 20, 3:30-4:30, Rm 302 Middleton Building (1305 Linden Dr)

The National Research Platform Workshop

Pat Christian, DoIT Network Services and Jan Cheetham, CIO Office

Pat and Jan summarized the workshop on the National Research Platform (NRP) they attended last month and the group discussed the potential of the platform to benefit UW-Madison researchers who participate in data-intensive research with external collaborators. 

The NRP is a vision for connecting the science DMZs across US research institutions to enhance data transfers and enable data-driven, collaborative science. It has implications for the physical sciences as well as biomedical and genomics research.

NRPmeeting.pdf

Wednesday, August 16, 3:30-4:30, Rm 302 Middleton Building (1305 Linden Dr)

Netdata, a tool for real-time monitoring

Vladimir Brik, Virtualization Systems Administrator, WIPAC

Vlad demonstrated and discussed a real-time monitoring tool called netdata (https://my-netdata.io/). He uses NetData to replace real-time command line monitoring tools like dstat. It is easy to install and use even without configuring, has thousands of metrics already built in and a good visualization interface. Because it samples the system every 1 sec, it is useful for analyzing performance problems that require high resolution and occur in short timeframes. Vlad first starting using NetData when he needed to diagnose an issue with a ZFS server and wanted to try the ZFS-specific metrics in NetData. Because it collects a lot data, NetData it is not as useful for following histories over hours or weeks; instead Vlad finds it complements tools like Nagios or Ganglia that are better suited for monitoring trends over the long term.

Real-time system monitoring with netdata.pdf

Wednesday, July 19, 3:30-4:30, Rm 302 Middleton Building (1305 Linden Dr)

High Density Storage Solution for Light Sheet Imaging Microscopy

Derek Cooper and Neil Van Lysel, Morgridge Institute

Derek and Neil talked discussed the data storage system they designed/built for the Jan Huisken lab at the Morgridge Institute. The Huisken lab uses light sheet imaging microscopy to capture TBs of data from living specimens per day. Neil and Derek conducted a proof of concept that included testing the ability of two different network-attached storage systems. The microscope set-up consists of an array of up to 12 cameras each capturing and streaming 800 Mbps of data (100 frames per second, 40 megapixels/frame) to an analysis server that is connected to the file server system. Two network attached file systems, Nimble and Isolon Nitro, were tested against several metrics such as file system throughput, image file creation time, and network throughput. 

Huisken Lab Proof of Concept - Copy.pdf

Wednesday, April 19, 3:30-4:30 , Rm 302 Middleton Building (1305 Linden Dr)

Docker containerization for a variety of reasons


Andy Davis, Associate Scientist, Engineering Physics

Using Docker with a nuclear engineering application

Andy discussed the use of Docker with software used to model radiation transport and geometry of complex nuclear systems. Because the applications include integration with libraries such as MOAB (Mesh Oriented Database), HDFS, and specialized compilers, use of containers to pre-package the dependencies has simplified the start-up process for new members of the research group. It has also provided for a persistent testing environment and enhanced reproducibility of simulations and analysis.  

Erin Grasmick and Aaron Moate, CHTC

Prototyping "docker-universe" jobs on HTCondor execute nodes 

Erin talked about using Docker for running computing jobs on resources at the Center for High Throughput Computing. The CHTC has supported this for a couple of years and is currently upgrading their computers to the CentOS 7 operating system, which supports Docker much more reliably than Enterprise Linux 6. To run a CHTC job inside a Docker container, the container must be hosted on the DockerHub website and a few changes need to be made to the HTCondor submit file. A number of Docker containers for running applications often used by CHTC customers already exist on DockerHub, for example, containers for R and Python, which can be deployed on CHTC resources. 

Jesse Thompson, Technical Architect, DoIT

Examples of modernizing enterprise applications with Docker

Jesse described ways Docker is being used in campus infrastructure for cloud productivity and collaboration services like Office 365 and G Suite to improve processes like account provisioning, SMTP relaying, and DMARC processing for authentication of email. Docker has allowed Jesse’s team, which includes student developers working across multiple platforms, to move away from monolithic code bases that have been used to run infrastructure services to a set of micro services that are easier to upgrade, integrate, and deploy across the team. However, the move to containerization has introduced some new complexities, too. These include the need to experiment with new ways of integrating external data sources and challenges with orchestration.

davis_docker.pdf
docker-chtc(5).pdf
Docker.pdf

Wednesday, March 15, 3:30-4:30 , Rm 302 Middleton Building (1305 Linden Dr)

Flexible Architectures for Contemporary Data Processing

William Benton, Red Hat Emerging Technology


William will cover his work in a single application area (anomaly detection in infrastructure logs) and discuss some of the architectural lessons his team has learned as they've put machine learning techniques into production.

Wednesday, February 15, 3:30-4:30, 1360 Genetics/Biotech Center (425 Henry Mall)

Methods for managing the software stacks used by researchers at SSEC 

Jesse Stroik, Space Science and Engineering


Jesse Stroik and Scott Nolin, system administrators in the Space Science and Engineering Center (SSEC), described the software management system they created to enable scientists at the SSEC to use software they’ve built in a user-friendly environment that provides a uniform experience for all researchers and every workstation at SSEC. Behind the scenes, the platform manages a large body of software versions, compilers, libraries, and operating systems; resources that had to be maintained by individual research groups for each workstation in the past. 

The system is built with LMOD (a lua-based module system developed at the Texas Advanced Computing Center) and the RPM utility for Linux (RPM Package Manager). They’ve found this system is easy to use with configuration management systems like Puppet which are well suited to managing software distributed via RPM. This allowed SSEC to define a software stack in configuration management and distribute it consistently to a set of hosts.  LMOD provides a consistent mechanism for users to ensure the compiler and supported software is consistently installed and loaded in their current environment. 

One challenge they encountered was consistent design in naming and versioning the software. This was mitigated by setting a naming/versioning policy for LMOD which resulted in added benefits of reproducibility and quick adoption by users by providing them less uncertainty in their own software builds. In fact, because the system creates links between the software version and its libraries, compilers, etc., it essentially creates a provenance trail for scientists, documenting all the software, library, and compiler versions that they used to generate a particular result with a given version of an application.  

This system is used by SSEC scientists and their external collaborators for testing proof-of-concept software enhancements. It works well with cluster applications since LMOD is written to manage MPI libraries. Researchers have successfully used the system to specify cluster and cron jobs in MatLab. When a new version of MatLab gets loaded, the system will recognize that the user still needs the older version to run these types of jobs. The software management system underlies a new cluster for the NOAA funded project, the S4 cluster. The system also supports researchers who need to use older software for specific tasks because it preserves older operating systems and compilers required to make that software run necessary for long-term reproducibility. 

RSAG_021517_Stroik.pdf