bsos Guide to research Data and  Computing


What's the difference between DIT and my departmental IT or OACS?

For most departments in BSOS, the Office of Academic Computing Services (OACS) serves as the main resource for help with computer labs and university endpoints (laptops and desktops). OACS also provides resources for research computing, including server hosting, access to the B-SWIFT high performance cluster, virtual machines with specialized software, web design and development for research applications, and database hosting. If you’re a member of the department of Economics or Geographical Science, these services may be provided directly by your department.

 The Division of Information Technology manages campus’ large-scale, enterprise systems which the entire campus uses and benefits from, such as email and other Google products, the campus network, and many administrative systems. DIT also manages several systems which facilitate research. These include several data storage systems , virtual workspaces, and software licensing.  DIT offers two dedicated research systems: the High Performance Computing Cluster for large scale analysis and the Controlled Unclassified Information Environment for highly sensitive data.

Is it ok to keep research data on my laptop/desktop computer?

There is no prohibition against storing most research data on a local computer (laptop or desktop) owned by the university. All university computers are encrypted according to state mandates and require login with your university credentials. Data classified as "Restricted" according to UMD IT-2 Data Classification Standard should only be stored in the Controlled Unclassified Information Environment. University data must not be stored, copied, saved, or downloaded onto computers not owned by the university. 

Storing research data on university laptops and desktops has several potential drawbacks to consider:

Thus, UMD Box is the recommended solution for most research data totaling under 5TB. Installing Box Drive on your computer allows you to access data stored in Box as if were an external or network drive. You don't need to the sync data to access it as long as you have an internet connection, and Box is automatically backed up to prevent catastrophic loss.

I have a dataset that will require special security measures. How do I get started?

Start by reviewing the university's IT-2 Data Classification Standard to determine the risk level of your data. If it's "Restricted," it needs to be stored and analyzed in the university's Controlled Unclassified Information Environment. Most other data sets can be stored in UMD Box. However, other options are available, depending on the security and storage needs of your project. Keep in mind that storage is only one consideration for keeping data secure. See the page on Data Security for additional information.

My computer is slow/has a hard time running my job. What can I do?

This could be due to a number of causes, including:

Before you invest in new hardware, there are some tweaks to your computer might solve the problem. You could also consider using the resources offered by BSOS and the university. 

Check how much RAM you have available on your machine relative to the size of the data you are processing. Make sure your computer's operating system and statistical software are up to date with all available patches. Consider whether you can rewrite your code to process less data (maybe fewer observations or variables) at one time or minimize the reading/writing of data from/to the hard drive.

DIT offers a Virtual Workspace with many pre-loaded software packages, but it's limited in terms of RAM (16GB) and processors (2). OACS Virtual Lab can create a custom virtual machine to provide access to additional computing hardware. In addition, the BSWIFT cluster is a powerful resource available to BSOS researchers. For even more power, DIT offers Zaratan, the university's high performance computing cluster (HPCC). Keep in mind none of these systems are intended for long-term data storage.

My data set is too large for my office computer. What options do I have?

Depending on the risk-level of your data and the operations you need to perform on it, several options are available for data storage. Box is a good starting place for most data and provides UMD users with 500GB of storage by default. Larger quotas (up to 5TB) are available for shared accounts. If you need more space, both DIT and OACS offer network storage. If you need more than ~10-15TB, it may be more economical to purchase your own server, but this requires professional IT administration, so please contact OACS if this is the case.

Oftentimes large data requires not just significant storage, but also significant processing power. So, keep in mind which system(s) will need to access the data. BSWIFT and Zaratan are powerful clusters but do not provide long-term data storage.

I want to learn R/SAS/Python/Stata. Where do I start learning to program?

Outside of for-credit classes, the university supports several resources for learning these tools, including:

In addition, you may find these external resources helpful:

You will need to install the relevant program(s) on your computer or get access to the software through a hosted environment in order to complete the exercises. The university's Virtual Workspace provides free access to R/RStudio, Python/Jupyter Notebooks, Sas, Stata and many other programs. 

Oftentimes, the best way to learn a new tool is to use it for a project you're interested in. Struggling a bit to figure out the syntax and organization for your own situation can help you stay motivated and learn and retain a programming language better than pre-determined exercises on an example data set. So, don't feel like you need to complete an entire module or course before getting started with your own data. Google searches and ChatGPT can help you figure out much of the code you need once you know the questions to ask. 

For more details about about software acquisition and programming best practices, please see Software and Programming page. 

Am I required to share data used in my research project? How do I do that?

It depends on the funder's requirements. Federal agencies are increasingly requiring that data that is collected with public money be made publicly available. Sharing data (and code!) contributes to the scientific enterprise by adding to the store of information available and facilitating reproducibility of research results. So, even if you aren't required to do so, it is generally good practice, subject to privacy considerations, and assuming you aren't prohibited by grant or contract terms. Visit the Open Science page for more information and resources related to this growing movement.

What's the difference between Google Drive and Box? Which should I use?

Google Drive and Box are both cloud-based storage options supported by the university. For research data, Box is generally the better solution because it is rated for more sensitive data, offers a higher quota, and allows finer-grained permission controls. Please see this article in DIT's Knowledge Base comparing Google Drive and Box features for more information.

I heard I have to report conflicts of interest and consulting work. How do I do that?

Please see the BSOS Research Administration team's webpage on conflict of interest. You can also email or contact Rebecca Hunsaker, Executive Director of Research Administration in BSOS.