Tools for Shareable Protein Analysis: ProteinCodeathon@ISMB2022

Ravi Abrol, Philippe Youkharibache, Jiyao Wang, Allissa Dillman, Alexa Salsbury, Tom Madej

The two main goals of this hybrid Codeathon are:

· To develop open source software modules and collaboration tools for universal biomolecular analyses enabling integration of diverse knowledge and datasets.

· To create a sustainable diverse community of developers and designers for continued development of such tools

Preamble

During the last two hackathons Hackathon at ISMB2020 and Hackathon at ISMB2021, we started a series of developments towards the in-depth and systematic analysis of molecular interactions, effects of mutations, protein flexibility, analysis of large datasets, annotations of topological domains of membrane protein receptors, and we opened up iCn3D to evolve towards an open platform to interoperate with external data and software. A feature unique to iCn3D is the ability to share data analysis and structural views through a simple web link (example).

We will continue these developments at this year’s codeathon with a focus on biochemical analysis of proteins including mutations, large sequence/structure datasets (AlphaFold, Uniprot, PDB, etc), and iCn3D User interface.

Potential Team Projects

We propose a list of potential projects spanning six themes listed below, where each team will have a leader along with 2-5 additional developers and a subject-matter expert interested in the topic. Anyone with passion and knowledge interested in leading a project that fits one of the themes below can propose a project by filling out the Project Pitch form or by contacting Ravi Abrol (abrol@csun.edu), if he/she can recruit among registered participants and/or invite new participants.

Theme 1. ANALYSIS OF LARGE DATASETS

Theme 2. ANALYSIS OF BIOMOLECULAR INTERACTIONS AND MUTATIONS

Theme 3. BIOMOLECULAR REPRESENTATIONS AND VISUALIZATION

Theme 4. USER INTERFACES AND DATA SHARING MECHANISMS

Registration and fees

Registration link

Applications are due by 9 July 2022.

There is no registration fee for the Codeathon@ISMB2022 beyond your normal ISMB2022 registration fees. You have to be registered at ISMB2022 to participate in this codeathon in-person.

Who can participate and what skills are needed?

As the codeathon will be conducted in a hybrid format, we will have two types of teams: all-in-person teams, and all-virtual teams. A team will be composed of individuals who all have different backgrounds and skills. We know by experience that blending a diverse set of skills is what forms a good team. We are welcoming hard core programmers as well scientists with little programming experience, as long as they have knowledge on the scientific application side. Some participants may have both sides and some may have one side, but the outcome of any project is a team effort.

We encourage researchers working in all areas of computational biology to join us in developing the above mentioned tools. In terms of coding, we will mainly use Python or JavaScript to write code. But any coding experience such as C/C++, JAVA, Perl, Julia, Rust, etc. and/or working knowledge of MySQL, XML, JSON formats should be good enough, especially for algorithmic or database/backend or middleware development. Experience with deployment of web-based tools will also be very useful. Our code will be deposited into GitHub. Some experience in Git/GitHub is helpful. What matters most is to bring new ideas and develop prototype functionality, not a final product. We are also seeking wet lab researchers working on all aspects of protein sequence, structure, and function with to join the codeathon teams to provide scientific input from the point of view of the needs of the community.

Outcome

Participants will contribute to prototype new open source software functionality, learn new skills, and get a chance to become part of a growing community. One of the objectives of this codeathon is to develop tools and publish their development and/or application as a peer-reviewed publication in the Frontiers in Genetics (Computational Genomics), Frontiers in Molecular Biosciences (Structural Biology), or Frontiers in Bioinformatics (Integrative Bioinformatics) journals within 3-6 months after the codeathon. Your contributions will be recognized through co-authorship in the submitted manuscripts. In addition, we would like to sustain this community of developers in between the codeathons through dedicated communication channels, as our hope is that you will continue to help develop these tools.

Getting Started with iCn3D

1. PROGRAM:

a. iCn3D publications: You can familiarize yourself with iCn3D through our recent publication and preprint.

b. Try iCn3D now!

c. iCn3D@GitHub

2. TUTORIALS and DOCUMENTATION

a. iCn3D Tutorial Part 1 (the basics): Video, Slides

b. iCn3D Tutorial Part 2 (molecular interactions, structural comparisons, etc): Video, Slides

c. Full documentation (Help pages)

Technical and Project Support

Each team will have a Slack subchannel for communication with each other and with the team leader. Each participant will also be part of a Technical Support subchannel, which should be used for technical questions related to github, slack, compute nodes, etc. and which will be monitored continuously by the codeathon’s tech-support team.

We look forward to seeing you at the codeathon!

- Ravi, Philippe, Jiyao, Allissa, Alexa, Tom