Sessions

Session II - Team Organization

When

August 10th 3-4pm PT

Topic

Team Organization — Relationship between data engineering and data science

Topic Lead

Greg Rokita

Participants

John Carnahan, Ian Small, Cindi Thompson, Mikhail Lyukmanov, Paddy Hannon (guest), Greg Rokita

Members Excused

Ali Ghodsi, Rahul Pathak, Mark Madsen, Chris Wensel

Administration

Discussed raising member count to 12. Participants tentatively agreed.

Based on historical data, the participation rate for similar events is ~50%. For DataESC we expect the participation to be 50-70%: out of 12 that equals to 6-9 members which is manageable in a conference call

Vote will take place during September session.

If passed, council will vote on new member nominations.

Topic Discussion Notes

One of the organizational challenges arrises when independent units within an organization need to coordinate when faced with similar problems. The units cannot be integrated horizontally, since they often have separate architectures and rules. If an output of experiment is useful in one unit, often it makes sense to productionlize the models in all units. The model has to be proved to fit the specifics of a particular unit. In order for this to work, squads (tiger teams, pods, project teams) need to be build horizontally with vertical career ladders for data engineers and data scientist built separately from squads for reporting structure and career growth tracks. That often is the only way to move quickly.

Aside from the fact that units may not agree with each other, a challenge is to persuade “domain experts,” that data experiment can have a better result than their intuition. Big data algorithm vs domain expertise is an interesting dynamic and persuading domain experts is often more time consuming than experimentation.

Metric for every squad team is determined by business outcomes for some organizations or by production software outcomes for others. Seem like a balance of business centric quads and platform squads makes sense is there are needs for both.

A good data scientist will be able to turn the KPI goal assigned to the team into a model that will keep the team aligned without a great deal of investment into documentation.

Ideal ratio seems to be 2 data engineers, 1 data scientists and 1 analyst/product owner. The team sizes can vary from 4 to 8 depending on the challenge. (2 engineers is good, but if you can afford, 3 is even better).

There are some cases where a small team may find some breakthrough and to put the model in production the team grows to the size of even 15-20 ppl.

Such team model progresses as follows:

Small team pokes around that data with experimental nature of the engagement
The team proves a case and justifies the existence of bigger effort to spend additional $
Bigger team is build, more data is made available, and full-fledged productionalization happens

BI plays a support role for the teams.

Ad hock questions from executives use case:

Priority of requests is determined by seniority of the person asking the questions.
Data scientist would burn faster on those teams compared to squads. Why?
If the question is usless, squad would stop immediately and move to next task
If the executive asks, the ad hock team continuous.
Because of fast burnout, it is important to rotate ppl from such team to squads
Rule of thumb: You lose data scientists if you turn them into mediocre analysts.

What squads are good for:

Turning KPI into a model
The model of squads works especially well with data science objectives
Motivating members
Getting stuff done
Providing horizontal learning accross disciplines

What squads don’t do:

Give individuals with particular set of skills someone else to learn from
Enforce vertical capability mindset. Capabilities should be reused across squads. Therefore, it is important to:
- Develop strong culture within the reporting structure of vertical teams.
- Hold meetings, meet ups and other opportunities to grow vertical skill sets.
- Incorporate Q&A cycle into the projects, where data scientist or data engineers answer questions.
- Organize lunches and events, special projects that have homework activities and side projects that eventually spin off into squad teams.
- Keep the skill community alive

Other challenges:

Alienating a person with a particular skill set and eventually losing them: a data engineer forced to do non-data engineering work or a data scientist forced to do data engineering work more then they want to.
Some data scientists don’t want to touch any code.
Long running projects. Subject matter expertise becomes tribal and ppl become possessive:
- Rotation is necessary to avoid territoriality
- Rotation is part of normal flow of business.
Job descriptions get more and more specific. If that is the case, one option is to manage skills so that everyone has ability to acquire skills rather than create numerous title tracks.

Opportunities:

Often if data scientist does not feel she/he has room for growth, she/he would acquire data engineering skills. Ppl who can do both are usually more valuable because they are more effective engineers and more effective data scientists.
Allow side work to happen within squads, if data engineer wants to do data science, and vice versa, let them.

Agenda

Final amendments, if any, and votes on Charter, Statue and Logistics
Topic discussion

Team Organization — Relationship between data engineering and data science

Questions to jump start the discussion:

-How to balance data science experimentation and productionalization roles e.g. who productionalizes the data science pipelines.

-How to organize teams given established portfolios vs. always changing, new initiatives.

-Do product owners and analysts participate in this process?

-How to balance specific end-user product vs. technology platform needs?

Topic Lead selection for Session III

Google Sites

Report abuse