August 10th 3-4pm PT
Team Organization — Relationship between data engineering and data science
Greg Rokita
John Carnahan, Ian Small, Cindi Thompson, Mikhail Lyukmanov, Paddy Hannon (guest), Greg Rokita
Ali Ghodsi, Rahul Pathak, Mark Madsen, Chris Wensel
Discussed raising member count to 12. Participants tentatively agreed.
Based on historical data, the participation rate for similar events is ~50%. For DataESC we expect the participation to be 50-70%: out of 12 that equals to 6-9 members which is manageable in a conference call
Vote will take place during September session.
If passed, council will vote on new member nominations.
One of the organizational challenges arrises when independent units within an organization need to coordinate when faced with similar problems. The units cannot be integrated horizontally, since they often have separate architectures and rules. If an output of experiment is useful in one unit, often it makes sense to productionlize the models in all units. The model has to be proved to fit the specifics of a particular unit. In order for this to work, squads (tiger teams, pods, project teams) need to be build horizontally with vertical career ladders for data engineers and data scientist built separately from squads for reporting structure and career growth tracks. That often is the only way to move quickly.
Aside from the fact that units may not agree with each other, a challenge is to persuade “domain experts,” that data experiment can have a better result than their intuition. Big data algorithm vs domain expertise is an interesting dynamic and persuading domain experts is often more time consuming than experimentation.
Metric for every squad team is determined by business outcomes for some organizations or by production software outcomes for others. Seem like a balance of business centric quads and platform squads makes sense is there are needs for both.
A good data scientist will be able to turn the KPI goal assigned to the team into a model that will keep the team aligned without a great deal of investment into documentation.
Ideal ratio seems to be 2 data engineers, 1 data scientists and 1 analyst/product owner. The team sizes can vary from 4 to 8 depending on the challenge. (2 engineers is good, but if you can afford, 3 is even better).
There are some cases where a small team may find some breakthrough and to put the model in production the team grows to the size of even 15-20 ppl.
Such team model progresses as follows:
BI plays a support role for the teams.
Ad hock questions from executives use case:
What squads are good for:
What squads don’t do:
Other challenges:
Opportunities:
Team Organization — Relationship between data engineering and data science
Questions to jump start the discussion:
-How to balance data science experimentation and productionalization roles e.g. who productionalizes the data science pipelines.
-How to organize teams given established portfolios vs. always changing, new initiatives.
-Do product owners and analysts participate in this process?
-How to balance specific end-user product vs. technology platform needs?