The Data Feed

Welcome to our blog! We'll post periodic updates here to cover latest updates on projects, new initiatives, research and more!

Connect with us: data@kippteamandfamily.org / Slack #data_comms

👷 Dispatches from the Data Mines: Understanding Our Below & Far Below Students

🚀 2030 Strategic Planning and the Future of Data

Data Feed⚡School Start Edition

👋 Get to know Data team!

⛰️ What are we working on? Topline Goals

🛠️ Case Study: Building a Resilient Data Platform

👷 Dispatches from the Data Mines: Understanding Our Below & Far Below Students

Exploration of Grade 3 - 8 students scoring Below and Far Below on New Jersey Student Learning Assessments from 2022 to 2024

Kevin Verhoff | December 2025

Every educator knows the challenge: Many of our students enter the classroom ready to go and thriving, while others struggle. They often start behind, and year-after-year fail to catch up. Instead of making the years-worth of growth that they need, the gap widens for them. We know that for many kids, school is a place where they are chronically below where they need to be.

If promises to our kids are sacred, then finding ways to support our struggling learners is essential. It’s some of the hardest work in schools, and teachers and school staff are already doing the heavy lifting every day. The data team wants to bring their knowledge and skills to the table in order to support school staff in this vital work.

With that spirit, we recently partnered informally with former Rise DSO and current PhD candidate Pasha Zandieh (he says Hi to all his Rise friends, btw!) to dig deeper into why some of our students stay behind year after year.

We knew there wouldn’t be one single solution. Each struggling student is unique, and there’s no one simple model that can capture the full complexity of a child’s life.

But we also knew that the data could help to reveal some patterns. We can identify places where our assumptions need to be challenged, where we could spot counter-intuitive trends, or where we could identify areas where small early interventions could be implemented. This blog post reveals some of the very initial findings from our investigation.

A Few Notes on Data Wrangling

Working with student data always involves making strategic choices about which data to include and what to prioritize. In order to get the cleanest most reliable picture possible we narrowed our efforts down to grades 3-8 NJSLA for now because it is a critical age range and because we have a lot of good data, going back to the 22-23 school year.

Using only students with full sets of records, this window gave us 12,304 unique student-subject-years from 2,058 individual students (two tests per student x three years). While this is a good dataset to work with, there are some real trade-offs we had to make:

Using only “complete cases” rules out a lot of students who have missing data. Sometimes that “missing data” is an important data point by itself, but it’s tricky to handle and introduces a lot of noise into the data.
Focusing on 3-8 means we may miss important trends affecting our lower elementary and high school students.
Relying on New Jersey assessment data ignores the unique characteristics of our Miami students.

We want to emphasize that this is meant to be an initial analysis - not a final answer. Our goal at this stage is to move quickly, generate hypotheses, and identify promising patterns to explore more. As we refine this work, we expect to expand our dataset and address these limitations more fully.

Below, you can use this interactive visualization to follow the general “flows” of the students in our dataset over the course of the three year window:

Major Findings

IEPs are a major factor, but they don’t tell the whole story

The data confirms a well-known reality, that our special education students are statistically more likely to perform Below or Far Below on state assessments. The analysis shows:

77.4% of students with an IEP score Below/Far Below
36.7% of their without IEPs score Below/Far Below

What might be surprising is that students with IEPs only make up 20% of the total group of students performing Below or Far Below. This means that four out of five students performing in the lowest academic tiers do not have an IEP.

This reframes the challenge. To be sure, we want to support our special education students, but the biggest impact may be on our early warning systems and tier 1 support which will benefit students with and without IEPs.

Teacher performance matters (but it’s complicated)

We also found a meaningful connection between teacher performance and student outcomes. Holding all other factors constant:

A teacher who is one tier higher on our performance rating system corresponds to a 2 to 7.5 percentage point decrease in the probability that a student will perform Below or Far Below at the end of the year.

This doesn’t prove causation, but it does suggest a meaningful link between teacher performance management tiers and overall student achievement. It reinforces something we already know: strong teaching changes student trajectories and supporting, developing, and retaining great teachers matters for each and every one of our students.

Q1 GPA

Because we want to identify students early, our analysis focused on identifying early warning indicators. As much as possible, we used data points from previous school years, and data from early in the school year. One candidate factor that appears to be a clear leading indicator was Quarter 1 GPA.

We found that students who eventually perform Below/Far Below had an average Q1 GPA 0.70–0.74 points lower than their peers. The difference is statistically significant. There is still overlap between groups—GPA alone can’t predict outcomes, but when paired with other variables, it can help flag students who may need support sooner rather than later.

There’s no one profile of a struggling student

Perhaps the biggest mindset shift came from moving beyond single indicators to a more holistic lens. Instead of taking each factor one at a time, we used a clustering algorithm (k-nearest neighbor clustering, for those who really want to dig in) to analyze all 28 different variables at once. The plot to the right is a two-dimensional compression of the 28 factors, but you can see the distinct clusters showing up.

The results were intriguing. Instead of a monolithic "at-risk" group, the analysis identified at least five distinct clusters of students, each with a unique combination of characteristics.

Here are some examples:

High-attendance middle schoolers struggling with math (Cluster 3): These students show up to school consistently (higher than average Q1 attendance) and don't have IEPs, yet they struggle specifically in 4th-8th grade math.
Low-attendance third and fourth graders (Cluster 4): For this group of younger students, low attendance in Quarter 1 is a primary characteristic, pointing to an actionable barrier to their success
Vulnerable and underserved students (Cluster 2): This group includes students who are newer to our schools and may be MLL, homeless, or have IEPs. Critically, the data also shows they are also less likely to have a Tier 4 teacher.

This shift from a single label to multiple profiles is transformative. It moves the conversation away from a one-size-fits-all approach analyzing our student data at scale. By understanding the specific needs of each profile, schools can design and deploy tailored support strategies that address the root cause of a student's academic challenges.

Conclusion and where we want to go

These initial findings make one thing clear: to truly understand and address student struggles, we need to embrace the complex, nuanced reality. An IEP is not the only thing about a student that matters, a teacher’s PM score is not the only driver, and there is no characteristic of an “at-risk” student.

For those of us who work with students, this is all common sense, but now we are beginning to see it in our data, and that presents an opportunity.

As we move forward, we are excited to dig more deeply:

We’re partnering more closely with Teaching & Learning to gather richer descriptive student information.
We’re seeking input from school teams—if you’d like to share your perspective, please fill out the form linked below.
We hope to pilot a predictive model in the 26–27 school year, which we can evaluate and refine.
Long term, we want to expand our partnership with Pasha. We think that we could bring new, more nuanced approaches to modeling student experiences and outcomes.

Ultimately, the goal is simple: To better understand who is struggling, why they are struggling, and how we can reach them earlier and more effectively.

This deeper understanding will help us improve our systems and strengthen our promise to every student we serve. Please share your reactions to this data here and let us know if you'd like to learn more.

🚀 2030 Strategic Planning and the Future of Data

A fundamental re-thinking of how all users interact with data, and how we level up in the age of AI

Cristina Baldor | December 2025

As part of strategic planning efforts for 2030, we are digging into our central question: what can data support look like in the next five years? One of our big bets is to fundamentally rethink the way all users interact with data, and we’re leveraging some really exciting new tools to make it happen.

Right now, our users interact with data in a few ways that have limitations when it comes to reliability and flexibility. Luckily, we’re in the midst of an explosion in tools that solve these kinds of problems: semantic layers, self-service business intelligence, and privately hosted chatbots.

Everyone (mostly nerds) is talking about semantic layers

A semantic layer solves the reliability problems above: it is a dictionary of both our organizational context and the calculations that we use to transform raw data output to visualizations on spreadsheets and dashboards. Instead of the formulas we use to calculate metrics living within the cell of several Google Sheets or Tableau workbooks, they are stored and maintained in one central place.

Definitions in a central semantic layer make it possible for both humans and computers to know that "ADA" at our organization means Average Daily Attendance, and you calculate it by taking the sum of the days a student was present and dividing that by the days they were enrolled at school. Instead of starting from scratch with those sums, you have a pre-calculated metric called "ADA" accessible to you in a spreadsheet or dashboarding tool. You also don’t have to import new rows every time you need an update. It will refresh on the same schedule that data is imported to our data warehouse.

Source: Enterprise Knowledge

Next-generation data tools: self-service business intelligence and our very own chatbot

The semantic layer is a great organizational and governance tool, but the most exciting part is that it cracks open the ability to provide new tools we think users will love.

With the the calculation and the context in one place, you don’t have to figure out which kids and days to include, build the formula, and then try to slice by school or grade with row filters or multiple tabs. You just drag over ADA from a list of metrics and then filter by school, grade level, IEP status, or anything else we use to slice and dice our data today.

The most powerful aspect for ensuring reliability and trust in the data is that we're all using the same calculation. We're not accidentally including days that the student wasn't enrolled, for example, or not up to date on a new policy mandating that something has to be calculated in a different way.

Self-service business intelligence basically means dashboards that you design yourself.

Once we have all our metrics centrally defined, making them available in a new kind of dashboarding tool means that trained users (not just Data Team) can drag and drop metrics and slices (grades, schools, IEP status, anything we use as a filter today) to build the exact tool you need for your use case.

Semantic layers also let us leverage all the exciting work happening in the world of AI by implementing a data chatbot. Since our chatbot would only have access to our data and definitions, it can’t hallucinate weird answers based on data from other schools, states, or dental associations.

Since it has access to live data, it also means that you can log in and ask “What was the ADA for all of TEAM today, by grade?” and get the answer as of this morning.

If you can’t tell, we’re really excited! Right now we’re engaged in product selection and pilots for the semantic layer and dashboarding tools, and the mammoth task of documenting all of our metrics. Every number that you see displayed on a dashboard right now has to be meticulously defined and engineered in the semantic layer, but once we do that the possibilities are practically endless.

Really want to nerd out? Check out this explainer from Data Camp on semantic layers and/or leave us some feedback about what you’d like data tools to look like in the future.

Data Feed⚡School Start Edition

2025 August

🔖 Data Portal – The last bookmark you’ll ever need?

We get this question a lot: “where do I find this dashboard?” We hear you! Over the summer we completely redesigned the Data Portal to include curated lists of dashboards according to your role. You will also find help guides, blog posts and more!

Bookmark these links and relax knowing you’ll always have a full list of the latest version of your dashboards:

Teachers and Coaches: kippteamandfamily.org/data/teachers
School Leadership: kippteamandfamily.org/data/leaders
Operations: kippteamandfamily.org/data/ops
Regional and CMO: kippteamandfamily.org/data/region

We also organized where you can find key resources such as:

👋 Get to know Data team!

2025 July

Greetings from Data team! If you ever wanted to know who does what check out Our Team page for more info.

Here is a quick view into our different roles:

And, here is a picture of Data team getting fancy at the Newark TEAMspys this past June!

⛰️ What are we working on? Topline Goals

2025 July

One of our major projects this year is to launch the first ever org-wide goal tracking dashboard. The Topline Goals Dashboard will be a year(s) long project, but this summer we are focusing on building out "Version 1" which will consist of tracking Academic, Ops and Talent goals across Heads of School, SL, DSOs and MDSO roles.

This project is very complex as we need to organize org-wide metrics (such as ADA or State Assessment Results) and combine them into one reporting tool that allows for 1) rollups across org, region and school level, 2) track results against goals and 3) track all this data week over week. The benefit of all this work is that it will enable all teammates to know where to find these key results and monitor progress in one place.

We are excited about next steps and will be sharing more in September!

🛠️ Case Study: Building a Resilient Data Platform

2024 May

Check out this post from Dagster (a vendor we use to manage how data moves across different systems and platforms).

In this interview, you can learn how Charlie Bini, our Data Engineer built out our new data warehouse and improved how data can be managed across the organization.

🔗 Case Study: KIPP - Building a Resilient Data Platform with Dagster

Report abuse

The Data Feed

Table of Contents

👷 Dispatches from the Data Mines: Understanding Our Below & Far Below Students

A Few Notes on Data Wrangling

Major Findings

IEPs are a major factor, but they don’t tell the whole story

Teacher performance matters (but it’s complicated)

Q1 GPA

There’s no one profile of a struggling student

Conclusion and where we want to go

🚀 2030 Strategic Planning and the Future of Data

Everyone (mostly nerds) is talking about semantic layers

Next-generation data tools: self-service business intelligence and our very own chatbot

Data Feed⚡School Start Edition

🔖 Data Portal – The last bookmark you’ll ever need?

👋 Get to know Data team!

⛰️ What are we working on? Topline Goals

🛠️ Case Study: Building a Resilient Data Platform