From a Software Engineer at Google who works develops programs that let app designers know how users are using their apps.
Asking Questions or Defining Problems:
The product team decides that we want to add a feature to our software development kit (SDK). We then take the product request document (PRD) and turn it into a design document (DD). We estimate what the result of this feature will be (increased load on servers, what the data collected will roughly look like).
We examine the logs and determine that an issue exists with the data being collected (e.g. is it in the format we expect? Is it the data collected within reasonable parameters?). We look at the data and determine if it's outside of those bounds.
Developing and Using Models:
We build a prototype of the feature that we want to implement. This will normally be tested using a test program that interfaces with a known system to see what preliminary data would look like.
Planning and Carrying Out Investigations and Analyzing and Interpreting Data:
When making a change to the product we're working on, the change will be put behind a flag (i.e. whether it should be executed or not), and A/B tests will be run with and without the feature to analyze impact. A subset of users are placed in the control and experimental groups, and a series of metrics are collected (e.g. crashes, latency, traffic), and the change will be rolled out to everyone if they are within bounds acceptable for the important metrics (e.g. traffic went up 10%, which is fine, and crashes/latency stayed neutral).
Additionally, any changes to the code will have unit tests (tests that are run for small portions of code) and integration tests (tests that are run from end to end across the whole product) which will report test success or failure with context. Failures will be analyzed for log statements that indicate at what part of the code the process went awry and, sometimes, developers will step through the code and inspect the variables to see where things went wrong.
Using Mathematics and Computational Thinking:
The A/B tests mentioned above spit out values and ranges for those values (e.g. the experimental group may show a -.1%-.5% change in some metric, with the ability to change the confidence interval). Additionally, data sent may have to be queried to take a look at some problematic datasets and make theories about where things went wrong.
Designing Solutions:
Based on the data, we can say that the feature increased traffic by 10%. Was this because the feature caused 10% more use? But if we look at the traffic, traffic specifically related to that feature was 20% of the control group's traffic. It should have increased overall traffic by 20%. Looking at the data sent for other parts of the product, usage of another feature was down half the increase. The new feature likely cannibalized that existing traffic.
Engaging in Argument from Evidence:
Answered in the previous practices.
Obtaining, Evaluating, and Communicating Information:
As the new feature/bugfix is rolled out to more and more users, snapshots are taken at each stage to see the change in data, and likely explanations for this change are documented by the engineers rolling out the feature. If the data matches expectations and is within bounds, it will be rolled out to 100% of users. This doesn't mean that something negative couldn't be hidden in the data (or not collected at all) and have to be rolled back later, but it gives us reasonable confidence that the change isn't catastrophic. If the data isn't within acceptable bounds, steps to stabilize that feature is documented. If the data looks like the feature is fundamentally flawed, it will be rolled back.