I take an approach to Data Analysis that is holistic and starts by referring back to my set of instructions to myself. Having this checklist helps make sure something isn’t missed.
Clean Data is Reliable Data. Validation steps in the data input process help prevent issues upfront. Learnings from Data Analysis projects can help improve processes to keep data clean as organizations grow. During the process phase of the analysis process, data can be processed from dirty to clean so that it can be analyzed to help make data-driven business decisions.
If you can’t find it, you can’t use it. If it isn’t stored safely, it can be compromised. If you don’t collect data appropriately, you can have biases or even misrepresent to the data owner what’s happening with their data. To prevent these problems and others, there are a few key things I pay attention to:
Ethical communication to data owners, transparency, and openness when collecting data.
Data privacy and data retention policies, and user-based permissioning for data access.
Consistent file naming conventions, and file folder best practices for storing.
Data structure, eliminating redundant data, and well planned metadata for databases.
Documentation, Documentation, and more Documentation for clarity to stakeholders.
Following all steps in the Data Analysis process is important to make sure the analysis is complete, consistent, and thorough. I created this checklist as part of my Data Analysis course and it has helped me categorize my knowledge based on my experience, and help know where I excel and where I can still grow.
Ask questions of stakeholders and collaborate to define goals of the project.
Who, what, why, where, when, and how will we achieve the expected results
Prepare the project timeline, data needed, data gathering plan, and deliverables.
How are we going to get the data we need to achieve the objectives?
When will we accomplish the goals? What reports will we deliver?
Process the data from raw to clean so that it’s ready for analysis
Remove duplicate data. Confirm date range of relevant data. Review data consistency.
Confirm completeness of data. Remove incorrect or inaccurate data.
Recommend process changes to improve cleanliness of source data
Analyze the data, document the findings, and confirm completeness of the findings compared to the project goals.
Be sure to ask WHY at least 5 times to get to the root cause
Share the results of the analysis according the defined deliverables
Provide reports and analysis to stakeholders
Refine based on stakeholder feedback
Act on the results of the analysis by collaborating with stakeholders
Make use of the analysis to drive decisions
Reflect on the project and what I’ve learned to improve my processes going forward