Looking for Social Finance Fund resources? Start here.

KnowledgeResources

Impact measurement, Pathfinder Pilot   Articles

Steps for reviewing your ETL pipeline

If you work for a social purpose organization (SPO), you know that nearly every SPO, no matter its mission and activities, deals with diverse types of data. From dollars in quarterly funding to the number of participants in a mentorship program or pounds of produce from a community farm, collecting, analyzing, and reporting this data are necessary when measuring your organization’s impact. These processes can be time-consuming and full of hurdles. In our work with the Pathfinder Pilot, we’ve learned about an additional, often unaddressed step that is the cause of many of these hurdles. Identifying it might be all you need to do to clear your path.

The missing step

The additional essential step lies between data collection and analysis. It looks different across organizations but has a common name: the ETL (extract, transform, load) pipeline. It may sound technical, but knowing its proper name allows us to find solutions and information on how to simplify it.

We’ve observed that what organizations find most difficult about impact measurement isn’t data collection, data analysis or data reporting—it’s the ETL pipeline that lives in between. The discovery that the ETL pipeline is the root cause of many of the hurdles organizations are experiencing was a key a-ha moment from the Pathfinder Pilot. We have learned that when an SPO takes the time to review and refine their ETL pipeline, they can uncover major benefits—benefits that can make the lives of data managers that much easier.

Why review & refine?

  • Reviewing the current process can show opportunities for improvement
  • Understanding the process will make it easier to address future issues
  • Documenting the process will help if any new team members or external contractors need to work with your data

Overall, reviewing the ETL pipeline helps data managers see exactly how they’re getting their data to a useful state and assess whether it’s useful enough. The question then becomes, how do you go about reviewing and refining? We’ve created a list of suggested steps to make the most of an ETL pipeline review, based on our process with a Pathfinder Pilot participant.

The review process

1. Focus on the “why,” not just the “how”
Don’t just look at the steps you take to get the data. Instead, ask why you are doing each step. For example, a staff member might be manually performing a complex lookup (the “how”) to determine a customer’s discount level (the “why”). Understanding the underlying goal may guide you towards a more efficient solution, such as calculating the discount directly from transaction data.

2. Look for the biggest pain points
Pay close attention to what you and/or your team find most frustrating. Think of steps that feel like they take hours, or elements that always break. Pain points like a data export that always takes ages or a manual lookup that often fails are the most valuable areas to automate because fixing them will have the greatest impact on your time and energy.

3. Differentiate between data sources
Recognize that different data sources have different levels of complexity. For instance, data from a source like an API might provide clean, well-structured data, while another might require complex manual exports and calculations.

4. Account for exceptions
A reliable pipeline must be built to handle not only the typical data flow, but also the exceptions! A process is defined as much by the 5% of the time it fails as it is by the 95% of the time it works. Investigate what causes failures, like a new volunteer not showing up in a mailing list, and if there is a pattern to when and why they happen. Then, look at how the issues are currently being resolved. If you’re resolving them manually, see if there’s an automation that can be used to handle these exceptions.

5. Document everything
Be sure to document throughout your review, including the existing ETL process, key findings, decisions, and any action items that result from your review. Save this document, and create a summary to ensure clarity across the team & future staff working with organizational data.

 

Ultimately, taking the time to review your ETL pipeline is an investment in your organization’s future. By following these best practices, you can move your data from a merely usable state to a truly efficient and reliable one. Review and documentation will help ensure that your organization’s data can consistently provide the valuable insights needed to further its impact goals.