Impact evaluation

An impact evaluation can help you establish whether your program is achieving the outcomes it is aiming for. Impact evaluations can also help you understand what makes your program work: Which elements of the program drive outcomes? Are there specific groups of people for whom the program works (better)? Does the program work in new contexts?

Impact evaluations determine program effectiveness by asking the question: what would have happened in the absence of my program? They allow you to conclude with reasonable confidence that your program caused the outcomes you are seeing. Although we can never observe what happens for the same set of people both with and without a program, we can use different techniques of generating a comparison group to construct what likely would have happened – and this process allows us to attribute causality to your program. 

There are many ways of conducting an impact evaluation, based on how you estimate what would likely have happened in the absence of your program. Click here to learn more about the different ways of conducting an impact evaluation, from randomized controlled trials to quasi-experimental approaches.

Example

Say you conduct an impact evaluation of a vocational training program. Six months after finishing the program, a majority of participants were employed. When you compare this group to a similar group of people who did not receive the program, however, you realize that the increase in employment was similar. The impact evaluation thus revealed that your program did not cause improved employment outcomes. As you look for other explaining factors, you realize your program coincided with economic growth and job creation – which could have caused the positive employment outcomes you are seeing.

Which questions can a needs assessment answer?

An impact evaluation reveals whether your program is achieving the desired outcomes. Evidence of high impact usually indicates that your program should be scaled up. If your impact evaluation shows that your program does not have any impact or has lower impact than expected, you can use this information to refine your program. For example, if you find that your vocational training program does not cause improved job outcomes, you would want to consider redesigning and retesting the program, or possibly terminating the program altogether. 

Impact evaluations can be used in two ways: to create knowledge about what works, and to inform specific decisions about your program. For example, you may conduct an impact evaluation of your vocational training program to understand whether the program leads to long-term full-time employment. You could do this to generally understand the impact of vocational programs, or specifically to inform your decision of whether to scale the training program to another state.

You can also use an impact evaluation to provide accountability to donors and secure additional funding. Showing that your program works would help you demonstrate to your donors that the funding they provided led to positive impact. It would also provide credibility for you to secure additional funding to expand your program.

Icon_Evie-01-pg57cdsb6m2opsz6e9pafm3yqw7kyn2k40yq2374hc-1.jpg

but your program may not be ready for an impact evaluation. Using another method can give you evidence that your program is moving in the right direction. Use the decision guide to find the right method for your program.

Impact evaluations can be used to compare the impact of different programs and determine which is the most effective. This can be useful in situations where you need to figure out if a more intensive version of a program is worth the extra expense, or when you’ve had several successful pilot programs and need to choose between them. You can vary a particular component of your program and compare outcomes across versions to determine which one is most effective at achieving your goal (this method is often referred to as A/B testing). Such impact evaluations can help you make decisions about program elements in a short timeframe.

Conducting a needs assessment when your program is already up and running can help you assess whether you are reaching the people you need to. Continuing from a previous example on parental perceptions affecting school attendance, let’s say you observe after a couple years that attendance in early grades has improved dramatically, but many older children are dropping out between primary and secondary school. You realize that the needs of your target population have shifted, and shift the focus your program to retention and remedial education, rather than convincing parents to send their children to school in the first place.

Next steps - Conducting an impact evaluation

Conducting an impact evaluation involves the following steps:

  1. Developing an evaluation design that is feasible given the context of your program. You need to identify a comparison group that doesn’t receive the program, and is unlikely to accidentally avail of the benefits of the program
  2. Offering the program to a set of people distinct from the comparison group
  3. Allowing enough time to pass for the outcomes you expect to manifest
  4. Collecting data on outcomes. This will most probably be through a survey, although it is possible to use administrative data
  5. Comparing outcomes across the comparison group and the group that received the program will reveal the effect of the program


Should you do the impact evaluation yourself or hire an expert?

Under most circumstances, organizations opt to work with an external evaluator for an impact evaluation. The main considerations for deciding whether to do the impact evaluation yourself or work with an expert are:

Conducting an impact evaluation requires technical capacity to develop a research design and data collection tools, and the ability to perform statistical analyses. It is also critical to ensure unbiased data collection. Most organizations cannot guarantee unbiased data collection or don’t have the capacity to conduct such a complicated analysis in-house, and usually opt to work with an external organization to conduct all or part of the evaluation. Should you choose to collect data internally, you should attempt to minimize response bias by hiring external enumerators, especially if the evaluation is asking sensitive personal questions.

If the purpose of your impact evaluation is accountability to external constituents like funders or taxpayers, you almost certainly want an independent evaluator (regardless of your internal capacity) so that the evaluation has sufficient credibility.

If you are working with an external evaluator, it is a good idea to familiarize yourself with key concepts and terms in impact evaluations, as listed in the Guide to Impact Evaluation on page 16 of this Impact Evaluability Toolkit from J-PAL and CLEAR South Asia.

How to conduct your impact evaluation

The other methods covered in the Impact Measurement Guide can help you identify problems, check whether you’re on the right path, and build a base of evidence that your program is likely having the positive impact you set out to achieve. Only an impact evaluation, however, can provide rigorous evidence of the size of that impact, and how much your program is responsible for it. While data on community needs and program performance is important and useful, there are situations when an impact evaluation is absolutely the right tool for the job. For example, evidence of impact may help you feel confident enough to expand your program’s reach, decide between different versions of a program, or unlock more funding.

These are all high-stakes decisions – and impact evaluations often feel like high-stakes activities! They are certainly a bigger investment of time and resources than many of the other methods covered here. Therefore, it is critical to be clear on the purpose of the impact evaluation before forging ahead. Which decisions will you make on the basis of the impact evaluation?

Then, you need to make sure your program is well-implemented. If the program has significant implementation flaws, an impact evaluation is likely to provide disappointing, confusing results. You won’t be sure whether the program’s theory of change just doesn’t hold up, or whether the program could have been successful with better execution.

Once you are confident you need an impact evaluation and that your program is well-run, you should also consider what to measure and how you will measure it. All these decisions will affect the evaluation design. Even if you are working with an external evaluator, you should plan to be involved in these conversations as they are driven by your program’s context.

Icon_Evie-01-pg57cdsb6m2opsz6e9pafm3yqw7kyn2k40yq2374hc-1.jpg

We suggest you work through the following checklist with your team and the evaluator. This will help you design an impact evaluation that answers your questions while also keeping your constraints in mind:

Which decisions will your organization make on the basis of an impact evaluation? How will you use the results? Which people do the results need to convince?

The answers to these questions will help you define the research question for the impact evaluation. This exercise will also help you identify the level of rigor you need and timeline you are bound to. You may also learn that an impact evaluation is not suited to answering your question –for example, it is difficult to evaluate systems-wide change, or a program that is very small.  

We also recommend going through the decision guide to make sure an impact evaluation is what you truly need.

Knowing your program is run well will make your impact evaluation more credible. For example, if the impact evaluation reveals that your program has low impact, you can have greater confidence that you need to update the theory of change, rather than make superficial changes to improve implementation.

You can check the implementation of your program through a monitoring system or a process evaluation:

  • A monitoring system collects real-time data about your program’s performance. You can use data from your monitoring system to know whether your program is performing as expected.
  • A process evaluation is a one-time exercise comparing implementation to the steps laid out in your theory of change and implementation plan. You can conduct a process evaluation before or alongside an impact evaluation.
  1. Conducting a process evaluation beforehand enables you to identify and fix issues before your impact evaluation. This way, you will know that your impact evaluation is measuring the impact of a well-run program.
  2. Conducting a process evaluation alongside an impact evaluation allows you to identify explanations for the results you find – whether the results you see are because of implementation processes (process evaluation) or because assumptions for success do or do not hold (impact evaluation).

When designing an impact evaluation, it’s important to be thoughtful about whether to measure outcomes or outputs in your theory of change. In most cases, the outputs delivered by your program, such as number of trainings delivered or bed nets distributed, don’t represent real changes in outcomes, such as increased earnings – you would likely need to measure the outcomes.  

However, it may be sufficient to measure outputs in cases where it is well-established that the output causes the outcome (e.g. vaccination programs), or where the program goal is related to outputs and not outcomes. For example, if  your program aims to get more children to attend preschool, the impact evaluation would measure the effectiveness of the program at increasing attendance in preschool (output), and not the impact of preschool on children’s learning levels (final outcome).

The “unit” is the level at which you assign groups for treatment or comparison. This can be individuals, schools, villages, etc. The unit you choose will affect the level at which you can measure outcomes – if you would like to measure the impact of a teacher training program on school rankings, you would need to choose schools as your unit, not individuals.

The choice of a unit depends on the indicator you want to measure, the possibility of people from your comparison group being exposed to your program (“spillovers”), the likelihood of participants dropping out of the sample, statistical power, and operational feasibility. For more information, please see Running Randomized Evaluations, Module 4.2: Choosing the Level of Randomization.

While randomized controlled trials are the gold standard for impact evaluations, it is not always possible to randomize, for logistical, budgetary, or ethical reasons. You can consider a nonrandomized, “quasi-experimental” impact evaluation in such cases. Have a look at our guidance on types of impact evaluations to decide whether you can randomize access to your program.

The units on which your program is delivered- to individuals, schools or hospitals, or entire regions at a time?- is one of the determinants of the number of participants (“sample size”) for your study. Impact evaluations generally require a large number of units, and the exact number required will depend on the evaluation method you’re using and the size of the impact you expect to find. Picking up subtle changes (such as a small boost in test scores) requires a bigger sample than detecting large changes (such as mortality rates falling by half). The sample size will have major implications for the cost of the impact evaluation.

For more information on sample size calculations, please refer to J-PAL’s guidance on conducting power calculations.

You will either collect data yourself or use data that’s already being collected for operational reasons. We call the latter “administrative data.” Generally, using administrative data will save you time and money. However, it is important to be mindful of the data quality and availability and the information captured by it when using this source. If administrative data is available for your study, have a look at J-PAL’s guidance on using administrative data to see if it is .

The timeline of the impact evaluation should leave enough time for outcomes to manifest. For example, if your program works to increase learning levels, it’s unlikely you’ll see a change in test scores until a few months later – and your data collection timelines should account for this.

You can potentially shorten the evaluation timeframe by measuring intermediate outcomes, in cases where there’s a well-established link between the intermediate and final outcome. For example, if you are evaluating the impact of polio vaccinations, you can measure polio vaccine take-up rather than incidence of disease because it is already well-established that polio vaccinations prevent polio.

The timeline of the impact evaluation will also depend on the purpose of the impact evaluation – you may need to shorten the evaluation timeframe if you need the results soon in order to make a major program decision.

Guide

Not sure where to go from here? 
Use our guide to frame a question and match it to the right method.

Types of impact evaluations

Impact evaluations compare the people who receive your program to a similar group of people who did not receive your program. Based on how this similar group is chosen, impact evaluations can be randomized controlled trials or quasi-experimental.

A randomized controlled trial (RCT) is considered the gold standard of impact evaluation. In an RCT, the program is randomly assigned among a target population. Those who receive the program are called the treatment group; those who do not receive the program are called the control group. We consider the outcomes from the control group to represent what would have happened to the treatment group in the absence of the program. By comparing outcomes among those who receive the program (the treatment group) to those who don’t (the control group), we can rigorously estimate the impact of the program. The random assignment of the program to the treatment and control group provides the rigor, as it ensures that the selection of people is not based on biased criteria that could affect the results.

When a randomized design is not feasible, there are other, “quasi-experimental,” ways of constructing a valid comparison group. 

  • In a matched design we would match individuals who receive the program to individuals who don’t receive the program based on some observable characteristics (such as age, gender, number of years of schooling, etc.), and compare outcomes across these groups. 
  • Another common technique is regression discontinuity design, in which you create a cutoff based on which individuals are eligible to receive the program, and then compare outcomes from groups just below and just above the cutoff to estimate impact. 

Matched designs and regression discontinuity designs are just two of many quasi-experimental techniques. J-PAL provides an overview of common methods of conducting an impact evaluation. All such methods seek to identify what would have happened to your target population if they had never received the program, and their success relies on the strength of the assumptions they make about whether the comparison group is a credible stand-in for your program’s target population. 

Recommendation: Theory of Change

A theory of change is a narrative about how and why a program will lead to social impact. Every development program rests on a theory of change – it’s a crucial first step that helps you remain focused on impact and plan your program better.

You can use diagrams.net to create your Theory of Change.

Recommendation: Needs assessment

A needs assessment describes the context in which your program will operate (or is already operating). It can help you understand the scope and urgency of the problems you identified in the theory of change. It can also help you identify the specific communities that can benefit from your program and how you can reach them.

Once you’re satisfied that your program can be implemented as expected:

Your program's theory of change

You have a written plan for how your program will improve lives – great! Make sure to refer to it as you explore the different sections in the Impact Measurement Guide, as it is the foundation for any other method you’ll use.

Recommendation: Process Evaluation

A process evaluation can tell you whether your program is being implemented as expected, and if assumptions in your theory of change hold. It is an in-depth, one-time exercise that can help identify gaps in your program.

Once you are satisfied that your program can be implemented as expected:

Recommendation: Evidence Review

An evidence review summarizes findings from research related to your program. It can help you make informed decisions about what’s likely to work in your context, and can provide ideas for program features.

Once you are satisfied with your evidence review:

Recommendation: Monitoring

A monitoring system provides continuous real-time information about how your program is being implemented and how you’re progressing toward your goals. Once you set up a monitoring system, you would receive regular information on program implementation to track how your program is performing on specific indicators.

Once you are satisfied with your monitoring system:

Recommendation: Monitoring

A monitoring system provides continuous real-time information about how your program is being implemented and how you’re progressing toward your goals. Once you set up a monitoring system, you would receive regular information on program implementation to track how your program is performing on specific indicators.

Once you are satisfied with your monitoring system:

How to compile evidence, method 2

You’re on this page because you want to search for evidence relevant to your program.

Here are some academic sources where you can search for relevant research:

  • Google Scholar is a search engine for academic papers – enter the keywords relevant to your program, and you’ll find useful papers in the top links
  • Once you identify some useful papers, you can consult their literature review and bibliography sections to find other papers that might be relevant
  • Speaking to a sector expert can guide you to useful literature

 However, don’t include only academic studies in your review! You should also consult:

  • Policy reports
  • Websites of organizations involved in this issue, such as think tanks, NGOs, or the World Bank
  • Public datasets
  • Your program archives – data and reports from earlier iterations of the program can be very valuable!

 The free Zotero plug-in provides an easy way to save, organize, and format citations collected during internet research. Note that Zotero can help you start your annotated bibliography, but it is not a substitute since it does not include any summary or interpretation of each study’s findings.

How to compile evidence, method 1

Start with the 3ie Development Evidence Portal, which has compiled over 3,700 evaluations and over 700 systematic evidence reviews. Steps 2-7 of this example are specific to locating evidence on the 3ie website, but you can also consider looking for a review by J-PAL or Campbell Collaborations or the Cochrane.

For example, suppose your goal is to increase immunization rates in India. Type “immunization vaccination” or other related terms into the search box, and click the magnification lens to search.

The search results include individual studies, which are usually about a single program in a single location, as well as “systematic reviews”, which is what we are looking for because they are more comprehensive. To show only the systematic reviews, on the left of the screen under Filter Results, click on PRODUCTS and check the Systematic Reviews box. We’re now left with 17 evidence reviews related to immunization.

Now you might want to further narrow your search by region or country. In our example, suppose we want to see only those evidence reviews that contain at least one study from India. Click on COUNTRY and scroll down to click on India.

There are still 9 evidence reviews! Now read the titles of each review and start going through the ones that seem applicable to you.

  • Note that they are sorted with the most recent first, which is helpful as newer reviews tend to be more comprehensive.
  • 3ie has made the hardest part – assessing how strong the evidence is – easy for us. They use a 3-star scale to indicate the level of confidence in the systematic review.
  • In this example, the most recent review, Interventions For Improving Coverage Of Childhood Immunisation In Low- And Middle-Income Countries, is also the only one rated 3 stars (high quality). Click on its title.

The next page gives you an overview of the study. If it is “Open access”, this means you can read it for free – click “Go to source” below the star rating. If it isn’t open access, you can try some of the strategies in Step 9 to see if you can find the study for free elsewhere.

Clicking on “Go to source” opens a new tab with a PDF of the article. Don’t be intimidated by the length and technical terminology, and start with the summary – these articles usually include an “abstract” and sometimes a “plain language summary” and/or “summary of findings”.

The summary will likely be useful but too vague – dig into the review and look for details about which programs were tried and where, and how well they worked.

  • Keep track of which programs and studies seem particularly relevant, so that you can look them up later.
  • Consider the conclusions of the authors of the systematic review – are there trends that emerge across countries and contexts that are relevant for you? Overall, what are the main lessons from this review that you take away?
  • Be sure to read with a skeptical mindset – just because something worked in Japan doesn’t mean it will work in India – nor does it mean it won’t. And just because something worked in India before doesn’t mean it will continue to work – context is more complicated than country! Think about the evidence you find as well-documented ideas, but not the last word.
  • You can skip the parts about the methodology followed by the authors of the systematic review.
  • If the systematic review was helpful, add it to the bibliography.
  • Copy the citation from the references section and paste it into a search engine.
  • Usually, the first result will be the full paper on a journal’s website. If it is open access, you can read it directly. A lot of academic literature is not open access, unfortunately. Here are some tricks you can try if the article is not available:

a) Go back to your search and see if you can find a PDF posted on one of the authors’ websites – authors often share “working papers”, which might differ only slightly from the final paper, for free on their site.

b) Email the paper’s authors if you can’t find it elsewhere – many researchers are happy to share a copy with people looking to learn from their experience.

  • Read the paper. You probably don’t need to read all of it – the abstract, introduction, a description of the program, the results, and the conclusion are probably enough, whereas sections on technical methods can be skipped. This paper may also include references to other literature that could be relevant to your program. Keep track of these references so you can look into them later.
  • Add the paper to the annotated bibliography if it seems relevant.
  •  

For each piece of evidence that you find, there should be a clear justification for including it in the bibliography, such as: it is a landmark study in the topic (i.e. it has a large number of citations or is cited by many other studies in your review), it is relevant to specific aspects of this evaluation (such as measuring similar outcomes, being conducted in a similar context, or evaluating a similar intervention), etc. However, there are no absolute standards for inclusion, and since not all studies will be used in writing up the review, it is better to err on the side of including a study in the annotated bibliography.

Repeat step 9 for every paper from the systematic review that you found relevant!

  • If an organization ran the program that was evaluated, you might be able to find information on the organization’s website. If not, try emailing the organization.
  • Sometimes people share program details elsewhere – blogs, policy briefs, videos, etc. Try searching more about the program and you might find something.
  • In our example, we found a review from 2016, which is quite recent. However, keep in mind it can take 1-2 years to write and publish a review – and since the review is citing only published literature, the cited articles would be a year or two old as well. This means that it is likely that this review is missing anything done since 2014 or 2013, and hopefully the world has learned a lot about how to address your problem since then. While this shortcut helped you find relevant evidence, you might still want to use Method 2 so that you can see what has been learned since 2014.

 

  • This method used only academic sources. Non-academic sources are also a very useful source of information, and you should look into them. In Method 2, we have included a list of non-academic sources to consult.

Process evaluation vs monitoring: Which one do you need?

Process evaluations and monitoring both provide information on how your program is running and whether it is meeting expectations. The key difference is that process evaluations are a one-off activity, while monitoring is ongoing. That means that process evaluations are often more intensive exercises to collect more data and dive deeper into the theory of change. In contrast, ongoing monitoring must not overburden program staff and often tracks just a few high-priority indicators that are critical to program success.  

Consider the following questions to help you decide between conducting a process evaluation and building a monitoring system:  

1.     Are you interested in identifying specific problems or general problems? 

Process evaluations typically identify general or systemic problems along the theory of change, whereas monitoring typically identifies specific entities (e.g. service providers or locations)that need more attention.    

2.     Are you looking to hold program staff accountable?   

Both process evaluations and monitoring are implemented for learning – is our program being implemented as planned? If not, at which steps is it breaking down? However, if you are seeking an accountability system, monitoring is better-suited as it is continuous, whereas a process evaluation is a one-time exercise.  

3.     Do you need ongoing data on how your program is performing?  

A process evaluation typically offers a snapshot in time, whereas monitoring involves ongoing data collection and analysis for the entire duration of the program. For example, a process evaluation may do in-depth interviews with program participants on their experiences, whereas a monitoring system might collect data on just a few questions related to beneficiary satisfaction.  

4.     Do you need comprehensive data?

A process evaluation is typically based on a sample, while monitoring is usually comprehensive. For example, in a teacher training program, you would monitor the training of all teachers (because it is useful to know exactly which teachers did not attend the training), whereas in a process evaluation, you would interview a subset of teachers to understand the reasons why they did not attend the training.