How Measures Are Scored

The IMPACT Measures Tool was developed to address the diverse needs of early childhood and parenting initiatives. Our team of measurement and evaluation experts uses a research-driven scoring method to examine each measure’s cost, usability, cultural relevance, and technical merit. Scores are based on information available from developer websites, technical manuals, and published peer-reviewed studies.

Scoring Diamond

Scoring diamond

The scoring diamond is a visual representation of how well a measure scores in each of the four key categories: cost, usability, cultural relevance, and technical meritEach category is assigned to a specific corner of the scoring diamond. The shaded region depicts the score of a measure in relation to the maximum score of 10 possible points for each category.  

IMPACT minimum score: We have highlighted measures that do not meet IMPACT’s recommended minimum score for technical merit. These measures either do not have sufficient technical merit evidence to score, or the evidence available is minimal and unsatisfactory. 

You can also set your preferences for a desired score in each of the four key categories. Once you set preferences, a dotted line will display on the scoring diamond, indicating how each measure scores in comparison to your desired preferences. 

Overview of Scoring System

Scoring system overview

Four Key Categories

I. Cost (10 points): How much does this measure cost?

The cost of a measure is determined by its price and how easy it is to access. Measures that cost less and that are easier to access receive higher scores.

Accordion Content

The IMPACT Measures Tool includes information on both free and at-cost measures. Each measure is assigned a score based on any financial costs that are required to administer the measure. For example, a measure might require purchasing of the actual license to use the measure, additional user and/or training manuals, hard copy of administration materials, etc. 

Each measure is also scored on how easy it is to access, based on whether the actual measure is downloadable online and ready for use. Measures that are free but require an account or contacting the developer for access are not considered accessible in the scoring system. Measures at cost do not receive any points in this category.

II. Usability (10 points): How easy is it to use this measure?

Usability represents how easy it is to administer a measure and to interpret the results. Measures score higher if they are easier to administer and interpret.

Accordion Content

Scores in this subcategory reflect how long it takes to administer the measure, including any setup required, delivery of materials, result calculation, and interpretation.

Scores are assigned based on the level of difficulty to learn to administer the measure, including any equipment, materials, and standardized language needed.

Scores are assigned based on how easy it is to interpret the measure’s results, such as whether the measure is norm-referenced (i.e., allowing comparison to other measures), whether supplemental materials are provided, and/or how detailed the coding process is described (if applicable).

Scores are assigned based on how many formats the measures can be administered in (i.e., paper and/or electronic). Measures are also scored based on whether internet access is required to administer the measure. 

III. Cultural Relevance (10 points): Does this measure serve different groups?

Cultural relevance explores the extent to which measures are developed with different communities in mind, and the steps taken to prevent or address measurement bias. Measures are subject to specific scoring criteria depending on whether they are developed for use in a single country, internationally (i.e., across multiple countries), or for a specific program.

Accordion Content

Scores reflect how inclusive the validation sample is for age, gender, ethnicity, socioeconomic status, linguistic diversity, geographic regions, and urbanicity.

Scores in this subcategory are based on the measure’s item generation process. This includes measure authors consulting with the community that the measure was intended for and developing with a diverse population in mind.

Measures receive higher scores if authors conducted analyses to identify potential bias between demographic groups (e.g., men and women) at the item level and if these biases were addressed. Scores in this subcategory also reflect if analyses were conducted to identify the potential bias of the measure as a whole and if these biases were addressed. Measures get higher scores if the developers were able to statistically demonstrate that the measure was unbiased with respect to these dimensions.

Our history subscore gives a measure credit if it was originally developed and/or validated in the past twenty years.

IV. Technical Merit (10 points): How accurate is this measure?

Each measure is scored based on how consistent and accurate it is, defined as Technical Merit.

Accordion Content

Validity indicates whether a measure accurately measures the topic that it intends to.

Scores for this subcategory are based on the measure’s stated purpose, the extent to which measure items and results represent the intended topic (such as parental stress), and the expected relationships between the measure topic, other topics, and/or individual characteristics.

For example, a measure intended to assess children’s vocabulary would be scored on validity based on whether it actually measures vocabulary skills versus other skills such as working memory or attention. For additional details and examples on types of validity, please review our Scoring Guidebook

For screening tools, validity is assessed through sensitivity and specificity. Screening tools receive higher scores for accurately identifying individuals at risk while avoiding over-identifying individuals who are not at risk (i.e., false positives).

Reliability indicates how consistent the results of a measure are. There are multiple types of reliability that factor into a measure’s score. These include rating the consistency of:

  • Measure’s results over time (test-retest reliability)
  • Different raters of a measure (inter-rater reliability)
  • Items within a measure (internal consistency)

Note: Not all types of reliability apply for every measure.

Measures are also reviewed for their report of means and standard deviations of scores, which allow for calculation of where scores fall in respect to the norming sample.

More Information

Looking for more in-depth information about our scoring system? View our Scoring Guidebook.

Technical Assistance Consultation

Looking for support with research design, implementation, or evaluation?

Request a free consultation call with our team to discuss our technical assistance services.

Contact Us

We are committed to supporting early childhood programs. Have feedback or questions? Contact us or visit our FAQ.

Phone: 541-346-4815