STEEP - Research

The STEEP has a solid research base with multiple studies in the top peer reviewed scholarly journals.  Hence, STEEP goes beyond being merely "research-based."  For background on the difference between peer reviewed research and being merely research based click here.  STEEP is a program that includes several components (e.g. intervention, progress monitoring, etc.). It was built by researchers on a solid foundation of research. Hence, in creating or selecting components, it was important that each part of the program have strong supporting. However, merely because each part is effective, does not mean the system as a whole would produce meaningful outcomes with students. Hence, the process as a whole has also been evaluated. Most of the studies reviewed below are published in peer reviewed scholarly journals and meet the criteria for scientifically based research.

For research on STEEP and in support of STEEP components. 
  1. Bibliography (Coming Soon)


Review of Research Support for the STEEP RTI Program

In describing the research base for STEEP, the information will be divided into (a) the research support for the program as an integrated process and (b) the research support for the individual components.

Research on the Model as a Whole

Researchers have investigated and evaluated the STEEP model as a whole. This type of research is needed because it is possible to select the very best screening procedures, the very best progress monitoring procedures, and the best research based intervention and still not produce good outcomes because the various components do not work well together to produce good academic outcomes for students.

Improving referral accuracy. The goal of RTI is to improve achievement. However, as a result of screening and improved achievement, the need for special education is reduced. With respect to reducing referrals, it is more precise to say that STEEP increases the accuracy of referrals. Amanda VanDerHeyden (University of California-Santa Barbara, see Note 1) and her colleagues (Vanderheyden et al, 2003) studied students in the Southeastern U.S. and used a comprehensive assessment and intervention process to establish a "gold standard" as to whether a child truly did or did not have a problem. For the purpose of identification of students for special education, Teacher referral was accurate 19% of the time in 406 cases, whereas the STEEP process was approximately 3 times more accurate than teacher referral and various other screening methods and tests. Given that teacher referral is so important in a traditional problem-solving model, the researchers concluded that data-based decision making involving universal screening plays an important role in determining who needs assistance. In particular, given the finding that teacher referral was accurate less than 20% of the time and given the importance assigned to teacher judgment, it seems important to take a closer look at a broader range of variables. In addition, this same study found that classroom context significantly and negatively impacted the accuracy of teacher referral. That is, teachers became much less accurate at identifying students who did and did not have a problem in both low-achieving and high-achieving classrooms (as compared with "normally" achieving classrooms), whereas STEEP maintained or achieved even greater accuracy across those contexts contexts.

More recently, VanDerHeyden et al (2007, Journal of School Psychology) studied students in the Western United States and showed reductions in referrals and improvements in achievement as STEEP was sequentially introduced across 5 schools (one by one) within one district. They also found that the quality of the referral increased. That is, students who did not respond to the STEEP program were more likely to qualify for special education but fewer students were referred. The program had a generally positive effect with respect to disproportionality in terms of ethnicity and language proficiency. This paper was awarded best scholarly article of the year in 2008 by the Society for the Study of School Psychology. For more information about the paper, click here.

Over-identification and disproportionality. Vanderheyden and Witt (i2005, School Psychology Review) examined the effect of STEEP in situations where there were either many high achieving students or proportionately high numbers of low achieving students. The findings indicated that teacher referral is markedly affected by the situation. For example, the "low" student in a high achieving classroom may get referred even though the "low" student is still in the normal range. However they "stand out" to the teacher because they are low relative to high performing peers. STEEP places an objective lens on the situation and is much more accurate regardless of context. A very interesting finding in this study was that minority children (who were primarily African American) were disproportionally represented as "low achievers" and fell into the bottom 25% of classes. However, the minority students were more likely to have rapid acceleration of learning when given a strong intervention. The researchers hypothesized that the quality of the intervention used may have been more in line with the needs of minority students than was their core curriculum.

Improving achievement in general education. VanDerHeyden and Burns (2005, Assessment for Effective Intervention) found that STEEP intervention procedures produced statistically significant gains in math performance for at-risk students. This study, along with VanDerHeyden et al 2006 will be of interest to principals and teachers because the studies show the importance and relevance of RTI to general education. VanDerHeyden and Burns found that CBM assessment and intervention produced significant achievement gains in math and produced statistically significant improvements in state testing scores on the Arizona state test. RTI is best viewed as an instructional model and these studies show that RTI can produce gains for all students. A "side effect" of improved achievement is reduced need for special education and reduction of problems such as disproportionality. Disproportionality and over-referral are problems that are reduced when achievement is improved.

Research on the Components of STEEP

In addition to being evaluated as an integrated process, the various components of STEEP, (screening, intervention and progress monitoring) within the program have undergone separate testing. Each of the components will be discussed separately.

Universal Screening and Progress Monitoring.

Universal screening and progress monitoring procedures have undergone extensive testing by a large number of researchers. These procedures rely on curriculum-based measurement which has been around for many years and hundreds of studies have supported its use in decision-making. Books by Shinn (1985) or Shapiro (1986) provide detailed reviews of this extensive literature. Passage leveling. The STEEP benchmark assessment probes have been developed using a 3 step process. First, we purchased a research database containing 5 million words. The words came from an exhaustive study of several thousand books read by students at various grade levels. The study reported frequencies of all words used for each grade level. From the list of high frequency words for each grade level, words of high and medium frequency were selected. STEEP probes were then written with those words. Second, the probes were checked for readability. Spache readability was used for grades 1-3 and Dale-Chall readability was used for Grades 4 up. These readability procedures have been shown to be the best for the specific grades. Finally the probes were teacher tested and researcher evaluated. This process produces a probe that has high generality. This means that because the words used are based upon the words students see ever day, then their ability to read STEEP probes is highly predictive of how well they will read every day. A probe that is too difficult lacks sensitivity and, in screening, will over identify students as in need of intervention. Most forms of CBM are adequate but some are better than others. Reports of DIBELS, for example, indicate that the first Oral Reading Fluency probe that students encounter at First Grade has a Spache readability of 2.5 (meaning middle of second grade reading level). If a probe is too difficult then student will read less words and more students will be identified as "at risk." This will mean a school will identify more students as needing intervention thus consuming more school resources.

Readability. As noted above, each of the STEEP benchmark and progress monitoring passages is leveled use a readability formula. At one point we began to suspect that readability may not be the best method to level a passage and we became interested in studying readability formulas. Scott Ardoin led a study (Ardoin, Suldo, Witt, Aldrich, & McDonald, 2005) in which we evaluated the accuracy of many different methods for estimating passage readability. The research indicated that our concerns were justified in that many readability methods are not very accurate. We continue to use readability in constructing our passages, however, we have bolstered that using the methods described above. We also continue to look for improved methods of leveling a reading passage. The use of a readability formula is unlikely to ever yield an accurate estimate for any one student. This is because readability is more or less a norm referenced concept. In general, most fourth grade students can read the "horse." However a particular student may have difficulty with that word. Therefore, passages leveled with a readability formula will be appropriate "in general" but there will be some individual differences between students.

Three vs. One Benchmark Probe. With universal screening, some systems utilize a process whereby each student receives three benchmark probes and the median is taken. With STEEP, only one benchmark probe is used. This combined with more efficient administration procedures means that STEEP takes less than one-third the time of DIBELS and other procedures. However, the question is, do you still get valid results using only one probe? A study by Scott Ardoin (University of South Carolina), Joe Witt and colleagues (which was awarded School Psychology Review article of the year by the Editorial Board for the journal in 2005) indicated that one probe yields equivalent results to three probes. It should be mentioned, however that one probe is not sufficient for progress monitoring. One probe results in too much "bounce" in the data and this is very problematic when intervening and making important decisions about progress in the context of RTI. STEEP therefore incorporates three probes for progress monitoring.

Instructional Placement Standards. The standards used for interpreting the STEEP CBM screening are those developed and recommended by Stan Deno (University of Minnesota) for instruction. They represent instructional standards for each grade level. Other systems use norms for interpreting CBM data. The norms are frequently not representative of the students at any one school and are not made available for review. Norms also take CBM into the realm of the standardized test and away from whether students can read adequately.

Can't Do/Won't Do Assessment. The can't do/won't do assessment has also undergone research and evaluation. Peer reviewed published research has been published in scholarly journals to support its use. Duhon, Noell, Witt et al (2004) found that the procedure correctly identified the correct intervention to use in all cases. An earlier study by Noell, Gansle, Witt and Colleagues (1998) yielded similar results. Additional information about these procedures is available: Noell, G. H., Gansle, K. A., Witt, J. C., Whitmarsh, E. L., Freeland, J. T., LeFleur, L. H., Gilbertson, D. A. & Northup, J. (1998). Effects of contingent reward and instruction on oral reading performance at differing levels of passage difficulty. Journal of Applied Behavior Analysis, 31, 659-664) or Duhon. G. J., Noell. G. H., Witt J. C., Freeland. J. T., Dufrene. B. A., & Gilbertson, D. N. (2004). Identifying academic skills and performance deficits: The experimental analysis of brief assessments of academic skills. School Psychology Review. 33, 429-443).

Progress Monitoring

The STEEP intervention progress monitoring system involves setting a goal, drawing an aimline and monitoring progress relative to standard decision rules. The intervention manual provides suggested progress goals for setting modest, reasonable and ambitious goals. These goals were based upon an integration of published studies pertaining to student progress. Once progress monitoring begins, the rate of progress for an individual student is evaluated relative to the aimline using standards established and recommended by Stan Deno of the University of Minnesota.

Intervention Selection

Within RTI, there are two basic approaches: the problem solving approach and the standard protocol model. The problem solving model calls on the team, through discussion and brainstorming, to identify student needs and to determine an appropriate intervention. With the standard protocol model, each step including the selection of appropriate intervention is guided by research based decision rules. Using research to guide decisions means that you connect a academic problem with an intervention that is known to be effective for that problem. This increases the likelihood that the correct intervention is matched with each problem. The STEEP program uses a standard protocol method. The STEEP standard protocol method uses data to recommend a specific intervention to match the students unique needs. To implement a standard protocol for intervention selection, one needs an instructional model, an assessment that determines student status within the instructional model, and research that show that students with a specific status in the instructional model improve more with specific interventions and improve less or not at all with other interventions. STEEP intervention selection is based upon an instructional model called the Instructional Hierarchy. The research in support of the STEEP standard protocol is based upon research by Duhon (2006) who found that use of the intervention indicated by the standard protocol was markedly superior to using an intervention which was not matched via protocol to student needs. Numerous other supporting studies have been conducted by Ed Daly and his colleagues.

Intervention Fidelity

Intervention fidelity mean that the intervention is implemented as intended. The STEEP program incorporates an implementation protocol to enhance fidelity. This protocol was created by Joe Witt who has published over 30 research studies on intervention implementation in schools. The protocol incorporates 3 practices that have been shown to improve implementation of school based interventions:

(a) an implementation protocol to help prevent problems,
(b) monitoring of fidelity using permanent products, and
(c) periodic review and performance management

STEEP Outcomes

The most important outcome of RTI is to improve student achievement. Hence, the RTI process for a district must have a clear focus on student outcomes. There are many tools available to assist districts to conduct RTI. However, some tools focus only on screening and progress monitoring. Screening and progress monitoring are parts of RTI but they are merely assessment. There is an old saying that "Weighing a cow does not make it fatter." Weighing does not help the cow to grow; eating grass does help the cow to grow. Similarly, assessment does not improve achievement--instruction improves achievement. When a district selects an RTI model, it was important to us that we be able to have research to say "YES" to the following questions:
  1. Does your RTI program improve achievement of students in general education?
  2. Does your RTI program reduce the need for special education placement?
  3. Does your RTI program help you select an appropriate intervention to respond to unique student instructional needs?
  4. Does your RTI program have a positive effect on disproportionality in special education?
  5. Is your RTI program reliable and valid and does it increase the accuracy of the referral and placement process over traditional methods such as teacher referral as well as traditional screening and testing.
See above for a review of STEEP outcomes.

Notes
  1. In the early literature on STEEP, parts of the process has been variously labeled Problem-Validation Screening, Screening to Enhance Equitable Placement, and Screening to Enhance Educational Performance. All of these names refer to the same process.