## Factors in Student Performance: Improving Research

Our data work, and our research, in collegiate mathematics education tends to be simple in design and ambiguous in results. We often see good or great initial results with a project, only to see regression towards the mean over time (or worse). I’d like to propose a more complete analysis of the problem space.

The typical data collection or research design involves measuring student characteristics … test scores, HS GPA, prior college work, grades in math classes, etc. For classical laboratory research, this would be equivalent to measuring the subjects without measuring the treatment effects directly.

So, think about measurements for our ‘treatments’. If we are looking into the effectiveness of math courses, the treatments are the net results of the course and the delivery of that course. Since we often dis-aggregate the data by course, we at least ‘control’ for those effects. However, we are not very sophisticated in measuring the delivery of the course — in spite of the fact that we have data available to provide some levels of measurement.

As an example, we offer many sections of pre-calculus at my college. Over a period of 4 years, there might be 20 distinct faculty who teach this course. A few of these faculty only teach one section in one semester; however, the more typical situation is that a faculty member routinely teaches the same course … and develops a relatively consistent delivery treatment.

We often presume (implicitly) that the course outcomes students experience are relatively stable across instructor treatment. This presumption is easily disproved, and easily compensated for.

Here is a typical graph of instructor variation in treatment within one course:

We have pass rates ranging from about 40% to about 90%, with the course mean (weighted) represented by the horizontal line at about 65%. As a statistician, I am not viewing either extreme as good or bad (they might both be ‘bad’ as a mathematician); however, I am viewing these pass rates as a measure of the instructor treatment in this course. Ideally, we would have more than one treatment measure. This one measure (instructor pass rate) is a good place to start for practitioner ‘research’. In analyzing student results, the statistical issue is:

Does a group of students (identified by some characteristic) experience results which are significantly different from the treatment measure as estimated by the instructor pass rate?

The data set then includes a treatment measure, as well as the measurements about students. In regression, we then include this ‘instructor pass rate’ as a variable. When there is substantial variation in instructor treatment measures, that variable often is the strongest correlate with success. If we attempt to measure student results without controlling for this treatment, we can report false positives or false negatives due to that set of confounding variables. Another tool, then, is to compute the ‘gain’ for each student. The typical binary coding (1=pass 2.0/C; 0=else) is used, but then subtract the instructor treatment measure from this. Examples:

- Student passes, instructor pass rate = .64 … gain = 1-.64 = .36
- Student does not pass, instructor pass rate = .64 … gain = 0-.64 = -.64

When we analyze something like placement test scores versus success, we can graph this gain by the test score:

This ‘gain’ value for each score shows that there is no significant change in student results until the ACT Math score is 26 (well above the cutoff of 22). This graph is from Minitab, which does not report the n values for each group; as you’d expect the large confidence interval for a score of 28 is due to the small n (6 in this case).

That conclusion is hidden if we look only at the pass rate, instead of the ‘gain’. This graph shows an apparent ‘decreased’ outcome for scores of 24 & 25 … which have an equal value in the ‘gain’ graph above:

The main point of this post is not how our pre-calculus course is doing, or how good our faculty are. The issue is ‘treatment measures’ separate from student measures. One of the primary weaknesses of educational research is that we generally do not control for treatments when comparing subjects; that is a fundamental defect which needs to be corrected before we can have stable research results which can help practitioners.

This is one of the reasons why we should not trust the ‘results’ reported by change agents such as Complete College America, or Jobs For the Future, or even the Community College Research Center. Not only do the treatment measures vary by instructor at one institution, I am pretty sure that they vary across institutions and regions. Unless we can show that there is no significant variation in treatment results, there is no way to trust ‘results’ which reach conclusions just based on student measures.

Join Dev Math Revival on Facebook: