# My Research Philosophy Statement (WIP)

**Note:** This is a work-in-progress, but I figured: “Why not work in the open?”. If you have any thoughts, or feedback, I would of course appreciate that!

Throughout my PhD, I have been primarily concerned with methodological developments related to measurement error, as it pertains to causal inference generally, and dynamic treatment regimes more specifically. While my work has been predominantly focused on expanding methodology, I have a keen interest in providing a theoretical basis for (comparatively) straightforward methods, which are easy to use for non-statisticians, while exhibiting provably good theoretical properties.

## Dissertation Work

Broadly, my research has focused on the fields of causal inference (in particular, precision medicine) and measurement error, where we are conducting the (to our knowledge) first investigation of how the latter impacts the former. Briefly, precision medicine seeks to use an individuals observed covariates to determine the optimal treatment strategies. These are referred to as adaptive treatment strategies or dynamic treatment regimes (DTRs). That is, precision medicine answers the question “what is the optimal treatment for a patient, when treatment can be”tailored" based on patient-specific information?". Measurement error refers to any situation where variables of interest are not accurately observed.

My dissertation work focused primarily on two, related questions:

- How does the presence of measurement error impact the estimation of optimal adaptive treatment strategies, and can these impacts be remedied?
- How can existing, commonly used measurement error correction techniques be expanded, to have broader utility for applied researchers?

The core question of my dissertation came out of the work I completed during my Masters (at the University of Waterloo, under the supervision of Michael Wallace) where we showed that measurement error in tailoring covariates has a substantial impact on the efficacy of estimation of optimal treatment regimes, and correspondingly called into question the use of DTRs estimated with error-prone data for the purpose of future treatment prescription. My PhD work (at the University of Waterloo, supervised by Michael Wallace and Grace Yi) continued on this thread, working on methods to remedy the effects of error in DTRs. We showed that a comparatively simple procedure, regression calibration, can be used to largely overcome the issues introduced with simple covariate measurement error, under fairly mild assumptions (Spicker and Wallace, 2020).

Doing this required an expansion of the regression calibration technique, in order for it to be applicable to the data on hand. This prompted the secondary question of concern for my dissertation: how can we adapat techniques which are in common use, maintaining their appeal while broadening the scenarios in which they are theoretically defensible? Our first investigation of this line of questioning took as inspiration the generalizations needed for regression calibration in DTRs. In particular, we looked at a class of measurement error correction techniques which rely on “replicate measurements” (that is, independent and identically distributed remeasurements of the quantity of interest) to correct for the effects of measurement error (regression calibration is one such technique).

We demonstrated how, with a slight modification to the estimation procedure, these techniques could be applied with any set of independent measurements, whether or not they are identically distributed (Spicker, Wallace, and Yi, 2021). Relaxing this assumption serves as an important contribution as observed data often dramatically contradict the assumption of identically distributed replicates, and these techniques are among the most applied error correction procedures.

We continued to expand extant methodologies, through an extension to simulation extrapolation, another frequently used procedure in measurement error correction. Simulation extrapolation relies on a bootstrap-esque, “remeasurement” procedure, and assumes normality of errors for its theoretical results. We demonstrated how this technique can be made nonparametric, removing the need for any assumptions on the error distribution, by “remeasuring” from an empirical distribution instead (Spicker, Wallace, and Yi, 2021). This contribution is particularly valuable, since the assumption of normality is pervasive in the measurement error literature, but there is strong evidence that it is frequently violated in observed data.

The final project specifically associated with my dissertation involves questions regarding adherence to treatment regimes. While my work has predominantly looked at errors in continuous variates, it is also an important question to consider what happens when the assigned treatment (typically a binary indicator) may not be accurately observed: that is, some patients do not adhere to the treatment they were assigned. This misclassification problem is of particular importance in the causal inference literature as ignoring it leads to parameter estimates with no substantive, causal interpretation. Presently, my ongoing work is considering ways to recover valid, causal estimators, by modifying DTR estimation techniques rooted in the estimating equation literature.

## Non-Dissertation Research Work

Outside of the work completed for my dissertation, I have also been involved in several intradisciplinary research teams. At the University of Waterloo, I am involved in a group with members from the School of Public Health, and Systems Design Engineering investigating the use of novel machine learning techniques applied to the study of dietary patterns and their implications for health. Dietary data is notoriously error-prone, and machine learning techniques are typically opaque in terms of interpretability. As a result, the simple application of these techniques to these data is unjustified, without a thorough investigation of the ways in which error can be handled, and without substantial improvements to the interpretability and explainability of the machine learning. While the core aim of the group is to work on the applied questions related to health outcomes and health equity, this investigation has branched into several interesting methodological questions which are of interest to pursue.

Prior to my masters degree, I was involved in another intradiscplinary research team at The Smith School of Business (Queen’s University). The research team involved researchers from Queen’s, in addition to members from the United States, Poland, and Italy. In this team, I provided guidance on the application of machine learning principles to questions of management science. In particular, the group was interested in determining how executives made decisions, whether based predominantly on intuition or analsis. This work was published in a book and has since been expanded into articles (Liebowitz, et al., 2018). While the work itself predates my involvement with statsistics research entirely, the formative experience of intradisciplinary research, with a diverse team, has instilled in me the importance of working diversely to approach interesting and important questions. Moreover, seeing the inner workings of an applied team, demonstrated the utility in ensuring that methodological developments are both theoretically sound, as well as accessible to practitioners, a lesson that I have continued throughout my research to date.

## Present and Future Considerations

In the future, I hope to expand the specific areas of research consideration, while maintaining the core underlying principles: developing methodology which is useful for practitioners, theoretically grounded, and tied to interesting questions. Moreover, I hope to continue to expand my multidisciplinary involvement, both as a means of motivating methodological questions and also to ensure that sound statistical theory is applied in the literature.

While there are several avenues to continue investigating measurement error in the context of DTRs (including a more thorough investigation of optimal estimation in the presence of error, for instance), precision medicine presents many interesting and understudied areas of investigation which are of direct interest to me. Some possible lines of inquiry include:

- Considering other forms of noisy data (for instance, missing or censored data), and their impact on optimal DTR estimation;
- Considering relaxations to assumptions that lay at the heart of causal inference (for instance, how can we accommodate unmeasured confounding);
- Considering utility-based outcomes, that take into account patient-preferences and alternative outcomes, in place optimizing a single, numerical variable;
- Considering problems related to the so-called “non-traditional inference”, that arises due to non-differentiability in the estimation procedures;
- Considering the application of these methods to problems outside of the domain of medicine (for instance, for the purposes of personalized education or in the domain of marketing).

These problems range from very applied to very theoretical, and could serve as the basis for a research program focused around precision medicine. I fully expect that as the adaptive treatment literature continues to move forward, additional theoretical challenges will arise, providing further ground for investigation.

Outside of a focus on precision medicine, I also have a particular interest in the statistical development of machine learning techniques, for the purposes of inference rather than prediction. My intradisciplinary work has suggested that, outside of statistics, there is a strong interest in applying state-of-the-art deeplearning techniques in a wide variety of domains. From my perspective, a key limitation in the ability to do this successfully stems from the lack of interpretability, explainability, and statistical basis for these models. To this end, I am also interested in fairly simple questions regarding the statistical basis for these methods (for instance, variable selection techniques, significance testing, model comparisons and tests, and parameter interpretation), with the goal of providing insight into how the methods truly function. The long-term goal with this type of investigation would be to not only ground these novel computational techniques in statistical theory, but to do so to provide a mechanism for using these methods in areas (such as causal inference) which require deeper insight.