ITSEW 2009
The Total Survey Error Concept: Uses and Abuses

Abstracts and Authors:

1. Effects of Disruptions on Total Survey Error

Alan F. Karr, National Institute of Statistical Sciences
Research Triangle Park, NC 27709 USA

Increasingly frequently, surveys are disrupted by exogenous events such as natural disasters, as well as internal events such as budget reductions. In this paper, I will discuss a conceptual taxonomy for data quality/total survey error effects of disruptions, and illustrate it using the National Health Interview Survey. I will also present a problem formulation that explicitly links data quality to cost, which will be illustrated in the setting of a natural disaster such as Hurricane Ike in Texas (September, 2008).

This is joint work with Myron Katzoff (US National Center for Health Statistics), Meena Khare (US National Center for Health Statistics) and Daniel Nussbaum (Naval Postgraduate School).

2. Relationship between Measurement Error and Unit Nonresponse in Household Surveys: An Empirical Approach in the Absence of Validation Data

Andy Peytchev, RTI International, e-mail: andrey@umich.edu (corresponding author)
Emilia Peytcheva, RTI International, e-mail: emilia@umich.edu

Social desirability is often noted as a cause of measurement error. If it is socially undesirable to be overweight or to have had an abortion, one may not disclose their true response.

The same causes of measurement error may also lead to unit nonresponse – the same reasons to conceal a fact or behavior during an interview may also lead to not agreeing to participate in the survey.

Such a socio-psychological common cause can create an association between measurement error and unit nonresponse. Understanding the causal relationship between measurement error and unit nonresponse is critical to the minimization of total survey error, yet both are elusive – they are difficult, often impossible, to measure. In most studies, true values are not known and the association between measurement error and unit nonresponse can not be estimated.

We propose a model-based approach to estimate measurement error in the absence of true values. First, predictive models are specified for selected variables to obtain estimates of measurement error. Next, this approach is used to estimate the association between measurement error and the likelihood of unit nonresponse through iteratively fitting two sequential models. This is tested with data from the National Health and Nutrition Examination Survey and the National Survey of Family Growth.

To evaluate the performance of this approach, measurement error is then calculated using proxies for true values: between self-reports and physical measurements for body weight in NHANES and between reports of abortions to interviewers versus self-reports in NSFG.

The performance of this approach is critically discussed in the context of its usefulness in understanding the relationship between multiple sources of survey error.

3. Quality Monitoring in Cross-National Survey Research: A Framework

Beth-Ellen Pennell (contact:bpennell@isr.umich.edu), Sue Ellen Hansen, Kirsten Alcser (University of Michigan), Janet Harkness (University of Nebraska, Lincoln), Brad Edwards and Pat Montalvan (Westat)

In general survey research, assessing the quality of survey data requires adequate documentation of the entire survey lifecycle and an understanding of protocols used to assure quality. The assessment procedures and criteria become more complicated in cross-national research, which in addition to methodological, organizational and operational barriers to the implementation of quality monitoring and producing documentation, may often include additional production processes, such as adaptation and translation of questions and pretesting in diverse contexts.

Discussions of quality often focus on fitness for use, survey error, or both. This paper will discuss seven dimensions of quality in a cross-national context and how they are impacted by cost, burden, design constraints and ‘professionalism’ (see Figures 1 and 2 for the basic framework). Such a process quality approach requires the use of quality standards, management for quality, and the collection of standardized study metadata, question metadata, and process paradata. Figure 3 outlines the elements of process quality management that allow users to assess quality.

Using this framework, this paper provides examples and challenges to quality monitoring and producing quality profiles in cross-national projects. We then focus on new approaches being used in the Survey of Health, Ageing and Retirement in Europe (SHARE) and proposed for the OECD’s Programme for the International Assessment of Adult Competencies (PIAAC), and make general recommendations with regard to quality monitoring across the survey lifecycle.


4. Common Misinterpetations and Pitfalls in the Use of Cohen’s Kappa and Cronbach’s Alpha

Paul P. Biemer, RTI International
Research Triangle Park, NC

Two ubiquitous measures of reliability are Cohen’s kappa statistic for test-retest reinterviews and Cronbach’s alpha statistic for scale score measures. Both of these statistics require fairly strong assumptions that are seldom satisfied in practice. In addition, they often mislead and can be misinterpreted by social science researchers. This paper discusses some problems with these two measures reliability commonly encountered in practice. Some alternative estimators of reliability are proposed that can be applied in many situations where kappa and alpha would be used but provide more accurate assessments of reliability. Some examples are provided to illustrate the problems and the propose solutions.

5. Evaluation of Respondents’ Reporting of Medical Events and Relationship to Response Propensities in the Medical Expenditure Panel Survey

Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
Trena.ezzati-rice@ahrq.gov (Contact Author), Frederick.rohde@ahrq.gov (Coauthor), Robert.baskin@ahrq.gov (Coauthor)

Agency for Healthcare Research and Quality (AHRQ)
540 Gaither Road, Rockville, MD 20850 USA

The Medical Expenditure Panel Survey (MEPS), ongoing since 1996, is a large-scale complex sample survey. The Household Component of the MEPS is a nationally representative survey of the U.S. civilian noninstitutionalized population and includes a comprehensive data collection effort from a sample of families and individuals in communities across the United States. During the household interviews, detailed information for each person in the household is collected including: demographic and socioeconomic characteristics, health conditions, health status, health care use, expenditures, sources of payment for health care services, and health insurance coverage. The MEPS is a panel survey which features five rounds of interviewing covering two full calendar years. The panel design provides continuous and current estimates and facilitates evaluations of transitions in respondents' health status, income and employment status, health insurance coverage, use of medical services, health care expenses, and varying sources of payment for health care. Also, the MEPS two-year panel design with multiple rounds of data collection provides an important resource for methodological evaluations of multiple sources of survey error. Previous research has shown differential reporting of medical events (e.g., inpatient, outpatient, emergency room and office-based doctor visits, as well as prescription drug purchases) across the five rounds of interviews which has the potential to impact the annual total number of events and expenditures. In this research analysis, we examine the correlates of potential event level measurement bias by mode of interview, type of respondent (cooperative versus reluctant), and other selected variables. Potential relationship of lower event reporting across rounds with variations in response propensities based on auxiliary response predictors will also be examined. Patterns among various event types will be assessed to determine whether a common set of correlates and/or propensities exist for different medical event types and reporting levels. Multiple years of MEPS data are evaluated.

6. Tracking error in a process control sample survey system: a study in interaction between administrative and behavioural deviations from design.

David Lawrence and Stephen Horn
Australian Government Department of Families, Housing, Community Services and Indigenous Affairs

The RSS has been developed to provide an independent measure of process health in delivering government payments. As such, following Franklin’s paradigm, it exemplifies the shift from rule-based to principle-based public regulation. The principles involved in this case are those attached to sample survey methods combined with repeated measure estimation.

Because the surveys are conducted by the delivery agency (under separate contract) designed by the program owners in separate streams who have vested interests in results and used for external audit certification, most elements have received scrutiny, not all informed by survey principle.

The system is unusual for a face to face population survey in having (virtually) no nonresponse. It relies instead on behavioural assumptions concerning reporting capacity and intention to obtain measures of illicit or damaging activity. This while rare entails an elaborate procedure of cross and third party checking, quality checks and independent validations.

This paper focuses on sources of survey error and how impact has been explored. It describes a test of the bias arising from excluding sample for random review found to be already scheduled for review under one or another of the regular compliance programs. A control sample was drawn that mirrored the review sample. Over and under payment inferred from compliance history in the control sample was compared to the results of the random reviews. The results show regular reviews disclosing fewer deviations from payment accuracy but on average larger amounts, consistent with a plausible model of slight downward bias from exclusion in the random reviews.

Reference
Horn, S., Quality Assurance Surveys and Program Administration, Proceedings of Q2008, European Conference on Quality in Official Statistics, Rome 2008

7. Identifying Multiple Sources of Errors: The 2007 Classification Error Survey for the US Census of Agriculture

Jaki McCarthy and Denise Abreu

Following the 2002 Census of Agriculture, a Classification Error Study (CES) was conducted to estimate the number of operations misclassified (either as farms or non-farms) in the census. This was done by matching operations who reported in the separate area frame based June Agricultural Survey (JAS) conducted in June of 2002 to their census report and comparing their answers. The information on the JAS was assumed to be correct, since it was collected in person by trained enumerators, while the census is a self administered mail form. 2002 misclassification estimates were generated based on cases where the census report was classified differently than the matching JAS report. Overall, the estimated misclassification rate was small but it was clear that in some cases the assumption that the JAS was correct was not justified.

Since the 2002 misclassification estimates were not used to adjust published Census estimates, a different approach was taken for a classification error study in 2007. For 2007, the focus was on understanding why operations reported differently in June and on the census, rather than estimating misclassification rates. Census records were again matched to operations' reports from the 2007 JAS, but neither report was assumed to be "the truth." As in 2002, records where an operation was classified as a farm in one case and as a non-farm in the other were targeted. In addition, operations who reported total acres operated that differed by more than 25% between June and the Census were also included. Instead of assuming one source was correct, these operations were re-interviewed and shown their survey and census questionnaires and asked to resolve and explain the discrepancies. In addition, operators were asked general questions related to suspected problems in reporting their acreage.

The re-interviews uncovered several different sources of errors in reporting. These occurred in both the JAS and the Census with the majority in the JAS, not the Census. Errors were related to respondents, enumerators and NASS procedures and clearly show that a multipart solution will be required to address them.

8. Measurement equivalence vs. Representativeness: The influence of response enhancing measures on the comparability of answers.

Mr. JWS Kappelhof (J.Kappelhof@scp.nl)
The Social and Cultural Planning Office, The Netherlands

In recent years several large scale surveys among difficult to survey groups have been conducted in the Netherlands. Tailor made approaches have been developed to increase the response among these difficult to survey groups. The reason to try and achieve a higher response is to reduce bias in population estimates which will occur when non-respondents systematically differ from respondents with regard to the variables under investigation. The probability of getting biased estimates will increase when response is unequal across different groups. In the Netherlands ethnic groups are seen as difficult to survey. The response among ethnic groups can be increased by the use of interviewers with the same ethnic background, translated questionnaires, longer fieldwork periods and an increased number of contact attempts.

The drawback of these response enhancing measures is that one cannot assume that a (latent) factor will be measurement invariant across these ethnic groups because of the ethnicity of the interviewer and language -and cultural differences. On the one hand because respondents with different cultural and ethnic backgrounds might differ in their opinion as to what is important about the factor being measured and on the other hand because of the perceived social undesirability of certain answers or opinions. With regard to social desirability the ethnicity of the interviewer and language might be of influence. The ethnicity of the interviewer would particularly play a role when the question is about specific ethnic issues. Also the gender match between interviewer and respondent will have an effect as will the sensitivity of the issue.

One of the main objectives of cross cultural survey research is to compare concepts across groups. Specifically in the case of difficult to survey groups this usually leads to a tradeoff between representativeness (high and equal distributed response among groups) and measurement invariance. There is the need to adequately measure concepts among all groups involved in the survey and to be able to compare those concepts across groups for which they need to be measurement invariant.

This research focuses on the effect of response enhancing measures, such as the use of ethnic interviewers and gender matching on the measurement invariance of concepts. The second aim of this research is to establish if the effect of these response enhancing measures on the measurement invariance of concepts increases if the concepts are more socially sensitive.

9. TSE in mixed-mode survey systems: A trade-off between errors & costs

Katja Lozar Manfreda, University of Ljubljana, Faculty of Social Sciences, katja.lozar@fdv.uni-lj.si
Vasja Vehovar, University of Ljubljana, Faculty of Social Sciences, vasja.vehovar@fdv.uni-lj.si
Nejc Berzelak, University of Ljubljana, Faculty of Social Sciences, nejc.berzelak@fdv.uni-lj.si
Eva Belak, Statistical Office of Slovenia, eva.belak@gov.si

Mixed-mode survey systems are increasingly used in order to overcome the problem of declining response rates. In addition, availability of new technological solutions for contacting respondents and collecting survey data (Internet, mobile phones) at reasonable costs supports the introduction of new ways of implementing social surveys. As a result, different survey systems are possible, with different modes of contacting respondents and different modes of actual collection of survey responses for each wave (pre-contact, first contact, follow-ups). Their evaluation from the point of view of errors & costs is of extremely practical importance. It enables an informed decision about allocating the resources in order to obtain an optimal balance between errors & costs for a certain survey project.

We present an approach on how to find an optimum balance between errors & costs when confronted with a practical question of which combination of contacting strategies and which mode of data collection to use for a certain project. We propose two approaches for comparing survey errors (measured with MSE) and survey costs: for given mixed-mode survey systems we search the condition giving either (1) minimum product of MSE & costs or (2) minimal MSE at given fixed budget. After the theoretical discussion of the proposed approach several examples of our own evaluation studies of errors & costs (done in Slovenia) will be presented, starting with our early simple examples to most complex recent ones:

1) Study from 1995: a mail survey of a specific population of individuals (i.e. follow-up of students) with different numbers of follow-up contacts. RQ: How many follow-up contacts are enough?
2) Study from 1999: a survey of business companies on ICT. RQ: Which of the following survey conditions gives the minimum error at fixed budget: CATI, mail, fax, web with mail invitation and web with email invitation?
3) Study from 2008: a survey of general population on ICT. RQ: Does any of alternative strategies (CATI, web with different incentive conditions, mail) outperforms a standard face-to-face survey done by Statistical Office?
4) Study from 2009: a telephone survey of general population on ICT. RQ: Comparing the costs and errors of a conventional telephone survey and a mobile telephone survey using RDD dialling with a standard face-to-face survey done by Statistical Office – is there time for a replacement of the modes?

The results of all these studies show that the decision on the optimal survey implementation strategy is not straightforward and that several different design features and possible outcomes need to be taken into account (length of the questionnaire, mode, response rate, measurement bias). They also show that new approaches which are cost convenient (such as web surveys) do not necessarily lead to the optimum decision due to lower response and high bias.

Presenting the approach and giving examples on how to make informed decisions when deciding on an appropriate mixed-mode system contributes to the cumulating research of TSE, especially as regards the practical question about allocating resources for an optimal trade-off between survey errors & costs.

10. Minimizing Total Survey Error in an International Assessment of Adult Competencies

Leyla Mohadjer, Westat

The Programme for the International Assessment of Adult Competencies (PIAAC) is a multi-cycle international survey of assessment of adult skills and competencies sponsored by the Organization for Economic Cooperation and Development (OECD). PIAAC will collect information on skills required in the workplace, educational background, professional attainment, and the ability to use information and communications technology. In addition, PIAAC will include an assessment of cognitive skills to measure participants’ general levels of numeracy and literacy. In-person interviews will be used to complete the background questionnaire and to administer the direct assessment.

PIAAC has established an overall set of Quality Assurance (QA) and Quality Control (QC) procedures covering all aspects of the study to ensure the sources of survey variability are kept to a minimum and that the survey design and implementation processes of PIAAC yield high-quality and internationally comparable data.

PIAAC has evolved from two previous international literacy surveys. The standards and guidelines (QA procedures) developed for PIAAC are based on, and expanded upon, the standards developed for the earlier surveys. In addition, PIAAC is developing a comprehensive set of QC plans and procedures covering all aspects of the survey to help ensure national PIAAC surveys follow the international QA standards. This presentation will include a brief summary of the PIAAC standards and guidelines, and will discuss the quality control plans for PIAAC survey design, implementation, and analysis.

11. Choosing the number of call attempts to minimize the nonresponse bias under a RHG approach

Annica Isaksson and Peter Lundquist, Statistics Sweden

We are interested in the problem of choosing the number of call attempts, from now on referred to as the level of effort (LOE), in a telephone survey. We hereby build upon earlier work presented at ITSEW 2008. Our focus is on choosing the LOE in such a way that the nonresponse bias is minimized under a variance constraint. The bias for each LOE is estimated from process data. The choice of LOE is allowed to be influenced by two sources of error; the sampling error and the error due to nonresponse. (Potential measurement errors are ignored, since we believe that their impact on the choice of LOE is likely to be small.) Various costs associated with the data collection are taken into account. We choose LOEs separately for different response homogeneity groups (RHGs). The cost functions for different RHGs are based on historical process data. The suggested approach is illustrated with survey data from Statistics Sweden.

12. Quality Indicators and Survey Costs

Marina Jansson, Statistics Sweden

Statistics Sweden has started a project on how survey quality and cost-efficiency are affected by different choices of tools and procedures. The main project goal is to provide guidance on how to make an optimal allocation of the resources for the various survey processes based on a planning criterion involving total survey error..

As a first step, the project identified processes that can have a large effect on cost and quality, including data collection and data processing. The project also developed a list of indicators for quality and cost.

As a second step the project planned two studies related to the data collection process. The first one is a study concerning the effects of number of call attempts in a survey of individuals, and the second is a study on how to measure the cost of the reminder in combination with how the nonresponse rate is changing after the regular reminder has been issued in a business survey.

The results of the studies will be used to generalize to other similar surveys.

The project also attempts to define the customer quality requirements, how our quality declarations reflect those requirements, and how we account for and follow up costs associated with our work on commission.

We also conducted some interviews with senior methodologists at Statistics Sweden to get further ideas on quality and cost issues.

13. Survey Design Handbook

Lars Lyberg, Anders Holmberg, Eva Elvers, Bo Sundgren

We will follow-up on last year’s ITSEW presentation of the same topic. There is now a first draft of a survey design handbook for survey managers. The purpose of this document is to make it easier for the managers to make informed decisions about the allocation of resources in their surveys. The contents of the document are planning criteria, contacts with clients and other stakeholders, cost and error structures, handling trade-off situations and IT options.

14. A Study of Sources for the Error Structure in Estimates of Census Coverage Error Components

Mary H. Mulry
U. S. Census Bureau

A major goal and challenge for coverage measurement for the 2010 Census is to design a survey that measures the components of coverage error, namely erroneous enumerations and omissions. Previous coverage measurement surveys, including the 2000 Accuracy and Coverage Evaluation (A.C.E.) (U.S. Census Bureau 2004) and the 1990 Post Enumeration Survey (PES) (Hogan 1992, 1993) were designed primarily to estimate census net error using dual system estimation. Total error models guided the estimation and syntheses of sampling and nonsampling errors in the estimates of net error for both the 2000 A.C.E. and the 1990 PES.

The 2010 Census Coverage Measurement Program (CCM) incorporates new technologies and methods in an attempt to address the problems found in the 2000 A.C.E. The evaluations of the 2000 A.C.E. revealed errors in measurement, particularly the failure to identify substantial numbers of erroneous enumerations, including duplicates. The response to rectify the problems has led to the CCM person interview, follow-up interview, and matching operations becoming more complicated than for previous coverage measurement surveys. One reason for the additional complexity is the new requirement to identify each sample person’s correct location for enumeration, not just whether or not the sample block is the correct location. Also, measuring component errors has led to an expansion the definition of enumerations eligible for matching to include some records that do not have complete name and two characteristics.

Recent studies examined the error structure in components of census coverage error based on a poststratified estimator of net coverage error. With poststratified estimates, some of the error that is present in the estimate of erroneous enumerations and thereby the net error estimates may offset in the estimate of omissions. However, there is no offsetting in the estimate of the erroneous enumerations error component.

This paper further examines the error structure for estimates of components of census coverage error by identifying sources of the errors. In addition, the study investigates the measurement of the different error sources in a manner that does not double-count errors. A further refinement of the error structure for the estimates of component errors will provide insight for the design of a simulation to study their impact. Also, gaining more knowledge about the error structure will aid in avoiding errors in the new design of the 2010 CCM as well as providing guidance for an assessment of its quality.

15. Can Speech Disfluency and Voice Pitch Predict Item Non-response and Accuracy to Income Questions?

Matt Jans, University of Michigan, Michigan Program in Survey Methodology
mattjans@isr.umich.edu

This paper explores two error sources (item nonresponse and measurement error) that are influenced by interviewer-respondent interactions. The interaction between interviewers and respondents has long been a focus of multiple sources of Total Survey Error (Kahn & Cannell, 1956), specifically unit nonresponse (Oksenberg and Cannell, 1988; Groves,O’Hare, Gould-Smith, Benki,& Maher, 2008) and measurement error (Conrad, Schober, & Dijkstra, 2008; Fowler & Mangione, 1990; O’Muircheartaigh & Campanelli, 1998). This paper looks at item nonresponse and measurement error in the context of answers to questions about respondents’ income.

Theoretical motivation for this paper comes from previous work in survey methodology, psycholinguistics and social psychology. Voice pitch and speech disfluency have been linked to survey error (nonresponse and measurement error) and also to internal psychological states (cognitive and emotional). These psychological states, evidenced through speech and voice, are hypothesized to be key variables in the production of item nonresponse and measurement error. Interviewer voice pitch has been related to unit nonresponse (Oksenberg & Cannell, 1988; Groves, Benki, etc), but has not been studied in item nonresponse. Speech disfluency has been linked to measurement error (Conrad, Schober, & Dijkstra, 2008; Schober & Bloom, 2004). The relationships between pitch, disfluencies, and income nonresponse and accuracy have not yet been tested in the literature.

Income nonresponse and inaccuracy are major data quality problems for survey researchers, so it is important to develop a more complete understanding of the mechanisms behind these errors. Research in psycholinguistics and social psychology suggests that general cognitive difficulty can be linked to speech disfluency (Schober & Bloom, 2004), and anxiety can be linked to variation in voice pitch (Bachorowski, 1999). Deception can be linked to both (DePaulo, Lindsay, Malone, Muhlenbruck, Charlton & Cooper, 2003). Each of these psychological states or processes has been suggested as a mechanism behind income reporting, but these hypotheses are difficulty to test without direct access to respondents’ psychological processes. In this research, voice and speech serve as proxy variables for internal psychological states and processes.

Data come from an RDD telephone survey (Survey of Consumers , SCA) and a mixed mode (face-to-face and phone) survey (Health and Retirement Study, HRS), both conducted by the University of Michigan’s Survey Research Center. The outcome variable examined in the SCA data is household income nonresponse, for which there are three levels: an exact dollar-amount, a bracketed-amount (such as “between $25,000 and $30,000”), and nonresponse. The outcome variable that examined in the HRS is previous month Social Security income accuracy for which records are available to validate amount of Social Security income received by each respondent.

Initial results from the SCA data suggest that both interviewer and respondent voice qualities and behaviors are related to the type of income response. The primary analysis will consist of latent variable models modeling hypothesized error mechanism of cognitive difficulty and anxiety. Further, this study illustrates the use of two software programs that are gaining popularity in survey methodological research; Sequence Viewer (for utterances level coding of interviewer-respondent interaction) and Praat (for acoustic analysis of voice).

16. The different roles of interviewers: How does interviewer personality affect respondents’ survey participation and response behavior?

Michael Weinhardt (SOEP – DIW, Berlin), mweinhardt@diw.de - presenting
Frauke Kreuter (JPSM University of Maryland, College Park), fkreuter@survey.umd.edu

In household panel surveys, such as the German Socioeconomic Panel Study (GSOEP), interviewers play a prominent role in securing cooperation--cooperation with the survey request itself and continued cooperation throughout the interview. However, only a few studies have focused on the influence of the interviewer's personality on response behavior. Most studies of interviewer effects on non-response and measurement error have had to rely on data provided by fieldwork agencies to relate interviewer characteristics to respondents' data. In December 2006, a survey of all current interviewers of the GSOEP was conducted (N=586). 94% of all interviewers responded to the 10-page paper questionnaire, including self-rating measures of attitudes, values, beliefs, and personality characteristics in exactly the same question format used in the GSOEP respondent's questionnaires. The interviewer questionnaire also contained a 15-item measure of the 'Big Five' personality traits: openness, conscientiousness, extroversion, agreeableness, and neuroticism. With this data, it is possible to examine effects of the interviewers on survey participation and response behavior by linking survey data from the interviewers with household and individual level information on respondents. In a multilevel logistic regression model predicting item nonresponse on income questions, extroversion in interviewers increased the chance of item non-response significantly. Further analysis will also look at the interaction effect between respondents' and interviewers' personalities on this measurement error, and will examine the effect of interviewer personality traits on response to the survey request. Results will be discussed in terms of their potential usefulness for recruiting and training of interviewers, highlighting the tension between ideal interviewer characteristics for recruitment vs. those ideal for the interview itself.

17. Nonresponse and Measurement Error in Employment Research

Frauke Kreuter
JPSM University of Maryland College Park, USA

Gerrit Mueller, Mark Trappmann
IAB Institute for Employment Research, Germany

Survey methodologists are increasingly concerned with the interaction of multiple error sources. Particularly prominent are discussions about nonresponse and measurement error. One hypothesis that is often found among practitioners is that sample cases that are brought into the survey only after repeated attempts and alternated recruitment strategies, are more likely to provide low quality data (e.g. Groves and Couper 1998). Data quality is often internally assessed through the proportion of missing items, proportion of don’t knows and the like (e.g. Fricker 2007). Rarely, in these studies, are external data available to evaluate the quality of respondents’ answers (e.g. Cannell & Fowler 1963, Olsen 2006).

The panel study PASS (Trappmann et al. 2009) is a novel dataset in the field of labor market, welfare state and poverty research in Germany. With almost 19,000 interviewed persons in more than 12,500 households, PASS is currently one of the most comprising panel surveys in Germany. The first round of data collection started in 2006. In PASS, survey data on the employment and unemployment history, income and education of participants can be linked to corresponding data from respondents' administrative records. Furthermore, the distributions of these variables in the sampling frame are known.

Based on this study, we give an assessment of data quality as a function of contactability and response propensity. Only for some variables, the measurement error (variance or bias) assessed through the administrative records is increased with decreasing contactability and response propensity of the target persons. In particular, this is found in case of retrospective questions. Here, the differing length of time between date of interview and event explains a large part of the difference in measurement error between respondents with high vs. low response propensity. Given this finding we decompose total absolute bias into contributions due to nonresponse bias and measurement error bias and evaluate how all three are affected by successively bringing respondents with low response propensity into the sample.

18. Modeling Multiple Sources of Survey Error in Physical Activity Data

Nick Beyler*^ (email: beylern@iastate.edu), Sarah Nusser* (email: nusser@iastate.edu), Alicia Carriquiry* (email: alicia@iastate.edu), Greg Welk# (email: gwelk@iastate.edu)
*Center for Survey Statistics and Methodology, Department of Statistics, Iowa State University
# Department of Kinesiology, Iowa State University
^Contact author

Physical activity measurement in large-scale surveys relies on respondent recall of physical activity from the previous 24 hours or even longer periods of time (e.g., previous week or month). Studies show that recall instruments, compared to more objective reference instruments (e.g., accelerometers, armband monitors, doubly labeled water), measure physical activity with considerable measurement error. However, most studies fail to investigate other forms of survey error like nonresponse error and coverage error. We are designing a survey that offers the opportunity to investigate the relationship between measurement error and nonresponse error in physical activity data. Survey participants will provide replicate measurements of physical activity from both a 24 hour physical activity recall and an armband monitor. Paradata related to recruitment efforts will also be available. We will discuss possible models for multiple sources of error using data from this survey. Measurement error (bias and variance) can be modeled using the recall and armband monitor measurements, and response propensity using data on effort required to obtain survey participation. In particular, we want to explore whether measurement errors and propensity to respond are individually or jointly dependent on physical activity and/or other factors such as demographic characteristics. If a second sample from a more complete sampling frame becomes available, we may also be able to investigate coverage error.

19. Using the TSE Framework in Legal Proceedings

Paul J. Lavrakas, Ph.D.

Survey evidence is used in legal testimony and for legal proceedings for a variety of purposes and with growing frequency. However, it is conspicuous that the Total Survey Error perspective has not as yet become the “standard” framework used to guide and structure the development and presentation of such evidence.
This paper will address the use of the TSE framework to structure my work in four legal proceedings. In each of these cases, multiple aspects of TSE came into play in how I wrote an expert report, how I critiqued a survey done (or proposed to be done) by opposing experts, and/or how I planned an original survey to gather key evidence to be used in the legal proceedings.

In the first case, I served as an expert to the Office of the Illinois Attorney General and used TSE to critique a research study conducted by a very prominent psychologist that was being entered in federal court as evidence to overturn a death penalty conviction. The major problems that I noted in my testimony were related to Sampling Error, Nonresponse Error, and Measurement Error. (The case was decided in favor of the state attorney during an appeals stage.) In the second instance, I served as an expert to the Office of the New York Attorney General and used TSE to critique a new survey research methodology that was to be implemented by a major media research company. Here, I wrote an expert report which the state attorneys used to make civil rights claims against the company’s methodology. I structured my report so as to identify serious Coverage Errors, Nonresponse Errors, and Measurement Errors in the new methodology as it applied to the mis-measurement of Asians, Blacks, and Hispanics. (The case was settled out of court in favor of the state attorneys and included an agreement that the company would fund a nonresponse bias study.) In the third case, I used TSE to conceptualize and manage the implementation of a large general population survey to meet the evidentiary needs of the federal government attorneys who were planning a suit against a large media company. In planning the survey I specifically chose survey methods that would hold up to future expert scrutiny from the standpoint of Coverage Error, Sampling Error, Nonresponse Error, Measurement Error, and Adjustment Error. (The outcome of that suit is pending.) In the fourth case, I am using the TSE framework to write an expert report to critique a survey being proposed to a federal court as gathering valid evidence to support a class action suit. The opposing researchers in this case appear unaware of the many sources of bias and variance that are readily identified within TSE and thus their survey design is highly flawed.

As part of my presentation, I will highlight how I used TSE in each case and will comment on how I presented TSE in the legal proceedings. I also will call attention to an important aspect of validity that is not encompassed by TSE: the issue of whether or not a survey has gathered experimental evidence to support causal reasoning.

20. Measuring Potential Error in Population Estimates for Local Government Areas of England and Wales

Joanne Clements, joanne.clements@ons.gsi.gov.uk
and Ruth Fulton, ruth.fulton@ons.gsi.gov.uk
Office for National Statistics, UK

The Office for National Statistics has established a project to improve their understanding, measurement and reporting of the quality of the mid-year population estimates. Additionally, there are a number of planned and implemented methodological improvements to mid-year population estimates for local Government areas (Local Authorities of England and Wales). Objective quality measures indicating total potential error for existing and the revised population estimates will improve evaluation of the new methods.

Potential error measures are not straightforward to produce because the population estimates are derived from multiple data sources (Census, administrative and survey sources) using a variety of estimation procedures. There is also a lack of independent data with which to corroborate the estimates, because any independent sources are of uncertain quality themselves with no total error measures.

Measures of potential error can be achieved by initially mapping out and describing the sources of potential error (quality issues) associated with each of the data sources and methods used, estimating the potential error for each quality issue and developing methods for combining these into a total quality measure of potential error.

This paper will detail
• the work completed to date, including the overall theoretical approach, the simulation methodology adopted for measuring total potential error and the composite quality measures considered
• discuss issues to consider when using the proposed methodology in practice to measure total error
• an outline of the potential error issues which have been identified relating to Internal Migration within England and Wales. The current proposals for investigating each of these issues, plus initial findings from this research.

Benefits from this work include improved information for users of population statistics and objectives measures for evaluation and quality assurance.

21. Coverage Rates and Coverage Error When Interviewers Create Frames

Stephanie Eckman

Area probability surveys in the U.S. devote considerable resources to housing unit listing, yet the error properties of this method of frame construction are poorly understood. For too long, the high quality of these frames has been an assertion rather than a conclusion of careful research. My dissertation will first explore the extent of undercoverage and overcoverage on housing unit frames, then test hypotheses about the mechanisms of lister error and finally estimate the effects of these lister errors on coverage bias in survey estimates. To address these aims, I have two datasets: one from the Census Bureau and another I am collecting with interviewers from the National Survey of Family Growth. Both use repeated independent listings to create a gold standard frame of housing units. I expect to see more errors of both undercoverage and overcoverage in segments where there is a lot of crime, where the language is not one the lister speaks and when the lister drives rather than walks. I suspect that listers who update an existing frame commit errors on confirmation bias, such as we see in dependent coding. Finally I wonder if interviewers who create frames from which they will later be interviewing tend to undercover units that are likely nonrespondents. If correct, this last hypothesis would demonstrate a connection between two important components of total survey error. I would appreciate feedback from those at the conference on my design, hypotheses and analysis plan.

22. Using Substantive Diagnostics to Evaluate the Validity of Latent Class Indicators of Measurement Error

Clyde Tucker and Brian Meekins, U.S. Bureau of Labor Statistics
tucker.clyde@bls.gov

Paul Biemer, RTI International

Latent class (or structure) analysis (LCA), a theory for detecting unobserved variables, was developed by Paul Lazarsfeld (1950). According to Lazarsfeld, an unobserved variable (measurement error in this case) could be constructed by taking into account the interrelationships among observed or “manifest” variables. The mathematics underlying this theory was extended by Lazarsfeld and Henry (1968) and Goodman (1974). Over the last twenty years, survey methodologists have used LCA to study measurement error or response error in surveys ((Van de Pol and de Leeuw 1986; Tucker 1992;Van de Pol and Langeheine 1997; Bassi et al. 2000; Biemer and Bushery 2000; Biemer and Wiesen 2002; Tucker et al. 2003, 2004, 2005, 2006, 2008).

The concept underlying LCA is relatively straightforward. The idea is to find a latent variable that explains the relationships between the observed variables. Thus, statistically speaking, the relationships between the observed variables disappear after conditioning on the latent variable. In the case of measurement error, the observed variables are often repeated measures or a set of indicators that measure various aspects of respondent performance. Maximum likelihood estimation, using an EM algorithm, is used to identify the latent model. A chi-square test is used to measure the goodness of fit.

The problem comes when trying to interpret the results. Although the mathematics are understandable, the estimation procedure does operate something like a “black box.” While the fit may be good from a statistical standpoint, the question remains as to whether valid conclusions can be drawn about the structure of measurement error in the data. Much of the answer to this question lies in having strong theoretical reasons for the choice of observed variables, understanding what their interrelationships say about measurement error, and being able to see how this information is captured in the latent variable. In the latter case, substantive, and not statistical diagnostics, are essential.

This paper will discuss ongoing work involving the use of latent class models to estimate measurement error in the reports of expenditures in the Consumer Expenditure Survey Program (CE). The paper begins with a brief description of the Consumer Expenditure Surveys sponsored by the Bureau of Labor Statistics (BLS) and conducted by the Census Bureau. The main body of the paper provides examples of increasingly complex latent class models of measurement error in consumer expenditure reports. The paper does not focus so much on the estimates of the errors themselves but, instead, on the methods for creating and using substantive diagnostics to evaluate the validity of the error measures produced from LCA prior to their further use in the analysis of the underlying causes of the errors. That is, an attempt is made to unlock the “black box” of LCA. In the process, the authors also discuss how the observed or manifest variables were chosen and some of the problems encountered with using latent class methodology in a variety of circumstances.

23. A Structured Approach to inference from Internet Access Panels

Yehuda Dayan, Ipsos-MORI and London School of Economics

Internet Access panels are a dominant platform for commercial survey based research in the fields of advertising, brand loyalty and marketing. However, they have failed to establish themselves as a reliable source in areas where the focus is on estimating descriptive statistics such as Media measurement and Political polling.

It has been recognized from the outset that such panels suffer acutely from errors associated with the representativeness of the sample, primarily coverage and self selection, although increasingly also from survey non response error- panellist response rates are currently averaging only 30%.

In this paper I first present a conceptual frame work that deconstructs the access panel survey sampling process into two major design stages, the Panel Assembly Stage and the Survey Sampling Stage. Notably, within the Panel Assembly Stage we treat separately the process of a household connecting to the Internet and the mechanism of joining an Internet access panel.

An immediate benefit of this framework is that it allows the model specification of the statistical adjustment to be error specific and so to be informed by established theories of ICT acceptance- for coverage error, as well as panel and survey participation theories- for the self selection and survey non response errors.

Based on this framework we suggest a model that expands the classic two phase sampling approach to the survey non response problem (Sarndal and Swensson 87), to a Three
Phase Sampling Model- a Panel assembly phase followed by Sampling and Response set phases.

Similar to the view of Couper and Groves (98), we expect that all well-specified response propensity models to have different functional forms. In our case, for the Panel assembly component, we expect that the causes of ICT acceptance and panel volunteering are different and even contradictive. This view is supported with empirical results, for example the positive correlation between Education and ICT Acceptance propensity versus the (conditional on Internet connectivity) negative correlation between Education and Panel Participation propensity, this is further supported by established theories such as the Technology Acceptance Model (TAM) and Theory of Planned Behaviour (TPB). Similarly we believe the tendency of the literature and of professionals to lump these phenomena together (along with the large non response error) into one post survey adjustment model is probably one reason for the failure to achieve good, or at least consistent, estimates through the Internet Access Panel platform

Estimation of the panel assembly propensity mirrors the two consecutive stages- (a) ICT acceptance followed by (b) panel volunteering (using an approach suggested by Couper and Groves (98)). The sampling design probabilities and a survey specific non response model are both conditional on the estimated panel assembly propensity. To add robustness we introduce a regression estimator that calibrates the HT estimator using known population, panel and sample auxiliary information each relating to one of the three phases of the survey process.

In the presentation we show initial results of an application of the model to the UK case. For the specification of a Panel Assembly propensity model we require three sets of data with overlapping auxiliary data- (1) a large representative sample of the target population- the general adult UK population, (2) a large representative sample of the Online connected segment of the target population and (3) an Online access panel. Using the Ipsos Interactive Services (IIS) panel and the National Readership Survey (NRS) we construct a panel assembly propensity model by estimating consecutively both components- ICT acceptance and volunteering. We test its effectiveness separately on a media and political survey applying the relevant three level auxiliary data through regression estimation as well as a survey specific non response model.

Some References
Sarndal, C. E. and B. Swensson (1987). A general view of estimation for two phases
of selection with applications to two-phase sampling and nonresponse. International
Statistical Review 55 (3), 279–294.

Groves, Robert M. and Mick P. Couper. 1998. Nonresponse in Household Interview Surveys. New York: Wiley.

24. Total Survey Error: Past, Present, and Future

Robert M. Groves, University of Michigan and Joint Program in Survey Methodology
and Lars Lyberg, Statistics Sweden

Total survey error is a conceptual framework or model describing statistical error properties of sample survey statistics. Early in the history of sample surveys it arose as a tool to focus on implications of various gaps between the conditions under which probability samples yielded unbiased estimates of finite population parameters and practical situations in implementing survey design. While the framework has components that permit design-based estimates, many of the design burdens to produce those estimates are large, and in practice most surveys do not implement them. Further, the framework does not incorporate all sources of quality that are commonly utilized in statistical information. Thus, many components of the total survey error framework are practically measurable only with model-based approaches. The importation of new modeling tools brings new promise, but also new challenges. A lasting value of the total survey error framework is at the design stage of a survey to attempt balance of costs and various errors. Indeed, this framework is the central organizing structure of the field of survey methodology.

25. The Role of Total Survey Error in Business Excellence Models Applied in Statistical Organizations

Lilli Japec and Åke Pettersson, Statistics Sweden

Business excellence models such as the Malcolm Baldrige Award and the European Foundation for Quality Management (EFQM) are increasingly used in statistical organizations as a means to assess the organizations’ performances across various criteria or dimensions. For instance, the basic Malcolm Baldrige criteria are leadership, strategic planning, customer focus, measurement, analysis and knowledge management, workforce focus, process management, and results. For each criterion an organization assesses its status by defining approaches used, how widely these approaches are implemented throughout the organization and how they are evaluated. The evaluation’s ultimate goal is that approaches should be continuously improved so that the organization ends up using best possible or world-class approaches.

A minimal total survey error given constraints such as budgets and fixed product characteristics is an important planning criterion in statistical organizations. A minimal total survey error is achieved not only by choosing proper methods and efficient trade-off strategies in the design phase but also by using good approaches regarding the excellence criteria mentioned. For instance, overarching issues such as developing and maintaining relevant competence among the work force, fostering a climate of process control and improvement, and implementing scientific working standards are all important when it comes to designing individual surveys. In this presentation we will provide examples of approaches defined within a business excellence framework that can have a positive effect on the total survey error. We will use EFQM as an illustration since that is the model used by Statistics Sweden.

 

Back to ITSEW 2009 Home page