COMM650/Class notes
From Driscollwiki
Dr. Sheila Murphy
- COMM650
- Office hours: Monday 11-1
August 23
Who am i?
- SM came to ASC directly out of PhD
- Straight thru U of Michigan (under thru phd)
- Institute for Social Research, UMich
- Destination for lots of exiled European scientists during WWII
- Many stopped at MIT first and moved to UM
- Highly quantitative program
- Destination for lots of exiled European scientists during WWII
- Trained in experimental design
- In addition to survey, interview, focus group, etc.
- Still using "quasi-experimental design"
- "If you can put together a good survey, you'll never starve"
Surveys in Annenberg
- Need to practice
- Goal of this course, to assemble a survey you actually hope to conduct
- Draft of survey due at end of class
- 10 min presentation regarding the survey
5 smaller assignments throughout semester
- All posted on Blackboard
Guest speaker, Sep 20th
- G/Jerry Power
- Former Annenberg grad
- "Non-traditional" careers
- Worked first at Magid (sp?) and Associates
- Media research firm, TV
- Moved to BBC World Service Trust (non-profit)
- Large-scale interventions
- Teaching people how to use media
- Inserting health, governance info into the media
- Moved again to Intermedia (US-based)
- Also giving the lunch talk on Sep 20
History of surveys
- Egyptian monarchs used surveys for taxation
- 1790: U.S. conducts first Census, repeat every 10 years
- 1889: "Life and labour of the people of London"
- Charles Booth
- More than "just a headcount"
- 1920: Introduction of standardized survey
- Previously, interviewers could word and sequence questions however they saw fit
- 1930: Rensis Likert developed a scale for measure
- Previously, questions tended to be comparative
- e.g. 20 types of soda, comparing 2 at a time, very time consuming, complicated results
- Interval scale
- Reduce number of questions without sacrificing accuracy
- Previously, questions tended to be comparative
- 1946: Likert became head of Survey Research Center (SRC), UMich
- Primary peer: North american Opinion Research Center (NORC) at UChicago
- Late 1940s: George Gallop, young sociology student predicted outcome of 1948 presidential election
- Literary Digest, best known poll of the time, 6mil reader responses + telephone interviews
- Gallop identified selection bias: who is a subscriber? Who has a phone?
- Leading to new thinking around selection and sampling
Total survey error paradigm
- Assumption: Accept that truth exists
- Researcher's job is to represent or describe that truth as accurately as possible
- Many types of error threaten this task:
- Measurement error
- Sampling err
- Response err
- Nonresponse err
- Processing or coding err
- Statistical err
Initial steps constructing a survey
- Decide what you need to know
- Formulate a problem statement (either a research question or hypothesis)
- Research Question:
Is there a relationship...?
- Hypothesis: suggests a direction to the relationship
- Decide how to measure or operationalize your constructs
Example Hs, RQs
"The mass media portrays women negatively" (broad starting point)
- RQ: Is there a relatinoship between stereotyped media portrayals and the self image of young female viewers?
- H: Stereotyped media portrayals of women cause a more negative self image among young female viewers.
Still too broad
- "... portrayals in primetime dramas/telenovelas/reality TV/etc ..."
- Seek existing scales for measuring "self image", e.g. Rosenberg's self-esteem scale
Internal validity
Construct validity
Extent to which the measure is related to the underlying construct
- Discriminant validity: Extent to which a measure distinguishes between individuals who do and do not have certain characteristics
- Convergent validity: extend to which a new measure correlates with other previously validated measures
- Content validity: extent to which a measure thoroughly assesses a particular domain of content
- Face validity: how compelling the measure is on its face? Person on the street? "Common sense"?
- Criterion validity: is validity compared to other existing criterion outside of the instrument
- Predictive validity: extent to which forecasts future performance
- Concurrent validity: extent to which predicts score on an established criterion measure administered at same time
External validity
- Generalizable to the wider world?
- Often the weakness of lab studies, student-sampling (freshman psych students)
Reliability
Reliability = true score/(true score + error)
Types of reliability
- Test-retest reliability: same people, same result?
- Parallel forms reliability: are diff forms equivalent? (i.e., SAT, diff q's, same test)
- Split half reliability: half items on one version, half on other
- Not typically a good idea
- Often used when survey is too long
- Internal consistency reliability: Do items assess one and only one dimension
- Interrater reliability: Is there consistency between raters (e.g. in a content analysis)?
- (Number of agreements)/(Number of disagreements)
Jump offs
- Computer-Assisted Telephone Interviewing Systems (CATI)
Aug 30
while you and i have lips and voices which are for kissing and to sing with who case if some one-eyed son of a bitch invents an instrument to measure spring with -- e. e. cummings
e.g Identify factors that predict intervening in spousal abuse
- You COULD ask someone to respond on a 5-pt scale, how likely are you to intervene...?
Hypothetical scenario
- Present a hypothetical scenario about a couple who suspects that there is abuse in a neighboring apartment
- Ask about the characters:
- Should the people intervene?
Defining spousal abuse
- Be specific, what is of interest in "spousal abuse"
- Only physical abuse?
- Only between married (not cohabitating) couples
- Only male on female
- Write up a def: "Physical abuse by a husband such as .... meant to cause physical pain on his wife."
Defining intervention
- "Engaging in one or more of these behaviors."
- Should Nina ... ?
- Talk to abused spouse
- Talk to abuser
- Offer abused spouse a place to go
- Offer abused spouse other resources like money
- Try to physically intervene the next time it happens
- etc. (Are these all the interventions we need?)
- We could make an additive score
- 1 point for YES
- 0 for No
- We could make a weighted score
- Some interventions are worth more points than others
- We could also offer a Likert scale rather than a yes/no
- E.g. Talk to abused spouse, "Very likely, likely, somewhat likely, unlikely, very unlikely"
How do we know that this measures a single construct?
- We want all of these to combine into one construct
- Factor analysis, Cronbach's alpha
- Perhaps there are two or more subgroups that "hang together"
"Fence-sitting"
- Weakness of using Likert response scales
- People might only want to answer in the middle for each intervention
Variable
- Measurable counterpart of a construct
- Manifest variables are variables that have obvious measurements available
- Latent variables are more difficult to measure
- e.g. intelligence, aggression
Dependent
- Variables being explained or predicted
Independent
- Vary naturally or through some sort of manipulation
Background variables
- Past behavior
- Demographic variables
- Attitudes toward targets
- Personality, moods and emotions
- Other individual differences such as perceived risk
- Exposure to a media campaign
Fishbein and Capella's integrative model of behavior
- Full model measures behavior
- Final product may be attitudes
- But people also use open-ended questions, focus groups to get preliminary info on attitudes
- Behavioral intent is imperfect but the best we have
Reality Isomorphism or "Fit"
- Measurements should be taken in a structure similar to off-line reality
Levels of measurement
- Progressive
- Each level has all the benefits of the former plus additional ones
- In general you want the highest level possible without sacrificing fit
Nominal
- "Name"
- Doesn't matter what number you assign to it
- Male/female, 0/1, 1/0, doesn't matter
- No particular order
- Nominal categories should be
- Exhaustive: account for all possible responses
- Exclusive: responses fall into one and only one category (including "other")
- "Please specify: ____" might accompany "Other"
- Which might inform future versions of the survey
Ordinal
- Relationship between categories
- You might order them
- But the distance between the choices is not equivalent
- e.g. Ranking football teams where team 1 is much better than team 2 and 3
- People have a tendency to treat these as they are interval
- Often possible to take something ordinal and bump it up to interval
- Preferable!
Interval
- Feels ordinal (low to high, high to low) but
- Assumes equal or roughly equal distance between each rank
- Most common level of measurement used in the social sciences
- Most common interval scale is the Likert scale
- Similar to "bipolar" scale
- Tend to either offer choices:
- Strongly disagree, disagree, ..., strongly agree
- Very minor 1 2 3 4 5 Very severe
- Could also be smiley faces (D:, ):, :|, :), :D)
- Advantage of a ten-point scale (1-10)
- Discourages fence-sitting
- May capture more slight variation
- Ideal scales might be:
- 7pts, all labeled, for bipolar
- 5pts, all labeled, for unipolar
- "Semantic differential scales"
- Good ... Bad
- Weak ... Strong
- Boring ... Interesting
Ratio
- Natural zero point signifies the complete absence of the variable being measured
- Can assume that the numeric values are ratios of one another
- Most physical dimensions can be measured in ratio scales, much rare that nonphysical variables such as attitudes can be measured this way
Scale or index
- Scales are groups of individual questions or items that all try to measure the same underlying construct
- Advantage of scales is that if they are measured on the same interval scales, you can group them into a single construct
Pre-existing scales
Advantages:
- Validated, "bugs out"
- Can make direct comparisons between your population and what others have found
Disadvantages:
- Constrained from changing the wording on a scale
- In general, you should use a scale in its entirety
- Or use complete scale followed by additional items that interest you
7 decisions to make when using scales
- Number of response options
- Labeling of response options
- Physical format of the scale
- Balanced v unbalanced
- Balanced scales have equal positive/negative choices (bad/ok/good)
- Unbalanced scales don't and are not interval scale
- Odd v even number of categories
- Do you want a midpoint?
- Often you get fence-sitters with a midpoint
- Forced v nonforced choice
- If you offer a "don't know", you may get a lot of these responses
- Some people believe that it's important to provide these options so that people don't just pick the midpoint and skew the results
- Social desirability (or the extent to which people are willing to tell you the truth)
- If people are lying because they think that they "should", we have a problem
Reducing social desirability
- Online survey may reduce SD bias compared with in-person or phone survey
- Reduce "judgemental" aspect
- Anonymity
- Confidentiality
- Word questions according to "other people", "most people"
- Use indirect or nonself-report measures
- Observation, tracking, response time, software assistant
- Potential probs: observation apprehension, demand characteristics, internal validity
"Simpatico"
- People try to please the survey taker
- One way to circumvent this is to use a bigger scale
- If they are always choosing 6 and 7 on a 7pt scale, expand to 10pt scale
Response time
- Do you notify the respondent?
- How do you make meaning?
Sept 13, Question ordering, biases, and ANHCS
- Academic v. non-academic survey research
- "Audiencescapes"
- Descriptive statistics,
- Inferential statistics
- Ev Rogers Award, annual award for people who produces educational entertainment
- Martine Bowman from U of Amsterdam
- Next Wednesday
Information processing
- 1970s, "cognitive revolution", "computer metapor of the mind"
- Could we model the mind in software, input/output
- Applying these ways of thinking (psych, info processing) to survey construction
- Notable practitioners: Sudman, Bradburn & Schwartz; Roger & ...geau
Thinking about answers (S, B, & S)
Suggest the process of answering a question can be broken down into at least 4 separate processes:
- Comprehension
- Retrieval
- Forming a judgement
- Editing the answer (for social desirability, or to match response options)
Methods access different stages
- "Retrospective thinkaloud" verbal protocols
- But can people access/articulate their own thought processes
- Dick Nesbit, Wilson, "Telling more than you know".
- People come up with all kinds of reasons for doing something beyond what they evidently did
Retrieval of autobiographical memories (Ch 7, S, B, & S)
Autobiographical memories have 3 components:
- Personal memories - visiting the Eiffel tower
- Autobiographical fact - city where born (no actual personal memory)
- Usually pretty accurate
- Generic personal memory (driving to USC - no specific memory but composite)
- Generally most difficult, nothing particularly notable
Estimation is particularly problematic when the event is ...
- Frequent
- Has occurred for a long time
- Happened in the distance past
- Not distinctive
- Mundane or unimportant
Cannel, Miller and Oksenberg (1981)
Proposed 2 main routes to answering survey items:
- A high road (careful "optimizers")
- A low road (sloppy, superficial, use heuristics)
- This "low road" is the same as the "satisficing" proposed by Krosnick and Tourangeau (based on Herb Simon's 1957 concept of satisficing)
Dealing with satisficing
Be specific!
- Ask about narrow bounds rather than general
- Think about your "most recent [sex partner]"
- Frame the question in terms of closer bounds
Satisficing
- People give you quick'n'dirty
- Simon won Nobel Prize
- Big problem for survey researchers
More potential biases
Generic memory
- R will generate a typical instance as opposed to an actual one
Retrospective bias
- R will color the past to match current attitudes
How to reduce these potential biases
Note: see Groves Table 7.1
- Supplement R recall with available records
- Bank, phone, GPA, IQ, perhaps not from R themselves
- Cues (what, who, where)
- Taking more time (slow pace of interview)
- Diary methods
- (Smith, 1991 showed poor match between diary and recall. Although diary is generally considered to be more accurate it is still subject to fatigue, social desirability, etc.)
- (i.e., Nielsen diaries v. Peoplemeter)
- Experience Sampling Method (ESM)
- Avoids recall problems altogether by collecting concurrent reports at random moments in time using pager or phone
- Cons: expensive and places considerable burden on researchers as well as respondents
- Used in MSM research
- The Day Reconstruction Method (DRM)
- Cheaper approximation to ESM
- Call in only on days where event occurred - report subject id number and describe event
Proxy reporting
- Single household member asked to report on behavior of entire household
Pros
- Doesn't overrepresent large families with more members
- "Household" is more common unit of analysis than individual
Cons
- Not all members can report on other family members
- e.g. teenage porn viewing (DiClemente)
- All members may not be equally able to report the information you want
- For example to get the best estimates of grocery purchase you probably want the grocery shopper (not always mother)
- Not everyone in the family knows the family finances
Order effects
Response order
- Due to limits in working memory (7 +/- 2)
- First (primacy) and last (recency) response options have an advantage
(But recent work by Bishop and Smith that looked at split ballots conducted by Gallop across the 1930s to the 50s and showed that the response order effect size was probably no more than 5%)
- You can randomize the order of responses
Sequence bias
- Are you priming a response?
- Questions that might influence responses to subsequent questions should be placed toward the end
- If you are unsure whether or not one question will influence the answer to a second question, you can
- Employ split-half design
- Half don't get the question at all
- Now you can comare and find the effect
- Counter-balance
- Some get it with the question first, others get the questions later
- Employ split-half design
Adopt a general organization pattern that compliments your objectives
Two general patterns
Inverted funnel
- Ask super specific question right out front
- Fans of this believe that there are fewer order effects from narrow to broad than reverse
Funnel
- 90% of the surveys use this pattern
Softballs First several questions should be easy to answer and nonthreatening
- Questions that are difficult, time consuming, or embarrassing should come near the end
- Income almost always comes last, most likely the primary hang-up, break-off question
- Demographic questions usually come at the end
Topically related questions should be grouped together
- Make it feel like a conversation
Questions should be ordered in such a way as to minimize response bias
- Such as yea-saying, nay-saying, or fence-sitting
- Yea-saying: saying yes to everything
- Nay-saying: saying no to everything
- Fence-sitting: saying no to everything
- Some psych scales have questions to filter for yea/nay
- "Do you make all your clothes?"
- Create two questions that contradict if they both have yay
- Keep relatively short
- Reverse some items
"Filter questions" and "skip patterns" should be specified so that respondents are not asked irrelevant questions
- Must not violate rules of convo
- If people have no children, don't ask childrens' names
- Easy on qualtrics
Screener questions up front
- Eliminate ineligible respondents right off the bat
Otherwise demographics at end to avoid breakoffs
- Income last
- Most people don't want to be in the lowest
History of ANHCS
- Monthly survey instrument
- http://anhcs.asc.upenn.edu/
The Core
- First 20 minutes of the survey
- Contains items measuring media use and health behaviors, knowledge and beliefs, as well as health policy
- Remains fairly constant over time
The modules
- Build off of the core
- The last 5 or so minutes of the survey
- Available to Annenberg faculty and graduate students for health related research
- On a competitive basis
Additional ANHCS features
- Random assignment of respondents to different conditions
- Present images and stream short video clips
- Leverage ANHCS data by paying Knowledge Networks to administer a 2nd survey and have them combine the data from ANHCS and the second survey into a single datafile
Sept 20
- Visit Gerry Powers
Takeaways
- Threats to internal validity:
- Referring to "access to information" when yo mean "access to media"
- Or access to "information" when you don't know what's quality
- DHS survey asks: Do you have radio/tv?
- No other info about signal, reception, preferences, habits
Sample scenario
- National survey in Angola
- Baseline to understand the gaps, knowledge, practices in HIV/AIDS
- Arrive in Luanda, Angola
- No census, 20-30 yrs out of date
- Many many borders, corridors for HIV transmission
- My central interest: knowledge about HIV
- Logistically difficult to reach certain groups
- e.g. in Cambodia budget included elephants to reach very remote people
- Assume that all the work is face-to-face
How to manage border communities?
- People are Angolans but they live on the borders of Zambia
- What if they live across the border?
- What if they have migrated because of conflicts?
- Who is included in the sample? Who is not?
How to achieve more and better agriculture coverage in African media?
- How do you begin? Where do you start?
- Examine existing content?
- We want to know about coverage?
- Start with media professionals?
- Diasporic reporting happens elsewhere
- "News production environments" vary considerably
Sampling in multi-lingual situations
- Contacted all BBC bureaus, stringers for names+numbers
- Snowball before first contact
- Needed 10 contacts for 4 interviews
Translation? Seven target languages
- Original survey in English
- How to get this surveyed?
- Translate it and then back-translated
- "Verbatims" in 7 languages and translated back into English
- 10 week turn around
Limits of comm theories and methods developed in Western tradition?
- What biases, assumptions will creep up in new contexts?
- Social construction in methods
- "The Social Life of Methods", conference at Hughes College, Oxford
- Presentation by someone from b-school at U of Copenhagen
- Seeking influence of religion (Catholicism, Protestantism) on research methodology
- Linguistic, epistemological issues
- One, two, many...
- Very practice of sharing thoughts with a researcher
- Roots in confessional
- Placing value on individual opinion, experience
- Perhaps giving an opinion is an issue of targeting
Example: universal education for girls research
- In Somalia
- No Somali women on the team
- Research conducted by huge 350lb Somali man
- But actually the women were more likely to open up to a man
- Because the man talked about it, he normalized, legitimized it
- Reduced the potential embarrassment
Africa Talks Climate
- One of their social science requirements: none of the respondents can know each other
- But over 50 men showed up, literally everyone in the village
- Couldn't turn away anyone from the focus group
- Later, seeking women's input, men said, "why would you want to talk to them?"
- Finally, with men's permission, they interviewed the women and they said, "why would you want to talk to us?"
- Go with desing that will yield highest quality data
Scales?
- Typically using 5-pt Likert scales also 5+1 with a Don't Know
- In Burma, considerable fencing-sitting, yay/nay-saying
- Also DKAs, neutrals on questions regarding politics, governance
- People didn't talk about politics or government
- http://africatalksclimate.com/
Non-academic jobs
Non-academic project design v. academic
- "Soft money", if you're not generating grants, you're not getting paid
- Similar research might address both theoretical and practical questions
- Designs are not mutually exclusive
- Important to keep abreast of the latest theory, methods
- Turn around time is much faster, 3-6mos
Jump offs
- Southern theory, Connell
Sept 27, Populations and sampling
Midterm
- "First half of a paper"
- Starting a lit review
- Something we can work with for the 2nd half of the course
Terms
Census
- A survey that attempts to include every member of the population of interest (POI)
Sample
- A subset of individuals from the larger group of interest (aka the population of interest or the target population)
- A "well-stirred soup"
Bias
- Refers to the systematic over or under representation of certain segments of the population (aka a non representative sample)
Stages of drawing a sample
Define the target population aka population of interest (POI)
Demographics
- The social grouping or categories that we use to describe precisely who it is we are targeting
- Common demographic categories include: age, sex, education level, race or ethnicity, etc.
Psychographics
Attempt to target specific "types" of individuals that focuses not on the standard demos or location but on their psych profile
- Lifestyle
- Soccermoms, Yuppies, Gen X
- VALS: Values, Attitudes, Lifestyles Survey
Determine the sampling frame
Survey elements, sampling units, units of analysis (group? individual? org?)
- Sampling frame: relatively complete list of those in your population of interest eligible to participate in your survey
- "All fortune 500 companies..."
- But what if you want to survey everyone in a town?
- You must define your pop-of-interest carefully considering both inclusion and exclusion criteria
- Also you must assess the degree of potential sampling frame error
Selecting a sampling technique
Probability sampling:
- Each individual or household or "element" has a known probability or chance of being selected
Nonprobability sampling:
- Does not use a chance selection procedure sand therefore is more easy for bias to creep in
Simple random sample
Samples in which every member of the population of interst (POI) has an equal and known chance of being selected
- Certain kinds of analysis:
- e.g. if you want to do a network analysis, they have to know one another
- Also, face-to-face interviews
- You might constraint your sample to a local area, "clustered" or "multi-staged" sample
Stratified random sampling
- Decide on the subgroups that you want to be sure are represented
- Be sure that your final sample contains a certain "mix" of people by matching the proportions you want on your key "variables of interest"
- Never have less than 100 in a particular subgroup
- Use random selection process but stop entering when a particular subcategory is filled
- Chochran (1961) generally suggests that the biggest gains are made in up to the first 6 categories or strata in a given variable
- The father of stratified random sampling is Jerzy Neyman who demonstrated in 1934 that stratified RS can produce samples equal or superior in quality to SRS particularly if you use a system called the Neyman allocation
| Male | Female
- | White | 50 | 50
- | AfAm | 50 | 50
- | Latino/a | 50 | 50
- | Asian | 50 | 50 |
|---|
Cluster sampling (aka Area SAmpling or Multistage sampling)
- The population is divided into mutually exclusive and exhaustive categories (eg. counties)
- These counties are selected aka area sampling
- Most often used in face-to-face surveys to control costs and logistics
Rate of homogeneity (ROH)
- Measure of similarity within a sample
Nonprobability sampling
There are at least 3 appropriate situations for nonprobability sampling:
- Hard to identify groups, e.g. gang members
- Very specific groups, e.g. patients with rare disease
- Program evaluations
Quota sampling
- Takes this idea of selecting respondents on the basis of their demographics one step further
- No sampling frame
- Collects a sample until study's demographics resemble those of the population as a whole or matches those demographics you are particularly interested in
- Most common demographic categories used include: age, gender, race, income
Snowball sampling
- A nonrandom sample that uses participants to recruit other participants who have the characteristics of interest
- For use only in hard to reach or rare populations of interest
- e.g., commercial sex workers, "straightedge kids"
Convenience sampling
- Grabbing almost any warm bodies without worrying about demographics and representativeness
- Least expensive and fast but you get what you pay for
Determine the sample size
- People are overly swayed by the N=1 experience
- e.g. your one friend bought a lemon but consumer reports tells you the car is great
Sample size should be dictated by ...
- Homogeneity of the population (e.g. close races need more respondents than landslide)
- Japan smaller sample than the US
- Type of data and future analysis (dichotomous (chi-square v interval (regression)))
- Precision required (what is the acceptable level of error?)
- e.g. +/- 3% too much in the Bush/Gore election
- Magnitude of a difference you are trying to detect (power or beta)
- Number of subgroups of interest
- Need to almost double your sample for each subgroup of interest
- Present goals: complex questions where you are interested in multiple variables need larger samples (more variance)
- Future goals
- For example, in planning a longitudinal or panel poll in which you reinterview the same individuals at several points in time, you will want to draw a larger initial sample because of attrition over time at each "wave" of data collection
- Typically calculate at least 10% attrition between each wave
- Tougher for elderly or highly mobile populations
Power
- Sample size
- Magnitude of effect expecting
- Variability
Sample size calculator
VERY large sample size
- Statistical significance is a function of sample size
- If the sample is very large, everything starts to come up statistically significant
- Large N, small effect size
- But not necessarily meaningful
- Might have to do a sample from within it
Oct 11, Survey mode
Housekeeping
- 10/18, Next homework due (4, 5, 6)
- 11/22, optional presentation day
- 11/29, 10 min presentations
- 12/3, papers due
4 Modes of survey administration
- Face to face
- Telephone
- Online
- By mail
Factors driving survey mode
Appropriateness
- Can I really reach my target population?
Quality of resulting data
Logistics
- Time, resources, monies
Cost
- Face to face is most $$
Amount and type of information you need
- Limits to how many questions you can ask in diff modes
Potential biases (Schwartz et al 1991)
Acronyms to know
- CAPI, computer assisted personal interview
- CASI, computer assisted self-interview
- CATI, computer assisted telephone interview
- IVR, interactive voice response (telephone counterpart to ACASI)
- Touchtone data entry (TDE)
Brief history
Pre 1960s
- Up through early 60s, it was all face-to-face
- People tended to agree ("politeness norm"), 70-90% response rates
- Phone surveys were only rarely used after the Alf Landon Literary Digest debacle in 1936
- Also, no areas codes until 1960s
- Very expensive long distance
- People were not accustomed to talking to strangers on the phone
- Mail surveys began to be used widely in the 1940s but only for highly specialized populations (i.e. USC alumni)
- Lists did not exist and response rate was very low
1960s-1980s
- By the late 1960s, shifts in tech/cultural norms
- For a time, the 'politeness norm' carried over to the phone
- Electronic typewriters enabled people to make copies on normal paper (instead of mimeo)
- Both telephone + mail enabled surveys to cover the nation and eliminated the need for "complex multistage samples", CHEAPER
1980s
- National household samples (phone)
- Mail when postal addresses are adequate and cost is a concern
- Face-to-face when even small coverage omissions are not acceptable (still expensive!)
1990s-present
- Trend toward gated communities and locked apartment bldgs
- Move toward unlisted numbers
- Telemarketers poisoned public, mistrust, end of "politeness"
- Less tolerant of long surveys
- Answering machines, caller ID, multiple lines
- Still not OK to call someone on a cellphone because of the billing
- 25% of U.S. is cellphone only
- Result: mail surveys currently can have higher response rate than phone
Note: face-to-face surveys almost always exclude Hawaii and Alaska
Factors to consider in selecting survey mode
- Coverage of population of interst
- Availability of a sampling frame
- Giving up probability sampling
- Unit of analysis
- Most common: household
- Degree of interviewer involvement
- Logistics
- Type of info collected, cost, time, length, response rate, etc.
- Post office will sell a list of addresses
Hypo: Surveying a fortune 500
- Start with a letter, follow up with call?
- Face to face?
- Trusted listserv?
Coverage of population of interest
- Surveys of households (FTF or mail) have much better overage than telephone
- 50% of homes have internet access but no sampling frame
Logistics
- Channel of comm
- Sometimes even with computer entry, researchers admin with pen+paper because people might find the laptop to be rude
- Cost
- Turnaround time
- Data entry: entered, cleaned
- Data cleaning: eliminate "wild codes"
- Computer assistance can clean data for you
- Analysis
- Type of info collected (i.e. reaction time)
Teleforms
- Scantron, teleforms
- Somewhat out of fashion because the computer is cheaper
Length
- F-to-F, 1 hour unpaid, 90min paid (in single sitting)
- Mail, 1 hr if paid
- Online, half hour if unpaid and topic is of interest (otherwise 5 min)
- Phone, 10-to-15 min
Lottery incentives?
- Sometimes a lottery ticket is better than a small amount of money
- If you pre-pay someone, they may feel "reciprocity"
Online incentive?
- Gift certificate
- Paypal accounts
Response rate
Number of surveys completed / number attempted
Best rates tend to be:
- FTF
- Telephone
- Online
Why not keep going?
- If the number of people refusing to participate goes up, you're more likely to have a systematic bias
- Jeopardizing the representativeness
- 80% very good
- 60% acceptable
- Under 50% questionable
Refusal conversion: Increasing response rate
- Emphasize the importance of the study
- Personalize by emphasizing the respondent's importance to the study
- Offer incentives ($, raffle, coupons)
- University sponsorship
- "Stamped" as opposed to metered return postage
- Personalized postcard pre-notification and follow-up
- Increase the number of follow-up contacts to at least 3
- The color mint green
- Notification of cutoff date (if mail survey)
- "Foot-in-the-door", get agreement to a small request first (fill out screener to see if eligible)
- Burden - if your survey is indeed short highlight length
- Interviewer training and manner (address concerns of R "not selling anything")
- Interviewer matching (or switch to a more skilled "converter" on multiple attempts
What is appropriate monetary incentive?
- Close to the amount they'd make in the same time
- Make it "worth their time"
The leverage salience theory (Groves, Singer and Corning, 2000)
- Figure out what is most important to people
Major survey modes
Mail surveys
- Self-administered
- Return stamped envelope
Internet surveys
- Preferable for international sampling
- Cheaper than mail
- Data entered automatically
- No sampling frame
- Site-based surveys for a certain kind of skew
Telephone surveys
- Random digit dialing (RDD) pool is helpful
- Not that expensive
- Fast turnaround, good to do immediately after an event
Personal interviews
- OK response rate (65-75%)
- Useful for sensitive info
Biases by mode interations =
- Time pressure
- Nonverbal + clarification ues
- Perceived confidentiality
- External distractions
- Self selection of respondents
- Order effects
- Context effects
- Response order effects
- Worse in self-admin
- Recall
- Social desirability
- Worst on FTF
- Question form
- FTF open ended questions are longer, more hetero
- Fence-sitting worst on self-admin
Multimodal surveys
- "Mixed mode surveys" (Dillman, Smyth, Christian, 2009)
- Controversial....
- Still need "unimode or unified design", questions are standardized across modes
- Occasionally there is a need for mode specific design
Nov 1, Piloting surveys
Housekeeping
- Team B: Elisheva, David
- Send pilot survey to teammates
- Do the "intensive" pilot of your own + the other ppl in your group
- Do the "polishing" version with people also in Team A
- Pilot on paper, not qualitrics
3 distinct standards that all survey questions should meet
Content standards
- Are your questions assessing what you want to assess?
- Internal validity, measuring what you think you are
Cognitive standards
- Do Rs understand the questions?
- Are they able to answer (ie, do they have the information required to answer them? The ability to answer them?)
- Are they willing to answer? (social desirability or privacy issues)
Usability standards
- Can interviewers and Rs complete survey as intended? (ie. fatigue, scales, etc)
The purposes or pretests (Converse & Presser, 1986)
For specific questions:
- Ensuring sufficient variation among respondents
- If there is no variation? Throw it out! Waste of time.
- Meaning
- Internal validity
- Task difficulty (too hard to answer)
- People might drop out if they can't answer on the spot
- Respondent interest and attention
- Softball, warmup questions can make ppl feel comfortable, good about taking the survey
For questionnaire as a whole:
- "Flow" and naturalness of sections
- Is it like a conversation?
- Do you explain the purpose?
- Also need IRB instructions
- Order of questions
- Is it funnel?
- Dealing with "order effects"?
- Skip patterns
- Scales
- Keep your responses increasing from left to right
- SM tends to use 7+ scales
- Timing
- Need to get a sense of timing when piloting
- Respondent interest and attention
- Fatigue! Is the same alternative checked over + over?
- Respondent well-being
- Is it upsetting?
- Appropriate reading level?
- 4th grade level
- If bilingual, be sure that you are using appropriate terms (not just auto translate)
- Is it manageable?
- Page breaks?
- Put questions into grids, test on qualtrics
Side note: don't refer to humans as "subjects"; use "participants" in a focus group or "respondents" in a survey
6 ways to review surveys
1. Expert content review
- Subject matter expert
- Survey design expert
- An expert panel is most effective + efficient (least $) way of debugging yr survey
- Strength: good for finding ambiguous terms, order and measurement problems
- Especially good to be sure you aren't looking over a MAJOR scale used in this area
- Graesser, Kennedy, Wiemer-Hastings, and Ottati (1999) developed a list of potential problems
- Graesser et. al have developed a computer program that is supposed to identify these problems as a rought expert appraisal but it is inferior to human experts.
2. Focus groups
- Focus groups are most useful early on in the survey construction process to get a sense of key issues, nomenclature, and a general sense of how people target population thinks about the issues
- Pros: quick turnaround
- Cons: Ps may not be representative, bandwagon effects, not good for specific wording issues
3. Cognitive interviews
NRC held workshop in 1983 that raised interest in cog psych in survey design
- Protocol analysis ("talk aloud")
- Rs think aloud as they answer questions, narration
- Retrospective think-alouds
- Rs describe ho they arrived at their answers either just after each question or at the end of the survey
- Confidence ratings
- Rs assess how confident they are in each response
- Paraphrasing
- Rs restate each question in their own words
- Definitions
- Rs provide definitions of key terms in the question
- Probes
- Rs ask followup questions designed to reveal their response strategies, for example, "Could you tell me more about that? Could you give me an example?"
4. Pretests or piloting
Two outcomes:
- Interviewer debriefing, interviewer feedback
- Improving wording, order, streamlining
- If it's hard for the interviewer to speak aloud, there may be a problem with the question wording
- Quantitative info
- Based on response of a small sample
- Look at item distribution (items with no variance may be dropped or scale changed) and missing data
Participating survey
- More in depth (using draft version)
- Uses talk alouds, definitions, ask about alternate wordings
Rs are told that it's a pretest...
- Ask Rs for feedback, "what did that question mean to you...?"
- Drawback: very time intensive
- Small number of Rs (Converse & Presser recommend 25-75 participants)
Polishing survey
- More like dress rehearsal
- Show must go on!
- Done on very polished (near finished) instrument
- Would include a timing of each section or page
- Online survey will give you a sense of the timing
5. Behavior coding
Videotape interviews + rate interviewer behavior
- Note wording, clarification issues
6. Randomized or split ballot experiments
- If you are uncertain of wording, pilot with two versions
- Assess the outcomes
- Look at length of time taken to answer
- And expectations regarding outcome
Other points
Don't reinvent the wheel
- Double check for pre-existing scales
Timing
- Self-administered - overall time or "sections" if possible
- Administered - per page / sections
- Online - can do for individual items, by section and overall
POI
- For pilots, go to your least common denominator
Interviewing
- Interviewer must "delivery" the interview, performance!
- High energy!
Ways of evaluating pretests
- Margin notes
- Oral debriefing with Rs
- Written reports section by section
- Field observation
- Optional: statistical analysis if you have enough pretest Rs
Questions for interviewers
- Did any questions make R uncomfortable or confused?
- Did you have to repeat any questions?
- What questions were the most difficult or awkward?
- Did any of the sections seem to drag?
- Were there any sections where the respondent would have liked the opportunity to say more?
Nov 8
- Following up on pilot surveys
Income measures
- Try to accommodate 95% of yr target population
Dealing with "sacred" questions
- You may find that a survey starts to go stale over time
- Current events, etc change the terms
- Need to be self-critical, -reflective and make changes
Time scales
- Minutes v hours?
Social desirability
- Presenting R with scenarios may be a good way to address this
Pilot
- Enter the data into an SPSS data sheet
Nov 22: Reliability, Validity
2 basic properties of empirical measurement:
- Reliability: will consistent trials show the same results?
- Validity: are you measuring what you think you are measuring?
Note: You can have reliable results that are not valid
Steps to analyzing data
Remember: use reason and interpret the data. Don't just rely on what SPSS spits out at you.
1. Clean data
Garbage in, garbage out (GIGO)
- Eliminate missing data
- Check for wild codes, data outside acceptable range; unacceptable
- You put in zeros for no answer but SPSS computes 0 as an acceptable response. New SPSS convention == . or period
- Listwise missing data: if any data is missing in a case, remove the whole case
- Pairwise missing data: if any data is missing in a case, remove that case and another case (results in "swiss cheese data")
- Outliers, more than 3 std. deviations from the mean, top 0.5% of the population acceptable but will seriously skew your data
- Response biases (nay-sayers, yay-sayers)
Check distribution of items
Need to ask these questions about every single item in the survey.
Is data normally distributed?
- If not, this may substantially limit your ability to analyze the data.
- Can be non-normal if there is a significant skew to one side or the other
Is data skewed?
- Items with means nearer the center of the response options better. (If very skewed, should you adjust your scale to measure smaller differences?)
If no or almost no variance, should you cut the item?
Check validity of your items
Interpretation of the validity coefficient
- Can range from -1 to 1 but almost always between 0 and 1.
- Rarely exceeds .5
- Cohen suggests that .1 = small correlation, .3 moderate, and .5 large
Coefficient of determination
- Percent of variance explained
- Validity coefficient squared
- e.g. validity coefficient of .4 predicts .16 or 16% of the variance
Validity coefficients influenced by sample size
- Small sample size, very small validity coefficient, or no result -- even when it is actually predictive
Scales
Easy mistake
- Don't forget to reverse code items for flipped scales!!!
Item-scale correlations for a given item
- Corrected item-scale correlation: with all items excluding this one
- Uncorrected item-scale correlation: with all items including itself
- Compare the 2 to assess impact of specific items
Check internal consistency or Reliability
- How strongly do your items correlate with one another?
- Number of items in the scale?
Coefficient alpha
- Common measure of reliability of items in terms of internal consistency
- Proportion of variance in the scale attributed to the "true" score (as opposed to measurement error)
Where alpha is the alpha coefficient (Note: SPSS does both of these)
- k is the number of items on the test
- sigma squared sub i is the variance of specific item I
- sigma squared sub x is the total variance of the scale
Imagine you have a 6 item scale and the intercorrelation among items is .5.
- Alpha equals 6(.5) / [1+.5(6-1)] = 3/3.5=.857.
- Very good reliability
Interpretation of the alpha coefficient
- 0 implies no reliability (all measurement error)
- 1 implies perfect reliability (no measurement error)
- (Over .90 might consider shortening the scale, Over .80 is
considered good, .70 is acceptable, below .60 unacceptable)
Factor analysis
Seeking 1 underlying factor to explain relationship among variables
- Set of items is not necessarily a scale
- Items may share no common underlying latent variable or they may share several
- Factor analysis is a generic term for set of statistical techniques that reduce set of observable variables into a small number of latent variables
- Factor analysis begins with the assumption that a single factor will be sufficient to explain the pattern of responses and then performs a statistical check on how well the single factor solution fares.
- If one factor is not adequate to explain the pattern of results it will try a 2 factor solution and so on until the unexplained residual correlations are small.
- For our purposes factor analysis is particularly useful for construct validation (of latent variables that are not easily operationalized) and assessing the number of factors within a scale.
Terminology:
- Load higher, means that a factor has a stronger predictive relationship to another
Rules of thumb:
- Only include variables that you believe are related to one another
- Sample size: you need at least 50 cases
- Based on correlations which are unstable with small samples (Tabachnick and Fidell, 2001)
- Have at least 3 observed variables for each "factor"
- Factors of only 1 (singlet) or 2 (doublet) items are undesirable (Thurstone)
- Add more differently-worded items if need be
- If relationships are curvilinear or strongly nonnormal, factor analysis will not work.
Determining extraction method
Common Factor Analysis (CFA)
- Typically used only when there is a lot of error
- Represents hypothetical variables and analyses only the common variance of the observed variables
Principle Component Analysis (PCA)
- Grounded in actual items
- Considers the total variance (common and unique to individual items) to account for the maximum proportion of variance with the minimum number of factors
- Use when measurement error is relatively small
Determining the method of rotation
Regardless of which one you pick, you can specify that a simple structure be maximized
Orthogonal rotation:
- Factors are forced to keep the angle between the reference axes perpendicular (90 degrees)
- First two factors are 2-D, factor three adds a third dimension
- Making sure factors are as different as can be
- VARIMAX, most common
Oblique rotation
- Less common, more complicated rotation that allows rotation angle to vary
- Used when you suspect underlying factors are correlated
- Residual correlations after orthogonal rotation are .15 or more)
- PROMAX, most common
Determining the number of factors
Note: ultimately this is a subjective decision but there are some criteria for determining the number of factors statistically
Kaisen-Guttman rule of eigenvalues
- An indicator of the amount of info or variance accounted for by a facotr
- Over 1 (you can set this minimum differently)
The percentage of variance explained
- Keep extracting factor until some % of variance has been explained
Scree test
- Image showing rate of change in size of eigenvalue
- Look for an "elbow"
Size of residuals
- If you are extracting the right factors, residuals will be small
Most important
- Interpretability - factors should be "theoretically meaningful"
Interpreting factors
- Examine factor loadings for significance
- Items with highest loadings are considered most similar to underlying latent factor
- Try to name the factors that reach significance
- If you can't name this structure, it might not be meaningful
Reviewing survey, remember...
Construct validity
- Measure what it's supposed to measure
Convergent validity'
- Does item/scale correlate with other established measures of same construct?
Face validity
- Common sense
Criterion-related validity
- Is measure a good predictor of some external criteria?
Content validity
- Does measurement reflect the entire domain
Exploratory v. Confirmatory factor analysis
Exploratory
- Analysis explores underlying factor structure
Confirmatory
- Begin with a structure an plan a study to confirm (or disconfirm) the hypothesis
- Need to use LISREL or another kind of SEM (structural equation modeling)
Examples
- Green & Salkind (2008)
- Williams and Monge (2001)
Factor analysis
Data reduction
- Identify factors that explain shared variance among a set of variables
Assumptions
- Measured variables are linearly related to the factors plus errors
- Measured items are multivariately normally distributed (for chi-squared test used in maximum likelihood)
Syntax v menus
- When you run 100s of factor analyses, syntax can be easier to manage
- Possible to use pulldown to generate a chunk of syntax and then manually edit for future analyses
Rotation
- After rotation, items are shifted to "best fit"
- Looking for items that are high on one component and low on the other
Reliability analysis
Item-total statistics
- If I have too many items, my alpha is not reliable
- Look at "Scale variance if Item Deleted"
- Cut ones that will raise the scale variance
Nov 29
Adam
- Mapping (Norman, 1988)
- Natural mapping (Skalski et al, in press)
= Skalski's topology of mapping
- Directional n.m.
- Kinesic
- Incomplete tangible
- Realistic tangible
psychology of n.m.
- Activation of mental models
Next steps
- Using Mechanical Turk to get a bigger non-student sample
Andrew
- Crimes against Children Research Center (CCRC)
- Lots of descriptive surveys
Elisheva
"Give the public what it wants to have and part of what it ought to have whether it wants it or not." -- Herbert Bayard Swope
Take aways
- "PCA": Principle Component Analysis
- Oblique rotation is best when there are interdependencies
- Promax: oblique
- Varimax: orthogonal

