So we've agreed on specifics of our project and we are happy with the Batch and Exhaust process. We have 2000 surveys to complete and...we would like the surveys completed in 2-3 days. When analysis time comes around will we be analysing a random spread of the population?
No
No
No
No
No!
If we expect fieldwork to be undertaken in the quickest turnaround time possible we will not gain the same age/gender/location spread as we would achieve if given weeks (not days) to undertake the same project. There may be a valid reason for this limited timeframe (like the client has a radio/TV advertisement occurring soon and they need the fieldwork undertaken prior to the marketing campaign) but we can't expect the quality of data or the spread of demographics/firmographics to be meaningful for analysis.
So generally speaking this approach is a BIG mistake!!
Why?
Well in a nutshell we CANNOT get a true RANDOM spread of the population regardless of sample size in any less than 7-10 days.
For our field team to run the project effectively these are the steps they MUST take:
1. Organise sample into Batches;
2. Brief All Staff on the Project;
3. Put their largest team on the Project for the first 20-30% of the project target (so 400-600 for N=2000 in this case). The interviewers work through the Batches, appointment setting, removing invalid numbers etc. Then the team is reduced to a manageable size of the top performers;
4. When 80-90% of the project is complete the number of interviewers is reduced even further and Refusal Converters are introduced into the mix. Refusal Converters (RCs) are interviewers who are more specialised in overcoming soft refusals. The process works by re-introducing soft refusals into the sample for RCs to contact and try to convert to a survey. This will enable an overall boost of response rate.
5. When the project is complete you should see the vast majority of sample has been exhausted (an average of 3- 5 dials dials per sample record) and the only refusals remaining should be HARD REFUSALS.
Thanks for Reading,
U :)
Search This Blog
Monday, November 1, 2010
Wednesday, October 27, 2010
True Randomness is Hard to Find
Soooooo here's the deal...
We want a Random sample of the population and we'd like it to fall by Age, Gender and Location...just as ABS % falls (purely by using CATI Last-Birthday method and no quota chasing).
We want a Random sample of the population and we'd like it to fall by Age, Gender and Location...just as ABS % falls (purely by using CATI Last-Birthday method and no quota chasing).
Soooooo here's the fact....
IT WON'T!!
Just because we want our project to PERFECTLY reflect ABS proportions it doesn't mean it will.
Just because I close my eyes and wish for an Aston Martin V8 Vantage to appear with two tickets to Fiji in the glove box it doesn't mean it will.
Just because I close my eyes and wish for an Aston Martin V8 Vantage to appear with two tickets to Fiji in the glove box it doesn't mean it will.
Just because I would like to have World Peace by the time you finish reading this blog it doesn't mean I will.
Now you may just think I'm being a bit silly but time and time again we spend too much time "hoping" for a result without actively getting involved to think about a solution that will work for our clients, and their budgets.
....lets continue....
Just hoping for the "right" fallout isn't enough to get by. We need to watch the proportions collected in field closely...if the incidence is not falling as expected then we may need to be creative in how the remainder of the project is executed.
...however, prior to going into field we'll need to ask ourselves these questions...
What does the client really need analysed? Do they really need absolute proportions of sample to reach these quotas?.....We need to be clear on the answers we receive from these questions as they will form the backbone of data collection.
Here is an example of a "quota table" that a client may provide...
Here is an example of a "quota table" that a client may provide...
D | |
Proportionate sample n= | |
ACT Male (18-34 yrs) | 3 |
ACT Male (34-55 yrs) | 3 |
ACT Male (55+ yrs) | 2 |
ACT Female (18-34 yrs) | 3 |
ACT Female (34-55 yrs) | 3 |
ACT Female (55+ yrs) | 2 |
NSW Male (18-34 yrs) | 49 |
NSW Male (34-55 yrs) | 61 |
NSW Male (55+ yrs) | 51 |
NSW Female (18-34 yrs) | 50 |
NSW Female (34-55 yrs) | 63 |
NSW Female (55+ yrs) | 57 |
NT Male (18-34 yrs) | - |
NT Male (34-55 yrs) | 2 |
NT Male (55+ yrs) | 1 |
NT Female (18-34 yrs) | 2 |
NT Female (34-55 yrs) | 2 |
NT Female (55+ yrs) | 1 |
QLD Male (18-34 yrs) | 30 |
QLD Male (34-55 yrs) | 36 |
QLD Male (55+ yrs) | 29 |
QLD Female (18-34 yrs) | 30 |
QLD Female (34-55 yrs) | 38 |
QLD Female (55+ yrs) | 32 |
SA Male (18-34 yrs) | 11 |
SA Male (34-55 yrs) | 14 |
SA Male (55+ yrs) | 13 |
SA Female (18-34 yrs) | 11 |
SA Female (34-55 yrs) | 15 |
SA Female (55+ yrs) | 15 |
TAS Male (18-34 yrs) | 3 |
TAS Male (34-55 yrs) | 4 |
TAS Male (55+ yrs) | 4 |
TAS Female (18-34 yrs) | 3 |
TAS Female (34-55 yrs) | 5 |
TAS Female (55+ yrs) | 5 |
VIC Male (18-34 yrs) | 38 |
VIC Male (34-55 yrs) | 46 |
VIC Male (55+ yrs) | 37 |
VIC Female (18-34 yrs) | 38 |
VIC Female (34-55 yrs) | 48 |
VIC Female (55+ yrs) | 43 |
WA Male (18-34 yrs) | 15 |
WA Male (34-55 yrs) | 19 |
WA Male (55+ yrs) | 15 |
WA Female (18-34 yrs) | 15 |
WA Female (34-55 yrs) | 19 |
WA Female (55+ yrs) | 17 |
Total | 1004 |
......Is our project going to fall out perfectly like this table (above)...?

.....soooooo what do we need to check?
What is the client really looking for? Are WE (as THE EXPERTS) over complicating the brief?
More often than not the client just wants a reasonable spread of age, gender and location. It doesn't have to fully replicate ABS % and the most accurate way to undertake this (without blowing our budget) is with the.....
More often than not the client just wants a reasonable spread of age, gender and location. It doesn't have to fully replicate ABS % and the most accurate way to undertake this (without blowing our budget) is with the.....
Batch and Exhaust Process
It’s important, in particular for Social Research projects, that we are using a Batch and Exhaust method of sample management.
The process is very easy to follow:
1. Source Nationally representative sample (by State);
2. Data Team Randomise entire sample;
2. Data Team Randomise entire sample;
3. Data Team split sample into number of Batches required (usually 3 labeled Batch A, Batch B
and Batch C but each project may differ depending on the ratio of sample records to interview
projected...usually 10:1 ratio will suffice unless there is specifc qualifying criteria likely to
TERMINATE a high proportion of candidates, in which case you may require more);
and Batch C but each project may differ depending on the ratio of sample records to interview
projected...usually 10:1 ratio will suffice unless there is specifc qualifying criteria likely to
TERMINATE a high proportion of candidates, in which case you may require more);
4. Data Team allocate sample locations;
5. All interviewers are placed into the first Batch and exhaust this Batch (as determined by
sample specifications. This can be anywhere from 3 dials to 15+ dials per active sample record);
sample specifications. This can be anywhere from 3 dials to 15+ dials per active sample record);
6. Once interviewers begin running out of phone numbers in Batch A they are slowly placed in
Batch B. We need to ensure that there are still interviewers moving in and out of the first Batch
to ensure the sample is exhausted and Appointments are honoured;
Batch B. We need to ensure that there are still interviewers moving in and out of the first Batch
to ensure the sample is exhausted and Appointments are honoured;
7. As the project is drawing to a close it should “tail off” so no fresh Batches are being accessed
in the last few days of fieldwork (unless it is absolutely necessary to reach quotas).
in the last few days of fieldwork (unless it is absolutely necessary to reach quotas).
The aim of this process is to ensure that our random sample of the population allows a good spread. So all potential respondents have an opportunity to answer the survey (via Last-Birthday method) without an inherent bias on contacting respondents on a certain day of the week or time of the day.
The expectation is that EVERY Batch that has been accessed is exhausted by the end of the project. Although to achieve specific quotas a fresh Batch may need to be accessed on an Ad Hoc basis.
Lets go through the 2 PRIMARY questions clients have when concerned with this method:
1. Is it true that when we have one big "bucket" of sample we can't target individual locations?
Correct. Using Last-Birthday method, given sufficient timelines, we will get the spread our "ABS CATI Reality" provides. If for any reason we need to top up locations for analysis purposes then we can add further locations (specific by State, Metro, Rural etc. randomised and de-duped against the rest of the sample in the project). This will give us flexibility (but we should only use this as a LAST RESORT as this reduces the true Randomness of the process).
2. What if we just started with State/specific locations so we didn't have the issues with trying to chase these quotas in the end. Wouldn't this give the same result?
Incorrect. If we target individual states but we are trying to get a National spread we are likely to put a bias in our sample. Foreinstance: If our Call Centre is in the Eastern States we may only call WA after 7pm for ease of Call Centre management thus reducing our chance of candidates who are home before 7pm an opportunity to take part. We may also find that our older age brackets will fill up first when targeting QLD but our younger age brackets will fill up most when targeting NSW/VIC...
..then when we look at our data we will notice heavy skews in specific States. When we weight it we may end up with some very small counts by State and age.
So our aim is to take the "choice" away from the Call Centre and keep the process automated by our systems that are designed to do so.
..then when we look at our data we will notice heavy skews in specific States. When we weight it we may end up with some very small counts by State and age.
So our aim is to take the "choice" away from the Call Centre and keep the process automated by our systems that are designed to do so.
At the end of the day using this method is a lot easier to manage (so we're not juggling multiple sample locations) and it gives us a much better spread (the closest to natural fallout we will find in the CATI environment).
Lastly, as you may or may not have read in my previous blogs...whatever doesn't fall out during this Batch and Exhaust process (whether it be age or gender) can be filled with 1 of 3 options.
Either:
Either:
1. Chase the quota/s at the incidence they consitute in the population (usually costing much more than a client is prepared to bear);
2. Use panel/purchased sample to top up quotas, analyse this data separately and if there is a major skew in findings with respondents of the same demographics then report separately. If there is little difference between the two sets of data then merge them and report as a whole; or
3. Accept the Fall Out as is!
Thanks for reading,
Saturday, October 23, 2010
Crunch those ABS! (the ebb and flow of field)
So you're a Researcher and you get a request from your client to provide data on how the general public feels towards say, Child Obesity, a very important Social Research initiative.
You begin to discuss the requirements of the proposal.
In a nutshell the client has particular interest in learning what 30 - 65+ year olds in the Northern Territory (NT) think about this topic and how it currently affects, or is likely to affect their lives or lives of their loved ones in the future.
The client wishes to get a "spread" of this population. In theory this seems quite an easy task. Using RDD we can call into the NT and conduct the interviews over the telephone thus ending up with a spread of ages, genders and locations.
But wait...After further discussion with the client we discover there is more to their requirements than we realised...
The client expects that we will capture a sufficient sample size to report on, ideally with ABS % in mind. This being the case (and thinking about an N=500 scenario) they would like:
(NT) 30-34 year olds: 80 (which is 8% of the Population, 16% of the quota group)
(NT) 35-39 year olds: 80 (which is 8% of the Population, 16% of the quota group)
(NT) 40-44 year olds: 70 (which is 7% of the Population, 14% of the quota group)
(NT) 45-49 year olds: 69 (which is 7% of the Population, 14% of the quota group)
(NT) 50-54 year olds: 61 (which is 6% of the Population, 12% of the quota group)
(NT) 55-59 year olds: 52 (which is 5% of the Population, 10% of the quota group)
(NT) 60-64 year olds: 37 (which is 4% of the Population, 7% of the quota group
(NT) 65+ year olds: 51 (which is 5% of the Population, 10% of the quota group)
Now we could say...hang on a sec...how about we just get a spread of the population (either via Last-Birthday, Rizzo or Kish Methods...each has its advantages) and then weight the data at the end of the project...Well with a target as low as N=500 to push quotas up or down with more than a 5% differential could cause a major skew in findings.
So we have a two pronged issue:
1) The Target of N=500 provides very little flexibility to adjust the quotas to suit the ebbs and flows of data collection;
2) The Population of the Northern Territory doesn't warrant a target much higher than N=500 considering the selection of the population we are targeting is only 117,000 (out of a TOTAL (NT) POPULATION of 226,000) and we will be spending a large amount of our hours budgeted potentially chasing tough to reach quotas.
But...the client then throws another spanner in the works..."We also need to analyse by Gender!" they say.
This is where we come up against the dreaded "Interlocking Quota" (shudders).
The interlocking quota is essentially a quota that is used to garner very specific information within a quota that already exists.
So we go from a "Mother Quota" of say (NT) 30-34 year olds to a couple of subsets of this as (NT) 30-34 year old Males and (NT) 30-34 year old Females. We then have a situation whereby the population incidence per quota has reduced even further....
(NT) 40-44 year old Females: 34 (which is 4% of the Population, 7% of the Quota Group)
(NT) 45-49 year old Females: 33 (which is 3% of the Population, 7% of the Quota Group)
(NT) 50-54 year old Females: 30 (which is 3% of the Population, 6% of the Quota Group)
(NT) 55-59 year old Females: 24 (which is 3% of the Population, 5% of the Quota Group)
(NT) 60-64 year old Females: 16 (which is 2% of the Population, 3% of the Quota Group)
(NT) 65+ year old Females: 24 (which is 2% of the Population, 5% of the Quota Group)
In a recent project that had very similar quotas to those just discussed I discovered that although the Gender Split within the population as ABS states should be 52% Males and 48% Females, using the Last-Birthday method of random selection the quotas fell out 35% Males and 65% Females. Now this was very difficult to determine "WHY?". Are Males less likely to answer a telephone from an unknown number, are phones primarily issued to the Female in the household, or is there another factor causing this?
You begin to discuss the requirements of the proposal.
In a nutshell the client has particular interest in learning what 30 - 65+ year olds in the Northern Territory (NT) think about this topic and how it currently affects, or is likely to affect their lives or lives of their loved ones in the future.
The client wishes to get a "spread" of this population. In theory this seems quite an easy task. Using RDD we can call into the NT and conduct the interviews over the telephone thus ending up with a spread of ages, genders and locations.
But wait...After further discussion with the client we discover there is more to their requirements than we realised...
The client expects that we will capture a sufficient sample size to report on, ideally with ABS % in mind. This being the case (and thinking about an N=500 scenario) they would like:
(NT) 30-34 year olds: 80 (which is 8% of the Population, 16% of the quota group)
(NT) 35-39 year olds: 80 (which is 8% of the Population, 16% of the quota group)
(NT) 40-44 year olds: 70 (which is 7% of the Population, 14% of the quota group)
(NT) 45-49 year olds: 69 (which is 7% of the Population, 14% of the quota group)
(NT) 50-54 year olds: 61 (which is 6% of the Population, 12% of the quota group)
(NT) 55-59 year olds: 52 (which is 5% of the Population, 10% of the quota group)
(NT) 60-64 year olds: 37 (which is 4% of the Population, 7% of the quota group
(NT) 65+ year olds: 51 (which is 5% of the Population, 10% of the quota group)
Now we could say...hang on a sec...how about we just get a spread of the population (either via Last-Birthday, Rizzo or Kish Methods...each has its advantages) and then weight the data at the end of the project...Well with a target as low as N=500 to push quotas up or down with more than a 5% differential could cause a major skew in findings.
So we have a two pronged issue:
1) The Target of N=500 provides very little flexibility to adjust the quotas to suit the ebbs and flows of data collection;
2) The Population of the Northern Territory doesn't warrant a target much higher than N=500 considering the selection of the population we are targeting is only 117,000 (out of a TOTAL (NT) POPULATION of 226,000) and we will be spending a large amount of our hours budgeted potentially chasing tough to reach quotas.
But...the client then throws another spanner in the works..."We also need to analyse by Gender!" they say.
This is where we come up against the dreaded "Interlocking Quota" (shudders).
The interlocking quota is essentially a quota that is used to garner very specific information within a quota that already exists.
So we go from a "Mother Quota" of say (NT) 30-34 year olds to a couple of subsets of this as (NT) 30-34 year old Males and (NT) 30-34 year old Females. We then have a situation whereby the population incidence per quota has reduced even further....
(NT) 30-34 year old Males: 41 (which is 4% of the Population, 8% of the Quota Group)
(NT) 30-34 year old Females: 39 (which is 4% of the Population, 8% of the Quota Group)
(NT) 35-39 year old Males: 41(which is 4% of the Population, 8% of the Quota Group)
(NT) 35-39 year old Females: 39 (which is 4% of the Population, 8% of the Quota Group)
(NT) 40-44 year old Males: 36 (which is 4% of the Population, 7% of the Quota Group)
(NT) 45-49 year old Males: 36 (which is 4% of the Population, 7% of the Quota Group)
(NT) 50-54 year old Males: 31 (which is 3% of the Population, 6% of the Quota Group)
(NT) 55-59 year old Males: 28 (which is 3% of the Population, 6% of the Quota Group)
(NT) 60-64 year old Males: 21 (which is 2% of the Population, 4% of the Quota Group)
(NT) 65+ year old Males: 27 (which is 3% of the Population, 5% of the Quota Group)
So let me tell you what the issues are with this approach:
1) The more specific the quota, the more costly it is to achieve;
2) Broader quotas (of say an age group without interlocking gender) are a lot quicker to achieve, thus reducing time in field, and making the project more affordable;
3) The more specific the quota, the more unknowns you introduce to the project (what if Males in the Northern Territory are less likely to have landlines to call? What if Females 30-34 only work in very specific Metro locations within the NT so more time is necessary to track them down?)
Sooooo....The key message to take out of this is....
The more specific the quota, the lower the incidence, the higher the cost.
Why?
Your main cost driver are the casual CATI interviewers. They are on the phone trying to get in touch with these very hard to reach quota groups...and not only that, the CATI interviewer motivation for the project can wain if they're sifting through phone numbers chasing a quota with a 4% incidence.
Your main cost driver are the casual CATI interviewers. They are on the phone trying to get in touch with these very hard to reach quota groups...and not only that, the CATI interviewer motivation for the project can wain if they're sifting through phone numbers chasing a quota with a 4% incidence.
Have you ever worked on the telephones trying to get in touch with a member of the population at less than 20% incidence rate? No? Well then have a chat to a CATI interviewer and they will tell you it's agonising and this frustration can come through in an interviewers voice. So when the interviewer eventually gets in contact with someone who qualifies they could lose the interview due to a refusal that has occurred because the interviewer is out of practice (as they are so used to having respondents disqualify).
Remember...at the end of the day you have "PEOPLE" calling "PEOPLE" so the process should be as user friendly as possible.
Lastly, if the client really MUST achieve these quotas we need to be as cost effective as possible. How do we do this?
Be honest with the client on likely scenarios good and bad (speak to your Field Team if you're unsure of the potential pitfalls). The client MUST be flexible.
Be honest with the client on likely scenarios good and bad (speak to your Field Team if you're unsure of the potential pitfalls). The client MUST be flexible.
In a recent project that had very similar quotas to those just discussed I discovered that although the Gender Split within the population as ABS states should be 52% Males and 48% Females, using the Last-Birthday method of random selection the quotas fell out 35% Males and 65% Females. Now this was very difficult to determine "WHY?". Are Males less likely to answer a telephone from an unknown number, are phones primarily issued to the Female in the household, or is there another factor causing this?
Well we may never know the true answer (or answers) but we can work with this information to mitigate risk of project failure. So HOW do we do this?
1. We can re-adjust our CATI quotas and target Males via a mixed-methodology approach;
2. We can use representative panel sample to boost quotas (but isolate data from this sample to determine any skews before merging it with the rest of the data collected);
3. We can snowball by contacting respondents who have already taken part in the study and attempt to get phone numbers of friends/family we can target direct.
A word of CAUTION...the alternative approaches just discussed are likely to produce at least a slight variation to the results we would produce if undertaking purely RDD.
So I'm going to leave you with a final point of note...think carefully about the questionnaire design to make it as iron clad as possible for there may be a need for you to be creative about how the data is collected while mid-field.
Thanks for reading.
U
3. We can snowball by contacting respondents who have already taken part in the study and attempt to get phone numbers of friends/family we can target direct.
A word of CAUTION...the alternative approaches just discussed are likely to produce at least a slight variation to the results we would produce if undertaking purely RDD.
So I'm going to leave you with a final point of note...think carefully about the questionnaire design to make it as iron clad as possible for there may be a need for you to be creative about how the data is collected while mid-field.
Thanks for reading.
U
Subscribe to:
Posts (Atom)



