Incentives and Customer Service Evaluations

Subject:	Consumer Science
Pages:	28
Words:	4144
Reading time:	15 min
Study level:	PhD

Introduction

The purpose of this study is to determine if there is truth in the assumption that by adding an incentive at the end of a transaction, a customer may potentially feel as though they were treated better than normal. As a result, customers who take a customer satisfaction survey may be convinced that the service was better than it initially was and provide a higher score for the service in the store which they patronized. The idea is that organizations can potentially offer incentives to their customers in order to make them feel better about the organization and also increase their customer service scores. As a result, organizations will be able to retain more customers and therefore increase their overall profits while satisfying a greater number of patrons. This research also examines if more customers will take the survey and this will help organizations increase the feedback rate they receive. With more feedback, organizations will have a better idea as to what their customers are demanding and what they need to do in order to please their customers. This chapter describes the analysis and results of data analysis employing both Chi-Squared Test of Independence and ANOVA Analysis of Variance.

Results

Test of hypothesis 1

Hypothesis H₀¹ states that the type of incentive offered for participating in the interactive voice response (IVR), computer-scripted and automated telephone survey makes no difference in shopper ratings of store cleanliness. Put another way, the attractiveness of the incentive does not unduly bias shoppers from giving their true opinion about how well the store comes up to ordinary standards for cleanliness.

In this and subsequent analyses, shoppers are classified into one of four independent groups according to the store where they had been randomly selected to participate in the survey. The “cleanliness” group were from the store that offered no incentive for participation, “cleanliness rating 1” survey participants received a candy bar, “cleanliness rating 2” shoppers were entered into a drawing for a $25 gift card, and “cleanliness rating 3” qualified for a drawing of a $250 gift card. These group designations are maintained for the three other variables analyzed.

When satisfaction ratings with store cleanliness are averaged for the entire 31-day experimental period, Table 1 (below) and Fig. 1 (overleaf) show that the control group was prone to give marginally better ratings than any experimental group. On the other hand, one notes that the candy bar and $250 drawing shopper groups yielded the highest maximum satisfaction ratings of 91.

Table 1: Mean Satisfaction Ratings with Cleanliness After 31 Days of Fieldwork

Descriptive Statistics
	N	Mean	Std. Deviation	Minimum	Maximum
Cleanliness rating	13	84.31	2.057	81	87
Cleanliness rating1	13	83.85	2.996	81	91
Cleanliness rating2	13	83.15	2.193	80	89
Cleanliness rating3	13	83.92	2.929	80	91

Figure 1: Average Cleanliness Rating, by Experimental Condition

The researcher opted to run multiple chi square tests of independence against the no-incentive group rather than testing goodness-of-fit across all four shopper groups all at once because:

The multi-group analysis defaults to testing against “expected values” across the entire contingency table, as a result of which variances are inflated and “significant” differences almost inevitable.
The nature of the research design is really a field quasi-experiment where the independent variable of incentive to participate is manipulated at three levels and the no-incentive group is effectively the control group. Hence, one-to-one tests between the control group condition and each experimental intervention in turn is the preferred option.

Tables 2 to 4 summarize the results of the Chi-Square Test of Independence for one incentive at a time against the control group. Taking the first case, no incentive versus receiving a candy bar to call the customer satisfaction survey “hotline”, the chi square values are 3.15 and 3.61, respectively. At five and seven degrees of freedom (df), the associated significance values both exceed the α = 0.05 hurdle. We are unable to reject the null hypothesis H₀¹and must therefore conclude that the incentive of a candy bar does not make a difference in altering store cleanliness ratings upwards.

Table 2 – Summary Statistics for Chi Square Test of Significance: No Incentive versus Candy Bar

Test Statistics
	Cleanliness rating	Cleanliness rating1
Chi-Square	3.154^a	3.615^b
df	5	7
Asymp. Sig.	.676	.823

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.
8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.

Comparing the incentive of a $25 gift card drawing versus the control group, the chi square values in Table 3 bear significance statistics that are similarly p > 0.05. In particular, the finding for the second incentive group suggests that any difference in group distributions (compared to the control) could occur by chance more than half the time if more such samplings were taken. We fail to reject this null hypothesis again and conclude that a $25 gift card drawing makes no difference in provoking more positive ratings about store cleanliness.

Table 3 – Summary Statistics for Chi Square Test of Significance: No Incentive versus $25 Drawing

Test Statistics
	Cleanliness rating	Cleanliness rating2
Chi-Square	3.154^a	4.769^b
df	5	6
Asymp. Sig.	.676	.574

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.
7 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.9.

Nor does raising the incentive to a $250 gift card drawing alter matters. At 5 degrees of freedom, the computed chi square value for the third experimental treatment group (Table 4 overleaf) is associated with a significance statistic p = 0.93 that is almost diametrically opposite from the p < 0.05 required to rule out chance occurrence. Any observed differences between control and treatment 3 are so minor that they can appear by chance 93 percent of the days when in-store response is recorded.

Table 4 – Summary Statistics for Chi Square Test of Significance: No Incentive versus $250 Drawing

Test Statistics
	Cleanliness rating	Cleanliness rating3
Chi-Square	3.154^a	1.308^a
df	5	5
Asymp. Sig.	.676	.934

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.

Test of hypothesis 2

Hypothesis H₀² states the customer’s attitude toward the checkout process being efficient is independent of the incentive used. Tables 5 to 7 below summarize the results of the Chi-Square Tests of Independence.

Against the no-incentive control, the candy bar incentive resulted in a chi square value so low that, at 7 degrees of freedom, the significance statistic of 0.94 (Table 5) far exceeds the required α < 0.05. Technically, this means the differences between the two distributions are so minimal they could occur by chance in around 94% of days or periods when the candy bar incentive is tested again. Hence, one must accept the null hypothesis and conclude that a candy bar is not enough to induce a more positive perception of the efficiency of the checkout process.

Under the $25 gift card drawing condition, evaluations of checkout speed differ somewhat more on cursory inspection. However, the chi square value of 4.8 (Table 6) is not enough, at 6 df, to yield the desired statistical significance: p = 0.57, a far cry from the p < 0.05 hurdle needed to have confidence that the “difference” represents meaningful rather than random variation. The second test condition under Hypothesis 2 does not, therefore, provoke better evaluations of checkout efficiency.

Table 5 – Summary Statistics for Chi Square Test of Significance: Efficiency of Checkout Process: No Incentive versus Candy Bar

Test Statistics
	Speedy checkout rating	Speedy checkout1
Chi-Square	3.615^a	2.385^a
df	7	7
Asymp. Sig.	.823	.936

8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.

Table 6 – Summary Statistics for Chi Square Test of Significance: Efficiency of Checkout Process: No Incentive versus $25 Drawing

Test Statistics
	Speedy checkout rating	Speedy checkout2
Chi-Square	3.615^a	4.769^b
df	7	6
Asymp. Sig.	.823	.574

8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.
7 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.9.

Thirdly, Table 7 overleaf shows the comparison of response distributions for the no-incentive and $250 gift card drawing conditions. There is some difference, the interesting part being that it is in the reverse direction from experimental condition 2. Once again, the chi square value is so low for 3 degrees of freedom that the significance statistic, p = 0.56, fails to meet the α<0.05 condition. The null hypothesis cannot be rejected.

Table 7 – Summary Statistics for Chi Square Test of Significance: Efficiency of Checkout Process: No Incentive versus $250 Drawing

Test Statistics
	Speedy checkout rating	Speedy checkout3
Chi-Square	3.615^a	2.077^b
df	7	3
Asymp. Sig.	.823	.557

8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.
4 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 3.3.

Test of hypothesis 3

Hypothesis H₀³ states the customer’s attitude toward the staff being friendly and courteous is independent from the incentive used. Tables 8 and 9 below summarize the frequency distribution for the control and first incentive variables, while Table 10 summarizes the results of the Chi-Square Test to assess whether the distributions are independent and can therefore be evaluated as representing meaningfully different shopper response.

Table 8 – Satisfaction Score Distribution for Friendliness of Store Clerks: Control Store

Friendliness rating
	Observed N	Expected N	Residual
80	2	2.2	-.2
81	2	2.2	-.2
82	1	2.2	-1.2
83	5	2.2	2.8
84	2	2.2	-.2
86	1	2.2	-1.2
Total	13

Table 9 – Satisfaction Score Distribution for Friendliness of Store Clerks: Incentive: Candy Bar

Friendliness1
	Observed N	Expected N	Residual
80	1	2.2	-1.2
81	1	2.2	-1.2
82	2	2.2	-.2
83	5	2.2	2.8
84	3	2.2	.8
98	1	2.2	-1.2
Total	13

The first null hypothesis involving perceptions of store clerk attitudes is evaluated in the result of the chi square test in Table 10 below. Here we see that the difference in the average residuals between the control and candy bar conditions is so narrow at 5.0 versus 5.9, respectively, for the given degrees of freedom, that such a gap can occur roughly a third of the time if further re-samplings are conducted. Hence, we are unable to reject the null hypothesis and must conclude that a candy bar incentive is not enough to sway shoppers into giving a more positive evaluation of store clerk friendliness.

Table 10 – Summary Statistics for Chi Square Test of Significance: Friendliness of Store Clerks: No Incentive versus Candy Bar

Test Statistics
	Friendliness rating	Friendliness1
Chi-Square	5.000^a	5.923^a
df	5	5
Asymp. Sig.	.416	.314

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.

Table 11 below reveals that the frequency distribution for the second incentive condition is rather different from the “norm” of the no-incentive control (Table 8 above). For one, the range of response for this incentive is wider.

Table 11 – Satisfaction Score Distribution for Friendliness of Store Clerks: Incentive: $25 Drawing

Friendliness2
	Observed N	Expected N	Residual
80	2	1.6	.4
81	1	1.6	-.6
82	2	1.6	.4
83	2	1.6	.4
84	1	1.6	-.6
85	2	1.6	.4
86	2	1.6	.4
96	1	1.6	-.6
Total	13

Nonetheless, the result of the chi square test (Table 12 below) suggests that the difference in response distributions is so minuscule at seven df that such can be expected to appear virtually every time (significance statistic: p = 0.99) a $25 drawing is run in the field alongside a no-incentive condition. In short, one must accept the null hypothesis and believe that the chance of winning a $25 gift card does not materially contribute to enhancing shopper perceptions of store staff friendliness.

Table 12 – Summary Statistics for Chi Square Test of Significance: Friendliness of Store Clerks: No Incentive versus $25 Drawing

Test Statistics
	Friendliness rating	Friendliness2
Chi-Square	5.000^a	1.154^b
df	5	7
Asymp. Sig.	.416	.992

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.
8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.

Table 13 below depicts the distribution of average daily ratings for evaluations of store clerk friendliness under the third test condition, a $250 gift card incentive. The following Table 14 reveals a chi square result that coincides with that for the second test condition. That is, the summed residuals against expected values for perceptions of store clerk friendliness under incentive 3 differ so little from the distribution for the no-incentive control that the estimated p = 0.99. The statistic suggests that such narrow differences can occur in virtually every future time series where a $250 incentive is run against a control condition. One must accept the null hypothesis. In practice, this finding means that the prospect of winning even a $250 gift card yields no appreciable difference in shopper satisfaction with store clerk friendliness.

Table 13 – Satisfaction Score Distribution for Friendliness of Store Clerks (Incentive: $250 Drawing)

Friendliness3
	Observed N	Expected N	Residual
80	2	1.6	.4
81	1	1.6	-.6
82	2	1.6	.4
83	2	1.6	.4
84	2	1.6	.4
85	2	1.6	.4
86	1	1.6	-.6
97	1	1.6	-.6
Total	13

Table 14: Summary Statistics for Chi Square Test of Significance: Friendliness of Store Clerks (No Incentive versus $250 Drawing)

Test Statistics
	Friendliness rating	Friendliness3
Chi-Square	5.000^a	1.154^b
df	5	7
Asymp. Sig.	.416	.992

6 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 2.2.
8 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.6.

ANOVA Analysis of Variance (Hypothesis 4)

Recall the null hypothesis that there is no difference in the mean daily calls of customers as a function from the incentive used by the store. The analytical approach that applies in this case is analysis of variance because there are a total of four sets of dependent continuous variables as in the contingency table below:

Table 15: Contingency Table for Number of Calls Made by Randomly-Chosen Customers

Time	Control store	Candy bar	$25 GC	$250 GC
22-Mar-2010	36	72	64	61
24-Mar-2010	29	28	25	22
27-Mar-2010	35	25	21	19
29-Mar-2010	37	26	28	24
01-Apr-2010	32	32	19	26
03-Apr-2010	39	29	24	21
06-Apr-2010	32	33	25	22
08-Apr-2010	34	27	26	20
11-Apr-2010	36	28	22	24
13-Apr-2010	38	34	28	26
16-Apr-2010	31	22	34	18
19-Apr-2010	38	27	27	22
22-Apr-2010	32	31	22	26

This being a time series, visual inspection of the trend lines over the 31-day period (Fig. 2 overleaf) gives an early clue to the comparative value of the incentives. The “no-incentive” control period tended to elicit more calls than any other test condition except for what seem to be fluke measurements on the first day of the quasi-experiment, when the three incentives all yielded significantly more call-in’s.

The results of the One-Way ANOVA, the appropriate tool in the case of multiple independent variables, are depicted in Table 16 below. The differences between groups – across all incentive conditions, as well as the control – are so minor as to yield an F value of just 2.00. Such a value can be generated about one-eighth of the time (p = 0.127). This probability is low but it is not enough to meet the more commonly-accepted rigor of α < 0.05. One concludes, based on the mean of calls made to participate in the online, computer-aided shopper satisfaction survey, that the null hypothesis cannot be safely rejected. None of the three incentives are any better than the no-incentive control for inducing participation in the telephone survey.

Table 16:

ANOVA
Calls made to IVR
	Sum of Squares	df	Mean Square	F	Sig.
Between Groups	627.904	3	209.301	2.000	.127
Within Groups	5023.077	48	104.647
Total	5650.981	51

Discussion, Implications, Recommendations

Research Questions and Hypotheses

Recall that the research questions were, in temporal order of desired shopper response:

Will an extra incentive, over and above the existent $1,000 gift card drawing, provoke randomly-selected shoppers to participate in the phone-in survey? Is there a difference in the mean daily calls in locations or during periods of no incentive versus when there is an incentive?
What kind of extra incentive has a positive biasing effect on shopper satisfaction with critical shopper experience criteria?

Recall, in turn, that the null and alternative hypotheses were articulated as follows:

Series	Null Hypotheses	Alternative Hypotheses
1	The customer’s attitude toward the store’s cleanliness is independent of the incentive used.	The customer’s attitude toward the store’s cleanliness is influenced by the incentive used.
2	The customer’s perception of checkout process efficiency is independent of the incentive used.	The customer’s perception of checkout process efficiency is influenced by the incentive used.
3	The customer’s perception of friendliness and courtesy is independent of the incentive used.	The customer perception of friendliness and courtesy is influenced by the incentive used.
4	There is no difference in phone survey participation according to the incentive used.	The type of incentive offered influences the propensity to participate in the phone survey.

Summary of the Results

Going by the averages of thirteen sampling points over a thirty-one day period, none of the three incentives are any better than no incentive at all for stimulating:

Participation in what is, after all, a voluntary call-in facility (no doubt conveniently located in the stores’ premises) for dialing and responding to a customer satisfaction survey service.
A greater number of shoppers into agreeing that the respective branches meet their personal criteria for outlet cleanliness.
A larger number of shoppers assenting that the checkout process meets their standard for speed and efficiency. On further analysis, this may be a compound variable (two variables in one) since rapid and error-free checkout are conceptually different.
More shoppers into agreeing that the store staff is unfailingly friendly and courteous.

Theoretical Analysis and Summary

The base tool of chi square analysis being sensitive to degrees of freedom or number of daily observations made, the findings of this 31-day test yielded no statistically significant differences in surveyed perceptions of the customer satisfaction parameters: store cleanliness, checkout efficiency, staff courtesy and phone-in volume. This can be considered analogous to the case of Sherlock Holmes’ “dog that did not bark” because the absence of statistically significant differences itself has certain implications. On the surface, the results signify that the chain can implement the least-cost option because consumer response will be the same regardless.

The question then becomes, what behavior is really being encouraged? The experiment mixed an immediate token reward (the candy bar) and a sweepstake with an unknown but expectedly low chance of winning the gift cards. This is an indirect sales promotion, given the underlying intent of customer satisfaction programs for maximizing customer experience, thereby enhancing loyalty and switching those who alternate their shopping trips in CHAIN X with competing retail establishments. This quest for competitive advantage requires answers to certain strategic issues, among which are: how does one consistently incentivize voluntary participation in a call-in program? At this level, however, the test was really about encouraging more participation and not about biasing shopper evaluations favorably.

The results of the experiment point to a uniform base of shoppers across all four stores who respond to an invitation to participate to the phone-in survey, whether or not there is an incentive. This uniform base is the headcount of shoppers:

Who already respond to the prospect of winning a thousand dollars, even though the odds of winning may be low; and/or,
Those who relish the chance to give feedback either because they are very pleased or very disappointed, regardless of the incentive.

In the first case, we therefore see what economists like to call “marginal utility” for the three minor incentives tested by this month-long experiment. The three incentives have absolutely no impact (or at best, only in a minor way) because shoppers already behave with the prospect of winning a thousand dollars in mind.

If that is so, then there is a distinct possibility that this channel for gathering customer satisfaction data has effectively been downgraded to a strictly voluntary, shopper-driven research method. Like comment cards, call-from-home/place-of-work campaigns, or Web site comment boxes, forums and chat/IM facilities, the operation of the low-cost, computer script-driven service subject of this experiment basically invites participation by the 20 percent or less of shoppers who are either very satisfied or extremely disappointed. Such methods risk being unrepresentative of shoppers in the middle who have more moderate opinions. In short, the incentive theory of motivation does not apply because shoppers participate in the same numbers and offer the same service attribute satisfaction ratings irrespective of secondary incentive used.

Limitations

The principal “limitation” of this field experiment design is that the prevailing incentive of a $1,000 drawing remained in force throughout the 31-day trial period. Shopper behavior was reinforced not just by the three levels of the incentive independent variable but by the continuing operation of a thousand-dollar drawing. Granted, it is open to debate whether a large incentive with low odds of winning is more compelling than a more modest incentive with provision for more winners. The chain can determine for itself whether there is incremental participation value to the three incentives tried if the data were changed to become a percentage of each store’s traffic that had been invited by having their attention called to the marking on their receipts. As it is, the comparison with control-store headcounts creates the impression that participation is not altered y any extra incentive.

Conclusion

That the results are equivalent no matter the seeming attractiveness of the $25 and $250 cash incentives is more meaningful than at first meets the eye. First of all, a much higher incentive of a $1,000 drawing remained before, during and after the experimental period. The key finding of this experiment is therefore this: the chain can save money by implementing a drawing for the smaller amounts rather than the thousand-dollar prize since degree of participation (at least by raw headcount) is unaltered.

It is comforting to know that incentives of differing (perceived) value do not necessarily bias shopper ratings. There is corporate illogic and myopia involved in wanting to test the effectiveness of incentives for raising customer satisfaction ratings. If the $250 drawing as incentive had provoked statistically significant improvements in ratings at that store, the result would still have begged the question of whether the branch was really meeting its key customer satisfaction performance indicators (KPIs). To be hardnosed about it, the sole independent variable in a customer satisfaction campaign should be how well each store meets shopper criteria for acceptable cleanliness, efficiency at checkout and staff attitudes.

In evaluating these results, one should also not neglect the possibility that all four stores uniformly meet their customer satisfaction KPI’s. Assuming that a customer satisfaction program had been in place for some time, one can reasonably expect that branch personnel have become adept at meeting minimum KPI levels at least. This explains why stores obtain the same satisfaction ratings better than variable incentives do.

Suggestions for Future Research

To be more confident about eliciting the same participation rate even at lower incentive levels, the chain should consider running a longer field trial where the $1,000 incentive is officially discontinued and where actual performance on the three key indicators is also logged on a continuing basis for analysis as the true independent variable. A longer sampling series bears the triple virtues of evening out seasonality, reflecting the variety of competitive store chain promotions, and increasing the “degrees of freedom” that the chi square statistic employs to evaluate the reliability of a data series.

In customer satisfaction research, perceptions logically count for more than store managers’ protestations that they manage key performance indicators according to system-wide standards. Nonetheless, a statistical series and analysis that includes KPI’s would be more rigorous for having greater explanatory power and being more action-oriented than is the case with the limited data series at hand.

Two minor adjustments are also worth exploring. Total store traffic count and headcount notified of having been picked at random bear including in the evaluation database. This means that the surveyed customers can be reckoned in terms of participation rates or proportions. After all, staying with headcounts, as the customer satisfaction test program appears to do at present, relies too much on the assumption of equal store populations all the time. Lastly, future analysis should incorporate pre-intervention baseline measures since testing for improvement or decline is also essential to customer satisfaction monitoring.