A catering Theory of Dividends

Malcolm Baker

Harvard Business School
Jeffrey Wurgler

NYU Stern School of Business
June 4, 2016

We develop a theory in which the decision to pay dividends is driven by investor demand. Managers cater to investors by paying dividends when investors put a stock price premium on payers and not paying when investors prefer nonpayers. To test this prediction, we construct four time series measures of investor demand for dividend payers: the difference in the average market-to-book ratios of current payers and nonpayers; the difference in the prices of Citizens Utilities cash and stock dividend share classes; the average announcement effect of recent dividend initiations; and the difference in future stock returns of payers and nonpayers. By each of these measures, nonpayers initiate dividends when demand for payers is high. By some measures, payers omit dividends when demand is low. Further analysis indicates that these results are better explained by the catering theory than other theories of dividends.

I. Introduction

Miller and Modigliani (1961) prove that dividend policy is irrelevant to stock price in perfect and efficient capital markets. In their setup, no rational investor has a preference between dividends and capital gains. Arbitrage ensures that dividend policy does not affect stock prices.

Forty years later, perhaps the only assumption in this proof that has not been thoroughly scrutinized is market efficiency.1 In this paper, we present a theory of dividends that relaxes this assumption. Our theory has three ingredients. First, for a variety of psychological and institutional reasons, some investors have an uninformed, time varying demand for dividend-paying stocks. Second, arbitrage fails to prevent this demand from occasionally driving apart the prices of stocks that do and do not pay dividends. Third, managers cater to this demand, paying dividends when investors put a higher price on the shares of payers, and not paying when investors prefer nonpayers. We call this a catering theory of dividends, and we formalize it in a simple theoretical model.

The catering theory is conceptually distinct from the traditional view of the relationship between dividend policy and investor demand, which emphasizes dividend irrelevance even when some investors have a rational preference for dividends. For example, Black and Scholes (1974) write: “If a corporation could increase its share price by increasing (or decreasing) its payout ratio, then many corporations would do so, which would saturate the demand for higher (or lower) dividend yields, and would bring about an equilibrium in which marginal changes in a corporation’s dividend policy would have no effect on the price of its stock” (p. 2). This intuition for dividend irrelevance can also be found in corporate finance textbooks.

The catering theory and the Black and Scholes view differ on several important points. One difference is that catering takes seriously the possibility that demand for dividends is affected by investor sentiment. This adds a new and unexplored dimension to traditional sources of demand for dividends, such as taxes and transaction costs, which are the context of the Black and Scholes quote. Another difference is that catering focuses on the demand for shares that pay dividends, and not necessarily the demand for an overall level of dividends. For example, we discuss the possibility that certain investors categorize all dividend-paying shares together, and pay less attention to whether the yield on those shares is three or four percent. But perhaps the most crucial difference is that catering takes a less extreme view on how fast managers or arbitrageurs eliminate an emerging dividend premium or discount. According to Black and Scholes, managers compete so aggressively that a nontrivial dividend premium or discount never arises, and therefore dividend policy remains effectively irrelevant. But this argument is compelling only if fluctuations in the demand for dividends are small relative to the capacity of firms to adjust dividends. It is not obvious a priori that this is the case, particularly if demand is affected by sentiment.

The main prediction of the catering theory is that the propensity to pay dividends depends on a measurable dividend premium in stock prices. To test this hypothesis, we construct four time series measures of the demand for dividend-paying shares. The broadest one is what we simply call the dividend premium – the difference between the average market-to-book ratio of dividend payers and nonpayers. The other measures are the difference in the prices of Citizens Utilities’ cash dividend and stock dividend share classes (between 1956 and 1989 CU had two classes of shares which differed in the form but not the level of their payouts); the average announcement effect of recent dividend initiations; and the difference in the future stock returns of payers and nonpayers. Intuition suggests that the dividend premium, the CU dividend premium, and initiation effects would be positively related to investor demand for dividends. In contrast, the difference in future returns of payers and nonpayers would be negatively related to any such demand – if demand for payers is so high that they are relatively overpriced, their future returns will be relatively low.

We then use these four measures of the demand for dividend-paying shares to explain time variation in dividend initiations and omissions. The results on initiations are the strongest. Each of the four demand measures is a significant predictor of the aggregate propensity to initiate dividends. In terms of economic magnitude, the lagged dividend premium variable by itself explains a remarkable sixty percent of the annual variation in the propensity to initiate. Another perspective is future stock returns. When the propensity to initiate dividends increases by one standard deviation, returns on payers are lower than nonpayers by nine percentage points per year over the next three years. Conversely, the propensity to omit dividends is high when the dividend premium variable is low, and when future returns on payers are high.

We consider several other explanations for these results, but conclude that they are best explained by catering. Alternative explanations based on time varying firm characteristics such as investment opportunities or profitability, for example, do not account for the results: The dividend premium variable helps to explain the residual propensity to initiate dividends that remains after controlling for changing firm characteristics, including investment opportunities, profits, and firm size. Alternative explanations based on time varying contracting problems, such as agency costs or signaling theories, do not address many aspects of the results, such as why dividend policy is related to the CU dividend premium and future returns. We view the lack of a compelling alternative explanation, and the close connection between the predictions of catering and the patterns that we document, as evidence in favor of the catering explanation.

The next question is which aspect of investor demand creates a time varying dividend premium. One possibility is sharp variations in tax clienteles or the transaction costs that determine the cost of homemade dividends. Rational tax and transaction cost clienteles should be satisfied by changes in the overall level of dividends, not the number of shares that pay dividends. But the dividend premium variable does not affect the overall dividend yield or payout ratio, just initiations and omissions. Also, the relationship between initiations and omissions and the dividend premium is apparent in regressions that control explicitly for time-series variation in taxes and transaction costs. Another possibility is that investor sentiment creates a demand for dividend-paying shares. Consistent with this hypothesis, we find a significant correlation between the dividend premium and the closed-end fund discount. This suggests the possibility that unsophisticated investors view nonpayers as growth firms, and prefer them to payers when they are optimistic about growth prospects in general.

In summary, we develop and find some initial empirical support for a theory of dividends that relaxes the market efficiency assumption of the Miller and Modigliani proof. The theory thus adds to the collection of dividend theories that relax other assumptions of the proof. It also adds to the growing literature on behavioral corporate finance. Shefrin and Statman (1984) develop a theory of investor demand for dividends that emphasizes self-control problems. The catering theory is closer in spirit to recent research that views corporate decisions as rational responses to mispricing. For example, Baker and Wurgler (2000, 2002) and Baker, Greenwood, and Wurgler (2002) view capital structure and security issuance decisions as rational responses to mispricing, or to perceptions of mispricing. Shleifer and Vishny (2002) develop a theory of mergers based on rational responses to mispricing. Morck, Shleifer, and Vishny (1990), Stein (1996), Baker, Stein, and Wurgler (2001), and Polk and Sapienza (2001) study rational corporate investment in inefficient capital markets. The survey results of Graham and Harvey (2001) and the insider trading patterns in Jenter (2001) provide further evidence for the theme that managers react to perceived mispricing.

Section II develops the theory and outlines a simple model. Section III presents the main empirical results. Section IV considers potential alternative explanations. Section V concludes and highlights directions for future research.

II. A catering theory of dividends

The theory has three ingredients. First, there is a time varying, uninformed demand for the shares of firms that pay cash dividends. This demand could reflect institutional changes, psychological influences, or both. Second, limited arbitrage means that this demand affects prices. Third, managers rationally cater in response. They tend to pay dividends if investors put a higher price on payers, and do not pay if investors favor nonpayers. A simple model illustrates some subtleties of catering as a managerial policy.

A. Uninformed demand for dividends

We posit that sometimes investors generally prefer stocks that pay cash dividends, and sometimes they generally prefer nonpayers. A useful framework for thinking about this hypothesis is categorization. Categorization refers to the cognitive process of grouping objects into discrete categories such as “birds” or “chairs.” This allows related objects to be considered together, in terms of a small set of common features that define category membership, rather than as individual objects, each with its own long list of identifying attributes. Categorization thus speeds up communication and inference. Rosch (1978) provides a detailed discussion of theory and evidence on categorization.

In standard investment theory, of course, investors conspicuously do not categorize. They view each security as a list of abstract statistics, such as mean, variance, and covariance. But in reality, as Barberis and Shleifer (2002) point out, investors typically do categorize securities into groups such as “small stocks,” “value stocks,” “tech stocks,” “old-economy stocks,” “junk bonds,” “utilities,” and so forth. For many investors, these labels appear to capture all they want to know, or have the ability to process, about the securities within the category.

There are several reasons to expect that unsophisticated investors and certain institutions categorize “dividend payers” directly or use dividend policy to classify stocks as “old economy,” for example. Whether a stock pays dividends is a salient characteristic, perhaps even more so than industry, size, or index membership. One reason why dividends are salient is a pervasive belief that dividend-paying stocks are less risky.2 This notion is common in the popular financial press, and was once common in the academic literature.3 Naïve investors, such as retirees and those who hold dividend-paying stocks for “income” despite the tax penalty, are especially likely to fall prey to this bird-in-the-hand argument. For them, the quarterly dividend check is much more salient than daily gyrations in the stock price, with the result that dividends and capital gains are in separate mental accounts. To the extent that the risk tolerance of bird-in-the-hand investors changes over time, their preferences for payers and nonpayers will change over time. This is one mechanism by which unsophisticated investors may display a time varying preference for dividend payers.

Another way dividend policy becomes salient is if some investors use it to infer managers’ investment plans. For example, it is reasonable to expect that investors interpret nonpayment, controlling for profitability, as evidence that the firm thinks it has excellent investment opportunities. Conversely, payment may be taken as evidence that opportunities are weak. These inferences create another channel though which payers and nonpayers become distinct categories, and they lead to a second mechanism that generates a time varying uninformed demand for payers. That is, when investors’ perceptions of overall growth opportunities are high, they prefer nonpayers, and vice-versa. Note that time variation in the demand for payers here is driven by perceptions of growth opportunities, not risk tolerance as in the mechanism outlined above. One popular model (Shiller (1984, 2000)) that combines both of these effects is that steady dividends mean “old-economy.” Old-economy stocks are viewed as safer but also as having less potential than the “new-economy” stocks which plow back everything to finance growth.

Black and Scholes (1974) and Allen, Bernardo, and Welch (2000), among others, suggest that institutional frictions also lead to the rational categorization of dividend payers. Taxes and the transaction costs of making homemade dividends are obvious examples of such frictions. Time variation in these frictions can then induce time varying preferences for payers. Many endowed institutions are restricted to spending from income, for example, an obvious reason to categorize payers. In terms of time variation, the 1970s witnessed a number of potentially significant events. The 1974 ERISA may have increased the attractiveness of payers to pension funds (Del Guercio (1996) and Brav and Heaton (1998)). The 1975 advent of negotiated commissions reduced the cost of creating homemade dividends and therefore may have increased the demand for nonpayers. The Nixon dividend controls, which limited dividend growth between 1971 and 1974, may have elevated the “grandfathered” shares that had already established a high level of dividends. And of course changes in the tax treatment of dividends, such as that generated by the 1986 Tax Reform Act, may change the demand for dividend payers without any link to their pretax fundamentals.

Given that categorization occurs, time varying demand between categories could also arise from what Mullainathan (2002) calls categorical inference. Investors using categorical inference may, for example, overestimate the impact of news about a particular dividend payer for other dividend payers, and underestimate its impact for nonpayers. This suggests that even without any explicit preference for cash dividends, the fact that categories have already been built around dividends could potentially lead to variation in demand between payers and nonpayers.

In summary, there are several reasons why some investors may view dividend payers as special. Some of them reflect investor psychology, while others reflect institutional constraints or frictions. The discussion also identifies psychological and institutional mechanisms that can lead to a time varying preference for dividend payers.4

B. Limited arbitrage

In the perfect and efficient markets of Miller and Modigliani (1961), uninformed demand for dividends would not affect stock prices. Arbitrage would prevent it. Arbitrageurs could short the firm with a preferred dividend policy and go long a correctly priced “perfect substitute” – a firm with the same investment policy but a different dividend policy. In perfect and efficient markets, only investment policy affects stock prices, so an arbitrage follows by making homemade dividends on the long firm to match the dividends declared by the short firm. In the absence of further frictions, this position delivers an up-front gain and can be risklessly held forever, or liquidated whenever prices move back in line. Competition for such arbitrage opportunities would then eliminate any dividend premium or discount.

In practice, however, the long-short arbitrage that drives the M&M irrelevance proof is risky and costly.5 Limited arbitrage is the second postulate of the catering theory. An obvious risk in long-short arbitrage is fundamental risk, which arises simply because individual stocks do not have perfect substitutes (Wurgler and Zhuravskaya (2002)). This risk is in principle diversifiable, but arbitrageurs also face a systematic risk, often called noise-trader risk, if they try to trade against systematic sentiment. With short horizons or limited capital, they are sensitive to this risk (De Long, Shleifer, Summers, and Waldmann (1990) and Shleifer and Vishny (1997)). Finally, long-short arbitrage is costly. Nontrivial shorting costs are reported in D’Avolio (2002), Geczy, Musto, and Reed (2002), and Lamont and Jones (2002).

If arbitrage is limited and uninformed demand varies at the category level, as Barberis and Shleifer propose, then prices can also vary at the category level.6 In particular, if dividend payers and nonpayers are special investor categories, as the previous discussion suggests, then uninformed demand can affect their relative prices.

Our own empirical work is soon to come. But for the impatient reader, we point to Long (1978) as some initial evidence that uninformed, time varying demand for dividends gets through arbitrage forces and does affect stock prices. Long studies the Citizens Utilities Company, which between 1956 and 1989 had one share class that paid cash dividends and another that paid stock dividends. By charter, the payouts to both classes were supposed to be of equal pretax value. In practice, the stock dividend averaged ten to twelve percent higher than the cash dividend. Long finds that during his sample period, the cash dividend share traded at a relative price that was too high, given its pretax dividend disadvantage and its further tax disadvantage.7 More interesting for our purposes, the relative price fluctuates substantially over time. Long, Poterba (1986), and Hubbard and Michaely (1997) conclude that these fluctuations cannot be explained by traditional theories of dividends.

C. Catering as a managerial policy

The third element of the theory is that managers cater to uninformed demand. In the setting of dividends, catering implies that managers will tend to initiate dividends when investors put a higher price on payers for some reason, and tend to omit dividends, or avoid initiating them, when investors favor nonpayers. The ultimate objective of a catering policy is to capture the stock price premium associated with the characteristics investors favor. Catering is thus distinct from the usual policy of maximizing shareholder value. In inefficient markets, managers have to decide between which of two prices to maximize: A short-run price affected by uninformed demand, and a fundamental value driven by investment policy. Catering maximizes the short-run price, while the traditional policy emphasizes long-run value.

In general, whether managers will rationally cater to a perceived short-run mispricing is an empirical question. It is rational in some circumstances and not others.8 One key factor is how much of a tradeoff there really is between catering and fundamental investment policy – if managers can maximize short-run and long-run price without conflict, they will presumably do both.9 Another factor is whether managers can personally profit from any short-term overvaluation that follows from successful catering. If they hold a significant amount of equity themselves, they can sell their overvalued shares. Or they may be able to exploit short-term overpricing by issuing dilutive, overpriced shares. A third factor is the horizon of managers, or the horizon of the investors they care about most. Managers with short horizons will be more likely to cater to short-run mispricing. The fact that managers’ bonuses and employment often depend on short-run performance suggests that short horizons may often be important in practice. These tradeoffs are made precise in the following simple model.

D. A model of dividend catering

Consider a firm with Q shares outstanding. At t = 1, it pays a liquidating dividend of V = F + per share, where is a normally distributed error term with mean zero. At t = 0, it has the choice of paying an interim dividend d{0,1} per share, which reduces the liquidating dividend by d(1+c). The risk-free rate is zero. The cost c is a way of capturing tradeoffs between dividend and investment policy, such as the net influence of financial constraints. The Miller and Modigliani case has c equal to zero – dividend policy does not interact with investment policy and has no tax consequences.

There are two types of investors, category investors and arbitrageurs. Both have constant absolute risk aversion. The aggregate risk tolerance per period is C= for the category investors and A for the arbitrageurs. Arbitrageurs have rational expectations over the terminal dividend, expecting an average payoff of F. Uninformed demand for dividends is implemented through an irrational expectation of the liquidating dividend by category investors. For simplicity, the misestimate the mean payout, but not the distribution around the mean. They expect a final payment of E(V) = VD from dividend payers and VG from nonpayers, which they view as growth firms. They also fail to realize that paying dividends may come with long-run costs. These expectations could reflect biased inferences that overweight within-category information as in Mullainathan (2002), biased risk perceptions arising from the bird-in-the-hand fallacy, biased expectations of investment opportunities, or capture institutional constraints or other frictions in a reduced form. Typically, their net result will cause VD and VG to fall on opposite sides of F.

If the firm meets its criteria, investor group k will demand

. (1)

With unlimited arbitrage, meaning A is large relative to , the category investors do not affect price. If dividend payers and nonpayers are not perfect substitutes, however, or if agency costs limit arbitrage horizons and capital, then the irrational expectations of category investors do affect price. With such limits on arbitrage, prices of dividend payers PD (cum dividend) and growth firms PG are

. (2)

Given these prices, the manager chooses dividend policy. As argued above, the choice depends on his horizon. In particular, suppose that the manager is risk neutral and cares about both the current stock price and the fundamental value of total distributions. The manager has no control over total distributions except through the cost parameter c. With his horizon measured as , the manager’s maximization problem is:


The solution is straightforward. The manager pays dividends if the dividend premium exceeds the present value of the long-run cost that he incorporates. That is, when

. (4)

The first term in the middle is the immediate positive price impact of switching categories. The second term is the immediate negative price impact of the arbitrageurs’ recognition of the cost of paying dividends. To induce payment, the net of these must exceed the long-run cost that the manager incorporates, the term on the right. Qualitatively, the propensity to pay dividends is decreasing in c, increasing in the dividend premium, decreasing in the prevalence of arbitrage, and decreasing in managers’ horizons. The announcement effect of a dividend initiation is positive and increasing in the dividend premium. Note that an uninformed demand interpretation of announcement effects could explain why dividend changes have price impacts while at the same time appear to contain more information about past earnings than future earnings (Lintner (1956), Fama and Babiak (1968), Watts (1973), DeAngelo, DeAngelo, and Skinner (1996) and Benartzi, Michaely, and Thaler (1997)).

Like most theories of dividend policy (for example, Miller and Rock (1985)), the decisions to initiate and omit dividends are symmetric in (4). However, the decision to pay dividends is empirically quite persistent. Past dividend policy has an important effect on the current decision to pay. To incorporate this asymmetry within the same conceptual framework, we introduce a third group of stocks, former dividend payers. This group, which includes firms with both low historical earnings growth, assuming that their past dividends were not fully replenished by stock issues, and no current dividends, lacks any of the salient features that are noticed by category investors. It attracts demand only from arbitrageurs. The prices of these former dividend payers are therefore just .

With former payers in the model, the decision for growth firms to initiate dividends is still governed by (4), while current payers continue to pay when:

. (5)

The inequality in (5) has much the same structure as in (4). As before, the propensity to pay is decreasing in the long-run cost and increasing in the dividend premium. The new insight is that continuing to pay dividends can be desirable even when initiating them is not. More formally, if A is small, or if c is small and VG and VD fall on opposite sides of F, then (5) is satisfied whenever (4) is satisfied. Intuitively, former payers are neglected companies, attracting only arbitrageurs. And so even when initiations are undesirable, current payers may want to continue to pay if arbitrage is weak and the long-run savings on the fundamental cost is modest. In these circumstances, the price hit to cutting the dividend would be especially large and negative. This third category of neglected stocks can also explain why some firms might initiate dividends even when dividends are not currently favored and why such initiations might still have a positive announcement effect.

A third category is also useful in resolving a remaining problem with (4) and (5), where the announcement effect of omissions is positive. This is not true in practice (Healy and Palepu (1988) and Michaely, Thaler, and Womack (1995)). To remedy this situation, one could of course introduce fundamental risk, financial constraints, or some asymmetric information. While potentially realistic, this would take us away from our goal of developing a model that focuses on relaxing just the market efficiency feature of the Miller and Modigliani setup. A more internally consistent approach is to introduce an intermediate time period between t = 0 and t = 1, in which the neglected former payers face a positive probability of being recategorized as growth firms – for example, because of a random earnings shock. In this case, dividend payers may choose to omit a dividend at t = 0 even when (5) is not satisfied. They suffer a short-run negative announcement effect, but the possibility of eventually being recategorized may be worth it. It is straightforward to formally incorporate this effect.

This simple model illustrates the basic tradeoffs in dividend catering. A robust conclusion is that the propensity to pay dividends is increasing in the dividend premium, and decreasing in the long-run costs of paying dividends. As discussed earlier, this means that the existence of catering behavior is in general an empirical issue. In the presence of financial constraints, for instance, dividend policy interacts with investment policy, so a rational manager’s propensity to cater to a mispricing associated with dividend policy will depend on the size of this tradeoff. Realistic variants of the model also suggest that the decisions to initiate and to continue paying should be analyzed separately.

III. Empirical tests

We test the prediction that dividend policy depends on uninformed demand for dividend payers as revealed through stock price signals. We have just discussed some cross-sectional wrinkles, but this is primarily a time series prediction because uninformed demand is hypothesized to be systematic. Time series data are therefore most appropriate.10

