Common Accountics Science and
Econometric Science Statistical Mistakes
Bob Jensen
at Trinity University
Accountics is the mathematical science of values.
Charles Sprague [1887] as quoted by McMillan [1998, p. 1]
http://www.trinity.edu/rjensen/395wpTAR/Web/TAR395wp.htm#_msocom_1
Tom Lehrer on Mathematical Models and Statistics 
http://www.youtube.com/watch?v=gfZWyUXn3So
You must watch this to the ending to appreciate it.
David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics
Science"
Bob Jensen
February 19, 2014
SSRN Download:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296
Abstract
For operational convenience I define accountics science as
research that features equations and/or statistical inference. Historically,
there was a heated debate in the 1920s as to whether the main research
journal of academic accounting, The Accounting Review (TAR) that
commenced in 1926, should be an accountics journal with articles that mostly
featured equations. Practitioners and teachers of college accounting won
that debate.
TAR articles and accountancy doctoral dissertations prior to
the 1970s seldom had equations. For reasons summarized below, doctoral
programs and TAR evolved to where in the 1990s there where having equations
became virtually a necessary condition for a doctoral dissertation and
acceptance of a TAR article. Qualitative normative and case method
methodologies disappeared from doctoral programs.
What’s really meant by “featured
equations” in doctoral programs is merely symbolic of the fact that North
American accounting doctoral programs pushed out most of the accounting to
make way for econometrics and statistics that are now keys to the kingdom
for promotion and tenure in accounting schools 
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms
The purpose of this paper is to make a case that the accountics science
monopoly of our doctoral programs and published research is seriously
flawed, especially its lack of concern about replication and focus on
simplified artificial worlds that differ too much from reality to creatively
discover findings of greater relevance to teachers of accounting and
practitioners of accounting. Accountics scientists themselves became a Cargo
Cult.
Eight Econometrics MultipleChoice Quiz Sets from David Giles
You might have to go to his site to get the quizzes to work.
Note that there are multiple questions for each quiz set.
Click on the arrow button to go to a subsequent question.
O.K., I know  that was a really
cheap way of getting your attention.
However, it worked, and
this post really is about
Hot Potatoes
 not the edible variety, but some
teaching apps. from "HalfBaked Software" here at the University
of Victoria.
To quote:
"The Hot
Potatoes suite
includes six applications, enabling you to create interactive
multiplechoice, shortanswer, jumbledsentence, crossword,
matching/ordering and gapfill exercises for the World Wide Web.
Hot Potatoes is
freeware,
and you may use it for any purpose or project you like."
I've included some Hot
Potatoes multiple choice exercises on the web pages for
several of my courses for some years now. Recently, some of the
students in my introductory graduate econometrics course
mentioned that these exercises were quite helpful. So, I thought
I'd share the Hot Potatoes apps. for that course with
readers of this blog.
There are eight multiplechoice
exercise sets in total, and you can run them from here:
I've also put the HTML and associated PDF
files on the
code page
for this blog. If you're going to download
them and use them on your own computer or website, just make sure
that the PDF files are located in the same folder (directory) as the
HTML files.
I plan to extend and update these Hot Potatoes exercises in
the near future, but hopefully some readers will find them useful in
the meantime.
From my "Recently Read" list:

Born, B. and J. Breitung, 2014. Testing for serial correlation
in fixedeffects panel data models. Econometric Reviews, in
press.

Enders, W. and Lee. J., 2011. A unit root test using a Fourier
series to approximate smooth breaks, Oxford Bulletin of Economics and
Statistics, 74, 574599.

Götz, T. B. and A. W. Hecq, 2014. Testing for Granger causality
in large mixedfrequency VARs. RM/14/028, Maastricht University, SBE,
Department of Quantitative Economics.

Kass, R. E., 2011. Statistical
inference: The big picture.
Statistical Science, 26, 19.

Qian, J. and L. Su, 2014. Structural change estimation in time
series regressions with endogenous variables. Economics Letters,
in press.

Wickens,
M., 2014. How did we get to where we are now? Reflections on 50
years of macroeconomic and financial econometrics. Discussion No. 14/17,
Department of Economics and Related Studies, University of York.
"Statistical Inference: The Big Picture," by Robert E. Kass,
Statistical Science 2011, Vol. 26, No. 1, 1–9 DOI: 10.1214/10STS337 ©
Institute of Mathematical Statistics 
http://www.stat.cmu.edu/~kass/papers/bigpic.pdf
Abstract.
Statistics has moved beyond the frequentistBayesian controversies of the
past. Where does this leave our ability to interpret results? I suggest that
a philosophy compatible with statistical practice, labeled here statistical
pragmatism , serves as a foundation for inference. Statistical pragmatism is
inclusive and emphasizes the assumptions that connect statistical models
with observed data. I argue that introductory courses often mischaracterize
the process of statistical inference and I propose an alternative “big
picture” depiction.
Common Accountics Science and Econometric Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
Statistical Science Reading List for June 2014 Compiled by David Giles in
Canada 
http://davegiles.blogspot.com/2014/05/junereadinglist.html
Put away that novel! Here's some really fun June reading:

Berger, J.,
2003. Could Fisher, Jeffreys and Neyman have agreed on testing?.
Statistical Science, 18, 132.

Canal, L. and R. Micciolo, 2014. The chisquare controversy.
What if Pearson had R? Journal of Statistical Computation and
Simulation, 84, 10151021.

Harvey, D. I., S. J. Leybourne, and A. M. R. Taylor, 2014. On
infimum DickeyFuller unit root tests allowing for a trend break under
the null. Computational Statistics and Data Analysis, 78,
235242.

Karavias, Y. and E. Tzavalis, 2014. Testing for unit roots in
short panels allowing for a structural breaks. Computational
Statistics and Data Analysis, 76, 391407.

King, G.
and M. E. Roberts, 2014. How robust standard errors expose
methodological problems they do not fix, and what to do about it.
Mimeo., Harvard University.

Kuroki, M. and J. Pearl, 2014. Measurement bias and effect
restoration in causal inference. Biometrika, 101, 423437.

Manski, C., 2014.
Communicating uncertainty in official economic statistics. Mimeo.,
Department of Economics, Northwestern University.

MartinezCamblor, P., 2014. On correlated zvalues in hypothesis
testing. Computational
Statistics and Data Analysis,
in press.
The Cult of Statistical Significance: How Standard Error Costs Us Jobs,
Justice, and Lives 
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm
Common Accountics Science and Econometric
Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
November 7, 2014 posting by David Giles in his Econometrics Beat blog.
This post is one of a sequence of
posts, the earlier members of which can be found
here,
here,
here, and
here.
These posts are
based on Giles (2014).
Some of the standard tests that we perform in
econometrics can be affected by the level of aggregation
of the data. Here, I'm concerned only with timeseries
data, and with temporal aggregation. I'm going to
show you some preliminary results from work that I have
in progress with
Ryan Godwin.
Although
these results relate to just one test, our work covers a
range of testing problems.
I'm not supplying the EViews program code that was used
to obtain the results below  at least, not for now.
That's because what I'm reporting is based on work in
progress. Sorry!
As in the
earlier posts, let's suppose that the aggregation is
over "m" highfrequency periods. A lower case symbol
will represent a highfrequency observation on a
variable of interest; and an uppercase symbol will
denote the aggregated series.
So,
Y_{t} = y_{t} + y_{t  1} +
......+ y_{t  m + 1} .
If we're aggregating monthly (flow) data to
quarterly data, then m = 3. In the case of
aggregation from quarterly to annual data, m = 4,
etc.
Now, let's investigate how such aggregation affects
the performance of the wellknown JarqueBera (1987)
(JB) test for the normality of the errors in a
regression model. I've discussed some of the
limitations of this test in an
earlier post,
and you
might find it helpful to look at that post
(and
this one) at this
point. However, the JB test is very widely used by
econometricians, and it warrants some further
consideration.
Consider the following a small Monte Carlo
experiment.
Continued at
http://davegiles.blogspot.com/2014/11/theeconometricsoftemporal.html#more
Jensen Comment
Perhaps an even bigger problem in aggregation is the assumption of stationarity.
From Two Former Presidents of the AAA
"Some Methodological Deficiencies in Empirical Research Articles in
Accounting." by Thomas R. Dyckman and Stephen A. Zeff , Accounting
Horizons: September 2014, Vol. 28, No. 3, pp. 695712 
http://aaajournals.org/doi/full/10.2308/acch50818 (not free)
This paper uses a sample of the regression and
behavioral papers published in The Accounting Review and the Journal of
Accounting Research from September 2012 through May 2013. We argue first
that the current research results reported in empirical regression papers
fail adequately to justify the time period adopted for the study. Second, we
maintain that the statistical analyses used in these papers as well as in
the behavioral papers have produced flawed results. We further maintain that
their tests of statistical significance are not appropriate and, more
importantly, that these studies do not—and cannot—properly address the
economic significance of the work. In other words, significance tests are
not tests of the economic meaningfulness of the results. We suggest ways to
avoid some but not all of these problems. We also argue that replication
studies, which have been essentially abandoned by accounting researchers,
can contribute to our search for truth, but few will be forthcoming unless
the academic reward system is modified.
The free SSRN version of this paper is at
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2324266
This Dyckman and Zeff paper is indirectly related to the following technical
econometrics research:
"The Econometrics of Temporal Aggregation  IV  Cointegration," by
David Giles, Econometrics Blog, September 13, 2014 
http://davegiles.blogspot.com/2014/09/theeconometricsoftemporal.html
Common Accountics Science and Econometric Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics
Science"
Bob Jensen
February 19, 2014
SSRN Download:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296
The Cult of Statistical Significance: How Standard Error Costs Us Jobs,
Justice, and Lives 
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm
Common Accountics Science and Econometric
Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
"Statistical Inference: The Big Picture," by Robert E. Kass,
Statistical Science 2011, Vol. 26, No. 1, 1–9 DOI: 10.1214/10STS337 ©
Institute of Mathematical Statistics 
http://www.stat.cmu.edu/~kass/papers/bigpic.pdf
Abstract.
Statistics has moved beyond the frequentistBayesian controversies of the
past. Where does this leave our ability to interpret results? I suggest that
a philosophy compatible with statistical practice, labeled here statistical
pragmatism , serves as a foundation for inference. Statistical pragmatism is
inclusive and emphasizes the assumptions that connect statistical models
with observed data. I argue that introductory courses often mischaracterize
the process of statistical inference and I propose an alternative “big
picture” depiction.
Common Accountics Science and Econometric Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
Eight Econometrics MultipleChoice Quiz Sets from David Giles
You might have to go to his site to get the quizzes to work.
Note that there are multiple questions for each quiz set.
Click on the arrow button to go to a subsequent question.
O.K., I know  that was a really
cheap way of getting your attention.
However, it worked, and
this post really is about
Hot Potatoes
 not the edible variety, but some
teaching apps. from "HalfBaked Software" here at the University
of Victoria.
To quote:
"The Hot
Potatoes suite
includes six applications, enabling you to create interactive
multiplechoice, shortanswer, jumbledsentence, crossword,
matching/ordering and gapfill exercises for the World Wide Web.
Hot Potatoes is
freeware,
and you may use it for any purpose or project you like."
I've included some Hot
Potatoes multiple choice exercises on the web pages for
several of my courses for some years now. Recently, some of the
students in my introductory graduate econometrics course
mentioned that these exercises were quite helpful. So, I thought
I'd share the Hot Potatoes apps. for that course with
readers of this blog.
There are eight multiplechoice
exercise sets in total, and you can run them from here:
I've also put the HTML and associated PDF
files on the
code page
for this blog. If you're going to download
them and use them on your own computer or website, just make sure
that the PDF files are located in the same folder (directory) as the
HTML files.
I plan to extend and update these Hot Potatoes exercises in
the near future, but hopefully some readers will find them useful in
the meantime.
From my "Recently Read" list:

Born, B. and J. Breitung, 2014. Testing for serial correlation
in fixedeffects panel data models. Econometric Reviews, in
press.

Enders, W. and Lee. J., 2011. A unit root test using a Fourier
series to approximate smooth breaks, Oxford Bulletin of Economics and
Statistics, 74, 574599.

Götz, T. B. and A. W. Hecq, 2014. Testing for Granger causality
in large mixedfrequency VARs. RM/14/028, Maastricht University, SBE,
Department of Quantitative Economics.

Kass, R. E., 2011. Statistical
inference: The big picture.
Statistical Science, 26, 19.

Qian, J. and L. Su, 2014. Structural change estimation in time
series regressions with endogenous variables. Economics Letters,
in press.

Wickens,
M., 2014. How did we get to where we are now? Reflections on 50
years of macroeconomic and financial econometrics. Discussion No. 14/17,
Department of Economics and Related Studies, University of York.
Recall that Bill Sharpe of CAPM fame and controversy is a Nobel Laureate 
http://en.wikipedia.org/wiki/William_Forsyth_Sharpe
"Don’t OverRely on Historical Data to Forecast Future Returns," by
Charles Rotblut and William Sharpe, AAII Journal, October 2014 
http://www.aaii.com/journal/article/dontoverrelyonhistoricaldatatoforecastfuturereturns?adv=yes
Jensen Comment
The same applies to not overrelying on historical data in valuation. My
favorite case study that I used for this in teaching is the following:
Questrom vs. Federated Department Stores,
Inc.: A Question of Equity Value," by University of Alabama faculty members
by Gary Taylor, William Sampson, and Benton Gup, May 2001 edition of
Issues in Accounting Education 
http://www.trinity.edu/rjensen/roi.htm
Jensen Comment
I want to especially thank
David Stout, Editor of the May 2001
edition of Issues in Accounting Education. There has been something
special in all the editions edited by David, but the May edition is very
special to me. All the articles in that edition are helpful, but I want to
call attention to three articles that I will use intently in my graduate
Accounting Theory course.
 "Questrom vs. Federated
Department Stores, Inc.: A Question of Equity Value," by University of
Alabama faculty members Gary Taylor, William Sampson, and Benton Gup,
pp. 223256.
This is perhaps the best short case that I've ever read. It will
undoubtedly help my students better understand weighted average cost of
capital, free cash flow valuation, and the residual income model. The
three student handouts are outstanding. Bravo to Taylor, Sampson, and
Gup.
 "Using the ResidualIncome
Stock Price Valuation Model to Teach and Learn Ratio Analysis," by
Robert Halsey, pp. 257276.
What a followup case to the Questrom case mentioned above! I have long
used the Dupont Formula in courses and nearly always use the excellent
paper entitled "Disaggregating the ROE: A
New Approach," by T.I. Selling and C.P. Stickney,
Accounting Horizons, December 1990, pp. 917. Halsey's paper guides
students through the swamp of stock price valuation using the residual
income model (which by the way is one of the few academic accounting
models that has had a major impact on accounting practice, especially
consulting practice in equity valuation by CPA firms).
 "Developing Risk Skills: An
Investigation of Business Risks and Controls at Prudential Insurance
Company of America," by Paul Walker, Bill Shenkir, and Stephen Hunn,
pp. 291
I will use this case to vividly illustrate the "toneatthetop"
importance of business ethics and risk analysis. This is case is easy
to read and highly informative.
Bob Jensen's threads on accounting theory 
http://www.trinity.edu/rjensen/Theory01.htm
"Proof of a Result About the "Adjusted" Coefficient of Determination,"
by David Giles, Econometrics Blog, April 16, 2014 
http://davegiles.blogspot.com/2014/04/proofofresultaboutadjusted.html
. . .
Let's take a look at the proof.
The model we're going to look at is the standard, kregressor, linear
multiple regression model:
y
= Xβ + ε .
(1)
We have n observations in our sample.
The result that follows is purely
algebraic, and not statistical, so in actual fact I don't
have to assume anything in particular about the errors in the model, and
the regressors can be random. So that the definition of the coefficient
of determination is unique, I will assume that the model includes
an intercept term.
The adjusted coefficient of determination
when model (1) is estimated by OLS is
R_{A}^{2}
= 1  [e'e / (n  k)] / [(y*'y*) / (n  1)] ,
(2)
where e is the OLS residual vector, and
y* is the y vector, but with each element expressed as a deviation from
the sample mean of the y data.
Now consider J independent exact linear
restrictions on the elements of β, namely Rβ = r, where R is a known
nonrandom (J x k) matrix of rank J; and r is a known nonrandom (J x 1)
vector. The Fstatistic that we would use to test the validity of these
restrictions can be written as:
F = [(e_{R}'e_{R}
 e'e) / J] / [e'e / (n  k)] ,
(3)
where e_{R} is the residual
vector when the restrictions on β are imposed, and the model is
estimated by RLS.
In the latter case, the adjusted
coefficient of determination is
R_{AR}^{2 }= 1
 [e_{R}'e_{R} / (n  k + J)] / [(y*'y*) / (n  1)] .
(4)
From
equation (3), F ≥ 1 if and only if
(n  k) e_{R}'e_{R} ≥ (n  k + J) e'e .
(5)
From (2) and (4), R_{A}^{2}≥
R_{AR}^{2} if and only if
(n  k) e_{R}'e_{R}
≥ (n  k + J) e'e.
But this is just the condition in (5).
So, we have the following result:
Imposing a set of exact linear
restrictions on the coefficients of a linear regression model will
decrease (increase) the adjusted coefficient of determination if the
Fstatistic for testing the validity of those restrictions is greater
(less) than one in value. If this statistic is exactly equal to one, the
adjusted coefficient of determination will be unchanged.
Notice that the result quoted at
the beginning of this post is a special case of this result, where the
restrictions are all "zero" restrictions. Recalling that the square of a
t statistic with v degrees of freedom is just an F statistic with 1 and
v degrees of freedom, the other principal result given in the
earlier post is
also obviously a special case of this, with just one zero restriction:
Adding a regressor will increase (decrease) R_{A}^{2} depending
on whether the absolute value of the tstatistic associated with
that regressor is greater (less) than one in value. R_{A}^{2} is
unchanged if that absolute tstatistic is exactly equal to one.
Jensen Comment
My question is how robust these results are to the order in which regressors are
added or deleted from the model. The model is not very robust in there are
ordering effects. My experience years ago was that ordering effects are a
problem.
David Giles Econometrics Beat Blog 
http://davegiles.blogspot.com/
Strategies to Avoid Data Collection Drudgery and Responsibilities for Errors in
the Data
Obsession With
RSquared
Drawing Inferences From Very Large DataSets
The Insignificance of
Testing the Null
Zero Testing for
Beta Error
Scientific Irreproducibility
Can
You Really Test for Multicollinearity?
Models That
aren't Robust
Simpson's Paradox and CrossValidation
Reverse
Regression
David
Giles' Top Five Econometrics Blog Postings for 2013
David Giles
Blog
A Cautionary Bedtime
Story
Gasp! How could an accountics scientist question such
things? This is sacrilege!
A Scrapbook on What's Wrong with the Past, Present and Future of Accountics
Science
574 Shields Against Validity Challenges in Plato's Cave 
http://www.trinity.edu/rjensen/TheoryTAR.htm
Real Science versus Pseudo Science 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm#PseudoScience
How Accountics Scientists Should Change:
"Frankly, Scarlett, after I get a hit for my resume in The Accounting Review
I just don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
"How NonScientific Granulation Can Improve Scientific
Accountics"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsGranulationCurrentDraft.pdf
Gaming for Tenure as an Accounting Professor 
http://www.trinity.edu/rjensen/TheoryTenure.htm
(with a reply about tenure publication point systems from Linda Kidwell)
Strategies to Avoid Data Collection
Drudgery and Responsibilities for Errors in the Data
In 2013 I scanned all six issues of The Accounting Review (TAR)
published in 2013 to detect what public databases were (usually at relatively
heavy fees for a system of databases) in the 72 articles published
JanuaryNovember, 2013 in TAR. The outcomes were as follows:
Many of these 72 articles used more than one public
database, and when the Compustat and CRSP joint database was used I counted
one for the Compustat Database and one for the CRSP Database. Most of the
nonpublic databases are behavioral experiments using students as surrogates
for realworld decision makers.
My
opinion is that 2013 is a typical year where over 92% of the articles published
in TAR used puchased public databases. The good news is that most of these
public databases are enormous, thereby allowing for huge samples for which
statistical inference is probably superfluous. For very large samples even
miniscule differences are significant for hypothesis testing making statistical
inference testing superfluous:
My theory is that accountics science gained
dominance in accounting research, especially in North American accounting Ph.D.
programs, because it abdicated responsibility:
1.
Most accountics scientists buy data, thereby
avoiding the greater cost and drudgery of collecting data.
2.
By relying so heavily on purchased data, accountics
scientists abdicate responsibility for errors in the data.
3.
Since adding missing variable data to the public
database is generally not at all practical in purchased databases, accountics
scientists have an excuse for not collecting missing variable data.
4. Software packages for modeling and testing
data abound. Accountics researchers need only feed purchased data into the
hopper of statistical and mathematical analysis programs. It still takes a lot
of knowledge to formulate hypotheses and to invent and understand complex
models. But the really hard work of collecting data and error checking is
avoided by purchasing data.
David Johnstone posted the
following message on the AECM Listserv on November 19, 2013:
An interesting aspect of all this is that there is
a widespread a priori or learned belief in empirical research that all and only
what you have to do to get meaningful results is to get data and run statistics
packages, and that the more advanced the stats the better. Its then just a
matter of turning the handle. Admittedly it takes a lot of effort to get very
proficient at this kind of work, but the presumption that it will naturally lead
to reliable knowledge is an act of faith, like a religious tenet. What needs to
be taken into account is that the human systems (markets, accounting reporting,
asset pricing etc.) are madly complicated and likely changing structurally
continuously. So even with the best intents and best methods, there is no
guarantee of reliable or lasting findings a priori, no matter what “rigor” has
gone in.
Part and parcel of the presumption that empirical
research methods are automatically “it” is the even stronger position that no
other type of work is research. I come across this a lot. I just had a 4^{th}
year Hons student do his thesis, he was particularly involved in the
superannuation/pension fund industry, and he did a lot of good practical stuff,
thinking about risks that different fund allocations present, actuarial life
expectancies etc. The two young guys (late 20s) grading this thesis, both
excellent thinkers and not zealots about anything, both commented to me that the
thesis was weird and was not really a thesis like they would have assumed
necessary (electronic data bases with regressions etc.). They were still
generous in their grading, and the student did well, and it was only their
obvious astonishment that there is any kind of worthy work other than the
formulaicempirical that astonished me. This represents a real narrowing of mind
in academe, almost like a tendency to dark age, and cannot be good for us long
term. In Australia the new push is for research “impact”, which seems to include
industry relevance, so that presents a hope for a cultural widening.
I have been doing some work with a lawyerPhD
student on valuation in law cases/principles, and this has caused similar raised
eyebrows and genuine intrigue with young colleagues – they just have never heard
of such stuff, and only read the journals/specific papers that do what they do.
I can sense their interest, and almost envy of such freedom, as they are all
worrying about how to compete and make a long term career as an academic in the
new academic world.
This could also happen in
accountics science, but we'll probably never know! 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
"Statistical Flaw Punctuates Brain Research in Elite Journals," by
Gary Stix, Scientific American, March 27, 2014 
http://blogs.scientificamerican.com/talkingback/2014/03/27/statisticalflawpunctuatesbrainresearchinelitejournals/
Neuroscientists need a statistics refresher.
That is the message of a
new analysis in Nature Neuroscience that
shows that more than half of 314 articles on neuroscience in elite
journals during an 18month period failed to take adequate measures to
ensure that statistically significant study results were not, in fact,
erroneous. Consequently, at least some of the results from papers in
journals like Nature, Science, Nature Neuroscience and Cell
were likely to be false positives, even after going through the arduous
peerreview gauntlet.
The problem of false positives appears to be rooted
in the growing sophistication of both the tools and observations made by
neuroscientists. The increasing complexity poses
a challenge to one of the fundamental assumptions made in statistical
testing, that each observation, perhaps of
an electrical signal from a particular neuron, has nothing to do with a
subsequent observation, such as another signal from that same neuron.
In fact, though, it is common in neuroscience
experiments—and in studies in other areas of biology—to produce readings
that are not independent of one another. Signals from the same neuron are
often more similar than signals from different neurons, and thus the data
points are said by statisticians to be clustered, or “nested.” To
accommodate the similarity among signals, the authors from VU University
Medical Center and other Dutch institutions suggest that a technique called
multilevel analysis is needed to take the clustering of data points into
account.
No adequate correction was made in any of the 53
percent of the 314 papers that contained clustered data when surveyed in
2012 and the first half of 2013. “We didn’t see any of the studies use the
correct multilevel analysis,” says Sophie van der Sluis, the lead
researcher. Seven percent of the studies did take steps to account for
clustering, but these methods were much less sensitive than multilevel
analysis in detecting actual biological effects. The researchers note that
some of the studies surveyed probably report falsepositive results,
although they couldn’t extract enough information to quantify precisely how
many. Failure to statistically correct for the clustering in the data can
increase the probability of falsepositive findings to as high as 80
percent—a risk of no more than 5 percent is normally deemed acceptable.
Jonathan D. Victor, a professor of neuroscience at
Weill Cornell Medical College had praise for the study, saying it “raises
consciousness about the pitfalls specific to a nested design and then
counsels you as to how to create a good nested design given limited
resources.”
Emery N. Brown, a professor of computational
neuroscience in the department of brain and cognitive sciences at the
MITHarvard Division of Health Sciences and Technology, points to a dire
need to bolster the level of statistical sophistication brought to bear in
neuroscience studies. “There’s a fundamental flaw in the system and the
fundamental flaw is basically that neuroscientists don’t know enough
statistics to do the right things and there’s not enough statisticians
working in neuroscience to help that.”
The issue of
reproducibility of research results has preoccupied the editors of many top
journals in recent years. The Nature
journals have instituted a checklist to help authors on reporting on the
methods used in their research, a list that inquires about whether the
statistical objectives for a particular study were met. (Scientific
American is part of the Nature Publishing Group.) The one clear message
from studies like that of van der Sluis and others is that the statistician
will take on an increasingly pivotal role as the field moves ahead in
deciphering ever more dense networks of neural signaling.
Jensen Comment
Accountics science differs neuroscience in that reproducibility of research
results does not preoccupy research journal editors 
http://www.trinity.edu/rjensen/TheoryTAR.htm
Obsession With RSquared
"Good Old RSquared," by David Giles, Econometrics Beat: Dave
Giles’ Blog, University of Victoria, June 24, 2013 
http://davegiles.blogspot.com/2013/05/goodoldrsquared.html
My students are often horrified when I
tell them, truthfully, that one of the last pieces of information that I
look at when evaluating the results of an OLS regression, is the coefficient
of determination (R^{2}), or its "adjusted" counterpart.
Fortunately, it doesn't take long to change their perspective!
After all, we all know that with
timeseries data, it's really easy to get a "high" R^{2} value,
because of the trend components in the data. With crosssection data, really
low R^{2 }values are really common. For most of us, the signs,
magnitudes, and significance of the estimated parameters are of primary
interest. Then we worry about testing the assumptions underlying our
analysis. R2 is at the bottom of the list of priorities.
Continued in article
Also see
http://davegiles.blogspot.com/2013/07/theadjustedrsquaredagain.html
Drawing Inferences From Very Large DataSets
David Johnstone wrote the following:
Indeed if you hold H_{0} the same and keep
changing the model, you will eventually (generally soon) get a significant
result, allowing “rejection of H_{0} at 5%”, not because H0 is
necessarily false but because you have built upon a false model (of which
there are zillions, obviously).
"Drawing Inferences From Very Large DataSets," by David Giles, Econometrics
Beat: Dave Giles’ Blog, University of Victoria, April 26, 2013 
http://davegiles.blogspot.ca/2011/04/drawinginferencesfromverylargedata.html
. . .
Granger (1998;
2003) has
reminded us that if the sample size is sufficiently large, then it's
virtually impossible not to reject almost any hypothesis.
So, if the sample is very large and the pvalues associated with
the estimated coefficients in a regression model are of the order of, say,
0.10 or even 0.05, then this really bad news. Much,
much, smaller pvalues are needed before we get all excited about
'statistically significant' results when the sample size is in the
thousands, or even bigger. So, the pvalues reported above are
mostly pretty marginal, as far as significance is concerned. When you work
out the pvalues for the other 6 models I mentioned, they range
from to 0.005 to 0.460. I've been generous in the models I selected.
Here's another set of results taken from a second, really nice, paper by
Ciecieriski et al. (2011) in the same issue of
Health Economics:
Continued in article
Jensen Comment
My research suggest that over 90% of the recent papers published in TAR use
purchased databases that provide enormous sample sizes in those papers. Their
accountics science authors keep reporting those meaningless levels of
statistical significance.
What is even worse is when meaningless statistical significance tests are
used to support decisions.
"Statistical Significance  Again " by David Giles, Econometrics
Beat: Dave Giles’ Blog, University of Victoria, December 28, 2013 
http://davegiles.blogspot.com/2013/12/statisticalsignificanceagain.html
Statistical Significance  Again
With all of this emphasis
on "Big Data", I was pleased to see
this post on the Big Data
Econometrics blog, today.
When you have a sample that runs
to the thousands (billions?), the conventional significance
levels of 10%, 5%, 1% are completely inappropriate. You need to
be thinking in terms of tiny significance levels.
I discussed this in some
detail back in April of 2011, in a post titled, "Drawing
Inferences From Very Large DataSets".
If you're of those (many) applied
researchers who uses large crosssections of data, and then
sprinkles the results tables with asterisks to signal
"significance" at the 5%, 10% levels, etc., then I urge
you read that earlier post.
It's sad to encounter so many
papers and seminar presentations in which the results, in
reality, are totally insignificant!
How Standard Error Costs Us Jobs,
Justice, and Lives, by Stephen T. Ziliak and Deirdre N. McCloskey (Ann
Arbor: University of Michigan Press, ISBN13: 978472050079, 2007)
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm
Page 206
Like scientists today in medical and economic and
other sizeless sciences, Pearson mistook a large sample size for the definite,
substantive significanceevidence s Hayek put it, of "wholes." But it was as
Hayek said "just an illusion." Pearson's columns of sparkling asterisks, though
quantitative in appearance and as appealing a is the simple truth of the sky,
signified nothing.
pp. 250251
The textbooks are wrong. The teaching is wrong. The
seminar you just attended is wrong. The most prestigious journal in your
scientific field is wrong.
You are searching, we know,
for ways to avoid being wrong. Science, as Jeffreys said, is mainly a series of
approximations to discovering the sources of error. Science is a systematic way
of reducing wrongs or can be. Perhaps you feel frustrated by the random
epistemology of the mainstream and don't know what to do. Perhaps you've been
sedated by significance and lulled into silence. Perhaps you sense that the
power of a Roghamsted test against a plausible Dublin alternative is
statistically speaking low but you feel oppressed by the instrumental variable
one should dare not to wield. Perhaps you feel frazzled by what Morris Altman
(2004) called the "social psychology rhetoric of fear," the deeply embedded path
dependency that keeps the abuse of significance in circulation. You want to come
out of it. But perhaps you are cowed by the prestige of Fisherian dogma. Or,
worse thought, perhaps you are cynically willing to be corrupted if it will keep
a nice job
Bob Jensen's threads on the often way analysts, particularly accountics
scientists, often cheer for statistical significance of large sample outcomes
that praise statistical significance of insignificant results such as R^{2}
values of .0001 
The Cult of Statistical Significance: How Standard Error Costs Us Jobs, Justice,
and Lives 
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm
The Insignificance of
Testing the Null
"Statistics: reasoning on uncertainty, and
the insignificance of testing null," by Esa Läärä
Ann. Zool. Fennici 46: 138–157
ISSN 0003455X (print), ISSN 17972450 (online)
Helsinki 30 April 2009 © Finnish Zoological and Botanical Publishing Board 200
http://www.sekj.org/PDF/anz46free/anz46138.pdf
The practice of statistical
analysis and inference in ecology is critically reviewed. The dominant doctrine
of null hypothesis signi fi cance testing (NHST) continues to be applied
ritualistically and mindlessly. This dogma is based on superficial understanding
of elementary notions of frequentist statistics in the 1930s, and is widely
disseminated by influential textbooks targeted at biologists. It is
characterized by silly null hypotheses and mechanical dichotomous division of
results being “signi fi cant” ( P < 0.05) or not. Simple examples are given to
demonstrate how distant the prevalent NHST malpractice is from the current
mainstream practice of professional statisticians. Masses of trivial and
meaningless “results” are being reported, which are not providing adequate
quantitative information of scientific interest. The NHST dogma also retards
progress in the understanding of ecological systems and the effects of
management programmes, which may at worst contribute to damaging decisions in
conservation biology. In the beginning of this millennium, critical discussion
and debate on the problems and shortcomings of NHST has intensified in
ecological journals. Alternative approaches, like basic point and interval
estimation of effect sizes, likelihoodbased and information theoretic methods,
and the Bayesian inferential paradigm, have started to receive attention. Much
is still to be done in efforts to improve statistical thinking and reasoning of
ecologists and in training them to utilize appropriately the expanded
statistical toolbox. Ecologists should finally abandon the false doctrines and
textbooks of their previous statistical gurus. Instead they should more
carefully learn what leading statisticians write and say, collaborate with
statisticians in teaching, research, and editorial work in journals.
Jensen Comment
And to think Alpha (Type 1) error is the easy part. Does anybody ever test for
the more important Beta (Type 2) error? I think some engineers test for Type 2
error with Operating Characteristic (OC) curves, but these are generally applied
where controlled experiments are super controlled such as in quality control
testing.
Jensen Comment
Beta Error 
http://en.wikipedia.org/wiki/Beta_error#Type_II_error
I've never seen an accountics science study
published anywhere that tested for Beta Error.
Scientific Irreproducibility (Frequentists Versus
Bayesians)
"Weak statistical standards implicated in scientific irreproducibility:
Onequarter of studies that meet commonly used statistical cutoff may be false."
by Erika Check Hayden, Nature, November 11, 2013 
http://www.nature.com/news/weakstatisticalstandardsimplicatedinscientificirreproducibility1.14131
The
plague of nonreproducibility in science may be
mostly due to scientists’ use of weak statistical tests, as shown by an
innovative method developed by statistician Valen Johnson, at Texas A&M
University in College Station.
Johnson compared the strength of two types of
tests: frequentist tests, which measure how unlikely a finding is to occur
by chance, and Bayesian tests, which measure the likelihood that a
particular hypothesis is correct given data collected in the study. The
strength of the results given by these two types of tests had not been
compared before, because they ask slightly different types of questions.
So Johnson developed a method that makes the
results given by the tests — the P value in the frequentist paradigm,
and the Bayes factor in the Bayesian paradigm — directly comparable. Unlike
frequentist tests, which use objective calculations to reject a null
hypothesis, Bayesian tests require the tester to define an alternative
hypothesis to be tested — a subjective process. But Johnson developed a
'uniformly most powerful' Bayesian test that defines the alternative
hypothesis in a standard way, so that it “maximizes the probability that the
Bayes factor in favor of the alternate hypothesis exceeds a specified
threshold,” he writes in his paper. This threshold can be chosen so that
Bayesian tests and frequentist tests will both reject the null hypothesis
for the same test results.
Johnson then used these uniformly most powerful
tests to compare P values to Bayes factors. When he did so, he found
that a P value of 0.05 or less — commonly considered evidence in
support of a hypothesis in fields such as social science, in which
nonreproducibility has become a serious issue —
corresponds to Bayes factors of between 3 and 5, which are considered weak
evidence to support a finding.
False positives
Indeed, as many as 17–25% of such findings are
probably false, Johnson calculates^{1}.
He advocates for scientists to use more stringent P values of 0.005
or less to support their findings, and thinks that the use of the 0.05
standard might account for most of the problem of nonreproducibility in
science — even more than other issues, such as biases and scientific
misconduct.
“Very few studies that fail to replicate are based
on P values of 0.005 or smaller,” Johnson says.
Some other mathematicians said that though there
have been many calls for researchers to use more stringent tests^{2},
the new paper makes an important contribution by laying bare exactly how lax
the 0.05 standard is.
“It shows once more that standards of evidence that
are in common use throughout the empirical sciences are dangerously
lenient,” says mathematical psychologist EricJan Wagenmakers of the
University of Amsterdam. “Previous arguments centered on ‘Phacking’,
that is, abusing standard statistical procedures to obtain the desired
results. The Johnson paper shows that there is something wrong with the P
value itself.”
Other researchers, though, said it would be
difficult to change the mindset of scientists who have become wedded to the
0.05 cutoff. One implication of the work, for instance, is that studies will
have to include more subjects to reach these more stringent cutoffs, which
will require more time and money.
“The family of Bayesian methods has been well
developed over many decades now, but somehow we are stuck to using
frequentist approaches,” says physician John Ioannidis of Stanford
University in California, who studies the causes of nonreproducibility. “I
hope this paper has better luck in changing the world.”
Accountics Scientists are More Interested in
Their Tractors Than Their Harvests 
http://www.trinity.edu/rjensen/TheoryTAR.htm
Can You Really Test for
Multicollinearity?
Unlike real scientists, accountics scientists seldom replicate published
accountics science research by the exacting standards real science 
http://www.trinity.edu/rjensen/TheoryTAR.htm#Replication
Multicollinearity 
http://en.wikipedia.org/wiki/Multicollinearity
"Can You Actually TEST for Multicollinearity?" by David Giles, Econometrics
Beat: Dave Giles’ Blog, University of Victoria, June 24, 2013 
http://davegiles.blogspot.com/2013/06/canyouactuallytestfor.html
. . .
Now, let's
return to the "problem" of multicollinearity.
What do we mean by
this term, anyway? This turns out to be the key question!
Multicollinearity
is a phenomenon associated with our particular sample of data
when we're trying to estimate a regression model. Essentially, it's a
situation where there is insufficient information in the sample of
data to enable us to enable us to draw "reliable" inferences about
the individual parameters of the underlying (population) model.
I'll be elaborating more on the "informational content" aspect of this
phenomenon in a followup post. Yes, there are various sample measures
that we can compute and report, to help us gauge how severe this data
"problem" may be. But they're not statistical tests, in any sense
of the word
Because multicollinearity is a characteristic of the sample, and
not a characteristic of the population, you should immediately be
suspicious when someone starts talking about "testing for
multicollinearity". Right?
Apparently not everyone gets it!
There's an old paper by Farrar and Glauber (1967) which, on the face of
it might seem to take a different stance. In fact, if you were around
when this paper was published (or if you've bothered to actually read it
carefully), you'll know that this paper makes two contributions. First,
it provides a very sensible discussion of what multicollinearity is all
about. Second, the authors take some well known results from the
statistics literature (notably, by Wishart, 1928; Wilks, 1932; and
Bartlett, 1950) and use them to give "tests" of the hypothesis that the
regressor matrix, X, is orthogonal.
How can this be? Well, there's a simple explanation if you read the
Farrar and Glauber paper carefully, and note what assumptions are made
when they "borrow" the old statistics results. Specifically, there's an
explicit (and necessary) assumption that in the population the X
matrix is random, and that it follows a multivariate normal
distribution.
This assumption is, of course totally at odds with what is usually
assumed in the linear regression model! The "tests" that Farrar and
Glauber gave us aren't really tests of multicollinearity in the
sample. Unfortunately, this point wasn't fully appreciated by
everyone.
There are some sound suggestions in this paper, including looking at the
sample multiple correlations between each regressor, and all of
the other regressors. These, and other sample measures such as
variance inflation factors, are useful from a diagnostic viewpoint, but
they don't constitute tests of "zero multicollinearity".
So, why am I even mentioning the Farrar and Glauber paper now?
Well, I was intrigued to come across some STATA code (Shehata, 2012)
that allows one to implement the Farrar and Glauber "tests". I'm not
sure that this is really very helpful. Indeed, this seems to me to be a
great example of applying someone's results without understanding
(bothering to read?) the assumptions on which they're based!
Be careful out there  and be highly suspicious of strangers bearing
gifts!
References
Shehata, E. A. E.,
2012. FGTEST: Stata module to compute FarrarGlauber
Multicollinearity Chi2, F, t tests.
Wilks, S. S., 1932. Certain generalizations in the analysis of
variance. Biometrika, 24, 477494.
Wishart, J., 1928. The generalized product moment distribution
in samples from a multivariate normal population. Biometrika,
20A, 3252.
Multicollinearity 
http://en.wikipedia.org/wiki/Multicollinearity
Singular Matrix 
http://en.wikipedia.org/wiki/Invertible_matrix#singular
"Least Squares, Perfect Multicollinearity, & Estimable Function," by
David Giles, Econometrics Blog, September 19, 2014 
http://davegiles.blogspot.com/2014/09/leastsquaresperfectmulticollinearity.html
. . .
The best way to think about
multicollinearity in a regression setting is that it reflects a shortage
of information. Sometimes additional information can be obtained via additional
data. Sometimes we can "inject" additional information into the problem
by means of exact or stochastic restrictions on the parameters. (The
latter is how the problem is avoided in a Bayesian setting.) Sometimes,
we can't do either of these things.
Here, I'll focus on the most extreme case
possible  one where we have "perfect multicollinearity". That's the
case where X has less than full rank, so that (X'X) doesn't have a
regular inverse. It's the situation outlined above.
For the least squares estimator, b, to be
defined, we need to be able to solve the normal equation, (1). What
we're interested in, of course, is a solution for every element
of the b vector. This is simply not achievable in the case of perfect
multicollinearity. There's not enough information in the sample for us
to be able to uniquely identify and estimate every individual regression
coefficient. However, we should be able to identify and estimate certain
linear combinations of those coefficients. These combinations are
usually referred to as "estimable functions" of the parameters.
Continued in article
It's relatively uncommon for accountics scientists to criticize each others'
published works. A notable exception is as follows:
"Selection Models in Accounting Research," by Clive S. Lennox, Jere R.
Francis, and Zitian Wang, The Accounting Review, March 2012, Vol. 87,
No. 2, pp. 589616.
This study explains the challenges associated with
the Heckman (1979) procedure to control for selection bias, assesses the
quality of its application in accounting research, and offers guidance for
better implementation of selection models. A survey of 75 recent accounting
articles in leading journals reveals that many researchers implement the
technique in a mechanical way with relatively little appreciation of
important econometric issues and problems surrounding its use. Using
empirical examples motivated by prior research, we illustrate that selection
models are fragile and can yield quite literally any possible outcome in
response to fairly minor changes in model specification. We conclude with
guidance on how researchers can better implement selection models that will
provide more convincing evidence on potential selection bias, including the
need to justify model specifications and careful sensitivity analyses with
respect to robustness and
multicollinearity.
. . .
CONCLUSIONS
Our review of the accounting literature indicates
that some studies have implemented the selection model in a questionable
manner. Accounting researchers often impose ad hoc exclusion restrictions or
no exclusion restrictions whatsoever. Using empirical examples and a
replication of a published study, we demonstrate that such practices can
yield results that are too fragile to be considered reliable. In our
empirical examples, a researcher could obtain quite literally any outcome by
making relatively minor and apparently innocuous changes to the set of
exclusionary variables, including choosing a null set. One set of exclusion
restrictions would lead the researcher to conclude that selection bias is a
significant problem, while an alternative set involving rather minor changes
would give the opposite conclusion. Thus, claims about the existence and
direction of selection bias can be sensitive to the researcher's set of
exclusion restrictions.
Our examples also illustrate that the selection
model is vulnerable to high levels of multicollinearity, which can
exacerbate the bias that arises when a model is misspecified (Thursby 1988).
Moreover, the potential for misspecification is high in the selection model
because inferences about the existence and direction of selection bias
depend entirely on the researcher's assumptions about the appropriate
functional form and exclusion restrictions. In addition, high
multicollinearity means that the statistical insignificance of the inverse
Mills' ratio is not a reliable guide as to the absence of selection bias.
Even when the inverse Mills' ratio is statistically insignificant,
inferences from the selection model can be different from those obtained
without the inverse Mills' ratio. In this situation, the selection model
indicates that it is legitimate to omit the inverse Mills' ratio, and yet,
omitting the inverse Mills' ratio gives different inferences for the
treatment variable because multicollinearity is then much lower.
In short, researchers are faced with the following
tradeoff. On the one hand, selection models can be fragile and suffer from
multicollinearity problems, which hinder their reliability. On the other
hand, the selection model potentially provides more reliable inferences by
controlling for endogeneity bias if the researcher can find good exclusion
restrictions, and if the models are found to be robust to minor
specification changes. The importance of these advantages and disadvantages
depends on the specific empirical setting, so it would be inappropriate for
us to make a general statement about when the selection model should be
used. Instead, researchers need to critically appraise the quality of their
exclusion restrictions and assess whether there are problems of fragility
and multicollinearity in their specific empirical setting that might limit
the effectiveness of selection models relative to OLS.
Another way to control for unobservable factors
that are correlated with the endogenous regressor (D) is to use panel data.
Though it may be true that many unobservable factors impact the choice of D,
as long as those unobservable characteristics remain constant during the
period of study, they can be controlled for using a fixed effects research
design. In this case, panel data tests that control for unobserved
differences between the treatment group (D = 1) and the control group (D =
0) will eliminate the potential bias caused by endogeneity as long as the
unobserved source of the endogeneity is timeinvariant (e.g., Baltagi 1995;
Meyer 1995; Bertrand et al. 2004). The advantages of such a
differenceindifferences research design are well recognized by accounting
researchers (e.g., Altamuro et al. 2005; Desai et al. 2006; Hail and Leuz
2009; Hanlon et al. 2008). As a caveat, however, we note that the
timeinvariance of unobservables is a strong assumption that cannot be
empirically validated. Moreover, the standard errors in such panel data
tests need to be corrected for serial correlation because otherwise there is
a danger of overrejecting the null hypothesis that D has no effect on Y
(Bertrand et al. 2004).10
Finally, we note that there is a recent trend in
the accounting literature to use samples that are matched based on their
propensity scores (e.g., Armstrong et al. 2010; Lawrence et al. 2011). An
advantage of propensity score matching (PSM) is that there is no MILLS
variable and so the researcher is not required to find valid Z variables
(Heckman et al. 1997; Heckman and NavarroLozano 2004). However, such
matching has two important limitations. First, selection is assumed to occur
only on observable characteristics. That is, the error term in the first
stage model is correlated with the independent variables in the second stage
(i.e., u is correlated with X and/or Z), but there is no selection on
unobservables (i.e., u and υ are uncorrelated). In contrast, the purpose of
the selection model is to control for endogeneity that arises from
unobservables (i.e., the correlation between u and υ). Therefore, propensity
score matching should not be viewed as a replacement for the selection model
(Tucker 2010).
A second limitation arises if the treatment
variable affects the company's matching attributes. For example, suppose
that a company's choice of auditor affects its subsequent ability to raise
external capital. This would mean that companies with higher quality
auditors would grow faster. Suppose also that the company's characteristics
at the time the auditor is first chosen cannot be observed. Instead, we
match at some stacked calendar time where some companies have been using the
same auditor for 20 years and others for not very long. Then, if we matched
on company size, we would be throwing out the companies that have become
large because they have benefited from highquality audits. Such companies
do not look like suitable “matches,” insofar as they are much larger than
the companies in the control group that have lowquality auditors. In this
situation, propensity matching could bias toward a nonresult because the
treatment variable (auditor choice) affects the company's matching
attributes (e.g., its size). It is beyond the scope of this study to provide
a more thorough assessment of the advantages and disadvantages of propensity
score matching in accounting applications, so we leave this important issue
to future research.
A second indicator is our journals.
They have proliferated in number. But we struggle with an intertemporal
sameness, with incremental as opposed to discontinuous attempts to move our
thinking forward, and with referee intrusion and voyeurism. Value relevance is a
currently fashionable approach to identifying statistical regularities in the
financial market arena, just as a focus on readily observable components of
compensation is a currently fashionable dependent variable in the compensation
arena. Yet we know measurement error abounds, that other sources of informa
tion are both present and hardly unimportant, that compensation is broadbased
and intertemporally managed, and that compen sating wage differentials are part
of the stew. Yet we continue on the comfortable path of sameness.
Joel Demski, AAA President's Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf
Models That aren't Robust
Robust Statistics 
http://en.wikipedia.org/wiki/Robust_statistics
Robust statistics are statistics with good
performance for data drawn from a wide range of probability distributions,
especially for distributions that are not normally distributed. Robust
statistical methods have been developed for many common problems, such as
estimating location, scale and regression parameters. One motivation is to
produce statistical methods that are not unduly affected by outliers.
Another motivation is to provide methods with good performance when there
are small departures from parametric distributions. For example, robust
methods work well for mixtures of two normal distributions with different
standarddeviations, for example, one and three; under this model,
nonrobust methods like a ttest work badly.
Continued in article
The phrase is most often used to distinguish assumed (shadow) worlds
that differ in usually important ways from the real world such as when
economists assume steadystate conditions, equilibrium conditions,
corporate utility functions, etc.
The Gaussian Copula function blamed for the collapse of the economy in 2007
is an example of a derivation in Plato's Cave that was made operational
inappropriately by Wall Street Investment Banks:
"In Plato's Cave:
Mathematical models are a powerful way of predicting financial markets. But
they are fallible" The Economist, January 24, 2009, pp. 1014 
http://www.trinity.edu/rjensen/2008Bailout.htm#Bailout
Conceivably a subset of Wall Street analysts make up a subset of "alumni" in
Plato's Cave. But they are joined by the many more quants in all disciplines who
do analytics and empirical research in the realm of assumed worlds that differ
from reality in possibly serious ways.
Game Theory Model Solutions Are Rarely Robust
Nash Equilibrium 
http://en.wikipedia.org/wiki/Nash_equilibrium
Question
Why do game theory model solutions like Nash Equilibrium fail so often in the
real world?
"They Finally Tested The 'Prisoner's Dilemma' On Actual Prisoners — And
The Results Were Not What You Would Expect," by Max Nissen, Business
Insider, July 13, 2013 
http://www.businessinsider.com/prisonersdilemmainreallife20137
The "prisoner's dilemma" is a familiar concept to just
about everyone who took Econ 101.The basic
version goes like this: Two criminals are arrested, but police can't convict
either on the primary charge, so they plan to sentence them to a year in
jail on a lesser charge. Each of the prisoners, who can't communicate with
each other, are given the option of testifying against their partner. If
they testify, and their partner remains silent, the partner gets three years
and they go free. If they both testify, both get two. If both remain silent,
they each get one.
In game theory, betraying your partner, or
"defecting" is always the dominant strategy as it always has a slightly
higher payoff in a simultaneous game. It's what's known as a "Nash
Equilibrium," after Nobel Prize winning mathematician and "A
Beautiful Mind" subject John Nash.
In sequential
games, where players know each other's previous behavior and have the
opportunity to punish each other, defection is the dominant strategy as
well.
However, on an overall basis, the best outcome for
both players is mutual cooperation.
Yet no one's ever
actually run the experiment on real prisoners before, until
two University of Hamburg economists
tried it out in a recent study comparing the behavior of inmates and
students.
Surprisingly, for
the classic version of the game, prisoners were far more cooperative than
expected.
Menusch Khadjavi and Andreas Lange put
the famous game to the test for the first time
ever, putting a group of prisoners in Lower Saxony's primary women's prison,
as well as students, through both simultaneous and sequential versions of
the game.
The payoffs
obviously weren't years off sentences, but euros for students, and the
equivalent value in coffee or cigarettes for prisoners.
They expected, building off of game theory and
behavioral economic research that show humans are more cooperative than the
purely rational model that economists traditionally use, that there would be
a fair amount of firstmover cooperation, even in the simultaneous
simulation where there's no way to react to the other player's decisions.
And even in the sequential game, where you get a
higher payoff for betraying a cooperative first mover, a fair amount will
still reciprocate.
As for the difference between student and prisoner
behavior, you'd expect that a prison population might be more jaded and
distrustful, and therefore more likely to defect.
The results went exactly the other way for the
simultaneous game, only 37% of students cooperate. Inmates cooperated 56% of
the time.
On a pair basis, only 13% of student pairs managed
to get the best mutual outcome and cooperate, whereas 30% of prisoners do.
In the sequential
game, far more students (63%) cooperate, so the mutual cooperation rate
skyrockets to 39%. For prisoners, it remains about the same.
What's interesting
is that the simultaneous game requires far more blind trust from both
parties, and you don't have a chance to retaliate or make up for being
betrayed later. Yet prisoners are still significantly more cooperative in
that scenario.
Obviously the
payoffs aren't as serious as a year or three of your life, but the paper
still demonstrates that prisoners aren't necessarily as calculating,
selfinterested, and untrusting as you might expect, and as behavioral
economists have argued for years, as mathematically interesting as Nash
equilibrium might be, they don't line up with real behavior all that well.
"Nobody understands “Prisoner’s dilemma”" July 23, 2013
http://beranger.org/2013/07/23/nobodyunderstandsprisonersdilemma/
. . .
Now, the theory says they’d be better off by
betraying — i.e. by confessing. And they invoke the
Nash equilibrium to “prove” that they’d be better
off this way.
The problem in real life is that:
 there’s no such thing as an “iterated
prisoners’ dilemma” — the deal is only offered you once!
 once again, this is a onetime shot,
not something that repeats, so anything Nashrelated is pure stupidity —
you can’t have any kind of “equilibrium” when a unique,
unrepeatable decision affects the entire outcome!
 “maximizing” WHAT? You’re not playing
baccarat, you’re getting out or staying in, game over!
 also, any discussion of a “probability
distribution” is pointless — it’s a ONETIME ISSUE, and then you go to
jail, dammit! Statistics doesn’t work with a unique sample.
 consequently, any analysis of what the other
prisoner might be f***ing thinking is f***ing useless — you
cannot possibly know how stupid or how intelligent the other guy is, and
again, any “statistical assumption” is pure intellectual masturbation;
you can only hope he’s not mentally deranged;
 as a practical issue, the “classical” dilemma,
which uses prison terms of 1, 2, and 3 years, is not
only confusing and making the judgement more difficult (as the terms are
too close to each other), but it’s also highly unrealistic in terms of
what the lack of evidence or the presence of a confession would give —
therefore, the variant with 1, 5 and 20 years is much
more appropriate.
OK, now let me say what I’d do if I were a
prisoner to have been offered such a deal: I’ll keep being silent —
they’d call this “cooperation”, but by elementary logic, this is obviously
the best thing to do, especially when thinking of real jail terms:
20 years is horrendous, 5 years is painful, but 1 year is rather cheap, so
I’d assume the other prisoner would think the same. Just common
sense. Zero years would be ideal, but there is a risk, and the risk reads “5
years”. This is not altruism, but compared to 5 years, 1 year would
be quite acceptable for a felon, wouldn’t you think so? Nothing
about any remorse of possibly putting the other guy behind bars for 20 years
— just selfish considerations are enough to choose this strategy! (Note that
properly choosing the prison terms makes the conclusion easier to reach: 2
years are not as much different from 1 year as the 5 years are.)
They’ve now for the
first time tried this dilemma in practice.
The idiots have used two groups: students — the stake
being a material reward –, and real inmates — where not the freedom was at
stake, but merely some cigarettes or coffee.
In such a flawed test environment, 37% of the
students did “cooperate”, versus 56% of the inmates. The “iterated”
(sequential) version of the dilemma showed an increased cooperation, but
only amongst the students (which, in my opinion, proves that they were
totally dumb).
Now, I should claim victory, as long as this
experiment contradicts the theory saying the cooperation should have been
negligible — especially amongst “immoral convicts”. And really, invoking a
Pareto standpoint
(making one individual better off without making any other individual worse
off) is equally dumb, as nobody thinks in terms of ethics… for some bloody
cigarettes! In real conditions though, where PERSONAL FREEDOM would
be at stake FOR YEARS (1, 5, or 20) — not just peanuts –, an experiment
would show even more “cooperation”, meaning that most people would remain
silent!
They can’t even design an experiment properly. Not
winning a couple of bucks, or a cuppa coffee is almost irrelevant to
all the subjects involved (this is not a real stake!), whereas the
stress of staying in jail for 5 or for 20 years is almost a
lifeordeath issue. Mathematicians and sociologists seem
unbelievably dumb when basic empathy is needed in order to
analyze a problem or conduct an experiment.
__
P.S.: A classical example that’s
commonly mentioned is that during the Cold War, both parts have chosen to
continuously arm, not to disarm — which means they didn’t “cooperate”. Heck,
this is a continuously iterated prisoners’ dilemma, which
is a totally different issue than a onetime shot
prisoners’ dilemma! In such a continuum, the “official theory” applies with
great success.
__
LATE EDIT: If it wasn’t clear
enough, the practical experiment was flawed for two major reasons:

The stake. When it’s not about losing personal
FREEDOM for years, but merely about
not earning a few euros or not being
given some cigarettes or coffee, people are more
prone to take chances and face the highest possible
risk… because they don’t risk that much!

The reversed logic. How can you replace
penalties with rewards (on a reversed scale, obviously) and still have
people apply the same judgement? Being put in jail for 20 years is
replaced with what? With not earning anything? Piece of cake!
What’s the equivalent of being set free? Being given a maximum of cash
or of cigarettes? To make the equivalent of a real prisoner’s dilemma,
the 20 years, 5 years or 1 year penalties shouldn’t have meant
“gradually lower earnings”, but rather fines imposed
to the subjects! Say, for the students:
 FREE means you’re given 100 €
 1 year means you should pay 100 €
 5 years means you should pay 500 €
 20 years means you should pay 2000 €
What do you think the outcome would have been
in such an experiment? Totally different, I’m telling you!
Also see
http://freakonomics.com/2012/04/25/ukgameshowgoldenballsanewsolutiontotheprisoner%E2%80%99sdilemma/
"ECONOMICS AS ROBUSTNESS ANALYSIS," by Jaakko Kuorikoski, Aki Lehtinen
and Caterina Marchionn, he University of Pittsburgh, 2007 
http://philsciarchive.pitt.edu/3550/1/econrobu.pdf
ECONOMICS AS ROBUSTNESS ANALYSIS
Jaakko Kuorikoski, Aki Lehtinen and Caterina
Marchionni
25.9. 2007
1. Introduction
.....................................................................................................................
1
2. Making sense of
robustness............................................................................................
4
3. Robustness in
economics................................................................................................
6
4. The epistemic import of robustness
analysis.................................................................
8
5. An illustration: geographical economics models
........................................................ 13
6. Independence of
derivations.........................................................................................
18
7. Economics as a Babylonian science
............................................................................
23
8. Conclusions
...................................................................................................................
1.Introduction
Modern economic analysis consists largely in building abstract
mathematical models and deriving familiar results from ever sparser
modeling assumptions is considered as a theoretical contribution. Why do
economists spend so much time and effort in deriving same old results
from slightly different assumptions rather than trying to come up with
new and exciting hypotheses? We claim that this is because the process
of refining economic models is essentially a form of robustness
analysis. The robustness of modeling results with respect to particular
modeling assumptions, parameter values or initial conditions plays a
crucial role for modeling in economics for two reasons. First, economic
models are difficult to subject to straightforward empirical tests for
various reasons. Second, the very nature of economic phenomena provides
little hope of ever making the modeling assumptions completely
realistic. Robustness analysis is therefore a natural methodological
strategy for economists because economic models are based on various
idealizations and abstractions which make at least some of their
assumptions unrealistic (Wimsatt 1987; 1994a; 1994b; Mäki 2000; Weisberg
2006b). The importance of robustness considerations in economics
ultimately forces us to reconsider many commonly held views on the
function and logical structure of economic theory.
Given that much of economic research praxis can
be characterized as robustness analysis, it is somewhat surprising that
philosophers of economics have only recently become interested in
robustness. William Wimsatt has extensively discussed robustness
analysis, which he considers in general terms as triangulation via
independent ways of determination . According to Wimsatt, fairly varied
processes or activities count as ways of determination: measurement,
observation, experimentation, mathematical derivation etc. all qualify.
Many ostensibly different epistemic activities are thus classified as
robustness analysis. In a recent paper, James Woodward (2006)
distinguishes four notions of robustness. The first three are all
species of robustness as similarity of the result under different forms
of determination. Inferential robustness refers to the idea that there
are different degrees to which inference from some given data may depend
on various auxiliary assumptions, and derivational robustness to whether
a given theoretical result depends on the different modelling
assumptions. The difference between the two is that the former concerns
derivation from data, and the latter derivation from a set of
theoretical assumptions. Measurement robustness means triangulation of a
quantity or a value by (causally) different means of measurement.
Inferential, derivational and measurement robustness differ with respect
to the method of determination and the goals of the corresponding
robustness analysis. Causal robustness, on the other hand, is a
categorically different notion because it concerns causal dependencies
in the world, and it should not be confused with the epistemic notion of
robustness under different ways of determination.
In Woodward’s typology, the kind of theoretical
modelrefinement that is so common in economics constitutes a form of
derivational robustness analysis. However, if Woodward (2006) and Nancy
Cartwright (1991) are right in claiming that derivational robustness
does not provide any epistemic credence to the conclusions, much of
theoretical model building in economics should be regarded as
epistemically worthless. We take issue with this position by developing
Wimsatt’s (1981) account of robustness analysis as triangulation via
independent ways of determination. Obviously, derivational robustness in
economic models cannot be a matter of entirely independent ways of
derivation, because the different models used to assess robustness
usually share many assumptions. Independence of a result with respect to
modelling assumptions nonetheless carries epistemic weight by supplying
evidence that the result is not an artefact of particular idealizing
modelling assumptions. We will argue that although robustness analysis,
understood as systematic examination of derivational robustness, is not
an empirical confirmation procedure in any straightforward sense,
demonstrating that a modelling result is robust does carry epistemic
weight by guarding against error and by helping to assess the relative
importance of various parts of theoretical models (cf. Weisberg 2006b).
While we agree with Woodward (2006) that arguments presented in favour
of one kind of robustness do not automatically apply to other kinds of
robustness, we think that the epistemic gain from robustness derives
from similar considerations in many instances of different kinds of
robustness.
In contrast to physics, economic theory itself
does not tell which idealizations are truly fatal or crucial for the
modeling result and which are not. Economists often proceed on a
preliminary hypothesis or an intuitive hunch that there is some core
causal mechanism that ought to be modeled realistically. Turning such
intuitions into a tractable model requires making various unrealistic
assumptions concerning other issues. Some of these assumptions are
considered or hoped to be unimportant, again on intuitive grounds. Such
assumptions have been examined in economic methodology using various
closely related terms such as Musgrave’s (1981) heuristic assumptions,
Mäki’s (2000) early step assumptions, Hindriks’ (2006) tractability
assumptions and Alexandrova’s (2006) derivational facilitators. We will
examine the relationship between such assumptions and robustness in
economic modelbuilding by way of discussing a case: geographical
economics. We will show that an important way in which economists try to
guard against errors in modeling is to see whether the model’s
conclusions remain the same if some auxiliary assumptions, which are
hoped not to affect those conclusions, are changed. The case also
demonstrates that although the epistemological functions of guarding
against error and securing claims concerning the relative importance of
various assumptions are somewhat different, they are often closely
intertwined in the process of analyzing the robustness of some modeling
result.
. . .
8. Conclusions
The practice of economic theorizing largely consists of building models with
slightly different assumptions yielding familiar results. We have argued
that this practice makes sense when seen as derivational robustness
analysis. Robustness analysis is a sensible epistemic strategy in situations
where we know that our assumptions and inferences are fallible, but not in
what situations and in what way. Derivational robustness analysis guards
against errors in theorizing when the problematic parts of the ways of
determination, i.e. models, are independent of each other. In economics in
particular, proving robust theorems from different models with diverse
unrealistic assumptions helps us to evaluate what results correspond to
important economic phenomena and what are merely artefacts of particular
auxiliary assumptions. We have addressed Orzack and Sober’s criticism
against robustness as an epistemically relevant feature by showing that
their formulation of the epistemic situation in which robustness analysis is
useful is misleading. We have also shown that their argument actually shows
how robustness considerations are necessary for evaluating what a given
piece of data can support. We have also responded to Cartwright’s criticism
by showing that it relies on an untenable hope of a completely true economic
model.
Viewing economic model building as robustness
analysis also helps to make sense of the role of the rationality axioms that
apparently provide the basis of the whole enterprise. Instead of the
traditional Euclidian view of the structure of economic theory, we propose
that economics should be approached as a Babylonian science, where the
epistemically secure parts are the robust theorems and the axioms only form
what Boyd and Richerson call a generalized sample theory, whose the role is
to help organize further modelling work and facilitate communication between
specialists.
Jensen Comment
As I've mentioned before I spent a goodly proportion of my time for two years in
a think tank trying to invent adaptive regression and cluster analysis models.
In every case the main reasons for my failures were lack of robustness. In
particular, if any two models feeding in predictor variables w, x, y, and z
generated different outcomes that were not robust in terms of the time ordering
of the variables feeding into the algorithms. This made the results dependent of
dynamic programming which has rarely been noted for computing practicality 
http://en.wikipedia.org/wiki/Dynamic_programming
Simpson's Paradox and CrossValidation
Simpson's Paradox 
http://en.wikipedia.org/wiki/Simpson%27s_paradox
"Simpson’s Paradox: A Cautionary Tale in Advanced Analytics," by Steve
Berman, Leandro DalleMule, Michael Greene, and John Lucker, Significance:
Statistics Making Sense, October 2012 
http://www.significancemagazine.org/details/webexclusive/2671151/SimpsonsParadoxACautionaryTaleinAdvancedAnalytics.html
Analytics projects often present us with situations
in which common sense tells us one thing, while the numbers seem to tell us
something much different. Such situations are often opportunities to learn
something new by taking a deeper look at the data. Failure to perform a
sufficiently nuanced analysis, however, can lead to misunderstandings and
decision traps. To illustrate this danger, we present several instances of
Simpson’s Paradox in business and nonbusiness environments. As we
demonstrate below, statistical tests and analysis can be confounded by a
simple misunderstanding of the data. Often taught in elementary probability
classes, Simpson’s Paradox refers to situations in which a trend or
relationship that is observed within multiple groups reverses when the
groups are combined. Our first example describes how Simpson’s Paradox
accounts for a highly surprising observation in a healthcare study. Our
second example involves an apparent violation of the law of supply and
demand: we describe a situation in which price changes seem to bear no
relationship with quantity purchased. This counterintuitive relationship,
however, disappears once we break the data into finer time periods. Our
final example illustrates how a naive analysis of marginal profit
improvements resulting from a price optimization project can potentially
mislead senior business management, leading to incorrect conclusions and
inappropriate decisions. Mathematically, Simpson’s Paradox is a fairly
simple—if counterintuitive—arithmetic phenomenon. Yet its significance for
business analytics is quite farreaching. Simpson’s Paradox vividly
illustrates why business analytics must not be viewed as a purely technical
subject appropriate for mechanization or automation. Tacit knowledge, domain
expertise, common sense, and above all critical thinking, are necessary if
analytics projects are to reliably lead to appropriate evidencebased
decision making.
The past several years have seen decision making in
many areas of business steadily evolve from judgmentdriven domains into
scientific domains in which the analysis of data and careful consideration
of evidence are more prominent than ever before. Additionally, mainstream
books, movies, alternative media and newspapers have covered many topics
describing how fact and metric driven analysis and subsequent action can
exceed results previously achieved through less rigorous methods. This trend
has been driven in part by the explosive growth of data availability
resulting from Enterprise Resource Planning (ERP) and Customer Relationship
Management (CRM) applications and the Internet and eCommerce more generally.
There are estimates that predict that more data will be created in the next
four years than in the history of the planet. For example, WalMart handles
over one million customer transactions every hour, feeding databases
estimated at more than 2.5 petabytes in size  the equivalent of 167 times
the books in the United States Library of Congress.
Additionally, computing power has increased
exponentially over the past 30 years and this trend is expected to continue.
In 1969, astronauts landed on the moon with a 32kilobyte memory computer.
Today, the average personal computer has more computing power than the
entire U.S. space program at that time. Decoding the human genome took 10
years when it was first done in 2003; now the same task can be performed in
a week or less. Finally, a large consumer credit card issuer crunched two
years of data (73 billion transactions) in 13 minutes, which not long ago
took over one month.
This explosion of data availability and the
advances in computing power and processing tools and software have paved the
way for statistical modeling to be at the front and center of decision
making not just in business, but everywhere. Statistics is the means to
interpret data and transform vast amounts of raw data into meaningful
information.
However, paradoxes and fallacies lurk behind even
elementary statistical exercises, with the important implication that
exercises in business analytics can produce deceptive results if not
performed properly. This point can be neatly illustrated by pointing to
instances of Simpson’s Paradox. The phenomenon is named after Edward
Simpson, who described it in a technical paper in the 1950s, though the
prominent statisticians Karl Pearson and Udney Yule noticed the phenomenon
over a century ago. Simpson’s Paradox, which regularly crops up in
statistical research, business analytics, and public policy, is a prime
example of why statistical analysis is useful as a corrective for the many
ways in which humans intuit false patterns in complex datasets.
Simpson’s Paradox is in a sense an arithmetic
trick: weighted averages can lead to reversals of meaningful
relationships—i.e., a trend or relationship that is observed within each of
several groups reverses when the groups are combined. Simpson’s Paradox can
arise in any number of marketing and pricing scenarios; we present here case
studies describing three such examples. These case studies serve as
cautionary tales: there is no comprehensive mechanical way to detect or
guard against instances of Simpson’s Paradox leading us astray. To be
effective, analytics projects should be informed by both a nuanced
understanding of statistical methodology as well as a pragmatic
understanding of the business being analyzed.
The first case study, from the medical field,
presents a surface indication on the effects of smoking that is at odds with
common sense. Only when the data are viewed at a more refined level of
analysis does one see the true effects of smoking on mortality. In the
second case study, decreasing prices appear to be associated with decreasing
sales and increasing prices appear to be associated with increasing sales.
On the surface, this makes no sense. A fundamental tenet of economics is
that of the demand curve: as the price of a good or service increases,
consumers demand less of it. Simpson’s Paradox is responsible for an
apparent—though illusory—violation of this fundamental law of economics. Our
final case study shows how marginal improvements in profitability in each of
the sales channels of a given manufacturer may result in an apparent
marginal reduction in the overall profitability the business. This seemingly
contradictory conclusion can also lead to serious decision traps if not
properly understood.
Case Study 1: Are those warning labels
really necessary?
We start with a simple example from the healthcare
world. This example both illustrates the phenomenon and serves as a reminder
that it can appear in any domain.
The data are taken from a 1996 followup study from
Appleton, French, and Vanderpump on the effects of smoking. The followup
catalogued women from the original study, categorizing based on the age
groups in the original study, as well as whether the women were smokers or
not. The study measured the deaths of smokers and nonsmokers during the 20
year period.
Continued in article
"Is the Ohlson
(1995) Model an Example of the Simpson's Paradox?" by Samithamby
Senthilnathan, SSRN 1417746, June 11, 2009 
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1417746
The Equity
Prices and Accounting Variables: The role of the most recent prior period's
price in value relevance studies
Paperback
by Samithamby Senthilnathan (Author)
Publisher: LAP LAMBERT Academic Publishing (May 22, 2012)
ISBN10: 3659103721 ISBN13: 9783659103728
http://www.amazon.com/dp/3659103721?tag=beschevac20
"Does an End of
Period's Accounting Variable Assessed have Relevance for the Particular Period?
Samithamby Senthilnathan, SSRN SSRN 1415182,, June 6, 2009 
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1415182
Also see
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1406788
What happened to crossvalidation in
accountics science research?
Over time I've become increasingly critical of
the lack of validation in accountics science, and I've focused mainly upon lack
of replication by independent researchers and lack of commentaries published in
accountics science journals 
http://www.trinity.edu/rjensen/TheoryTAR.htm
Another type of validation that seems to be on
the decline in accountics science are the socalled crossvalidations.
Accountics scientists seem to be content with their statistical inference tests
on ZScores, FTests, and correlation significance testing. Crossvalidation
seems to be less common, at least I'm having troubles finding examples of
crossvalidation. Crossvalidation entails comparing sample findings with
findings in holdout samples.
Cross Validation 
http://en.wikipedia.org/wiki/Crossvalidation_%28statistics%29
When reading the following paper using logit
regression to to predict audit firm changes, it struck me that this would've
been an ideal candidate for the authors to have performed crossvalidation using
holdout samples.
"Audit Quality and Auditor Reputation: Evidence from Japan," by Douglas J.
Skinner and Suraj Srinivasan, The Accounting Review, September 2012, Vol.
87, No. 5, pp. 17371765.
We study events surrounding
ChuoAoyama's failed audit of Kanebo, a large Japanese cosmetics company
whose management engaged in a massive accounting fraud. ChuoAoyama was PwC's
Japanese affiliate and one of Japan's largest audit firms. In May 2006, the
Japanese Financial Services Agency (FSA) suspended ChuoAoyama for two months
for its role in the Kanebo fraud. This unprecedented action followed a
series of events that seriously damaged ChuoAoyama's reputation. We use
these events to provide evidence on the importance of auditors' reputation
for quality in a setting where litigation plays essentially no role. Around
one quarter of ChuoAoyama's clients defected from the firm after its
suspension, consistent with the importance of reputation. Larger firms and
those with greater growth options were more likely to leave, also consistent
with the reputation argument.
Rather than just use statistical inference tests
on logit model Zstatistics, it struck me that in statistics journals the
referees might've requested crossvalidation tests on holdout samples of firms
that changed auditors and firms that did not change auditors.
I do find somewhat more frequent
crossvalidation studies in finance, particularly in the areas of discriminant
analysis in bankruptcy prediction modes.
Instances of crossvalidation in accounting
research journals seem to have died out in the past 20 years. There are earlier
examples of crossvalidation in accounting research journals. Several examples
are cited below:
"A field study examination of budgetary
participation and locus of control," by Peter Brownell, The Accounting
Review, October 1982 
http://www.jstor.org/discover/10.2307/247411?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203
"Information choice and utilization in an
experiment on default prediction," AbdelKhalik and KM ElSheshai 
Journal of Accounting Research, 1980 
http://www.jstor.org/discover/10.2307/2490581?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203
"Accounting ratios and the prediction of
failure: Some behavioral evidence," by Robert Libby, Journal of
Accounting Research, Spring 1975 
http://www.jstor.org/discover/10.2307/2490653?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203
There are other examples of crossvalidation
in the 1970s and 1980s, particularly in bankruptcy prediction.
I have trouble finding illustrations of
crossvalidation in the accounting research literature in more recent years. Has
the interest in crossvalidating waned along with interest in validating
accountics research? Or am I just being careless in my search for illustrations?
Reverse Regression
"Solution to Regression Problem," by David Giles, Econometrics
Beat: Dave Giles’ Blog, University of Victoria, December 26, 2013 
http://davegiles.blogspot.com/2013/12/solutiontoregressionproblem.html
O.K.  you've had long enough to think about that
little regression problem I
posed the other day.
It's time to put you
out of your misery!
Here's the problem again, with a solution.
Problem:
Suppose that we estimate the following regression model by OLS:
y_{i} = α + β x_{i} +
ε_{i} .
The model has a single regressor, x, and the point
estimate of β turns out to be 10.0.
Now consider the "reverse regression", based on
exactly the same data:
x_{i} = a + b y_{i} +
u_{i} .
What can we say about the value of the OLS point
estimate of b?
 It will be 0.1.
 It will be less than or equal to 0.1.
 It will be greater than or equal to 0.1.
 It's impossible to tell from the information
supplied.
Solution:
Continued in article
David Giles' Top Five Econometrics Blog Postings for 2013
Econometrics Beat: Dave Giles’ Blog, University of Victoria, December
31, 2013 
http://davegiles.blogspot.com/2013/12/mytop5for2013.html
Everyone seems to be doing it at this time of the year.
So, here are the five most popular new posts on this blog in 2013:

Econometrics and "Big Data"

Ten Things for Applied Econometricians to Keep in Mind

ARDL Models  Part II  Bounds Tests

The Bootstrap  A NonTechnical Introduction

ARDL Models  Part I
Thanks for reading, and for your comments.
Happy New Year!
Jensen Comment
I really like the way David Giles thinks and writes about econometrics. He does
not pull his punches about validity testing.
Econometrics Beat: Dave Giles' Blog 
http://davegiles.blogspot.com/
Back to work, and back to reading:

Basturk, N., C. Cakmakli, S. P. Ceyhan, and H. K. van Dijk,
2013. Historical developments in Bayesian econometrics after
Cowles Foundation monographs 10,14. Discussion Paper 13191/III,
Tinbergen Institute.

Bedrick, E. J., 2013. Two useful reformulations of the
hazard ratio. American Statistician, in press.

Nawata, K. and M. McAleer, 2013. The maximum number of
parameters for the Hausman test when the estimators are from
different sets of equations. Discussion Paper 13197/III,
Tinbergen Institute.

Shahbaz, M, S. Nasreen, C. H. Ling, and R. Sbia, 2013.
Causality between trade openness and energy consumption: What
causes what high, middle and low income countries. MPRA Paper
No. 50832.

Tibshirani, R., 2011. Regression shrinkage and selection
via the lasso: A retrospective. Journal of the Royal
Statistical Society, B, 73, 273282.

Zamani, H. and N. Ismail, 2014. Functional form for the
zeroinflated generalized Poisson regression model.
Communications in Statistics  Theory and Methods, in press.
Once upon a time, when all the world and you
and I were young and beautiful, there lived in the
ancient town of Metrika a young boy by the name of Joe.
Now, young Joe was a talented lad, and his home town was
prosperous and filled with happy folk  Metricians, they were
called. Joe was a member of the Econo family, and his ancestors
had been among the foundingfathers of the town. Originating in
the neighbouring city of Econoville, Joe
Econometrician's forebears had arrived in Metrika not long after
the original settlers of that town  the Biols (from nearby
Biologica), and the unfortunately named Psychos (from the hamlet
of Psychovia).
In more recent times, other families (or "specialists", as they
were sometimes known) had also established themselves in the
town, and by the time that Joe was born there was already a
sprinkling of Clios (from the ancient city of Historia), and
even a few Environs. Hailing from the suburbs of Environmentalia,
the Environs were regarded with some disdain by many of the more
established families of Metrika.
Metrika began as a small village  little more than a coachstop
and a mandatory tavern at a junction in the highway running from
the ancient data mines in the South, to the great city of
Enlightenment, far to the North. In Metrika, the transporters of
data of all types would pause overnight on their long journey;
seek refreshment at the tavern; and swap tales of their
experiences on the road.
To be fair, the data transporters were more than just humble
freight carriers. The raw material that they took from the data
mines was largely unprocessed. The vast mountains of raw numbers
usually contained valuable gems and nuggets of truth, but
typically these were buried from sight. The data transporters
used the insights that they gained from their raucous,
beerfired discussions and arguments (known locally as
"seminars") with the Metrika yokels locals at
the tavern to help them to sift through the data and extract the
valuable jewels. With their loads considerably lightened, these
"dataminers" then continued on their journey to the City of
Enlightenment in a much improved frame of mind, hangovers
nothwithstanding!
Over time, the town of Metrika prospered and grew as the talents
of its citizens were increasingly recognized and valued by those
in the surrounding districts, and by the data miners
transporters.
Young Joe grew up happily, supported by his family of
econometricians, and he soon developed the skills that were
expected of his societal class. He honed his computing skills;
developed a good nose for "dodgy" data; and studiously broadened
and deepened his understanding of the various tools wielded by
the artisans in the neighbouring town of Statsbourg.
In short, he was a model child!
But  he was torn! By the time that he reached the tender age of
thirteen, he felt the need to make an important,
lifedetermining, decision.
Should he align his talents with the burly crew who
frequented the gym near his home  the macroeconometricians  or
should he throw in his lot with the physically challenged bunch
of empirical economists known locally as the
microeconometricians?
What a tough decision! How to decide?
He discussed his dilemma with his parents, aunts, and uncles.
Still, the choice was unclear to him.
Then, one fateful day, while sitting by the side of the highway
and watching the dataminers pass by with their increasingly
heavy loads, the answer came to him! There was a simple
solution  he would form his own breakaway movement that was
free of the shackles of his Econo heritage.
Overwhelmed with excitement, Joe raced back to the tavern to
announce to the seminar participants locals
that henceforth he was to be known as a Data Scientist.
As usual, the locals largely ignored what he was saying, and
instead took turns at talking loudly about things that they
thought would make them seem important to their peers. Finally,
though, after many interruptions, and the consumption of copious
quantities of ale, Joe was able to hold their attention.
"You see", he said, "the data that are now being mined, and
transported to the City of Enlightenment, are available in such
vast quantities that the truth must lie within them."
"All of this energy that we've been expending on building
economic models, and then using the data to test their validity
 it's a waste of time! The data are now so vast that the models
are superfluous."
(To be perfectly truthful, he probably used words of one
syllable, but I think you get the idea.)
"We don't need to use all of those silly simplifying
assumptions that form the basis of the analysis being undertaken
by the microeconometricians and macroeonometricians."
(Actually, he slurred these last three words due to a mixture of
youthful enthusiasm and a mouthful of ale.)
"Their models are just a silly game, designed to create the
impression that they're actually adding some knowledge to the
information in the data. No, all that we need to do is to gather
together lots and lots of our tools, and use them to drill deep
into the data to reveal the true patterns that govern our
lives."
"The answer was there all of the time. While we referred to
those Southerners in disparaging terms, calling them "data
miners" as if such activity were beneath the dignity of serious
modellers such as ourselves, in reality datamining is our
future. How foolish we were!"
Now, it must be said that there were a few older econometricians
who were somewhat unimpressed by Joe's revelation. Indeed, some
of them had an uneasy feeling that they'd heard this sort of
talk before. Amid much headscratching, beardstroking, and
alequaffing, some who were present that day swear they heard
mention of longlost names such as Koopmans and Vining. Of
course, we'll never know for sure.
However, young Joe was determined that he had found his destiny.
A Data Scientist he would be, and he was convinced that others
would follow his lead. Gathering together as many calculating
tools as he could lay his hands on, Joe hitched a ride North, to
the great City of Enlightenment. The protestations of his family
and friends were to no avail. After all, as he kept insisting,
we all know that "E" comes after "D".
And so, Joe was last seen sitting in a large wagon of data,
trundling North while happily picking through some particularly
interesting looking nuggets, and smiling the smile of one who
knows the truth.
To this day, econometricians gather, after a hard day of
modelling, in the taverns of Metrika. There, they swap tales of
new theories, interesting computer algorithms, and even the
characteristics of their data. Occasionally, Joe's departure
from the town is recalled, but what became of him, or his
followers, we really don't know. Perhaps he never actually found
the City of Enlightenment after all. (Shock, horror!)
And that, dear children, is what can happen to you  yes, even
you  if you don't eat all of your vegetables, or if you believe
everything that you hear at seminars the
tavern.
"Some
Thoughts About Accounting Scholarship," by Joel Demski, AAA President's
Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf
Some Thoughts on Accounting Scholarship From Annual Meeting Presidential
Address, August 22, 2001
Tradition calls for me to reveal plans and aspirations for the coming year.
But a slight deviation from tradition will, I hope, provide some perspective
on my thinking.
We have, in the past half century, made considerable strides in our
knowledge of accounting institutions. Statistical connections between
accounting measures and market prices, optimal contracting, and professional
judgment processes and biases are illustrative. In the process we have
raised the stature, the relevance, and the sheer excitement of intellectual
inquiry in accounting, be it in the classroom, in the cloak room, or in the
journals.
Of late, however, a malaise appears to have settled in. Our progress has
turned flat, our tribal tendencies have taken hold, and our joy has
diminished.
Some Warning Signs
Some Warning Signs One indicator is our textbooks, our primary communication
medium and our statement to the world about ourselves. I see several
patterns here. One is the unrelenting march to make every text look like
People magazine. Form now leads, if not swallows, substance. Another is the
insatiable appetite to list every rule published by the FASB (despite the
fact we have a tidal wave thanks to DIG, EIFT, AcSEC, SABs, and what have
you). Closely related is the interest in fads. Everything, including this
paragraph of my remarks, is now subject to a valueadded test. Benchmarking,
strategic vision, and EVA ® are everywhere. Foundations are nowhere.
Building blocks are languishing in appendices and wastebaskets.
A second indicator is our journals. They have proliferated in number. But we
struggle with an intertemporal sameness, with incremental as opposed to
discontinuous attempts to move our thinking forward, and with referee
intrusion and voyeurism. Value relevance is a currently fashionable approach
to identifying statistical regularities in the financial market arena, just
as a focus on readily observable components of compensation is a currently
fashionable dependent variable in the compensation arena.
Yet we know
measurement error abounds,
that other sources of information are both
present and hardly unimportant, that compensation is broadbased and intertemporally managed, and that compensating wage differentials are part
of the stew.
Yet we continue on the comfortable path of sameness.
A third indicator is our work habits. We have embraced, indeed been
swallowed by, the multiple adjective syndrome, or MAS: financial, audit,
managerial, tax, analytic, archival, experimental, systems, cognitive, etc.
This applies to our research, to our reading, to our courses, to our
teaching assignments, to our teaching, and to the organization of our Annual
Meeting. In so doing, we have exploited specialization, but in the process
greatly reduced communication networks, and taken on a near tribal
structure.
A useful analogy here is linearization. In accounting we linearize
everything in sight: additive components on the balance sheet, linear cost
functions, and the most glaring of all, the additive representation inherent
in ABC, which by its mere structure denies the scope economy that causes the
firm to jointly produce that set of products in the first place.
Linearization denies interaction, denies synergy; and our recent propensity
for multiple adjectives does precisely the same to us. We are doing to
ourselves what we’ve done to our subject area. What, we might ask, happened
to accounting? Indeed, I worry we will someday have a section specialized in
depreciation or receivables or intangibles.
I hasten to add this particular tendency has festered for some time. Rick
Antle, discussing the “Intellectual Boundaries in Accounting Research” at
the ’88 meeting observed:
In carving out tractable pieces of institutionally defined problems, we
inevitably impose intellectual boundaries. ... My concern arises when,
instead of generating fluid, useful boundaries, our processes of
simplification lead to rigid, dysfunctional ones. (6/89 Horizons, page
109).
I fear we have perfected and made a virtue out of Rick’s concern. Fluid
boundaries are now held at bay by our work habits and natural defenses.
A final indicator is what appears to be coming down the road, our work in
progress. Doctoral enrollment is down, a fact. It is also arguably factual
that doctoral training has become tribal. I, personally, have witnessed this
at recent Doctoral and New Faculty Consortia, and in our recruiting at UF.
This reinforces the visible patterns in our textbooks, in our journals, and
in our work habits. Some Contributors
Some Contributors
These patterns, of course, are not accidental. They are largely endogenous.
And I think it is equally instructive to sketch some of the contributors.
One contributor is employers, their firms, and their professional
organizations. Employers want and lobby for the student well equipped with
the latest consulting fad, or the student well equipped to transition into a
billable audit team member or tax consultant within two hours of the first
day of employment. Immediacy is sought and championed, though with the
caveat of criticalthinking skills somehow being added to the stew.
Continued in article
Jensen
Comment
I agree with much of what Joel said, but I think he overlooks what I think is a
major problem in accounting scholarship. That major problem in my viewpoint is
the takeover of accountancy doctoral programs in North America where accounting
dissertations are virtually not acceptable unless they have equations 
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms
Recommendation 2 of the American Accounting
Association Pathways Commission (emphasis added)
Scapbook1083
http://www.trinity.edu/rjensen/TheoryTar.htm#Scrapbook1083 
Promote accessibility
of doctoral education by allowing for flexible content and structure
in doctoral programs and developing multiple pathways for degrees.
The current path to an accounting Ph.D. includes lengthy, fulltime
residential programs and research training that is for the most
part confined to quantitative rather than qualitative methods.
More flexible programs  that might be parttime, focus on applied
research and emphasize training in teaching methods and curriculum
development  would appeal to graduate students with professional
experience and candidates with families, according to the report.
http://commons.aaahq.org/groups/2d690969a3/summary 
It has
been well over a year in which I've scanned the media for signs of change. But
in well over a year I've seen little progress and zero encouragement that
accounting doctoral programs and our leading accounting research journals are
going to change. A necessary condition remains that an accounting doctoral
dissertation and an Accounting Review article is not acceptable unless it
has equations.
Accounting
scholarship in doctoral programs is still "confined to quantitative rather than
qualitative methods." The main reason is simple. Quantitative research is
easier.
My theory is that
accountics science gained dominance in accounting research, especially in North
American accounting Ph.D. programs, because it abdicated responsibility:
1.
Most accountics scientists buy data, thereby avoiding the greater cost and
drudgery of collecting data.
2.
By
relying so heavily on purchased data, accountics scientists abdicate
responsibility for errors in the data.
3.
Since adding missing variable data to the public database is generally not at
all practical in purchased databases, accountics scientists have an excuse for
not collecting missing variable data.
4.
Software packages for modeling and testing data abound. Accountics researchers
need only feed purchased data into the hopper of statistical and mathematical
analysis programs. It still takes a lot of knowledge to formulate hypotheses and
to understand the complex models. But the really hard work of collecting data
and error checking is avoided by purchasing data.
Some Thoughts About Accounting Scholarship," by Joel Demski, AAA
President's Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf
. . .
A second indicator is our journals. They have proliferated in number. But we
struggle with an intertemporal sameness, with incremental as opposed to
discontinuous attempts to move our thinking forward, and with referee intrusion
and voyeurism. Value relevance is a currently fashionable approach to
identifying statistical regularities in the financial market arena, just as a
focus on readily observable components of compensation is a currently
fashionable dependent variable in the compensation arena.
Yet we know measurement error abounds, that other sources of information are
both present and hardly unimportant, that compensation is broadbased and
intertemporally managed, and that compensating wage differentials are part of
the stew. Yet we continue on the comfortable path of sameness.
It has
been well over a year since the Pathways Report was issued. Nobody is listening
on the AECM or anywhere else! Sadly the accountics researchers who generate this
stuff won't even discuss their research on the AECM or the AAA Commons:
"Frankly,
Scarlett, after I get a hit for my resume in The Accounting Review I just
don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
Bob Jensen's threads on validity testing in accountics science 
http://www.trinity.edu/rjensen/TheoryTAR.htm
How did academic accounting research
become a pseudo science?
http://www.trinity.edu/rjensen/theory01.htm#WhatWentWrong

Avoiding applied research for practitioners and failure to attract
practitioner interest in academic research journals 
"Why business ignores the business schools," by Michael Skapinker
Some ideas for applied research 
http://www.trinity.edu/rjensen/theory01.htm#AcademicsVersusProfession

Clinging to Myths in Academe and Failure to Replicate and Authenticate
Research Findings
http://www.trinity.edu/rjensen/theory01.htm#Myths

Poorly designed and executed experiments that are rarely, I mean very, very
rarely, authenticated
http://www.trinity.edu/rjensen/theory01.htm#PoorDesigns

Discouragement
of case method research by leading journals (TAR, JAR, JAE, etc.) by turning
back most submitted cases 
http://www.trinity.edu/rjensen/000aaa/thetools.htm#Cases

Economic Theory Errors
Where analytical mathematics in accountics research made a huge mistake
relying on flawed economic theory and interval/ratio scaling
http://www.trinity.edu/rjensen/theory01.htm#EconomicTheoryErrors

Accentuate the Obvious and Avoid the Tough Problems (like fraud) for Which
Data and Models are Lacking
http://www.trinity.edu/rjensen/theory01.htm#AccentuateTheObvious

Financial Theory Errors
Where capital market research in accounting made a huge mistake by relying
on CAPM
http://www.trinity.edu/rjensen/theory01.htm#AccentuateTheObvious

Philosophy of Science is a Dying Discipline
Most scientific papers are probably wrong
http://www.trinity.edu/rjensen/theory01.htm#PhilosophyScienceDying
 An Unlikely Debate Between Leading
Accountics Researchers:
CAN SCIENCE HELP SOLVE THE ECONOMIC CRISIS?
http://commons.aaahq.org/posts/2ae5ce5297
History of Quantitative Finance
"Four features in appreciation of the life and work of Benoit Mandelbrot,"
Simoleon Sense, February 3, 2011 
http://www.simoleonsense.com/fourfeaturesinappreciationofthelifeandworkofbenoitmandelbrot/
"Psychology’s
Treacherous Trio: Confirmation Bias, Cognitive Dissonance, and Motivated
Reasoning," by sammcnerney, Why We Reason, September 7, 2011 
Click Here
http://whywereason.wordpress.com/2011/09/07/psychologystreacheroustrioconfirmationbiascognitivedissonanceandmotivatedreasoning/
Gasp! How could an accountics scientist question such things? This is
sacrilege!
Let me end my remarks with a question: Have Ball and
Brown (1968)—and Beaver (1968) for that matter, if I can bring Bill Beaver into
it—have we had too much influence on the research agenda to the point where
other questions and methods are being overlooked?
Phil Brown of Ball and Brown Fame
"How Can We Do Better?" by Phillip R. Brown (of Ball and Brown Fame),
Accounting Horizons (Forum on the State of Accounting Scholarship),
December 2013 
http://aaajournals.org/doi/full/10.2308/acch10365
Not Free
Philip R. Brown AM is an Honorary Professor at The
University of New South Wales and Senior Honorary Research Fellow at The
University of Western Australia.
I acknowledge the thoughtful comments of Sudipta Basu,
who arranged and chaired this session at the 2012 American Accounting
Association (AAA) Annual Meeting, Washington, DC.
The video presentation can be accessed by clicking the
link in Appendix A.
Corresponding author: Philip R. Brown AM.
Email:
philip.brown@uwa.edu.au
When Sudipta Basu asked me whether I
would join this panel, he was kind enough to share with me the proposal
he put to the conference organizers. As background to his proposal,
Sudipta had written:
Analytical and
empirical researchers generate numerous results about accounting, as
do logicians reasoning from conceptual frameworks. However, there
are few definitive tests that permit us to negate propositions about
good accounting.
This panel aims to
identify a few “most wrong” beliefs held by accounting
experts—academics, regulators, practitioners—where a “most wrong”
belief is one that is widespread and fundamentally misguided about
practices and users in any accounting domain.
While Sudipta's proposal resonated
with me, I did wonder why he asked me to join the panel, and whether I
am seen these days as just another “grumpy old man.” Yes, I am no doubt
among the oldest here today, but grumpy? You can make your own mind on
that, after you have read what I have to say.
This essay begins with
several gripes about editors, reviewers, and authors, along with
suggestions for improving the publication process for all concerned. The
next section contains observations on financial accounting standard
setting. The essay concludes with a discussion of research myopia,
namely, the unfortunate tendency of researchers to confine their work to
familiar territory, much like the drunk who searches for his keys under
the street light because “that is where the light is.”
ON EDITORS AND REVIEWERS, AND
AUTHORS 
I have never been a regular editor,
although I have chaired a journal's board of management and been a guest
editor, and I appointed Ray Ball to his first editorship (Ray was the
inaugural editor of the Australian Journal of Management). I
have, however, reviewed many submissions for a whole raft of journals,
and written literally hundreds of papers, some of which have been
published. As I reflect on my involvement in the publications process
over more than 50 years, I do have a few suggestions on how we can do
things better. In the spirit of this panel session, I have put my
suggestions in the form of gripes about editors, reviewers, and authors.
Oneeyed editors—and reviewers—who
define the subject matter as outside their journal's interests are my
first gripe; and of course I except journals with a mission that is
stated clearly and in unequivocal terms for all to see. The best editors
and the best reviewers are those who are openminded who avoid
prejudging submissions by reference to some particular set of questions
or modes of thinking that have become popular over the last five years
or so. Graeme Dean, former editor of Abacus, and Nick Dopuch,
former editor of the Journal of Accounting Research, are fine
examples, from years gone by, of what it means to be an excellent
editor.
Editors who are reluctant to entertain
new ways of looking at old questions are a second gripe. Many years ago
I was asked to review a paper titled “The Last Word on …” (I will not
fill in the dots because the author may still be alive.) But at the time
I thought, what a strange title! Can any academic reasonably believe
they are about to have the last say on any important accounting issue?
We academics thrive on questioning previous works, and editors and their
reviewers do well when they nurture this mindset.
My third gripe concerns editors who,
perhaps unwittingly, send papers to reviewers with vested interests and
the reviewers do not just politely return the paper to the editor and
explain their conflict of interest. A fourth concerns editors and
reviewers who discourage replications: their actions signal a
disciplinary immaturity. I am referring to rejecting a paper that
repeats an experiment, perhaps in another country, purely because it has
been done before. There can be good reasons for replicating a study, for
example if the external validity of the earlier study legitimately can
be questioned (perhaps different outcomes are reasonably expected in
another institutional setting), or if methodological advances indicate a
likely design flaw. Last, there are editors and reviewers who do not
entertain papers that fail to reject the null hypothesis. If the
alternative is wellreasoned and the study is sound, and they can be big
“ifs,” then failure to reject the null can be informative, for it may
indicate where our knowledge is deficient and more work can be done.^{1}
It is not only editors and reviewers
who test my emotional state. I do get a bit short when I review papers
that fail to appreciate that the ideas they are dealing with have long
yet uncited histories, sometimes in journals that are not based in North
America. I am particularly unimpressed when there is an
alltootransparent and excessive citation of works by editors and
potential reviewers, as if the judgments of these folks could possibly
be influenced by that behavior. Other papers frustrate me when they are
technically correct but demonstrate the trivial or the obvious, and fail
to draw out the wider implications of their findings. Then there are
authors who rely on unnecessarily coarse “control” variables which, if
measured more finely, may well threaten their findings.^{2}
Examples are dummy variables for common law/code law countries, for
“high” this and “low” that, for the presence or absence of an
audit/nomination/compensation committee, or the use of an industry or
sector variable without saying which features of that industry or sector
are likely to matter and why a binary representation is best. In a
nutshell, I fear there may be altogether too many dummies in financial
accounting research!
Finally, there are the
International Financial Reporting Standards (IFRS) papers that fit into
the category of what I describe as “before and after studies.” They
focus on changes following the adoption of IFRS promulgated by the
Londonbased International Accounting Standards Board (IASB). A major
concern, and I have been guilty too, is that these papers, by and large,
do not deal adequately with the dynamics of what has been for many
countries a period of profound change. In particular, there is a
tradeoff between (1) experimental noise from including too long a
“before” and “after” history, and (2) not accommodating the process of
change, because the “before” and “after” periods are way too short.
Neither do they appear to control convincingly for other timerelated
changes, such as the introduction of new accounting and auditing
standards, amendments to corporations laws and stock exchange listing
rules, the adoption of corporate governance codes of conduct, more
stringent compliance monitoring and enforcement mechanisms, or changes
in, say stock, market liquidity as a result of the introduction of new
trading platforms and protocols, amalgamations among market providers,
the explosion in algorithmic trading, and the increasing popularity
among financial institutions of trading in “dark pools.”
ON FINANCIAL ACCOUNTING STANDARD
SETTING 
I count a number of highly experienced
financial accounting standard setters among my friends and professional
acquaintances, and I have great regard for the difficulties they face in
what they do. Nonetheless, I do wonder
. . .
A not uncommon belief among academics
is that we have been or can be a help to accounting standard setters. We
may believe we can help by saying something important about whether a
new financial accounting standard, or set of standards, is an
improvement. Perhaps we feel this way because we have chosen some
predictive criterion and been able to demonstrate a statistically
reliable association between accounting information contained in some
database and outcomes that are consistent with that criterion. Ball and
Brown (1968, 160) explained the choice of criterion this way: “An
empirical evaluation of accounting income numbers requires agreement as
to what realworld outcome constitutes an appropriate test of
usefulness.” Note their reference to a requirement to agree on the test.
They were referring to the choice of criterion being important to the
persuasiveness of their tests, which were fundamental and related to the
“usefulness” of U.S. GAAP income numbers to stock market investors 50
years ago. As time went by and the financial accounting literature grew
accordingly, financial accounting researchers have looked in many
directions for capital market outcomes in their quest for publishable
results.
Research on IFRS can be used to
illustrate my point. Those who have looked at the consequences of IFRS
adoption have mostly studied outcomes they believed would interest
participants in equity markets and to a less extent parties to debt
contracts. Many beneficial outcomes have now been claimed,^{4}
consistent with benefits asserted by advocates of IFRS. Examples are
more comparable accounting numbers; earnings that are higher “quality”
and less subject to managers' discretion; lower barriers to
international capital flows; improved analysts' forecasts; deeper and
more liquid equity markets; and a lower cost of capital. But the
evidence is typically coarse in nature; and so often the results are
inconsistent because of the different outcomes selected as tests of
“usefulness,” or differences in the samples studied (time periods,
countries, industries, firms, etc.) and in research methods (how models
are specified and variables measured, which estimators are used, etc.).
The upshot is that it can be difficult if not impossible to reconcile
the many inconsistencies, and for standard setters to relate reported
findings to the judgments they must make.
Despite the many largely capital
market outcomes that have been studied, some observers of our efforts
must be disappointed that other potentially beneficial outcomes of
adopting IFRS have largely been overlooked. Among them are the wider
benefits to an economy that flow from EU membership (IFRS are required),^{5}
or access to funds provided by international agencies such as the World
Bank, or less time spent by CFOs of international companies when
comparing the financial performance of divisions operating in different
countries and on consolidating the financial statements of foreign
subsidiaries, or labor market benefits from more flexibility in the
supply of professionally qualified accountants, or “better” accounting
standards from pooling the skills of standard setters in different
jurisdictions, or less costly and more consistent professional advice
when accounting firms do not have to deal with as much crosscountry
variation in standards and can concentrate their highlevel technical
skills, or more effective compliance monitoring and enforcement as
regulators share their knowledge and experience, or the usage of IFRS by
“millions (of small and medium enterprises) in more than 80 countries” (Pacter
2012), or in some cases better education of tomorrow's accounting
professionals.^{6}
I am sure you could easily add to this list if you wished.
In sum, we can help standard setters,
yes, but only in quite limited ways.^{7}
Standard setting is inherently political in nature and will remain that
way as long as there are winners and losers when standards change. That
is one issue. Another is that the results of capital markets studies are
typically too coarse to be definitive when it comes to the detailed
issues that standard setters must consider. A third is that accounting
standards have ramifications extending far beyond public financial
markets and a much more expansive view needs to be taken before we can
even hope to understand the full range of benefits (and costs) of
adopting IFRS.
Let me end my remarks
with a question: Have Ball and Brown (1968)—and Beaver (1968) for that
matter, if I can bring Bill Beaver into it—have we had too much
influence on the research agenda to the point where other questions and
methods are being overlooked?
February 27, 2014 Reply from Paul Williams
Bob,
If you read that last Horizon's section provided by "thought leaders" you
realize the old guys are not saying anything they could not have realized 30
years ago. That they didn't realize it then (or did but was not in their
interest to say so), which led them to run journals whose singular purpose
seemed to be to enable they and their cohorts to create politically correct
academic reputations, is not something to ask forgiveness for at the end of
your career.
Like the sinner on his deathbed asking
for God's forgiveness , now is a hell of a time to suddenly get religion. If
you heard these fellows speak when they were young they certainly didn't
speak with voices that adumbrated any doubt that what they were doing was
rigorous research and anyone doing anything else was the intellectual hoi
polloi.
Oops, sorry we created an academy that
all of us now regret, but, hey, we got ours. It's our mess, but now we are
telling you its a mess you have to clean up. It isn't like no one was saying
these things 30 years ago (you were as well as others including yours truly)
and we have intimate knowledge of how we were treated by these geniuses.
David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics
Science"
Bob Jensen
February 19, 2014
SSRN Download:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296
Abstract
For operational convenience I define accountics science as
research that features equations and/or statistical inference. Historically,
there was a heated debate in the 1920s as to whether the main research
journal of academic accounting, The Accounting Review (TAR) that
commenced in 1926, should be an accountics journal with articles that mostly
featured equations. Practitioners and teachers of college accounting won
that debate.
TAR articles and accountancy doctoral dissertations prior to
the 1970s seldom had equations. For reasons summarized below, doctoral
programs and TAR evolved to where in the 1990s there where having equations
became virtually a necessary condition for a doctoral dissertation and
acceptance of a TAR article. Qualitative normative and case method
methodologies disappeared from doctoral programs.
What’s really meant by “featured
equations” in doctoral programs is merely symbolic of the fact that North
American accounting doctoral programs pushed out most of the accounting to
make way for econometrics and statistics that are now keys to the kingdom
for promotion and tenure in accounting schools 
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms
The purpose of this paper is to make a case that the accountics science
monopoly of our doctoral programs and published research is seriously
flawed, especially its lack of concern about replication and focus on
simplified artificial worlds that differ too much from reality to creatively
discover findings of greater relevance to teachers of accounting and
practitioners of accounting. Accountics scientists themselves became a Cargo
Cult.
Shielding Against Validity Challenges in Plato's Cave 
http://www.trinity.edu/rjensen/TheoryTAR.htm
Common Accountics Science and Econometric Science Statistical Mistakes 
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
The Cult of Statistical Significance:
How Standard Error Costs Us Jobs, Justice, and Lives 
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm
How Accountics Scientists Should Change:
"Frankly, Scarlett, after I get a hit for my resume in The Accounting Review
I just don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left
of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
What went wrong in accounting/accountics research? 
http://www.trinity.edu/rjensen/theory01.htm#WhatWentWrong
The Sad State of Accountancy Doctoral
Programs That Do Not Appeal to Most Accountants 
http://www.trinity.edu/rjensen/theory01.htm#DoctoralPrograms