Akshay Ranganath

Notes from "Naked Statistics" by Charles Wheelan

Blog Post created by Akshay Ranganath Employee on Dec 9, 2015

I recently read the book, "Naked Statistics - Stripping the dread from data" by Charles Wheelan. It had a lot of interesting points related to stats that I took notes. I just thought of sharing the same.

 

This could be someone who is embarking on Web Performance and Real User Monitoring.

 

It's easy to lie with statistics, but it's hard to tell the truth without them - Andrejs Dunkels. (p xv)

 

Descriptive statistics exist to simplify, which always implies some loss of nuance or detail. Anyone working with numbers needs to recognize as much. (p 6)

 

Regression analysis is the tool that enables researchers to isolate a relationship between two variables.. while holding constant ("controlling for") the effects of other important variables. (p 11)

 

Limitation of regression analysis: We can isolate a strong association between two variables by using statistical analysis, but we cannot necessarily explain why that relationship exists. (p 12)

 

Median is the point that divides a distribution in half, meaning half the observations lie above the median and half lie below. (p 19)

 

Standard deviation - how dispersed the date are from their mean. (p 23)

 

.an Index which is a descriptive statistic made up of other descriptive statistics..

The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison..

the disadvantage of any index is that it consolidates lots of complex information into a single number. (p 30)

 

Precision reflects the exactitude with which we can express something. (p 37)

 

Accuracy is a measure of whether a figure is broadly consistent with the truth - hence the danger of confusion precision with accuracy. If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy. (p 37)

 

The key lesson is to pay attention to the unit of analysis. Who or what is being described, and is that different from the "who" or "what" being described by someone else? (p 41)

 

The median is not sensitive to outliers (p 43)

.. the median can also do its share of dissembling because it is not sensitive to outliers. (p 43)

 

From a standpoint of accuracy, the median versus mean question revolves around whether the outliers in a distribution distort what is being described or are instead an important part of the message. (p 44)

 

Percentage don't lie - but they can exaggerate. One way to make growth look explosive is to use percentage change to describe some change relative to a low starting point. (p 48)

 

Obviously the flip side is true. A small percentage of an enormous sum can be a big number. (p 49)

 

"You can't manage what you can't measure." True. But you had better be darn sure that what you are measuring is really what you are trying to manage. (p 51)

 

A detailed knowledge of statistics does not deter wrongdoing any more than a detailed knowledge of the law averts criminal behavior. With both statistics and crime, the bad guys often know exactly what they're doing! (p 57)

 

Correlation measures the degree to which two phenomena are related to each one another.. Two variables are positively correlated if a change in one is associated with a change in the other in the same direction.. A correlation is negative if a positive change in one variable is associated with a negative change in the other.. The tricky thing about these kinds of associations is that not every observations fits the pattern. (p 59)

 

The power of correlation as a statistical tool is that we can encapsulate an association between two variables in a single descriptive statistic: the correlation coefficient.. The second attractive feature of the correlation coefficient is that it has no units attached to it. (p 60)

 

.. the law of large numbers tells us that as the number of independent trials increases, the average of the outcomes will get closer and closer to its expected value. (p 79)

 

.. one of the core lessons of personal finance - is that you should always insure yourself against any adverse contingency that you cannot comfortably afford to withstand. (p 82)

 

The greatest risks are never the ones you can see and measure, but the ones you can't see and therefore can never measure. The ones that seem so far outside the boundary of normal probability that you imagine they could happen in your lifetime - even though, of course, they do happen, more often than you care to realize. (p 99)

 

Probability doesn't make mistakes, people using probability make mistakes. (p 100)

 

People's intuitive conceptions of randomness depart systematically from the laws of chance. (p103)

 

"reversion to mean" - Probability tells us that any outlier - an observation that is particularly far from the mean in one direction or the other - is likely to be followed by outcomes that are more consistent with the long-term average. (p 105)

 

..In other words, when a CEO appears on the cover of Businessweek, sell the stock. (p 107)

 

The broader point here is that our ability to analyze data has grown far more sophisticated than our thinking about what we ought to do with the results.. For all the elegance and precision of probability, there is no substitute for thinking about what calculations we are doing and why we are doing them. (p 109)

 

The core principle underlying the central limit theorem is that a large, properly drawn sample will resemble the population from which it is drawn. (p 128)

 

According to the central limit theorem, the sample means for any population will be distributed roughly as a normal distribution around the population mean. (p 132)

 

The standard error measures the dispersion of the sample means... The standard error is the standard deviation of the sample means! (p 136)

 

Statistics cannot prove anything with certainty. Instead, the power of statistical inference derives from observing the most likely explanation for that outcome. (p 144)

 

The point of statistics is not do myriad rigorous mathematical calculations, the point is to gain insight into meaningful social phenomena. Statistical inference is really just the marriage of two concepts.. data and probability (with a little help from the central theorem). (p 146)

 

One of the most common tools in statistical inference is hypothesis testing.. Any statistical inference begins with an implicit or explicit null hypothesis. This is our starting assumption, which will be rejected or not on the basis of subsequent statistical analysis. (p 146) It may seem counterintuitive, but researchers often create a null hypothesis in the hope of being able to reject it. (p 148)

 

The p-value is the specific probability of getting a result at least as extreme as the one you've observed if the null hypothesis is true. (p 152)

 

.. when we can reject a null hypothesis at some reasonable significance level, the results are said to be "statistically significant". (p 153)

 

The fundamental difference between a poll and other forms of sampling is that the sample statistic we care about will not be a mean but rather a percentage or proportion. (p 171)

 

Bad polling results do not typically stem from bad math when calculating the standard errors. Bad polling typically stem from a biased sample, or bad questions, or both. (p 178) The real challenge of polling is twofold: finding and reaching that proper sample; and eliciting information from that representative group in a way that accurately reflects what its members believe. (p 183)

 

3 things to keep in mind in polling: (p 178-181)

  • Is this an accurate sample of the population whose opinions we re trying to capture?
  • Have the questions been posed in a way that elicits accurate information on the topic of interest? When we solicit public opinion, the phrasing of the question and the choice of language can matter enormously.
  • Are respondents telling the truth?

 

.. regression analysis allows us to quantify the relationship between a particular variable and an outcome that we care about while controlling for other factors. In other words, we can isolate the effect of one variable.. while holding the effects of other variable constant. (p 186)

 

The problem is that the mechanics of regression analysis are not the hard part; the hard part is determining which variables ought to be considered in the analysis and how that can best be done. (p 187)

 

Once we have a coefficient and a standard error, we can test the null hypothesis that there is in fact no relationship between the explanatory variable and the dependent variable. (p 197)

 

Regression analysis is the hydrogen bomb of the statistics arsenal. (p 213)

 

Reverse causality: A statistical association between A and B does not prove that A causes B. In fact, it's entirely plausible that B is causing A. (p 216)

Outcomes