Just go out and collect enormous amounts of data, put it through the right pre-processing and turn the crank on various statistical learning algorithms – and all your problems will be solved. That is the argument being made in this opinion piece: Alon Halevy, Peter Norvig and Fernando Pereira, The Unreasonable Effectiveness of Data, IEEE […]

# Why do simple techniques work?

My past few posts have been driven by an underlying question that was pointedly raised by someone in a discussion group I follow on linkedin (if you’re curious, this is a Quant Finance group that I follow due to my interest in autonomous agent design and the question was posed by a hedge fund person […]

# ET Interview with Prof. C.R. Rao

My colleague Michael Herrmann kicked off our Information Geometry reading group last week, with a discussion on the Cramer-Rao bound. We’re taking an easy tour around various areas of statistics and geometry before jumping into Amari’s book and related material. By way of introduction, he mentioned an interview article (The ET Interview: Professor C.R. Rao, […]

# The Bad Egg Problem

Quick Summary: How many eggs do one need to randomly sample from a box containing one dozen eggs to be 50% or more confident that there are no bad eggs ? An interesting aspect of grocery shopping is that it offers tons of opportunity to hone your statistics skills. Here is one such instance I […]

# statistical inference

A statistical hypothesis is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.[1] A statistical hypothesis test is a method of statistical inference. Commonly, two statistical data sets are compared, or a data set obtained by sampling is compared against a synthetic data […]

# Analysis of variance (ANOVA)

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as “variation” among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different […]

# Latin square

The “Gamma plus two” method for generating “odd order” magic squares, the“Gamma plus two plus swap” method for generating “singly even order” magicsquares, and Durer’s method for generating “doubly even order” magic squares. By Professor Edward Brumgnach, P.E. City University of New YorkQueensborough Community College In combinatorics and in experimental design, a Latin square is […]

# Simpson’s paradox

In probability and statistics, Simpson’s paradox, or the Yule–Simpson effect, is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data. This result is often encountered in social-science and medical-science statistics,[1] and is particularly confounding when frequency […]

# boy girl paradox

Published on Mar 8, 2016 TED-Ed presented a riddle last week based on a classic probability problem. However in the riddle there is a small and seemingly insignificant detail that changes the calculation. In this video I present the pertinent details of the frog riddle, explain its connection to the boy or girl paradox, and […]

# Hill’s criteria for causation

The Bradford Hill criteria, otherwise known as Hill’s criteria for causation, are a group of minimal conditions necessary to provide adequate evidence of a causal relationship between an incidence and a possible consequence, established by the English epidemiologist Sir Austin Bradford Hill (1897–1991) in 1965. The list of the criteria is as follows: Strength (effect […]