Posts tagged statistics
“Karl Broman is here putting forward a very interesting problem. Interesting, not only because it involves socks, but because it involves what I would like to call Tiny Data™. The problem is this: Given the Tiny dataset of eleven unique socks, how many socks does Karl Broman have in his laundry in total?“
(via Rasmus Bååth)
IndexMundi contains detailed country statistics, charts, and maps compiled from multiple sources. You can explore and analyze thousands of indicators organized by region, country, topic, industry sector, and type.
What does “causality” mean, and how can you represent it mathematically? How can you encode causal assumptions, and what bearing do they have on data analysis? These types of questions are at the core of the practice of data science, but deep knowledge about them is surprisingly uncommon.
We’ve all heard in school that “correlation does not imply causation,” but what does imply causation?! The gold standard for establishing cause and effect is a double-blind controlled trial (or the AB test equivalent). If you’re working with a system on which you can’t perform experiments, is all hope for scientific progress lost? Can we ever understand systems that we have limited or no control over? This would be a very bleak state of affairs, and fortunately there has been progress in answering these questions in the negative! So what is causality good for? Anytime you decide to take an action, in a business context or otherwise, you’re making some assumptions about how the world operates. That is, you’re making assumptions about the causal effects of possible actions.
I started a series of posts aimed at helping people learn about causality in data science (and science in general), and wanted to compile them all together here in a living index.
The Gini Coefficient, which can measure inequality in any set of numbers, has been in use for a century, but until recently it rarely left the halls of academia. Its one-number simplicity endeared it to political scientists and economists; its usual subject—economic inequality—made it popular with sociologists and policy makers. The Gini Coefficient has been the sort of workhorse metric that college freshmen learn about in survey courses and some PhD statisticians devote a lifetime to. It’s been so useful, so adaptable, that its strange history has survived only as a footnote: the coefficient was developed in 1912 by Corrado Gini, an Italian sociologist and statistician—who also wrote a paper called “The Scientific Basis of Fascism.”