# PMean: Writing the methods section of a research paper

I’m teaching a class on Clinical Research Methodology and at least a few of the students are confused about what to put in the methods section of a research paper or a thesis. They’re confused? I’m even more confused than they are. Every paper and every thesis is different, so it is impossible to offer any coherent guidance. But let me try anyway. Continue reading

# Recommended: Data science curriculum roadmap

How do you teach data science? That’s not an easy question, because data science means different things to different people. This site shows different curricula depending on what you want your program to emphasize. Continue reading

# Recommended: Separating Unique and Duplicated Observations Using PROC SORT in SAS 9.3 and Newer Versions

SAS has some very powerful ways to find duplicate values and to store the duplicates separate from the unique values. Many of these use the sort procedure. Here is a nice guideline for what would otherwise be very difficult to figure out on your own. Continue reading

# Recommended: What would Florence Nightingale make of big data?

This is a nice video, professionally produced and very short (4 minutes) that shows the importance Florence Nightingale attached to Statistics. It reviews how she used Statistics aggressively to lobby for improvements to health care, and speculates on what she would think about the efforts today to use big data for decision making. The narrator is David Spiegelhalter, a famous statistician. Continue reading

# Recommended: When Big Data goes to school

File this under the “dark side” of data science. Alfie Kohn is a critic of many of the motivational methods used in business and education, and he makes many good points in this blog post about relying on readily available data without questioning its quality. Continue reading

# Recommended: Case for omitting tied observations in the two-sample t-test and the Wilcoxon-Mann-Whitney Test

When you are running a non-parametric test, like the Wilcoxon-Mann-Whitney test, you can only be 100% of the properties of that test (including Type I and Type II error rates) if the data are continuous. If there are ties in the data, the properties of the test are unknown. This paper shows four commonly used approaches for settings where values might be tied and runs simulations to measure Type I and Type II error rates for both the two-sample t-test and the Wilcoxon-Mann-Whitney test under a range of tied values and a range of distributions. The results are, at least to me, quite surprising. Continue reading