I’m planning to give a talk on “The Dark Side of Data Science” and I’m hoping to get some interesting references and articles from my colleagues. Here is a first draft of my abstract, with a few references that I am already familiar with. Continue reading

# Recommended: GloVe word vector embeddings

When you are working with text mining, you might want to reduce the dimensionality of your problem. The word2vec algorithm, developed by Tomas Mikolov and others at Google, offers a nice approach. This page shows how to apply this algorithm within R. Continue reading

# Recommended: Practical deep learning for coders

This is a MOOC (Massive Open Online Course) covering deep learning models. I have not taken it, but it comes highly recommended by others. It uses Python as the underlying language. Continue reading

# PMean: A simple structure for documentation

Everybody has different standards for documentation, and if you are already using a standard you like, don’t let me stop you. But if you’ve never used much documentation and decide that you need to do better, here’s a guideline that I developed. Continue reading

# Recommended: Use of Electronic Health Record Data in Clinical Investigations

This press releases announces a “Guidance for Industry” document that the U.S. Food and Drug Administration provides from time to time on technical issues. This document discusses the use of the Electronic Health Record as an additional source of information for prospective clinical trials. Continue reading

# PMean: Grading rubric for computer assignments

I’ve been teaching a variety of classes that require students to run a statistical analysis in a package like SAS or R and report the results. There is a tremendous variety of formats that students use, and I thought it would be helpful to offer some guidance. It would save me time in grading, but more importantly it would emphasize that students need to think about what they produce rather than just tossing together whatever comes out of the computer. The five requirements for homework assignments are they be complete, concise, clear, error-free, and interpretable. Continue reading

# Recommended: A Review of Published Analyses of Case-Cohort Studies and Recommendations for Future Reporting

I got a question about how to analyze a case cohort study in Stata. The person was following the code in a Stata conference presentation but was unsure about some of the details. Always looking for simple explanations that I myself don’t understand well, I found this nice article on how these case cohort studies are written up in the literature. Naturally, it provides a brief explanation of how you analyze data from a case cohort design along with several helpful references. Continue reading

# Recommended: How to be more effective in your professional life

This article starts with a nice anecdote about being dismissive about what someone else is saying ends up hurting you. It also provides a nice structure, POWER, for organizing consulting meetings. POWER stands for Prepare, Open, Work, End, and Reflect. This article was a basis for some of the content in an interesting webinar on consulting. Continue reading

# PMean: How much missingness can you tolerate?

I got a question about how much missing data could you have in a study and still feel comfortable with your data analysis. It’s a question with no hard and fast answer, but I get the question so often that I have developed some general guidance. Continue reading

# Recommended: Bayesian meta-analysis of two proportions in random control trials

I got a question about Bayesian meta-analysis and found this nice teaching example. I’m not sure if the graphs are from the R package bayesmeta, but it looks like it. Continue reading