Someone on the Statistical Consulting forum mentioned that she is going to become an independent consultant when she graduates and wanted to find out from people who are currently in that position what the one thing is that they hate most. This email drew a lot of responses including several people who cautioned this women about the difficulties for a young person to become an independent consultant. Here are the thoughts I shared on the thing I hate most and what the issues are with embarking out on your own as an independent consulting early in your career. Continue reading
Category Archives: Statistics
PMean: Those darn commas in SQL
I should know better, but I made a rookie mistake with SQL that took a long time for me to fix. It’s one of those detail oriented things and if you aren’t detail oriented, you can’t call yourself a programmer. Continue reading
Recommended: A Tutorial on Loops in R
This is a very clear, but also very detailed explanation of the for, while, and repeat loops along with the concept of vectorization. A great resource for beginners. Continue reading
Recommended: Oracle Dates and Times
I’m working with R and SQL, and some of the work uses SQLite, and some of the work uses Oracle. There are subtle differences between the two, and for that matter between any two database programs. While there are SQL standards, most packages have minor deviations, or enhancements. Dates in Oracle represent one deviation. In particular, Oracle does not use the ISO 8601 standard date format (yyyy-mm-dd) by default. Here’s a nice overview of how to work with Oracle dates. Continue reading
PMean: What greedy means to a geek
I’ve run across the term “greedy” in several work related contexts, so I thought it might be worth explaining what it means. Continue reading
PMean: One small grant for me, one giant leap for Biostatisticians
I’m so busy these days that it is silly to take on anything new, but I found an opportunity for a small research grant that I might want to submit a proposal for. Continue reading
PMean: By the skin of my teeth
I have to brag a bit. I’m working part-time at Kansas University Medical Center (along with a couple other part-time jobs) and my boss asked me two weeks ago if I was interested in writing a paper on the data analyses I had been working on. It would be submitted to the AMIA 2017 Joint Summit on Translational Research and I’d be the first author. Continue reading
PMean: Turning off large blocks of an R Markdown document
When you’re running a large and complicated program using R Markdown, you can use the CACHE option to save a lot of time. CACHE will notice if a program chunk has stayed the same and avoid running it again. I tend to avoid using the CACHE option, though, because sometimes it fails to execute something that you want executed, even though it looks on the surface like nothing has changed. So I created some simple program chunks that allow me to explicitly turn off parts of the R Markdown program that I don’t need to evaluate at the time. Think of it as a manual cache.
It’s a very simple thing, but one which confounded me for a while, so I am writing about it here. That way I won’t forget six months down the road. Continue reading
PMean: Merging in dplyr is a lot faster
At the Joint Statistics Meetings, I found out that the advantages of some of the new libraries for data manipulation (like dplyr and tidyr) go beyond just the flexibility of the new methods of data manipulation. These libraries produce code that is easier to read and which also runs a lot faster. I did not appreciate how much faster until I tried a test today. Continue reading
Recommended: The FAIR Guiding Principles for scientific data management and stewardship
I’ve always been supportive of efforts to share data. For me, it’s a bit selfish, because I want to find interesting real world examples to use in teaching and on my web pages. But the issue goes way beyond this, of course. Sharing data is an ethical imperative, especially for federally funded research or research that relies on volunteer subjects. It has led to many important discoveries beyond the realm of the original context in which the data was collected. In order for data sharing to be effective, you need to embrace four guiding principles: your data needs to findable, accessible, interoperable, and re-usable. This paper highlights those principles and offers some current examples of data sharing systems.