I’m an experienced R programmer trying to learn a little about SQL. One of my good friends who lives totally in the database world (I call her the Teradata Queen), shared a link to a blog post at SQLServerCentral about using R. Microsoft is including R in its SQL Server distribution, so this is an opportunity for a lot of interesting work combining the data manipulation power of SQL Server with the data analysis power of R. Anyway, the blog post explains some of the cost and performance issues associated with R scripts running on a SQL Server CPU. Continue reading
Tag Archives: R software
Recommended: A Tutorial on Loops in R
This is a very clear, but also very detailed explanation of the for, while, and repeat loops along with the concept of vectorization. A great resource for beginners. Continue reading
PMean: Turning off large blocks of an R Markdown document
When you’re running a large and complicated program using R Markdown, you can use the CACHE option to save a lot of time. CACHE will notice if a program chunk has stayed the same and avoid running it again. I tend to avoid using the CACHE option, though, because sometimes it fails to execute something that you want executed, even though it looks on the surface like nothing has changed. So I created some simple program chunks that allow me to explicitly turn off parts of the R Markdown program that I don’t need to evaluate at the time. Think of it as a manual cache.
It’s a very simple thing, but one which confounded me for a while, so I am writing about it here. That way I won’t forget six months down the road. Continue reading
PMean: Merging in dplyr is a lot faster
At the Joint Statistics Meetings, I found out that the advantages of some of the new libraries for data manipulation (like dplyr and tidyr) go beyond just the flexibility of the new methods of data manipulation. These libraries produce code that is easier to read and which also runs a lot faster. I did not appreciate how much faster until I tried a test today. Continue reading
Recommended: Tibbles (Tibbles are a modern take on data frames)
I’m an old dog R programmer who tends to rely on features of R that were available 10 years ago (an eternity for computers). But it’s time for this old dog to learn new tricks. One thing I need to use in my R programs is called a “tibble” (sometimes called a “tidy tibble”). It’s a minor but important improvement on data frames and many of the newer packages are using tibbles instead of data frames. Tibbles are available in the package, tibble. This web page offers a nice description of the improvements on tibbles. Continue reading
PMean: Changing the font size in RStudio
Suppose you’re giving a talk and using R Studio. You want to make the fonts a bit larger so your audience can read them. It’s easy to do, once you know where to look. Continue reading
PMean: Changing the font size in R
This is one of those obvious things that’s not obvious when you need it most. Suppose I’m doing a demo of R for a group like our wonderful Kansas City R Users Group. I want to have a readable sized font. Here’s how you do it. Continue reading
Recommended: dplyr and pipes: the basics
One of the recent developments in R that I was unaware of until I attended some talks at the Joint Statistical Meetings was the use of dplyr and pipes. It’s an approach to data management that isn’t different from earlier approaches, but the code is much easier to read and maintain. This blog post explains in simple terms how these work and why you would use them. Continue reading
Recommended: Hadley Wickham, the Man Who Revolutionized R
Hadley Wickham has written many popular R packages, so many that they are sometimes referred to as the “Hadleyverse.” This is a nice biography that emphasizes the impact that Dr. Wickham has had on R. Continue reading
PMean: Simple string substitutions
I had to make a rather complex string substitution for a project and I thought it would help to briefly review some simpler string substitution examples in R. You can find the R code at my github site. Continue reading