Tag Archives: Big data

PMean: Examining the storage format for sparse matrices in R

I’ve been working with sparse matrices a bit for my work with the Greater Plains Collaborative. They are a very useful way of storing matrices where most of the entries are zero. This occurs quite often in medical data. There are thousands of medical procedures that you can torture your patients with, so any matrix that has indicator variables for every medical procedure will be quite big. Fortunately, both for us and for the patients, the number of procedures that a particular patient has to endure is quite a bit smaller. So for each row of the matrix, the number of non-zero entries will be very small, probably in the single digits. A sparse matrix will be much smaller because it stores only the location of the non-zero entries. Here’s some R code that shows how this works. I have the code available at my new github site. Continue reading

Recommended: Tessera. Open source environment for deep analysis of large complex data

I have not had time to preview this software, but it looks very interesting, It takes large problems and converts them to a form for parallel processing, not by changing the underlying algorithm, which would be very messy, but by splitting the data into subsets, analyzing each subset, and recombining these results. Such a method “Divide and Recombine” should work well for some analysis, but perhaps not so well for others. It is based on the R programming language. If I get a chance to work with this software, I’ll let you know what I think. Continue reading

Recommended: Special issue–Using Big Data to Transform Care

The July 2014 issue of Health Affairs is devoted entirely to “big data”.  The articles provide a general overview to big data, several applications of big data, big data and genomics, use of electronic health records, and ethical issues including privacy concerns. For now, at least, the articles are available for free to any user. Continue reading

PMean: NIH is interested in big data

The National Institutes of Health has shown a recent interest in “big data.” You can define big data in several ways, but a common characteristic is the three V’s. Big data takes up a lot of space (volume) and/or it comes at you very rapidly (velocity) and/or it comes in a wide range of differing formats (variety). One of the recent Requests for Applications (RFAs) from NIH spells out what types of research into big data that they are interested in seeing. I might be interested in applying, and would love to find some collaborators. Here’s a summary of what the RFA is all about. Continue reading