Most popular programming languages have one thing in common – they are all “Open source”. Open source is a decentralised development model which is based on community participation. The community members contribute to the development of the programming language and these contributions are publicly available to be accessed by anyone.
Community participation is the prime reason for continuous development and innovation in these open source languages like R, C++, C#, Java, PHP, Python, Ruby, etc. For data science, R is one of the most popular language. The main reason for its popularity is continuous contribution and support from R practitioners in the data science community. These packages form the backbone of R programming language.
While a lot of tutorials are being shared across on solving problems using R, the open source development gets lesser attention. For me, creating a package and giving it back to the community meant a huge thing. It was my way to starting to give back and I know this is just the start.
In order to help expand the community further, I decided to write an article about the process of package creation and how to contribute a package to open source R community. Also, we are going to create a package and contribute it to the open source community.
Read on!
A package in R is simply a reusable R function(s) with standard and self-explanatory documentation on how to use it. Sometimes, packages come with sample data as well.
There are 10,000+ packages on CRAN until today and majority of these packages have dependency on some other R package(s). This signifies that most of the packages are built over the functionality of some other package(s).
For example, a package I authored named ensembleR has main dependency on caret package along with some other packages: e1071, ipred, knitr, rmarkdown which are used for running the examples and creating vignettes.
You can realise the importance of packages in R by this handy infographic with the most commonly used libraries in R:
Sometime back in one of the competition on Analytics Vidhya, I was trying to ensemble varioud models. I realized that there is no easy to use open source package for ensembling in R.
That’s when I decided to take this opportunity to create a simple package that will enable people to perform ensembling (stacking) using a few lines of code. Hence, I have created a package named ensembleR, which can be accessed on CRAN. This package enables people to create ensemble of several models in R. To know more about ensembling in R, read here.
This package can create millions of unique ensembles (Stacked models) and give predictions using all of them within a single line of code. The current version on CRAN is the initial release of the package. You can find more details on how this package works here.
The process of creating a package in R is both challenging as well as exciting, especially for the first time. I started off with learning the basic structure and process of creating a package.
Once I coded the package, I learnt how to submit it on CRAN to make it available to other community members. Getting it on CRAN was the toughest part because of extensive and rigorous testing of the package, which is also responsible for maintaining the quality and consistency of packages that go on CRAN.
In this article, I’ll take you through the complete process of creating a package from scratch and getting it on CRAN and/ or GitHub to be made available publicly.
The advantages for creating a R package are:
Also, there are certain challenges also in creating a package:
Before, beginning to write a package, there are few prerequisites you should be familiar with. These prerequisites are:
Now let’s get started with creating a simple package of our own. Within this package, we’ll create a function to offer the functionality of predicting the stock price movement for tomorrow using simple logistic regression given the stock symbol. Simple enough. Let’s begin!
#' @title #' #' @description #' #' @param #' #' @return #' #' @examples #' #' @export
Here,
#' @title Predicts Stock Price Movement for Given Stock Symbol #' #' @description This package predicts whether the stock price at tommorow's market close would be higher or lower compared to today's closing place. #' #' @param symbol #' #' @return NULL #' #' @examples stock_predict('AAPL') #' #' @export stock_predict stock_predict<-function(symbol) { #To ignore the warnings during usage options(warn=-1) options("getSymbols.warning4.0"=FALSE) #Importing price data for the given symbol data<-data.frame(xts::as.xts(get(quantmod::getSymbols(symbol)))) #Assighning the column names colnames(data) <- c("data.Open","data.High","data.Low","data.Close","data.Volume","data.Adjusted") #Creating lag and lead features of price column. data <- xts::xts(data,order.by=as.Date(rownames(data))) data <- as.data.frame(merge(data, lm1=stats::lag(data[,'data.Adjusted'],c(-1,1,3,5,10)))) #Extracting features from Date data$Date<-as.Date(rownames(data)) data$Day_of_month<-as.integer(format(as.Date(data$Date),"%d")) data$Month_of_year<-as.integer(format(as.Date(data$Date),"%m")) data$Year<-as.integer(format(as.Date(data$Date),"%y")) data$Day_of_week<-as.factor(weekdays(data$Date)) #Naming variables for reference today <- 'data.Adjusted' tommorow <- 'data.Adjusted.5' #Creating outcome data$up_down <- as.factor(ifelse(data[,tommorow] > data[,today], 1, 0)) #Creating train and test sets train<-data[stats::complete.cases(data),] test<-data[nrow(data),] #Training model model<-stats::glm(up_down~data.Open+data.High+data.Low+data.Close+ data.Volume+data.Adjusted+data.Adjusted.1+ data.Adjusted.2+data.Adjusted.3+data.Adjusted.4+ Day_of_month+Month_of_year+Year+Day_of_week, family=binomial(link='logit'),data=train) #Making Predictions pred<-as.numeric(stats::predict(model,test[,c('data.Open','data.High','data.Low','data.Close','data.Volume','data.Adjusted','data.Adjusted.1','data.Adjusted.2','data.Adjusted.3','data.Adjusted.4','Day_of_month','Month_of_year','Year','Day_of_week')],type = 'response')) #Printing results print("Probability of Stock price going up tommorow:") print(pred) }
Now check the “Generate documentation with Roxygen” option and put “–as-cran” under Check Package space to simulate the CRAN package checking and testing.
As you have successfully created a package in R, you’ll like to share it with others to let them use the functions in your package. For the process of publishing a package, there are two popular platforms: CRAN and GitHub.
Getting your package on CRAN is a difficult task due to the extensive and rigorous testing carried out on the packages before they can be published on CRAN. Along with passing these tests, you need to have comprehensive vignettes describing the working of your package. These vignettes will be stored in vignettes folder that you can create in the main project directory.
Once you’re sure that your package is doing well against the local simulation tests and is well documented, you need to create the source package by going to Build > Build Source Package.
After the source package is created, you can then submit an application to publish it on CRAN here.
Generally, a much easier way to make your package public is to publish it on GitHub. The simplest way to publish your package on GitHub is to create a new repository and upload the contents of the main folder (StockPredictor in our case) to that repository. I have done the same here.
Now, anyone can install and use this package using the following command:
devtools::install_github(“sauravkaushik8/SamplePackage”)
I can’t express how I feel after putting a package on CRAN. The usability of the package probably means very little to the outside world, but that is irrelevant. For me, I know that I have started the journey to make my favourite tool even stronger.
After making the CRAN contribution, I have realised that it has helped me in number of ways:
Hopefully, you would have found this article helpful in creating your first open source package in R. Based on my experience, I’ll like to make a few useful suggestions:
I believe this article would have given you a good understanding of the process involved in creating your own packages in R from scratch. By following this tutorial, by now you would have hand-ons experience in creating a package in R.
Packages form the backbone of R programming language and I’ll highly encourage you to contribute to the development of the R language as well.
Did you enjoy reading this article? Do share your views in the comment section below. If you have any doubts / questions feel free to drop them in the comments below.
Article is really helpful for the newbies learning R language. I was interested in learning about creating R package. This article clearly conveys the required information and motivates individuals to contribute to the R community. Will definitely try to create a package in the near future. Thanks for sharing such an interesting article.
Hi Shanthi Bhojaraj, I'm really glad you found it helpful and I'll encourage you do make contributions as well. Best, Saurav.
Excellent article. The human touch of sharing you feelings while you created a package in R & published it on CRAN / GitHub adds a sense of reality to the task and a very helpful dimension to the article. Please keep up your good work. Best, Marshall Keyes, MD
Hi Mr. Keyes, I'm really glad you found it interesting. And thank you for you wishes. Best, Saurav.
That's really cool. A very good initiative and truly motivating.
Hi Surya, I'm really glad you found it helpful. Best, Saurav.