Abstract:
Aim: This paper aim to use Lasso Regression Model to ascertain how the level of development in a country affects the interest of a number of internet users.
Methodology: Least Absolute Shrinkage and Selection Operator (Lasso) regression with the Least Angle Regression selection (LARs) algorithm with k=5-fold cross validation was used to estimate the lasso regression model used to ascertain the significant association between the number of internet user in a country and the development indicators for that country. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of the predictor variables. The lasso regression model was estimated on a training data set consisting of observations from the year 2012 (N=199), and a test data set included the observations from the year 2013 (N=196).
Results: LASSO regression model was trained on N=199 countries and used to identify the best subset of predictors which predicted the response variables; Number of internet users in N=196 countries around the world for the year 2013. The Number of internet users for training and test sets per 100 people for the countries ranged from 1.06 to 96.2 and 1.30 to 96.55 respectively. This indicates that there is significant variation in the response variable.
Conclusion: It is possible that the few variable indicators we considered as strong predictors of internet are confounded by other factors not considered in the analysis. Therefore, it is recommended that future efforts should focus on other ways to fill in the missing observations since there are large number of national development indicators/factors that are associated with the number of internet users.