We examine three common transformations (identity, fourth-root, and log) to determine the most suitable transformation for evaluating the importance of certain common features surrounding the Twin Cities Metropolitan Area (TCMA) city parks on park visitation. The distances between these features and city parks are approximately exponentially distributed by noting that their relative locations closely follow the spatial Poisson process. Because a fourth-root transformation improves the normality of exponential random variables, we verify that the fourth-root transformation is considered best by comparing correlation coefficients of the fourth-rooted data to the untransformed and log-transformed data via simulation. Using the TCMA city parks data, we also confirm that the fourth-root transformation improves the bivariate normality. Finally, we show that the fourth-root transformation of distance-type variables improves the probability of selecting the most important features affecting the park visitation using the least absolute shrinkage and selection operator (LASSO) regression.
“Modeling Park Visitation Using Transformations of the Distance-Type Predictor Variables with LASSO” by Ashley Hall (Western Washington University)
Loading...
Hi Ashley! I just moved to the Twin Cities area and have been exploring lots of parks so I really enjoyed reading your poster. Have you tried applying similar techniques to data for other cities? It would be interesting to know if distance to bus stops is an important factor in other areas of the country as well.
Hello, Lisa! Thank you so much for your comment. To answer your question, yes! I have done some research about the applicability of this technique using park data from New York City and came up with similar results. Based on my research, the same, or similar techniques, would absolutely be applicable in other regions of the country. It’s important to note that certain variables may be excluded while others would be included based on the surrounding features. For example, in a city that didn’t have public transportation access, that particular an amenity may not be reasonable to include. However, including different variables may be necessary to analyze a different area. The model we created utilizes LASSO regression on whatever variable desired!
Thank you again for your comment.