Data Analytics Using R
In late 2019, I embarked on my specialist diploma in Business and Decision Analytics. Again, it was out of interests and also for curiosity. During the python data analytics course, I found that the concepts in Statistics, Regression, Machine Learning were not so different from what I have studied in school before. In my Mathematics degree, we learned to derive the log-likelihood and the maximum likelihood estimators very extensively for the different discrete and continuous functions. I enjoyed all the derivation and found them quite ‘therapeutic’ to do, but frankly when I started work, I struggled when people asked what is the use of so much theory in a Mathematics degree. What is it good for?
It has opened doors for me to teach the higher levels of Secondary and Junior College math, but the sums and models in class do not fully showcase the power and full capacity of me having spent years in university specializing in deriving, studying, analyzing and interpreting statistics models. I also felt a little handicap because most of what I learned were models; how do they apply in the real world?
When I signed up for the diploma, I was mainly keen to cover up my so called handicap. We were doing mainly data visualization at the start of the diploma using different software. I was very happy to be accepted into the pioneer batch and to me, it was like answering a deep rooted question I had with my university education; it was quite not very helpful to be able to understand concepts very well yet unable to use statistics software to present the data for good storytelling.
What piqued my interest was a Machine Learning module last semester, in February 2020. The lecturer started bringing in the regression model and the logistic model, and talk of maximum-likelihood estimator, the y-hats. He brought in formulae of the models and we were mainly supposed to apply them, but for me, I realised I could actually look under the hood and figure out why the formula was the way it was, thanks to all my statistics background.
Recently, in the current statistics bootcamp in this semester in May 2020, I felt like my second prayer being answered. The p-value, QQ-plots, Durbin-Watson tests for auto correlations, and all the statistics tests that we were applying in class were a combination of 2 of my university modules: Data Analysis Using Computer (which coincidentally was my first brush with R) and Econometrics.
In 2009, as part of my degree, it was compulsory to do Basic and Intermediate Econometrics. I got very average marks from those modules. But I tried as hard I did, I never fully understood the concepts. We were just running models for our project and were quite clueless about the statistical significance and also the different tests that we were applying. One of my project mates were so stressed by the presentation that she took MC on that day. I presented her part but it was very evident that I was not sure what I was talking about. It was there and then I decided that I will revisit Econometrics one day.
It is almost like a full circle when I encountered the same models in my current statistics boot-camp in my Specialist Diploma. As the lecturer started speaking on Zoom, I went around my room searching all textbooks and university notes. I actually did see these before in school. I had not expected to meet Econometrics models again in my life; but now that I am more familiar with R, I can run my own analyses and fulfill my wish from 10 years back.
Read on to see my projects in R.