Udemy projects
So I have fallen in love with this new instructor on Udemy, Kirill Eremenko from www.superdatascience.com, and this is my second course with him.
I have been exposed to R for a while but I could never figure out what's so amazing about this programming language that mathematicians and some academics love to stick by it. When he explained the course, it was a huge aha moment for me each time.
Interestingly, he was re-introduced to me by my lecturer in Machine Learning. He was saying in Feb 2020 that one of the best courses to learn Machine Learning was this guy's course and it gave him a lot of new realizations each day. I went back and realized that I have actually bought a ton of Kirill's courses, they are sitting on my Udemy yet I never found the motivation to get started. Thanks to the circuit breaker (extended circuit breaker), I am finding more focused blocks of time to do the self improvement that I have always placed at the back burner. Here are the projects from his Udemy course: R Programming from A to Z: R for Data Science with Real Exercises.
Enjoy!
Investigating the Law of Law Numbers
Barbara Yam
5/1/2020
# Investigating the law of Large Numbers
N <- 1000
counter <- 0
for (i in rnorm(N)){
if(i> -1 & i <1){
counter <- counter +1
}
}
answer <- counter/N *100
answer
# Thoughts: N could be replaced with a bigger and bigger number to see
# if the answer converges to expected mean of 68.2%
Financial Analysis Project
Barbara Yam
6 May 2020
#Data
revenue <- c(14574.49, 7606.46, 8611.41, 9175.41, 8058.65, 8105.44, 11496.28, 9766.09, 10305.32, 14379.96, 10713.97, 15433.50)
expenses <- c(12051.82, 5695.07, 12319.20, 12089.72, 8658.57, 840.20, 3285.73, 5821.12, 6976.93, 16618.61, 10054.37, 3803.96)
The Task is to calculate the following financial metrics: - profit for each month - profit after tax for each month (the tax rate is 30%) - profit margin for each month - equals to profit after tax divided by revenue - good months - where profit after tax was greater than the mean for the year - bad months - where the where profit after tax was less than the mean for the year - the best month - where the profit after tax was max for the year - the worst month - where the profit after tax was min for the year
Note: i. Results for dollar values need to be calculated with $0.01 precision, but need to be presented in units of $1000 with no decimal points.
- Results for the profit margin ratio need to be presented in units of % with no decimal points
Solution
#profit for each month
profit <- revenue - expenses
profit
## [1] 2522.67 1911.39 -3707.79 -2914.31 -599.92 7265.24 8210.55 3944.97
## [9] 3328.39 -2238.65 659.60 11629.54
#profit after taxt for each month (tax is 30%)
profit_after_tax <- round(0.7 * profit,2)
profit_after_tax
## [1] 1765.87 1337.97 -2595.45 -2040.02 -419.94 5085.67 5747.38 2761.48
## [9] 2329.87 -1567.06 461.72 8140.68
profit margin for each month
profit_margin <- round(profit_after_tax/revenue,2) *100
profit_margin
## [1] 12 18 -30 -22 -5 63 50 28 23 -11 4 53
good months - where profit after tax is greater than the mean for the year
mean_profit_after_tax <- mean(profit_after_tax)
good_months <- profit_after_tax > mean_profit_after_tax
good_months
## [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
bad months
bad_months <- !good_months
bad_months
## [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
best month were profit after tax is the max for the year
best_month <- profit_after_tax == max(profit_after_tax)
best_month
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
worst month were profit after tax is the min for the year
worst_month <- profit_after_tax == min(profit_after_tax)
worst_month
## [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
units of thousands
revenue_1000 <- round(revenue/1000,0)
expenses_1000 <- round(expenses/1000,0)
profit_1000 <- round(profit/1000,0)
profit_after_tax_1000 <- round(profit_after_tax/1000,0)
profit_margin
## [1] 12 18 -30 -22 -5 63 50 28 23 -11 4 53
good_months
## [1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
bad_months
## [1] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
best_month
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
worst_month
## [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#matrices
report_matrix <- rbind(revenue_1000,expenses_1000,profit_1000,profit_after_tax_1000,
profit_margin,good_months,bad_months,best_month,worst_month)
report_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## revenue_1000 15 8 9 9 8 8 11 10 10 14 11
## expenses_1000 12 6 12 12 9 1 3 6 7 17 10
## profit_1000 3 2 -4 -3 -1 7 8 4 3 -2 1
## profit_after_tax_1000 2 1 -3 -2 0 5 6 3 2 -2 0
## profit_margin 12 18 -30 -22 -5 63 50 28 23 -11 4
## good_months 1 0 0 0 0 1 1 1 1 0 0
## bad_months 0 1 1 1 1 0 0 0 0 1 1
## best_month 0 0 0 0 0 0 0 0 0 0 0
## worst_month 0 0 1 0 0 0 0 0 0 0 0
## [,12]
## revenue_1000 15
## expenses_1000 4
## profit_1000 12
## profit_after_tax_1000 8
## profit_margin 53
## good_months 1
## bad_months 0
## best_month 1
## worst_month 0
BasketBall Trends
Barbara Yam
9 May 2020
Dear Student,
Welcome to the dataset for the homework exercise.
Instructions for this dataset: You have only been supplied vectors. You will need to create the matrices yourself. Matrices: - FreeThrows - FreeThrowAttempts
Sincerely, Kirill Eremenko www.superdatascience.com
Copyright: These datasets were prepared using publicly available data. However, theses scripts are subject to Copyright Laws. If you wish to use these R scripts outside of the R Programming Course by Kirill Eremenko, you may do so by referencing www.superdatascience.com in your work.
Comments: Seasons are labeled based on the first year in the season E.g. the 2012-2013 season is preseneted as simply 2012
Notes and Corrections to the data: Kevin Durant: 2006 - College Data Used Kevin Durant: 2005 - Proxied With 2006 Data Derrick Rose: 2012 - Did Not Play Derrick Rose: 2007 - College Data Used Derrick Rose: 2006 - Proxied With 2007 Data Derrick Rose: 2005 - Proxied With 2007 Data
Seasons
Seasons <- c("2005","2006","2007","2008","2009","2010","2011","2012","2013","2014")
Players
Players <- c("KobeBryant","JoeJohnson","LeBronJames","CarmeloAnthony","DwightHoward","ChrisBosh","ChrisPaul","KevinDurant","DerrickRose","DwayneWade")
Free Throws
KobeBryant_FT <- c(696,667,623,483,439,483,381,525,18,196)
JoeJohnson_FT <- c(261,235,316,299,220,195,158,132,159,141)
LeBronJames_FT <- c(601,489,549,594,593,503,387,403,439,375)
CarmeloAnthony_FT <- c(573,459,464,371,508,507,295,425,459,189)
DwightHoward_FT <- c(356,390,529,504,483,546,281,355,349,143)
ChrisBosh_FT <- c(474,463,472,504,470,384,229,241,223,179)
ChrisPaul_FT <- c(394,292,332,455,161,337,260,286,295,289)
KevinDurant_FT <- c(209,209,391,452,756,594,431,679,703,146)
DerrickRose_FT <- c(146,146,146,197,259,476,194,0,27,152)
DwayneWade_FT <- c(629,432,354,590,534,494,235,308,189,284)
Matrix
FreeThrows <- rbind(KobeBryant_FT,JoeJohnson_FT,LeBronJames_FT,CarmeloAnthony_FT,DwightHoward_FT,ChrisBosh_FT,ChrisPaul_FT,KevinDurant_FT,DerrickRose_FT,DwayneWade_FT)
colnames(FreeThrows) <- Seasons
rownames(FreeThrows) <- Players
FreeThrows
## 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
## KobeBryant 696 667 623 483 439 483 381 525 18 196
## JoeJohnson 261 235 316 299 220 195 158 132 159 141
## LeBronJames 601 489 549 594 593 503 387 403 439 375
## CarmeloAnthony 573 459 464 371 508 507 295 425 459 189
## DwightHoward 356 390 529 504 483 546 281 355 349 143
## ChrisBosh 474 463 472 504 470 384 229 241 223 179
## ChrisPaul 394 292 332 455 161 337 260 286 295 289
## KevinDurant 209 209 391 452 756 594 431 679 703 146
## DerrickRose 146 146 146 197 259 476 194 0 27 152
## DwayneWade 629 432 354 590 534 494 235 308 189 284
Free Throw Attempts
KobeBryant_FTA <- c(819,768,742,564,541,583,451,626,21,241)
JoeJohnson_FTA <- c(330,314,379,362,269,243,186,161,195,176)
LeBronJames_FTA <- c(814,701,771,762,773,663,502,535,585,528)
CarmeloAnthony_FTA <-c(709,568,590,468,612,605,367,512,541,237)
DwightHoward_FTA <- c(598,666,897,849,816,916,572,721,638,271)
ChrisBosh_FTA <- c(581,590,559,617,590,471,279,302,272,232)
ChrisPaul_FTA <- c(465,357,390,524,190,384,302,323,345,321)
KevinDurant_FTA <- c(256,256,448,524,840,675,501,750,805,171)
DerrickRose_FTA <- c(205,205,205,250,338,555,239,0,32,187)
DwayneWade_FTA <- c(803,535,467,771,702,652,297,425,258,370)
Matrix
FreeThrowsAttempts <- cbind(KobeBryant_FTA,JoeJohnson_FTA,LeBronJames_FTA,
CarmeloAnthony_FTA,DwightHoward_FTA,ChrisBosh_FTA,
ChrisPaul_FTA,KevinDurant_FTA,DerrickRose_FTA,DwayneWade_FTA)
colnames(FreeThrowsAttempts) <- Seasons
rownames(FreeThrowsAttempts) <- Players
FreeThrowsAttempts
## 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
## KobeBryant 819 330 814 709 598 581 465 256 205 803
## JoeJohnson 768 314 701 568 666 590 357 256 205 535
## LeBronJames 742 379 771 590 897 559 390 448 205 467
## CarmeloAnthony 564 362 762 468 849 617 524 524 250 771
## DwightHoward 541 269 773 612 816 590 190 840 338 702
## ChrisBosh 583 243 663 605 916 471 384 675 555 652
## ChrisPaul 451 186 502 367 572 279 302 501 239 297
## KevinDurant 626 161 535 512 721 302 323 750 0 425
## DerrickRose 21 195 585 541 638 272 345 805 32 258
## DwayneWade 241 176 528 237 271 232 321 171 187 370
Game Matrix
KobeBryant_G <- c(80,77,82,82,73,82,58,78,6,35)
JoeJohnson_G <- c(82,57,82,79,76,72,60,72,79,80)
LeBronJames_G <- c(79,78,75,81,76,79,62,76,77,69)
CarmeloAnthony_G <- c(80,65,77,66,69,77,55,67,77,40)
DwightHoward_G <- c(82,82,82,79,82,78,54,76,71,41)
ChrisBosh_G <- c(70,69,67,77,70,77,57,74,79,44)
ChrisPaul_G <- c(78,64,80,78,45,80,60,70,62,82)
KevinDurant_G <- c(35,35,80,74,82,78,66,81,81,27)
DerrickRose_G <- c(40,40,40,81,78,81,39,0,10,51)
DwayneWade_G <- c(75,51,51,79,77,76,49,69,54,62)
Games <- rbind(KobeBryant_G, JoeJohnson_G, LeBronJames_G, CarmeloAnthony_G, DwightHoward_G, ChrisBosh_G, ChrisPaul_G, KevinDurant_G, DerrickRose_G, DwayneWade_G)
rm(KobeBryant_G, JoeJohnson_G, CarmeloAnthony_G, DwightHoward_G, ChrisBosh_G, LeBronJames_G, ChrisPaul_G, DerrickRose_G, DwayneWade_G, KevinDurant_G)
colnames(Games) <- Seasons
rownames(Games) <- Players
Plot free throw attempts per game
myplot2<-function(){
matplot(t(FreeThrowsAttempts/Games),type="b",pch=15:18,col=c(1:4,6),main ="Free Throw Per Game")
legend("topleft",cex=0.5,inset=0.05,legend=Players[1:10], col=c(1:4,6),pch=15:18, horiz=F)
}
myplot2()
Plot accuracy of free throws
FreeThrows/FreeThrowsAttempts
## 2005 2006 2007 2008 2009 2010
## KobeBryant 0.8498168 2.0212121 0.7653563 0.6812412 0.7341137 0.8313253
## JoeJohnson 0.3398438 0.7484076 0.4507846 0.5264085 0.3303303 0.3305085
## LeBronJames 0.8099730 1.2902375 0.7120623 1.0067797 0.6610925 0.8998211
## CarmeloAnthony 1.0159574 1.2679558 0.6089239 0.7927350 0.5983510 0.8217180
## DwightHoward 0.6580407 1.4498141 0.6843467 0.8235294 0.5919118 0.9254237
## ChrisBosh 0.8130360 1.9053498 0.7119155 0.8330579 0.5131004 0.8152866
## ChrisPaul 0.8736142 1.5698925 0.6613546 1.2397820 0.2814685 1.2078853
## KevinDurant 0.3338658 1.2981366 0.7308411 0.8828125 1.0485437 1.9668874
## DerrickRose 6.9523810 0.7487179 0.2495726 0.3641405 0.4059561 1.7500000
## DwayneWade 2.6099585 2.4545455 0.6704545 2.4894515 1.9704797 2.1293103
## 2011 2012 2013 2014
## KobeBryant 0.8193548 2.0507812 0.08780488 0.2440847
## JoeJohnson 0.4425770 0.5156250 0.77560976 0.2635514
## LeBronJames 0.9923077 0.8995536 2.14146341 0.8029979
## CarmeloAnthony 0.5629771 0.8110687 1.83600000 0.2451362
## DwightHoward 1.4789474 0.4226190 1.03254438 0.2037037
## ChrisBosh 0.5963542 0.3570370 0.40180180 0.2745399
## ChrisPaul 0.8609272 0.5708583 1.23430962 0.9730640
## KevinDurant 1.3343653 0.9053333 Inf 0.3435294
## DerrickRose 0.5623188 0.0000000 0.84375000 0.5891473
## DwayneWade 0.7320872 1.8011696 1.01069519 0.7675676
myplot3<- function(){
matplot(t(round(FreeThrows/FreeThrowsAttempts,1)),type="b",pch=15:18,col=c(1:4,6), main="Accuracy of Free Throws")
legend("topleft",cex=0.5,inset=0.02,legend=Players[1:10], col=c(1:4,6),pch=15:18, horiz=F)
}
myplot3()
Points Matrix
KobeBryant_PTS <-
c(2832,2430,2323,2201,1970,2078,1616,2133,83,782)
JoeJohnson_PTS <-
c(1653,1426,1779,1688,1619,1312,1129,1170,1245,1154)
LeBronJames_PTS <- c(2478,2132,2250,2304,2258,2111,1683,2036,2089,1743)
CarmeloAnthony_PTS <- c(2122,1881,1978,1504,1943,1970,1245,1920,2112,966)
DwightHoward_PTS <- c(1292,1443,1695,1624,1503,1784,1113,1296,1297,646)
ChrisBosh_PTS <- c(1572,1561,1496,1746,1678,1438,1025,1232,1281,928)
ChrisPaul_PTS <- c(1258,1104,1684,1781,841,1268,1189,1186,1185,1564)
KevinDurant_PTS <- c(903,903,1624,1871,2472,2161,1850,2280,2593,686)
DerrickRose_PTS <- c(597,597,597,1361,1619,2026,852,0,159,904)
DwayneWade_PTS <- c(2040,1397,1254,2386,2045,1941,1082,1463,1028,1331)
Points <- rbind(KobeBryant_PTS, JoeJohnson_PTS, LeBronJames_PTS, CarmeloAnthony_PTS, DwightHoward_PTS, ChrisBosh_PTS, ChrisPaul_PTS, KevinDurant_PTS, DerrickRose_PTS, DwayneWade_PTS)
rm(KobeBryant_PTS, JoeJohnson_PTS, LeBronJames_PTS, CarmeloAnthony_PTS, DwightHoward_PTS, ChrisBosh_PTS, ChrisPaul_PTS, KevinDurant_PTS, DerrickRose_PTS, DwayneWade_PTS)
colnames(Points) <- Seasons
rownames(Points) <- Players
Field Goals Matrix
KobeBryant_FG <- c(978,813,775,800,716,740,574,738,31,266)
JoeJohnson_FG <- c(632,536,647,620,635,514,423,445,462,446)
LeBronJames_FG <- c(875,772,794,789,768,758,621,765,767,624)
CarmeloAnthony_FG <- c(756,691,728,535,688,684,441,669,743,358)
DwightHoward_FG <- c(468,526,583,560,510,619,416,470,473,251)
ChrisBosh_FG <- c(549,543,507,615,600,524,393,485,492,343)
ChrisPaul_FG <- c(407,381,630,631,314,430,425,412,406,568)
KevinDurant_FG <- c(306,306,587,661,794,711,643,731,849,238)
DerrickRose_FG <- c(208,208,208,574,672,711,302,0,58,338)
DwayneWade_FG <- c(699,472,439,854,719,692,416,569,415,509)
FieldGoals <- rbind(KobeBryant_FG, JoeJohnson_FG, LeBronJames_FG, CarmeloAnthony_FG, DwightHoward_FG, ChrisBosh_FG, ChrisPaul_FG, KevinDurant_FG, DerrickRose_FG, DwayneWade_FG)
rm(KobeBryant_FG, JoeJohnson_FG, LeBronJames_FG, CarmeloAnthony_FG, DwightHoward_FG, ChrisBosh_FG, ChrisPaul_FG, KevinDurant_FG, DerrickRose_FG, DwayneWade_FG)
colnames(FieldGoals) <- Seasons
rownames(FieldGoals) <- Players
Plot player playing style (2 vs 3 points preference) excluding Free Throws
PointWithoutFreeThrows <- Points - FreeThrows
PointWithoutFreeThrows/FieldGoals
## 2005 2006 2007 2008 2009 2010 2011
## KobeBryant 2.184049 2.168512 2.193548 2.147500 2.138268 2.155405 2.151568
## JoeJohnson 2.202532 2.222015 2.261206 2.240323 2.203150 2.173152 2.295508
## LeBronJames 2.145143 2.128238 2.142317 2.167300 2.167969 2.121372 2.086957
## CarmeloAnthony 2.048942 2.057887 2.079670 2.117757 2.085756 2.138889 2.154195
## DwightHoward 2.000000 2.001901 2.000000 2.000000 2.000000 2.000000 2.000000
## ChrisBosh 2.000000 2.022099 2.019724 2.019512 2.013333 2.011450 2.025445
## ChrisPaul 2.122850 2.131234 2.146032 2.101426 2.165605 2.165116 2.185882
## KevinDurant 2.267974 2.267974 2.100511 2.146747 2.161209 2.203938 2.206843
## DerrickRose 2.168269 2.168269 2.168269 2.027875 2.023810 2.180028 2.178808
## DwayneWade 2.018598 2.044492 2.050114 2.103044 2.101530 2.091040 2.036058
## 2012 2013 2014
## KobeBryant 2.178862 2.096774 2.203008
## JoeJohnson 2.332584 2.350649 2.271300
## LeBronJames 2.134641 2.151239 2.192308
## CarmeloAnthony 2.234679 2.224764 2.170391
## DwightHoward 2.002128 2.004228 2.003984
## ChrisBosh 2.043299 2.150407 2.183673
## ChrisPaul 2.184466 2.192118 2.244718
## KevinDurant 2.190150 2.226148 2.268908
## DerrickRose NaN 2.275862 2.224852
## DwayneWade 2.029877 2.021687 2.056974
myplot4 <- function(){
matplot(t(round(PointWithoutFreeThrows/FieldGoals,1)),type="b",pch=15:18,col=c(1:4,6), main="Player Playing Style")
legend("topleft",cex=0.5,inset=0.02,legend=Players[1:10], col=c(1:4,6),pch=15:18, horiz=F)
}
myplot4()
World Bank
Barbara Yam
12 May 2020
Project details: You are required to produce a scatterplot depicting Life Expectancy (y-axis) and Fertility Rate (x-axis) statistics by Country.
The scatterplot needs to also be categorised by Countries’ Region. 2 years worth of data has been supplied: 1960 and 2013, and to produce a visualisation for each of these years.
Finally to provide insights into how the two periods compare.
stats2 <- read.csv("Section5-Homework-Data.csv")
#split data ito data1960 and data2013
data1960 <- stats2[stats2$Year==1960,]
data1960$Region <- factor(data1960$Region)
head(data1960)
## Country.Name Country.Code Region Year Fertility.Rate
## 1 Aruba ABW The Americas 1960 4.820
## 2 Afghanistan AFG Asia 1960 7.450
## 3 Angola AGO Africa 1960 7.379
## 4 Albania ALB Europe 1960 6.186
## 5 United Arab Emirates ARE Middle East 1960 6.928
## 6 Argentina ARG The Americas 1960 3.109
str(data1960)
## 'data.frame': 187 obs. of 5 variables:
## $ Country.Name : chr "Aruba" "Afghanistan" "Angola" "Albania" ...
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Region : Factor w/ 6 levels "Africa","Asia",..: 6 2 1 3 4 6 2 6 5 3 ...
## $ Year : int 1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
## $ Fertility.Rate: num 4.82 7.45 7.38 6.19 6.93 ...
data2013 <- stats2[stats2$Year==2013,]
data2013$Region <- factor(data2013$Region)
head(data2013)
## Country.Name Country.Code Region Year Fertility.Rate
## 188 Aruba ABW The Americas 2013 1.669
## 189 Afghanistan AFG Asia 2013 5.050
## 190 Angola AGO Africa 2013 6.165
## 191 Albania ALB Europe 2013 1.771
## 192 United Arab Emirates ARE Middle East 2013 1.801
## 193 Argentina ARG The Americas 2013 2.335
str(data2013)
## 'data.frame': 187 obs. of 5 variables:
## $ Country.Name : chr "Aruba" "Afghanistan" "Angola" "Albania" ...
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Region : Factor w/ 6 levels "Africa","Asia",..: 6 2 1 3 4 6 2 6 5 3 ...
## $ Year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ Fertility.Rate: num 1.67 5.05 6.16 1.77 1.8 ...
nrow(data2013)
## [1] 187
nrow(data1960)
## [1] 187
nrow(data2013)
## [1] 187
#showing equal split
#Execute below code to generate three new vectors
Country_Code <- c("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","BGR","BHR","BHS","BIH","BLR","BLZ","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","CHN","CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYP","CZE","DEU","DJI","DNK","DOM","DZA","ECU","EGY","ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","GNQ","GRC","GRD","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ","ISL","ITA","JAM","JOR","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","LCA","LKA","LSO","LTU","LUX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MMR","MNE","MNG","MOZ","MRT","MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL","OMN","PAK","PAN","PER","PHL","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN","SEN","SGP","SLB","SLE","SLV","SOM","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYR","TCD","TGO","THA","TJK","TKM","TLS","TON","TTO","TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN","VIR","VNM","VUT","WSM","YEM","ZAF","COD","ZMB","ZWE")
Life_Expectancy_At_Birth_1960 <- c(65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659,65.8634634146342,61.7827317073171,70.8170731707317,68.5856097560976,60.836243902439,41.2360487804878,69.7019512195122,37.2782682926829,34.4779024390244,45.8293170731707,69.2475609756098,52.0893658536585,62.7290487804878,60.2762195121951,67.7080975609756,59.9613658536585,42.1183170731707,54.2054634146342,60.7380487804878,62.5003658536585,32.3593658536585,50.5477317073171,36.4826341463415,71.1331707317073,71.3134146341463,57.4582926829268,43.4658048780488,36.8724146341463,41.523756097561,48.5816341463415,56.716756097561,41.4424390243903,48.8564146341463,60.5761951219512,63.9046585365854,69.5939268292683,70.3487804878049,69.3129512195122,44.0212682926829,72.1765853658537,51.8452682926829,46.1351219512195,53.215,48.0137073170732,37.3629024390244,69.1092682926829,67.9059756097561,38.4057073170732,68.819756097561,55.9584878048781,69.8682926829268,57.5865853658537,39.5701219512195,71.1268292682927,63.4318536585366,45.8314634146342,34.8863902439024,32.0422195121951,37.8404390243902,36.7330487804878,68.1639024390244,59.8159268292683,45.5316341463415,61.2263414634146,60.2787317073171,66.9997073170732,46.2883170731707,64.6086585365854,42.1000975609756,68.0031707317073,48.6403170731707,41.1719512195122,69.691756097561,44.945512195122,48.0306829268293,73.4286585365854,69.1239024390244,64.1918292682927,52.6852682926829,67.6660975609756,58.3675853658537,46.3624146341463,56.1280731707317,41.2320243902439,49.2159756097561,53.0013170731707,60.3479512195122,43.2044634146342,63.2801219512195,34.7831707317073,42.6411951219512,57.303756097561,59.7471463414634,46.5107073170732,69.8473170731707,68.4463902439024,69.7868292682927,64.6609268292683,48.4466341463415,61.8127804878049,39.9746829268293,37.2686341463415,57.0656341463415,60.6228048780488,28.2116097560976,67.6017804878049,42.7363902439024,63.7056097560976,48.3688048780488,35.0037073170732,43.4830975609756,58.7452195121951,37.7736341463415,59.4753414634146,46.8803902439024,58.6390243902439,35.5150487804878,37.1829512195122,46.9988292682927,73.3926829268293,73.549756097561,35.1708292682927,71.2365853658537,42.6670731707317,45.2904634146342,60.8817073170732,47.6915853658537,57.8119268292683,38.462243902439,67.6804878048781,68.7196097560976,62.8089268292683,63.7937073170732,56.3570487804878,61.2060731707317,65.6424390243903,66.0552926829268,42.2492926829268,45.6662682926829,48.1876341463415,38.206,65.6598292682927,49.3817073170732,30.3315365853659,49.9479268292683,36.9658780487805,31.6767073170732,50.4513658536585,59.6801219512195,69.9759268292683,68.9780487804878,73.0056097560976,44.2337804878049,52.768243902439,38.0161219512195,40.2728292682927,54.6993170731707,56.1535365853659,54.4586829268293,33.7271219512195,61.3645365853659,62.6575853658537,42.009756097561,45.3844146341463,43.6538780487805,43.9835609756098,68.2995365853659,67.8963902439025,69.7707317073171,58.8855365853659,57.7238780487805,59.2851219512195,63.7302195121951,59.0670243902439,46.4874878048781,49.969512195122,34.3638048780488,49.0362926829268,41.0180487804878,45.1098048780488,51.5424634146342)
Life_Expectancy_At_Birth_2013 <- c(75.3286585365854,60.0282682926829,51.8661707317073,77.537243902439,77.1956341463415,75.9860975609756,74.5613658536585,75.7786585365854,82.1975609756098,80.890243902439,70.6931463414634,56.2516097560976,80.3853658536585,59.3120243902439,58.2406341463415,71.245243902439,74.4658536585366,76.5459512195122,75.0735365853659,76.2769268292683,72.4707317073171,69.9820487804878,67.9134390243903,74.1224390243903,75.3339512195122,78.5466585365854,69.1029268292683,64.3608048780488,49.8798780487805,81.4011219512195,82.7487804878049,81.1979268292683,75.3530243902439,51.2084634146342,55.0418048780488,61.6663902439024,73.8097317073171,62.9321707317073,72.9723658536585,79.2252195121951,79.2563902439025,79.9497804878049,78.2780487804878,81.0439024390244,61.6864634146342,80.3024390243903,73.3199024390244,74.5689512195122,75.648512195122,70.9257804878049,63.1778780487805,82.4268292682927,76.4243902439025,63.4421951219512,80.8317073170732,69.9179268292683,81.9682926829268,68.9733902439024,63.8435853658537,80.9560975609756,74.079512195122,61.1420731707317,58.216487804878,59.9992682926829,54.8384146341464,57.2908292682927,80.6341463414634,73.1935609756098,71.4863902439024,78.872512195122,66.3100243902439,83.8317073170732,72.9428536585366,77.1268292682927,62.4011463414634,75.2682926829268,68.7046097560976,67.6604146341463,81.0439024390244,75.1259756097561,69.4716829268293,83.1170731707317,82.290243902439,73.4689268292683,73.9014146341463,83.3319512195122,70.45,60.9537804878049,70.2024390243902,67.7720487804878,65.7665853658537,81.459756097561,74.462756097561,65.687243902439,80.1288780487805,60.5203902439024,71.6576829268293,74.9127073170732,74.2402926829268,49.3314634146342,74.1634146341464,81.7975609756098,73.9804878048781,80.3391463414634,73.7090487804878,68.811512195122,64.6739024390244,76.6026097560976,76.5326585365854,75.1870487804878,57.5351951219512,80.7463414634146,65.6540975609756,74.7583658536585,69.0618048780488,54.641512195122,62.8027073170732,74.46,61.466,74.567512195122,64.3438780487805,77.1219512195122,60.8281463414634,52.4421463414634,74.514756097561,81.1048780487805,81.4512195121951,69.222,81.4073170731707,76.8410487804878,65.9636829268293,77.4192195121951,74.2838536585366,68.1315609756097,62.4491707317073,76.8487804878049,78.7111951219512,80.3731707317073,72.7991707317073,76.3340731707317,78.4184878048781,74.4634146341463,71.0731707317073,63.3948292682927,74.1776341463415,63.1670487804878,65.878756097561,82.3463414634146,67.7189268292683,50.3631219512195,72.4981463414634,55.0230243902439,55.2209024390244,66.259512195122,70.99,76.2609756097561,80.2780487804878,81.7048780487805,48.9379268292683,74.7157804878049,51.1914878048781,59.1323658536585,74.2469268292683,69.4001707317073,65.4565609756098,67.5223658536585,72.6403414634147,70.3052926829268,73.6463414634147,75.1759512195122,64.2918292682927,57.7676829268293,71.159512195122,76.8361951219512,78.8414634146341,68.2275853658537,72.8108780487805,74.0744146341464,79.6243902439024,75.756487804878,71.669243902439,73.2503902439024,63.583512195122,56.7365853658537,58.2719268292683,59.2373658536585,55.633)
#(c) Kirill Eremenko, www.superdatascience.com
life_expectancydf <- data.frame(Code=Country_Code, LifeExpectancy1960 = Life_Expectancy_At_Birth_1960,
LifeExpectancy2013= Life_Expectancy_At_Birth_2013)
str(life_expectancydf)
## 'data.frame': 187 obs. of 3 variables:
## $ Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ LifeExpectancy1960: num 65.6 32.3 33 62.3 52.2 ...
## $ LifeExpectancy2013: num 75.3 60 51.9 77.5 77.2 ...
head(life_expectancydf)
## Code LifeExpectancy1960 LifeExpectancy2013
## 1 ABW 65.56937 75.32866
## 2 AFG 32.32851 60.02827
## 3 AGO 32.98483 51.86617
## 4 ALB 62.25437 77.53724
## 5 ARE 52.24322 77.19563
## 6 ARG 65.21554 75.98610
head(data1960)
## Country.Name Country.Code Region Year Fertility.Rate
## 1 Aruba ABW The Americas 1960 4.820
## 2 Afghanistan AFG Asia 1960 7.450
## 3 Angola AGO Africa 1960 7.379
## 4 Albania ALB Europe 1960 6.186
## 5 United Arab Emirates ARE Middle East 1960 6.928
## 6 Argentina ARG The Americas 1960 3.109
summary(data1960)
## Country.Name Country.Code Region Year
## Length:187 Length:187 Africa :53 Min. :1960
## Class :character Class :character Asia :33 1st Qu.:1960
## Mode :character Mode :character Europe :40 Median :1960
## Middle East :12 Mean :1960
## Oceania :13 3rd Qu.:1960
## The Americas:36 Max. :1960
## Fertility.Rate
## Min. :1.940
## 1st Qu.:4.311
## Median :6.210
## Mean :5.537
## 3rd Qu.:6.806
## Max. :8.187
merge1960 <- merge(data1960,life_expectancydf, by.x="Country.Code",by.y="Code")
merge1960$LifeExpectancy2013 <-NULL
merge1960$Year <- NULL
#year not necessary because we already know this is for 1960.
merge2013 <- merge(data2013,life_expectancydf, by.x="Country.Code",by.y="Code" )
merge2013$LifeExpectancy1960 <- NULL
merge1960$Year <- NULL
#year not necessary because we already know this is for 2013.
head(merge2013)
## Country.Code Country.Name Region Year Fertility.Rate
## 1 ABW Aruba The Americas 2013 1.669
## 2 AFG Afghanistan Asia 2013 5.050
## 3 AGO Angola Africa 2013 6.165
## 4 ALB Albania Europe 2013 1.771
## 5 ARE United Arab Emirates Middle East 2013 1.801
## 6 ARG Argentina The Americas 2013 2.335
## LifeExpectancy2013
## 1 75.32866
## 2 60.02827
## 3 51.86617
## 4 77.53724
## 5 77.19563
## 6 75.98610
str(merge2013)
## 'data.frame': 187 obs. of 6 variables:
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Country.Name : chr "Aruba" "Afghanistan" "Angola" "Albania" ...
## $ Region : Factor w/ 6 levels "Africa","Asia",..: 6 2 1 3 4 6 2 6 5 3 ...
## $ Year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ Fertility.Rate : num 1.67 5.05 6.16 1.77 1.8 ...
## $ LifeExpectancy2013: num 75.3 60 51.9 77.5 77.2 ...
library(ggplot2)
qplot(data=merge1960, x=Fertility.Rate, y=LifeExpectancy1960,
colour=Region,alpha=I(0.6),main="Life Expectancy vs Fertility (1960)")
qplot(data=merge2013, x=Fertility.Rate, y=LifeExpectancy2013,
colour=Region,alpha=I(0.6),main="Life Expectancy vs Fertility (2013)")
Analysis: The European and American countries tend to be at the top left of both graphs where fertility rate is lower and life expectancy is higher.
In 1960, for the European countries, the fertility is between 2 to 4 with life expectancy between 60 and 70 years of age.
In 2013, life expectancy has increased to 70 to 80 years of age, and fertility rate reduced to less than 2.
In 1960, the African nations have a fertility rate of between 5 to 8 and life expectancy lower than 55 years of age.
In 2013, the same African nations have a slightly lowered fertility rate of between 3 to 7 and life expectancy has risen to 50 to 70.
Overall, there seems to a trend where when life expectancy increases, fertility rate reduces.
Movies Visualisation
Barbara Yam
15 May 2020
#task is to reproduce the boxplot that is created by the end of
# this exercise
movies <- read.csv("Section6-Homework-Data.csv")
# data prep
head(movies)
## Day.of.Week Director Genre Movie.Title Release.Date
## 1 Friday Brad Bird action Tomorrowland 22/05/2015
## 2 Friday Scott Waugh action Need for Speed 14/03/2014
## 3 Friday Patrick Hughes action The Expendables 3 15/08/2014
## 4 Friday Phil Lord, Chris Miller comedy 21 Jump Street 16/03/2012
## 5 Friday Roland Emmerich action White House Down 28/06/2013
## 6 Friday David Ayer action Fury 17/10/2014
## Studio Adjusted.Gross...mill. Budget...mill. Gross...mill.
## 1 Buena Vista Studios 202.1 170 202.1
## 2 Buena Vista Studios 204.2 66 203.3
## 3 Lionsgate 207.1 100 206.2
## 4 Sony 208.8 42 201.6
## 5 Sony 209.7 150 205.4
## 6 Sony 212.8 80 211.8
## IMDb.Rating MovieLens.Rating Overseas...mill. Overseas. Profit...mill.
## 1 6.7 3.26 111.9 55.4 32.1
## 2 6.6 2.97 159.7 78.6 137.3
## 3 6.1 2.93 166.9 80.9 106.2
## 4 7.2 3.62 63.1 31.3 159.6
## 5 8.0 3.65 132.3 64.4 55.4
## 6 5.8 2.85 126 59.5 131.8
## Profit. Runtime..min. US...mill. Gross...US
## 1 18.9 130 90.2 44.6
## 2 208.0 132 43.6 21.4
## 3 106.2 126 39.3 19.1
## 4 380.0 109 138.4 68.7
## 5 36.9 131 73.1 35.6
## 6 164.8 134 85.8 40.5
summary(movies)
## Day.of.Week Director Genre Movie.Title
## Length:608 Length:608 Length:608 Length:608
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Release.Date Studio Adjusted.Gross...mill. Budget...mill.
## Length:608 Length:608 Length:608 Min. : 0.60
## Class :character Class :character Class :character 1st Qu.: 45.00
## Mode :character Mode :character Mode :character Median : 80.00
## Mean : 92.47
## 3rd Qu.:130.00
## Max. :300.00
## Gross...mill. IMDb.Rating MovieLens.Rating Overseas...mill.
## Length:608 Min. :3.600 Min. :1.490 Length:608
## Class :character 1st Qu.:6.375 1st Qu.:3.038 Class :character
## Mode :character Median :6.900 Median :3.365 Mode :character
## Mean :6.924 Mean :3.340
## 3rd Qu.:7.600 3rd Qu.:3.672
## Max. :9.200 Max. :4.500
## Overseas. Profit...mill. Profit. Runtime..min.
## Min. : 17.2 Length:608 Min. : 7.7 Min. : 30.0
## 1st Qu.: 49.9 Class :character 1st Qu.: 201.8 1st Qu.:100.0
## Median : 58.2 Mode :character Median : 338.6 Median :116.0
## Mean : 57.7 Mean : 719.3 Mean :117.8
## 3rd Qu.: 66.3 3rd Qu.: 650.1 3rd Qu.:130.2
## Max. :100.0 Max. :41333.3 Max. :238.0
## US...mill. Gross...US
## Min. : 0.0 Min. : 0.0
## 1st Qu.:107.0 1st Qu.:33.7
## Median :141.7 Median :41.8
## Mean :167.1 Mean :42.3
## 3rd Qu.:202.1 3rd Qu.:50.1
## Max. :760.5 Max. :82.8
str(movies)
## 'data.frame': 608 obs. of 18 variables:
## $ Day.of.Week : chr "Friday" "Friday" "Friday" "Friday" ...
## $ Director : chr "Brad Bird" "Scott Waugh" "Patrick Hughes" "Phil Lord, Chris Miller" ...
## $ Genre : chr "action" "action" "action" "comedy" ...
## $ Movie.Title : chr "Tomorrowland" "Need for Speed" "The Expendables 3" "21 Jump Street" ...
## $ Release.Date : chr "22/05/2015" "14/03/2014" "15/08/2014" "16/03/2012" ...
## $ Studio : chr "Buena Vista Studios" "Buena Vista Studios" "Lionsgate" "Sony" ...
## $ Adjusted.Gross...mill.: chr "202.1" "204.2" "207.1" "208.8" ...
## $ Budget...mill. : num 170 66 100 42 150 80 50 85 70 5 ...
## $ Gross...mill. : chr "202.1" "203.3" "206.2" "201.6" ...
## $ IMDb.Rating : num 6.7 6.6 6.1 7.2 8 5.8 6 6.8 6.3 5.9 ...
## $ MovieLens.Rating : num 3.26 2.97 2.93 3.62 3.65 2.85 3.16 3.45 2.92 2.9 ...
## $ Overseas...mill. : chr "111.9" "159.7" "166.9" "63.1" ...
## $ Overseas. : num 55.4 78.6 80.9 31.3 64.4 59.5 39.9 39.3 73.9 49.8 ...
## $ Profit...mill. : chr "32.1" "137.3" "106.2" "159.6" ...
## $ Profit. : num 18.9 208 106.2 380 36.9 ...
## $ Runtime..min. : int 130 132 126 109 131 134 125 115 92 84 ...
## $ US...mill. : num 90.2 43.6 39.3 138.4 73.1 ...
## $ Gross...US : num 44.6 21.4 19.1 68.7 35.6 40.5 60.1 60.7 26.1 50.2 ...
# renaming some columns to easier names
colnames(movies) <- c("DayofWeek","Director","Genre","MovieTitle",
"ReleaseDate","Studio","AdjustedGrossinMillions",
"BudgetinMillions","GrossinMillions","IMDBRating",
"MovieLensRating","OverseasinMillions","OverseasPercent",
"ProfitinMillions","ProfitPercent","RuntimeinMin",
"USinMillions","GrossPercentUS")
# change studio and genre from character to factors
movies$Studio <- factor(movies$Studio)
movies$Genre <- factor(movies$Genre)
# start mini visualizations
library(ggplot2)
#off topic
ggplot(data=movies, aes(x=DayofWeek)) +geom_bar()
# interestingly most movies are released on Fridays and
# no movies are released on Mondays!
v <- ggplot(data=movies,aes(x=Genre, y=GrossPercentUS,
colour=Studio))
v +geom_boxplot(size=1.2)
# too much data! filter out only the needed genres and studios
str(movies)
## 'data.frame': 608 obs. of 18 variables:
## $ DayofWeek : chr "Friday" "Friday" "Friday" "Friday" ...
## $ Director : chr "Brad Bird" "Scott Waugh" "Patrick Hughes" "Phil Lord, Chris Miller" ...
## $ Genre : Factor w/ 15 levels "action","adventure",..: 1 1 1 5 1 1 2 1 1 10 ...
## $ MovieTitle : chr "Tomorrowland" "Need for Speed" "The Expendables 3" "21 Jump Street" ...
## $ ReleaseDate : chr "22/05/2015" "14/03/2014" "15/08/2014" "16/03/2012" ...
## $ Studio : Factor w/ 36 levels "Art House Studios",..: 2 2 11 25 25 25 2 31 31 20 ...
## $ AdjustedGrossinMillions: chr "202.1" "204.2" "207.1" "208.8" ...
## $ BudgetinMillions : num 170 66 100 42 150 80 50 85 70 5 ...
## $ GrossinMillions : chr "202.1" "203.3" "206.2" "201.6" ...
## $ IMDBRating : num 6.7 6.6 6.1 7.2 8 5.8 6 6.8 6.3 5.9 ...
## $ MovieLensRating : num 3.26 2.97 2.93 3.62 3.65 2.85 3.16 3.45 2.92 2.9 ...
## $ OverseasinMillions : chr "111.9" "159.7" "166.9" "63.1" ...
## $ OverseasPercent : num 55.4 78.6 80.9 31.3 64.4 59.5 39.9 39.3 73.9 49.8 ...
## $ ProfitinMillions : chr "32.1" "137.3" "106.2" "159.6" ...
## $ ProfitPercent : num 18.9 208 106.2 380 36.9 ...
## $ RuntimeinMin : int 130 132 126 109 131 134 125 115 92 84 ...
## $ USinMillions : num 90.2 43.6 39.3 138.4 73.1 ...
## $ GrossPercentUS : num 44.6 21.4 19.1 68.7 35.6 40.5 60.1 60.7 26.1 50.2 ...
movies_filter <- (movies$Genre=="action") | (movies$Genre=="adventure")|
(movies$Genre=="animation") | (movies$Genre == "comedy") |
(movies$Genre == "drama")
movies_filter2 <- (movies$Studio=="Buena Vista Studios") | (movies$Studio=="Fox")|
(movies$Studio=="Paramount Pictures") | (movies$Studio=="Sony") |
(movies$Studio=="Universal") | (movies$Studio=="WB")
movies_filter2
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [13] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
## [49] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [73] FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
## [85] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
## [97] TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [109] TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
## [121] TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [133] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [145] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## [157] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [169] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## [181] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [193] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE
## [205] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
## [217] TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [229] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE
## [241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE
## [253] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [265] TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
## [277] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## [289] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE
## [301] TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## [313] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [325] TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [337] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE
## [349] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
## [361] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
## [373] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE
## [385] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [397] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## [409] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE
## [421] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [433] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [445] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [457] TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## [469] FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [481] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
## [493] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [505] TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [517] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## [529] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## [541] FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [553] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [565] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
## [577] TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [589] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [601] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
movies_filtered <- movies[movies_filter & movies_filter2,]
head(movies_filtered)
## DayofWeek Director Genre MovieTitle ReleaseDate
## 1 Friday Brad Bird action Tomorrowland 22/05/2015
## 2 Friday Scott Waugh action Need for Speed 14/03/2014
## 4 Friday Phil Lord, Chris Miller comedy 21 Jump Street 16/03/2012
## 5 Friday Roland Emmerich action White House Down 28/06/2013
## 6 Friday David Ayer action Fury 17/10/2014
## 7 Thursday Rob Marshall adventure Into the Woods 25/12/2014
## Studio AdjustedGrossinMillions BudgetinMillions GrossinMillions
## 1 Buena Vista Studios 202.1 170 202.1
## 2 Buena Vista Studios 204.2 66 203.3
## 4 Sony 208.8 42 201.6
## 5 Sony 209.7 150 205.4
## 6 Sony 212.8 80 211.8
## 7 Buena Vista Studios 213.9 50 212.9
## IMDBRating MovieLensRating OverseasinMillions OverseasPercent
## 1 6.7 3.26 111.9 55.4
## 2 6.6 2.97 159.7 78.6
## 4 7.2 3.62 63.1 31.3
## 5 8.0 3.65 132.3 64.4
## 6 5.8 2.85 126 59.5
## 7 6.0 3.16 84.9 39.9
## ProfitinMillions ProfitPercent RuntimeinMin USinMillions GrossPercentUS
## 1 32.1 18.9 130 90.2 44.6
## 2 137.3 208.0 132 43.6 21.4
## 4 159.6 380.0 109 138.4 68.7
## 5 55.4 36.9 131 73.1 35.6
## 6 131.8 164.8 134 85.8 40.5
## 7 162.9 325.8 125 128.0 60.1
str(movies_filtered)
## 'data.frame': 423 obs. of 18 variables:
## $ DayofWeek : chr "Friday" "Friday" "Friday" "Friday" ...
## $ Director : chr "Brad Bird" "Scott Waugh" "Phil Lord, Chris Miller" "Roland Emmerich" ...
## $ Genre : Factor w/ 15 levels "action","adventure",..: 1 1 5 1 1 2 1 1 3 8 ...
## $ MovieTitle : chr "Tomorrowland" "Need for Speed" "21 Jump Street" "White House Down" ...
## $ ReleaseDate : chr "22/05/2015" "14/03/2014" "16/03/2012" "28/06/2013" ...
## $ Studio : Factor w/ 36 levels "Art House Studios",..: 2 2 25 25 25 2 31 31 34 25 ...
## $ AdjustedGrossinMillions: chr "202.1" "204.2" "208.8" "209.7" ...
## $ BudgetinMillions : num 170 66 42 150 80 50 85 70 80 60 ...
## $ GrossinMillions : chr "202.1" "203.3" "201.6" "205.4" ...
## $ IMDBRating : num 6.7 6.6 7.2 8 5.8 6 6.8 6.3 4.5 5.6 ...
## $ MovieLensRating : num 3.26 2.97 3.62 3.65 2.85 3.16 3.45 2.92 2.17 2.84 ...
## $ OverseasinMillions : chr "111.9" "159.7" "63.1" "132.3" ...
## $ OverseasPercent : num 55.4 78.6 31.3 64.4 59.5 39.9 39.3 73.9 50.3 60.6 ...
## $ ProfitinMillions : chr "32.1" "137.3" "159.6" "55.4" ...
## $ ProfitPercent : num 18.9 208 380 36.9 164.8 ...
## $ RuntimeinMin : int 130 132 109 131 134 125 115 92 80 133 ...
## $ USinMillions : num 90.2 43.6 138.4 73.1 85.8 ...
## $ GrossPercentUS : num 44.6 21.4 68.7 35.6 40.5 60.1 60.7 26.1 49.7 39.4 ...
library(ggplot2)
w <- ggplot(data=movies_filtered,aes(x=Genre, y=GrossPercentUS))
w <- w + geom_jitter(aes(size=BudgetinMillions,colour=Studio))+
ylab("Gross % US")+
ggtitle("Domestic Gross % by Genre") +
geom_boxplot(alpha=0.7,outlier.colour=NA) +
theme(
axis.title.x = element_text(colour="Blue",size=20),
axis.title.y = element_text(colour="Blue",size=20),
axis.text.x=element_text(size=10),
axis.text.y=element_text(size=10),
plot.title=element_text(size=25),
legend.title=element_text(size=10),
legend.text=element_text(size=10),
text=element_text(family="Comic Sans MS")
)
w$labels$size <- "Budget $M"
w
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database