Module # 8 Input/Output, string manipulation and plyr package
Github Link: https://github.com/rayhankhan-svg/r-programming-assignments
R Code:
# Step 1: Import dataset
x <- read.table(file.choose(), header = TRUE, sep = ",")
# View dataset
x
# Step 2: Install/load plyr and calculate mean Grade by Sex
install.packages("plyr")
library(plyr)
y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))
# View result
y
# Step 3: Write the mean result to a CSV-style file
write.table(y, "Sorted_Average", sep = ",", row.names = FALSE)
# Step 4: Filter names containing the letter i or I
newx <- subset(x, grepl("[iI]", x$Name))
# View filtered dataset
newx
# Step 5: Write the filtered subset to a CSV-style file
write.table(newx, "DataSubset", sep = ",", row.names = FALSE)
Output:
> # Step 1: Import dataset
> x <- read.table(file.choose(), header = TRUE, sep = ",")
>
> # View dataset
> x
Name Age Sex Grade
1 Raul 25 Male 80
2 Booker 18 Male 83
3 Lauri 21 Female 90
4 Leonie 21 Female 91
5 Sherlyn 22 Female 85
6 Mikaela 20 Female 69
7 Raphael 23 Male 91
8 Aiko 24 Female 97
9 Tiffaney 21 Female 78
10 Corina 23 Female 81
11 Petronila 23 Female 98
12 Alecia 20 Female 87
13 Shemika 23 Female 97
14 Fallon 22 Female 90
15 Deloris 21 Female 67
16 Randee 23 Female 91
17 Eboni 20 Female 84
18 Delfina 19 Female 93
19 Ernestina 19 Female 93
20 Milo 19 Male 67
>
> # Step 2: Install/load plyr and calculate mean Grade by Sex
> install.packages("plyr")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.5/plyr_1.8.9.tgz'
Content type 'application/x-gzip' length 984753 bytes (961 KB)
==================================================
downloaded 961 KB
The downloaded binary packages are in
/var/folders/3k/3m0n90_x4pj426bkxvf2p01h0000gn/T//RtmpFal0Bh/downloaded_packages
> library(plyr)
>
> y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))
>
> # View result
> y
Name Age Sex Grade Grade.Average
1 Lauri 21 Female 90 86.9375
2 Leonie 21 Female 91 86.9375
3 Sherlyn 22 Female 85 86.9375
4 Mikaela 20 Female 69 86.9375
5 Aiko 24 Female 97 86.9375
6 Tiffaney 21 Female 78 86.9375
7 Corina 23 Female 81 86.9375
8 Petronila 23 Female 98 86.9375
9 Alecia 20 Female 87 86.9375
10 Shemika 23 Female 97 86.9375
11 Fallon 22 Female 90 86.9375
12 Deloris 21 Female 67 86.9375
13 Randee 23 Female 91 86.9375
14 Eboni 20 Female 84 86.9375
15 Delfina 19 Female 93 86.9375
16 Ernestina 19 Female 93 86.9375
17 Raul 25 Male 80 80.2500
18 Booker 18 Male 83 80.2500
19 Raphael 23 Male 91 80.2500
20 Milo 19 Male 67 80.2500
>
> # Step 3: Write the mean result to a CSV-style file
> write.table(y, "Sorted_Average", sep = ",", row.names = FALSE)
>
> # Step 4: Filter names containing the letter i or I
> newx <- subset(x, grepl("[iI]", x$Name))
>
> # View filtered dataset
> newx
Name Age Sex Grade
3 Lauri 21 Female 90
4 Leonie 21 Female 91
6 Mikaela 20 Female 69
8 Aiko 24 Female 97
9 Tiffaney 21 Female 78
10 Corina 23 Female 81
11 Petronila 23 Female 98
12 Alecia 20 Female 87
13 Shemika 23 Female 97
15 Deloris 21 Female 67
17 Eboni 20 Female 84
18 Delfina 19 Female 93
19 Ernestina 19 Female 93
20 Milo 19 Male 67
>
> # Step 5: Write the filtered subset to a CSV-style file
> write.table(newx, "DataSubset", sep = ",", row.names = FALSE)
>
Sorted_Average:
"Name","Age","Sex","Grade","Grade.Average"
"Lauri",21,"Female",90,86.9375
"Leonie",21,"Female",91,86.9375
"Sherlyn",22,"Female",85,86.9375
"Mikaela",20,"Female",69,86.9375
"Aiko",24,"Female",97,86.9375
"Tiffaney",21,"Female",78,86.9375
"Corina",23,"Female",81,86.9375
"Petronila",23,"Female",98,86.9375
"Alecia",20,"Female",87,86.9375
"Shemika",23,"Female",97,86.9375
"Fallon",22,"Female",90,86.9375
"Deloris",21,"Female",67,86.9375
"Randee",23,"Female",91,86.9375
"Eboni",20,"Female",84,86.9375
"Delfina",19,"Female",93,86.9375
"Ernestina",19,"Female",93,86.9375
"Raul",25,"Male",80,80.25
"Booker",18,"Male",83,80.25
"Raphael",23,"Male",91,80.25
"Milo",19,"Male",67,80.25
DataSubset:
"Name","Age","Sex","Grade"
"Lauri",21,"Female",90
"Leonie",21,"Female",91
"Mikaela",20,"Female",69
"Aiko",24,"Female",97
"Tiffaney",21,"Female",78
"Corina",23,"Female",81
"Petronila",23,"Female",98
"Alecia",20,"Female",87
"Shemika",23,"Female",97
"Deloris",21,"Female",67
"Eboni",20,"Female",84
"Delfina",19,"Female",93
"Ernestina",19,"Female",93
"Milo",19,"Male",67
Explanation:
In order to get the mean grade for Module #8, I imported the Assignment 6 dataset into R and used the plyr tool using Sex as the category. After dividing the data by sex using the ddply() function, I created a new variable named Grade. I used the Grade column's mean to calculate the average. The average grade for female students was 86.9375, while the average grade for male students was 80.25, according to the data. I then used write.table() with comma separation to write this output to a file. I then created a new subset of the original dataset by filtering it so that only names with the letter i (including uppercase I) were included. To get those names, I used subset() in conjunction with grepl ("[iI]", x$Name). Students like Lauri, Leonie, Mikaela, Aiko, Corina, Petronila, Alecia, Shemika, Deloris, Eboni, Delfina, Ernestina, and Milo were included in the smaller dataset that resulted from this. Lastly, I created a second CSV-style file with this filtered subset. This assignment helped me practice importing data, using plyr to summarize grouped information, working with string matching, and exporting results to files.
Comments
Post a Comment