Module # 8 Input/Output, string manipulation and plyr package

Github Link: https://github.com/rayhankhan-svg/r-programming-assignments

R Code:

# Step 1: Import dataset

x <- read.table(file.choose(), header = TRUE, sep = ",")

# View dataset

# Step 2: Install/load plyr and calculate mean Grade by Sex

install.packages("plyr")

library(plyr)

y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))

# View result

# Step 3: Write the mean result to a CSV-style file

write.table(y, "Sorted_Average", sep = ",", row.names = FALSE)

# Step 4: Filter names containing the letter i or I

newx <- subset(x, grepl("[iI]", x$Name))

# View filtered dataset

newx

# Step 5: Write the filtered subset to a CSV-style file

write.table(newx, "DataSubset", sep = ",", row.names = FALSE)

Output:

> # Step 1: Import dataset

> x <- read.table(file.choose(), header = TRUE, sep = ",")

> # View dataset

> x

Name Age Sex Grade

1 Raul 25 Male 80

2 Booker 18 Male 83

3 Lauri 21 Female 90

4 Leonie 21 Female 91

5 Sherlyn 22 Female 85

6 Mikaela 20 Female 69

7 Raphael 23 Male 91

8 Aiko 24 Female 97

9 Tiffaney 21 Female 78

10 Corina 23 Female 81

11 Petronila 23 Female 98

12 Alecia 20 Female 87

13 Shemika 23 Female 97

14 Fallon 22 Female 90

15 Deloris 21 Female 67

16 Randee 23 Female 91

17 Eboni 20 Female 84

18 Delfina 19 Female 93

19 Ernestina 19 Female 93

20 Milo 19 Male 67

> # Step 2: Install/load plyr and calculate mean Grade by Sex

> install.packages("plyr")

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.5/plyr_1.8.9.tgz'

Content type 'application/x-gzip' length 984753 bytes (961 KB)

==================================================

downloaded 961 KB

The downloaded binary packages are in

/var/folders/3k/3m0n90_x4pj426bkxvf2p01h0000gn/T//RtmpFal0Bh/downloaded_packages

> library(plyr)

> y <- ddply(x, "Sex", transform, Grade.Average = mean(Grade))

> # View result

> y

Name Age Sex Grade Grade.Average

1 Lauri 21 Female 90 86.9375

2 Leonie 21 Female 91 86.9375

3 Sherlyn 22 Female 85 86.9375

4 Mikaela 20 Female 69 86.9375

5 Aiko 24 Female 97 86.9375

6 Tiffaney 21 Female 78 86.9375

7 Corina 23 Female 81 86.9375

8 Petronila 23 Female 98 86.9375

9 Alecia 20 Female 87 86.9375

10 Shemika 23 Female 97 86.9375

11 Fallon 22 Female 90 86.9375

12 Deloris 21 Female 67 86.9375

13 Randee 23 Female 91 86.9375

14 Eboni 20 Female 84 86.9375

15 Delfina 19 Female 93 86.9375

16 Ernestina 19 Female 93 86.9375

17 Raul 25 Male 80 80.2500

18 Booker 18 Male 83 80.2500

19 Raphael 23 Male 91 80.2500

20 Milo 19 Male 67 80.2500

> # Step 3: Write the mean result to a CSV-style file

> write.table(y, "Sorted_Average", sep = ",", row.names = FALSE)

> # Step 4: Filter names containing the letter i or I

> newx <- subset(x, grepl("[iI]", x$Name))

> # View filtered dataset

> newx

Name Age Sex Grade

3 Lauri 21 Female 90

4 Leonie 21 Female 91

6 Mikaela 20 Female 69

8 Aiko 24 Female 97

9 Tiffaney 21 Female 78

10 Corina 23 Female 81

11 Petronila 23 Female 98

12 Alecia 20 Female 87

13 Shemika 23 Female 97

15 Deloris 21 Female 67

17 Eboni 20 Female 84

18 Delfina 19 Female 93

19 Ernestina 19 Female 93

20 Milo 19 Male 67

> # Step 5: Write the filtered subset to a CSV-style file

> write.table(newx, "DataSubset", sep = ",", row.names = FALSE)

Sorted_Average:

"Name","Age","Sex","Grade","Grade.Average"

"Lauri",21,"Female",90,86.9375

"Leonie",21,"Female",91,86.9375

"Sherlyn",22,"Female",85,86.9375

"Mikaela",20,"Female",69,86.9375

"Aiko",24,"Female",97,86.9375

"Tiffaney",21,"Female",78,86.9375

"Corina",23,"Female",81,86.9375

"Petronila",23,"Female",98,86.9375

"Alecia",20,"Female",87,86.9375

"Shemika",23,"Female",97,86.9375

"Fallon",22,"Female",90,86.9375

"Deloris",21,"Female",67,86.9375

"Randee",23,"Female",91,86.9375

"Eboni",20,"Female",84,86.9375

"Delfina",19,"Female",93,86.9375

"Ernestina",19,"Female",93,86.9375

"Raul",25,"Male",80,80.25

"Booker",18,"Male",83,80.25

"Raphael",23,"Male",91,80.25

"Milo",19,"Male",67,80.25

DataSubset:

"Name","Age","Sex","Grade"

"Lauri",21,"Female",90

"Leonie",21,"Female",91

"Mikaela",20,"Female",69

"Aiko",24,"Female",97

"Tiffaney",21,"Female",78

"Corina",23,"Female",81

"Petronila",23,"Female",98

"Alecia",20,"Female",87

"Shemika",23,"Female",97

"Deloris",21,"Female",67

"Eboni",20,"Female",84

"Delfina",19,"Female",93

"Ernestina",19,"Female",93

"Milo",19,"Male",67

Explanation:

In order to get the mean grade for Module #8, I imported the Assignment 6 dataset into R and used the plyr tool using Sex as the category. After dividing the data by sex using the ddply() function, I created a new variable named Grade. I used the Grade column's mean to calculate the average. The average grade for female students was 86.9375, while the average grade for male students was 80.25, according to the data. I then used write.table() with comma separation to write this output to a file. I then created a new subset of the original dataset by filtering it so that only names with the letter i (including uppercase I) were included. To get those names, I used subset() in conjunction with grepl ("[iI]", x$Name). Students like Lauri, Leonie, Mikaela, Aiko, Corina, Petronila, Alecia, Shemika, Deloris, Eboni, Delfina, Ernestina, and Milo were included in the smaller dataset that resulted from this. Lastly, I created a second CSV-style file with this filtered subset. This assignment helped me practice importing data, using plyr to summarize grouped information, working with string matching, and exporting results to files.

Search This Blog

R Programming Journal – Rayhan Khan

Module # 8 Input/Output, string manipulation and plyr package

Comments

Post a Comment

Popular posts from this blog

Assignment #1

Module # 6 Doing math in R part 2

Module # 3 data.frame