Assignment #9: Visualization in R – Base Graphics, Lattice, and ggplot2

Github Link: https://github.com/rayhankhan-svg/r-programming-assignments


R Code: 

data <- read.csv(file.choose(), stringsAsFactors = FALSE)


head(data)

str(data)


data$rownames <- as.numeric(data$rownames)

data$education <- as.numeric(data$education)


plot(data$rownames, data$education,

     main = "Base: Education by Observation",

     xlab = "Observation",

     ylab = "Education",

     col = "blue")


hist(data$education,

     main = "Base: Distribution of Education",

     xlab = "Education",

     col = "lightgreen")


library(lattice)


xyplot(education ~ rownames | gender,

       data = data,

       main = "Lattice: Education by Observation and Gender")


bwplot(education ~ job,

       data = data,

       main = "Lattice: Education by Job Type")


library(ggplot2)


ggplot(data, aes(x = rownames, y = education, color = gender)) +

  geom_point() +

  geom_smooth(method = "lm") +

  labs(title = "ggplot2: Education by Observation with Trend")


ggplot(data, aes(x = education)) +

  geom_histogram(binwidth = 1, fill = "blue") +

  facet_wrap(~ job) +

  labs(title = "ggplot2: Education Distribution by Job Type")


Output: 

> data <- read.csv(file.choose(), stringsAsFactors = FALSE)

> head(data)

  rownames    job education gender minority

1        1 manage        15   male       no

2        2  admin        16   male       no

3        3  admin        12 female       no

4        4  admin         8 female       no

5        5  admin        15   male       no

6        6  admin        15   male       no

> str(data)

'data.frame': 474 obs. of  5 variables:

 $ rownames : int  1 2 3 4 5 6 7 8 9 10 ...

 $ job      : chr  "manage" "admin" "admin" "admin" ...

 $ education: int  15 16 12 8 15 15 15 12 15 12 ...

 $ gender   : chr  "male" "male" "female" "female" ...

 $ minority : chr  "no" "no" "no" "no" ...

> data$rownames <- as.numeric(data$rownames)

> data$education <- as.numeric(data$education)

> plot(data$rownames, data$education,

+      main = "Base: Education by Observation",

+      xlab = "Observation",

+      ylab = "Education",

+      col = "blue")

> hist(data$education,

+      main = "Base: Distribution of Education",

+      xlab = "Education",

+      col = "lightgreen")

> library(lattice)

> xyplot(education ~ rownames | gender,

+        data = data,

+        main = "Lattice: Education by Observation and Gender")

> bwplot(education ~ job,

+        data = data,

+        main = "Lattice: Education by Job Type")

> library(ggplot2)

> ggplot(data, aes(x = rownames, y = education, color = gender)) +

+   geom_point() +

+   geom_smooth(method = "lm") +

+   labs(title = "ggplot2: Education by Observation with Trend")

`geom_smooth()` using formula = 'y ~ x'

> ggplot(data, aes(x = education)) +

+   geom_histogram(binwidth = 1, fill = "blue") +

+   facet_wrap(~ job) +

+   labs(title = "ggplot2: Education Distribution by Job Type")

























Explanation: 

For Module #9, I compared three R visualization systems, base graphics, lattice, and ggplot2, using the BankWages dataset. I chose education as the primary numerical variable for visualization because this dataset includes variables like rownames, jobs, education, and gender. I produced a scatter plot of education by observation number and a histogram illustrating the distribution of education using base R graphics. Although base visuals were simple and easy to use, each plot element had to be manually specified. Despite being straightforward, after the plot is established, they become less adaptable. I used bwplot() to compare education across job categories and xyplot() to show education by observation conditioned on gender using lattice graphics. Lattice improves cross-category comparison by making it simpler to work with grouped data and producing multi-panel charts automatically. Lastly, I made a scatter plot using ggplot2 that included a faceted histogram per job type and a regression line. Plot customization and extension are made easier by ggplot2's tiered architecture. In comparison to the other systems, it also generated graphics that were the clearest and most aesthetically pleasing. Lattice worked well for grouped and panel data, Base R was the most straightforward, and ggplot2 offered the greatest flexibility and high-quality visuals. The primary difficulty was adjusting to the various syntactic styles that each system employed.

Comments

Popular posts from this blog

Assignment #1

Module # 6 Doing math in R part 2

Module # 3 data.frame