PH Biostatistics Assignment 5

Eunice

9 months ago

PH Biostatistics Assignment 5

Problem 1

test()

Problem 2

correlation coefficient

Problem 3

True

Problem 4

Problem 5

there is a relationship between the two variables

Problem 6

- - R = -0.85
- - - R = 0.49
  - - - R = -0.03
    - - R = -0.48

Problem 7

library(readr)
NHANES_2020 <- read_csv(“C:/Users/MyNIST/Desktop/NHANES_2020.csv”, show_col_types = FALSE)
View(NHANES_2020)

library(ggplot2)

t1 = table(NHANES_2020$Gender, NHANES_2020$BMI_Cat)
t1

##
##     Normal Obese Overweight Underweight
##   1   1044 1644       1521          62
##   2   1141 2043       1246          88

addmargins(t1)

##
##       Normal Obese Overweight Underweight Sum
##   1     1044 1644       1521          62 4271
##   2     1141 2043       1246          88 4518
##   Sum   2185 3687       2767         150 8789

prop.table(t1, margin=1)

##
##         Normal      Obese Overweight Underweight
##   1 0.24443924 0.38492156 0.35612269 0.01451651
##   2 0.25254537 0.45219124 0.27578575 0.01947764

From the resulting table, we can observe the relationship between Gender (1 for males, 2 for females) and BMI-Cat (categorized as Normal, Obese, Overweight, and Underweight). The total number of participants in the obese category is 3,687 (1,644 males and 2,043 females). When looking at the gender distribution, there are 4,518 females and 4,271 males in the dataset. This indicates that slightly more females participated in the study compared to males. The BMI distribution shows a higher proportion of both males and females in the obese category compared to other BMI categories.

Problem 8

t1 <- table(NHANES_2020$Gender, NHANES_2020$`BMI_Cat`)
test1 = chisq.test(t1)
test1

##
## Pearson’s Chi-squared test
##
## data: t1
## X-squared = 72.439, df = 3, p-value = 1.282e-15

test1$observed

##
##     Normal Obese Overweight Underweight
##   1   1044 1644       1521          62
##   2   1141 2043       1246          88

test1$expected

##
##       Normal    Obese Overweight Underweight
##   1 1061.797 1791.692   1344.619    72.89225
##   2 1123.203 1895.308   1422.381    77.10775

The null hypothesis for the chi-square test is that there is no significant association between Gender and BMI-Cat, meaning that gender does not affect the distribution of BMI categories. The alternative hypothesis is that there is a significant relationship between Gender and BMI-Cat. The chi-square test result shows a chi-square statistic of 72.439 with 3 degrees of freedom and a p-value of 1.282e-15. Since the p-value is significantly lower than the commonly used threshold of 0.05, we reject the null hypothesis. This suggests that there is a statistically significant relationship between Gender and BMI-Cat, indicating that BMI distributions vary by gender.

Problem 9

ggplot(data=NHANES_2020, aes(x=Height, y=Weight)) +
geom_point()

The scatter plot of Height vs. Weight shows a positive relationship, with the points generally following an upward trend as height increases. This suggests that as height increases, weight tends to increase as well. However, the plot indicates variability in weight for individuals of similar height, implying that other factors, beyond just height, may influence weight.

Problem 10

cor(NHANES_2020$Height, NHANES_2020$Weight)

## [1] 0.4146947

The correlation coefficient between Height and Weight is 0.4147, indicating a moderate positive linear relationship between the two variables. This means that as height increases, weight also tends to increase, but the relationship is not perfect. A correlation coefficient closer to 1 would suggest a stronger linear relationship, whereas 0.4147 indicates that while height is associated with weight, other factors likely play a role in determining weight.

Problem 11

cor.test(NHANES_2020$Height, NHANES_2020$Weight)

##
## Pearson’s product-moment correlation
##
## data: NHANES_2020$Height and NHANES_2020$Weight
## t = 42.72, df = 8787, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3972318 0.4318574
## sample estimates:
## cor
## 0.4146947

The 95% confidence interval for the population correlation coefficient ranges from 0.3972 to 0.4319. Since the confidence interval does not contain zero, this supports the conclusion that there is a statistically significant positive correlation between Height and Weight. The interval also provides a more nuanced view of the correlation, indicating that while there is a moderate positive relationship, the exact value could fall within this range. The relatively narrow confidence interval suggests a stable estimate of the correlation in this dataset.

Problem 12

The chi-square test is a potent statistical analysis that can be applied when analyzing the association of two categorical variables. In this case, the association of Gender with the BMI-Cat will be analyzed. The test examines whether the deviation in the frequencies of observed cases in categories is just due to random distribution or reflects a real association. It is best applied when the datasets are large, and each frequency counted in the contingency table is over five. Low expected counts can distort the chi-square approximation and result in incorrect conclusions. This calls for sample size considerations to be adequate so that the results of the test are reliable. Furthermore, since the test is designed only to test for associations, it does not provide information on the strength or direction of the relationship; hence, other analyses must be conducted to fully understand the patterns, such as observed versus expected counts.

Expanding on the utility of the chi-square test requires making clear its limitations and instances of appropriate application. While very good at establishing whether a statistically significant association exists between categorical variables, it does not quantify the magnitude of the associations. This, therefore, calls for the combining of the chi-square test with other statistical measures that can give insight into how intense the association is. Moreover, the chi-square test strongly relies on the assumption of sufficient expected frequencies in each cell of the contingency table. Failure to observe this assumption can easily result in misinterpretations of the data. Consequently, in using the chi-square test, one also has to refer to supplementary statistical tools and to check whether the data set meets the necessary criteria for a valid and reliable analysis.

ORDER A PLAGIARISM-FREE PAPER HERE

We’ll write everything from scratch

Question

Remember to start all of your answers with a header (# Problem 1, # Problem 2, etc.). Check that your RMarkdown-generated Word output is formatted properly and submit your solutions through Blackboard by the due date.

Concepts

Which R command can you use to examine the relationship between two continuous variables? (5 points)

test()
test()
mean()
relation()

To examine the relationship between two continuous variables, you can use ______. (5 points)

ANOVA
t-test
c²
correlation coefficient

The c² test is to test whether there is a statistical relationship between two categorical variables. (5 points)

True
False

If two variables have no relationship, the Pearson’s r is ______. (5 points)

–1
0
1
05

To determine if the correlation coefficient in the sample is statistically significant, your alternate hypothesis is ______. (5 points)
there is no relationship between the two variables
there is a relationship between the two variables
the correlation coefficient is larger than .5
the correlation coefficient is smaller than –.5

Match each correlation to the corresponding scatter plot. (5 points)
PH Biostatistics Assignment 5

Application

This HW uses the NHANES 2022 data (same data set that we used in module 2. First you will need to import the data to your R Markdown but change the data path to match the path on your computer

library(readr)

NHANES_2020 <- read_csv(“C:/Users/Ruaa Al-Juboori/Desktop/PH610/NHANES_2020.csv”)

Next answer the following questions:

Create a table to explore the relationship between Gender and BMI-Cat. How many participants are in the obese category? How many participants are (10 points) females?
Run A Chi square and test the relationship between Gender and BMI-Cat.

What is your Null hypothesis and what is the alternative hypothesis? (10 points)

Would you keep or reject the null hypothesis? (Hint: check the P-value) (10 points)

Create a scatter plot to examine the association between Height and Weight. What do you notice? (10 points).
What is the correlation coefficient? How can you describe this relationship? (10 points)
What are the upper and lower levels of the 95%CI of the population correlation coefficient? How would you describe it? (10 points)

Reflection

How does the Chi-Square test help in understanding relationships between categorical variables? Reflect on when it is appropriate to use this test, and what key considerations, such as sample size and expected counts, can impact the accuracy of the test results? (10 points)