PH Biostatistics Assignment 5
Problem 1
- test()
Problem 2
- correlation coefficient
Problem 3
- True
Problem 4
- 0
Problem 5
- there is a relationship between the two variables
Problem 6
-
-
- R = -0.85
-
-
- R = 0.49
-
-
- R = -0.03
-
-
- R = -0.48
-
-
-
-
Problem 7
library(readr)
NHANES_2020 <- read_csv(“C:/Users/MyNIST/Desktop/NHANES_2020.csv”, show_col_types = FALSE)
View(NHANES_2020)
library(ggplot2)
t1 = table(NHANES_2020$Gender, NHANES_2020$BMI_Cat)
t1
##
## Normal Obese Overweight Underweight
## 1 1044 1644 1521 62
## 2 1141 2043 1246 88
addmargins(t1)
##
## Normal Obese Overweight Underweight Sum
## 1 1044 1644 1521 62 4271
## 2 1141 2043 1246 88 4518
## Sum 2185 3687 2767 150 8789
prop.table(t1, margin=1)
##
## Normal Obese Overweight Underweight
## 1 0.24443924 0.38492156 0.35612269 0.01451651
## 2 0.25254537 0.45219124 0.27578575 0.01947764
From the resulting table, we can observe the relationship between Gender (1 for males, 2 for females) and BMI-Cat (categorized as Normal, Obese, Overweight, and Underweight). The total number of participants in the obese category is 3,687 (1,644 males and 2,043 females). When looking at the gender distribution, there are 4,518 females and 4,271 males in the dataset. This indicates that slightly more females participated in the study compared to males. The BMI distribution shows a higher proportion of both males and females in the obese category compared to other BMI categories.
Problem 8
t1 <- table(NHANES_2020$Gender, NHANES_2020$`BMI_Cat`)
test1 = chisq.test(t1)
test1
##
## Pearson’s Chi-squared test
##
## data: t1
## X-squared = 72.439, df = 3, p-value = 1.282e-15
test1$observed
##
## Normal Obese Overweight Underweight
## 1 1044 1644 1521 62
## 2 1141 2043 1246 88
test1$expected
##
## Normal Obese Overweight Underweight
## 1 1061.797 1791.692 1344.619 72.89225
## 2 1123.203 1895.308 1422.381 77.10775
The null hypothesis for the chi-square test is that there is no significant association between Gender and BMI-Cat, meaning that gender does not affect the distribution of BMI categories. The alternative hypothesis is that there is a significant relationship between Gender and BMI-Cat. The chi-square test result shows a chi-square statistic of 72.439 with 3 degrees of freedom and a p-value of 1.282e-15. Since the p-value is significantly lower than the commonly used threshold of 0.05, we reject the null hypothesis. This suggests that there is a statistically significant relationship between Gender and BMI-Cat, indicating that BMI distributions vary by gender.
Problem 9
ggplot(data=NHANES_2020, aes(x=Height, y=Weight)) +
geom_point()
The scatter plot of Height vs. Weight shows a positive relationship, with the points generally following an upward trend as height increases. This suggests that as height increases, weight tends to increase as well. However, the plot indicates variability in weight for individuals of similar height, implying that other factors, beyond just height, may influence weight.
Problem 10
cor(NHANES_2020$Height, NHANES_2020$Weight)
## [1] 0.4146947
The correlation coefficient between Height and Weight is 0.4147, indicating a moderate positive linear relationship between the two variables. This means that as height increases, weight also tends to increase, but the relationship is not perfect. A correlation coefficient closer to 1 would suggest a stronger linear relationship, whereas 0.4147 indicates that while height is associated with weight, other factors likely play a role in determining weight.
Problem 11
cor.test(NHANES_2020$Height, NHANES_2020$Weight)
##
## Pearson’s product-moment correlation
##
## data: NHANES_2020$Height and NHANES_2020$Weight
## t = 42.72, df = 8787, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3972318 0.4318574
## sample estimates:
## cor
## 0.4146947
The 95% confidence interval for the population correlation coefficient ranges from 0.3972 to 0.4319. Since the confidence interval does not contain zero, this supports the conclusion that there is a statistically significant positive correlation between Height and Weight. The interval also provides a more nuanced view of the correlation, indicating that while there is a moderate positive relationship, the exact value could fall within this range. The relatively narrow confidence interval suggests a stable estimate of the correlation in this dataset.
Problem 12
The chi-square test is a potent statistical analysis that can be applied when analyzing the association of two categorical variables. In this case, the association of Gender with the BMI-Cat will be analyzed. The test examines whether the deviation in the frequencies of observed cases in categories is just due to random distribution or reflects a real association. It is best applied when the datasets are large, and each frequency counted in the contingency table is over five. Low expected counts can distort the chi-square approximation and result in incorrect conclusions. This calls for sample size considerations to be adequate so that the results of the test are reliable. Furthermore, since the test is designed only to test for associations, it does not provide information on the strength or direction of the relationship; hence, other analyses must be conducted to fully understand the patterns, such as observed versus expected counts.
Expanding on the utility of the chi-square test requires making clear its limitations and instances of appropriate application. While very good at establishing whether a statistically significant association exists between categorical variables, it does not quantify the magnitude of the associations. This, therefore, calls for the combining of the chi-square test with other statistical measures that can give insight into how intense the association is. Moreover, the chi-square test strongly relies on the assumption of sufficient expected frequencies in each cell of the contingency table. Failure to observe this assumption can easily result in misinterpretations of the data. Consequently, in using the chi-square test, one also has to refer to supplementary statistical tools and to check whether the data set meets the necessary criteria for a valid and reliable analysis.
ORDER A PLAGIARISM-FREE PAPER HERE
We’ll write everything from scratch
Question
Remember to start all of your answers with a header (# Problem 1, # Problem 2, etc.). Check that your RMarkdown-generated Word output is formatted properly and submit your solutions through Blackboard by the due date.
Concepts
- Which R command can you use to examine the relationship between two continuous variables? (5 points)
- test()
- test()
- mean()
- relation()
- To examine the relationship between two continuous variables, you can use ______. (5 points)
- ANOVA
- t-test
- c2
- correlation coefficient
- The c2 test is to test whether there is a statistical relationship between two categorical variables. (5 points)
- True
- False
- If two variables have no relationship, the Pearson’s r is ______. (5 points)
- –1
- 0
- 1
- 05
- To determine if the correlation coefficient in the sample is statistically significant, your alternate hypothesis is ______. (5 points)
- there is no relationship between the two variables
- there is a relationship between the two variables
- the correlation coefficient is larger than .5
- the correlation coefficient is smaller than –.5
- Match each correlation to the corresponding scatter plot. (5 points)
PH Biostatistics Assignment 5
Application
This HW uses the NHANES 2022 data (same data set that we used in module 2. First you will need to import the data to your R Markdown but change the data path to match the path on your computer
library(readr)
NHANES_2020 <- read_csv(“C:/Users/Ruaa Al-Juboori/Desktop/PH610/NHANES_2020.csv”)
Next answer the following questions:
- Create a table to explore the relationship between Gender and BMI-Cat. How many participants are in the obese category? How many participants are (10 points) females?
- Run A Chi square and test the relationship between Gender and BMI-Cat.
What is your Null hypothesis and what is the alternative hypothesis? (10 points)
Would you keep or reject the null hypothesis? (Hint: check the P-value) (10 points)
- Create a scatter plot to examine the association between Height and Weight. What do you notice? (10 points).
- What is the correlation coefficient? How can you describe this relationship? (10 points)
- What are the upper and lower levels of the 95%CI of the population correlation coefficient? How would you describe it? (10 points)
Reflection
- How does the Chi-Square test help in understanding relationships between categorical variables? Reflect on when it is appropriate to use this test, and what key considerations, such as sample size and expected counts, can impact the accuracy of the test results? (10 points)