Descriptive Statistics and Reporting Results
In our current modernized society, humans conduct most of their activities digitally. Technological digital devices and the internet facilitate daily activities such as shopping, social interactions, information search, and entertainment. It enhances the easy recording and analysis of such information and births an era of computational science, online marketing, and personalized search engines (Kosinski et al., 2013). Such records make it possible to distinguish recorded data and information that can be predicted through statistical analysis of such records.
Sometimes, people may choose not to reveal characteristics such as age and gender. Still, this information can be obtained directly from analyzing other life traits they unknowingly reveal. Therefore, this study seeks to establish the extent to which fundamental human trait tracks can be useful in predicting personal attributes some individuals privatize. The study uses Facebook likes and other records from Twitter and web browsers.
Figure 1: Descriptive Statistics and Reporting Results
The figure presents the study design of the selected attributes and traits for predictive analysis. These include religion, sex, personality, origin, and life satisfaction. Other characteristics such as drug and substance abuse, political views, and relationship status are also used to make inferences. The statistical tables present the study outcomes. The participants were a sample of 58,466 volunteers from the US. Their Facebook likes, and information from their profiles were obtained and compiled.
The User-like Matrix provides data for the possibility of an association between the character and his like, where 1 indicates an association, and 0 is no association. Linear regression predicts numeric variables for aspects such as age and intelligence, while logistic regression predicts other qualitative variables such as sex and gender. The User-like Matrix is easy and straightforward, and explicit. However, the data may be sophisticated for a typical reader to understand due to the specialty and complexity of the methods used. SVD and 10-fold cross-validation are uncommon techniques for data presentation.
An alternative method of presenting the data frame would be a simple data table showing the codes 0 and 1 and the variables they represent. Cross-tabulation analysis results from statistical software such as SPSS could more vividly depict the association and correlation (Field, 2018). Since the figure represents the study design, few conclusions can be made from the figures. For instance, the described components are justified by the few number of participants from which the information was drawn.
Figure 2: Dichotomous Variables Prediction
The graph presents the probable accuracy for predicting dichotomous variables. It is done in terms of the characteristic curve of the area under the receiver. It measures the likelihood of accurately classifying two users who were selected randomly from a class, for instance, single vs. in a relationship. The graph includes all the relevant data required for analyzing the aspect of perfect classification. The choice of colors to represent the different variables makes the figure appealing and understandable to the reader.
Due to the uniqueness of the aspect being probed, the method of presentation best fits the intended output. Another additional method for which the data model might be tested for prediction accuracy is obtaining the Mean Bias Error (MBE), which indicates whether the estimations are overestimated or underestimated. When a negative MBE value is obtained, it means underestimation. It can be derived from the figure that Caucasian Americans and African Americans were almost accurately classified with 95% probability, followed by gender at 93% (Kosinski et al., 2013). It implied a significant difference between the groups’ behaviors is expressed by the likes enhancing an almost perfect classification.
Figure 3: Numeric Variables Prediction
The graph shows the correctness of predicting numeric values between the actual and predicted, as depicted by the correlation coefficient from Pearson’s product-moment. The Facebook friendship network. The psychological traits were approximately measured using questionnaires since they cannot be measured directly (Kosinski et al., 2013). The accuracy of the used questionnaires is shown by the transparent bars on the basis of test-retest reliability scores. All the correlations are measured at a significant level of p<0.001. The graph sufficiently predicts the data and the representation is understandable even by a typical reader.
We can derive from the figure that openness is almost accurately predicted since the predicted value (r = 0.43) lies close to the test-retest reliability test score (r = 0.5). It denotes that fairly observing the likes of the user informs about the real openness trait of the user (Kosinski et al., 2013). Age depicted the highest correlation with r = 0.75, then density with r = 0.52, and size with 0.47. The predictions with the lowest accuracy were satisfaction with r = 0.17 and contentiousness with r = 0.29. For these behaviors, the prediction correctness is half the figure from test-re-test reliability scores.
Figure 4: Data Amount Available and the Accuracy of Prediction
The graph presents the results of participants with likes ranging from 1 to 700. The median number of likes per individual was 68, with an Interquartile Range equal to 152. The graph attempts to establish the change in prediction accuracy in the number of obtained likes given and the expected accuracy of a particular participant. A sample of n = 500 for users who had 300 likes was selected, and then running predictive models based on subsets 1,2, …, 300 were selected randomly (Kosinski et al., 2013). The line graphs do not indicate the correlation coefficients for people with different likes at a glance.
Bar charts indicating the corresponding likes and their coefficients could better present the data. However, the graph clearly shows the trend of the observed likes per group (gender, age, and openness) and Pearson’s coefficient of correlation. We can derive from the graph that about 50% of the users had a minimum of 100 likes, whereas about 20% of the users had a minimum of 250 likes each. Dobin et al. (2008) suggest applying a model-based approach as an alternative method that can be used to describe the amount of data (sample) required to yield a certain prediction accuracy.
Dobbin K., Zhao Y., Simon R, (2008): How Large a Training Set is Needed to Develop a Classifier for Microarray Data? Clinical Cancer Research. 14 (1): 108-114. 10.1158/1078-0432.CCR-07-0443.
Field A. (2018) Discovering Statistics Using Ibm Spss Statistics, Sage Publishers, California.
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private Traits And Attributes Are Predictable From Digital Records Of Human Behavior. Proceedings Of The National Academy Of Sciences, 110(15), 5802-5805. doi: 10.1073/pnas.1218772110
ORDER A PLAGIARISM-FREE PAPER HERE
We’ll write everything from scratch
Session 1 Research Assignment
Descriptive Statistics and Reporting Results
Manipulating and describing statistics so that they will be easily understood are basic but required skills for the researcher. Academic researchers may tolerate uninspiring text-based presentations if they must (but really do not need to). However, in the business world visuals and graphic presentations are a requirement. Right up there at the top of the presentation tools arsenal are those beautiful color charts and graphs that help explain the relationships represented in the numbers.
In their article entitled “Private traits and attributes are predictable from digital records of human behavior”, Kosinski, Stillwell, and Graepel (2013) use the Facebook “Likes” of over 58,000 volunteers in the United States to see if they can make predictions about them (and various sub-groups) based on those Likes. This article blends academic and business research in a way that is interesting and productive for decision-makers in both of these categories. The mechanics of setting up and manipulating such a large data set (big data), are outside the scope of this course, but this is an important emerging field for supporting decision-making of all types. The terabytes of information that are now available require ever more sophisticated ways to store, access, categorize, analyze, slice and dice.
This article uses the Facebook data to test their hypotheses and present their results. Even with only four figures (a flow chart, two bar charts, and a line graph), this article is a good example of how descriptive statistics can be used depending on the questions of interest. Note that some types of graphic descriptions or presentations are better suited for certain types of information (data) than others. As an aside, why might this data set be of interest to a business researcher? The authors explain in their opening paragraph:
A growing proportion of human activities, such as social interactions, entertainment, shopping, and gathering information, are now mediated by digital services and devices. Such digitally mediated behaviors can easily be recorded and analyzed fueling the emergence of computational social science and new services such as personalized search engines, recommender systems, and targeted online marketing. (Kosinski, et al, 2013)
For this assignment, read the article Private Traits and Attributes are Predictable from Digital Records of Human Behavior. Then answer the following questions for each of the Figures 1, 2, 3, and 4: Please paste website in google to obtain the article, Thank you. https://www.pnas.org/content/110/15/5802.full
- What type of graphic (descriptive statistical representation) is it and what data/information does it represent?
- Does the graphic represent the data well (is it easy to understand)? Why or why not? (E.g. Think of the type of graphic it is, color choice, and placement, data that is included versus excluded; if possible, will it be clear to the target reader, etc.)
- What are some other ways that the data might be presented?
- What ideas or conclusions can you draw from the graphic?
Your paper should be 3-4 pages long, not including cover and reference pages, and be formatted according to APA guidelines..
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15). Retrieved from http://www.pnas.org/content/110/15/5802.full
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."