Discussion – Big Data
Big data is large data that cannot be handled or processed using the traditional techniques of managing data. Some of the traditional techniques include relational databases such as SQL and Oracle (Sharma, 2020). Large data, which is big data, consists of data such as that shared on the internet daily. For example, the data shared daily on applications including Facebook, AliExpress.com, and Amazon.com is humongous (Sharma, 2020). According to Cielen et al. (2016), big data characteristics include volume, variety, velocity, and veracity. Volume describes the amount of data to be managed. Variety describes different data and their diversity. Velocity describes the speed at which new data is generated. Veracity describes data accuracy. Storage, analysis, and reporting are big data challenges that should be dealt with to enhance data management (Sharma, 2020). Because RDBMS cannot be used to manage and store big data, compression technology can be employed for big data to occupy smaller storage. In big data, data differ in size and structure, making it difficult to process. This challenge could be solved by employing distributed processing, where multiple computers can process data in a network. Reports from big data can be technical in nature because they are mainly statistical. To solve this challenge, reports could be interpreted and simplified before dissemination. Need help with your assignment ? Reach out to us. We offer excellent services.
According to research, Python is ranked first in data analytics (Sharma, 2020). Big data experts find python easy to use Python because of its inbuilt features that facilitate efficient data analysis. For example, python has data analytical tools and is extensible on various platforms (Sharma, 2020). Some of the areas where Python is used to perform data analysis on big data include learning institutions, hospitals, government institutions, online stores, and supermarkets. In supermarkets and grocery stores, data analysis could be used to enhance marketing (Schaaf, 2017). This is done by collecting customer information based on their purchases. Such information would enable the supermarket to identify fast-moving and slow-moving products. As a result, the supermarket would know how to manage their inventory as well as their shelves display. For example, there would be no empty shelves or products going bad on the shelves. However, during customer data collection, a company should ensure that data privacy is upheld (Schaaf, 2017). Bid data analysis in supermarkets could also facilitate the evaluation of current market performance and predict future performance. For example, an advertisement could be evaluated based on the number of customers it reached, how many liked or disliked it, and how many made purchases of the advertised products. Feedback could also be collected from customers and analyzed. Another area that would benefit from big data analysis is a learning institution such as a university. Data would be collected and analyzed based on students’ preferences, number of enrollments for certain courses, and fee pay ability. The software can determine human behavior by making advertisement pop-ups and noting how humans respond. For example, advertising limited pizza discounts could make customers rush to place orders even though the customers had not planned to order pizza on that day. This activity would support the decision-making process for the pizza house. The house could identify days when a pizza discount is more effective. For example, more customers might react to a pizza discount advertisement on a weekend than on a Monday.
Some of the data analysis tools used in Python include numerical Python (NumPY), Matplotlib, and Pandas (Sharma, 2020). NumPY is used for scientific, mathematical, and engineering operations on data. NumPY arrays are lists that allow sufficient storage even as data increases in size (Sharma, 2020). With sufficient memory, NumPY facilitates fast data operations. Pandas provide more features of data handling as compared to NumPY (Sharma, 2020). For example, Pandas has high performance and uses data frames to read and write data on a file. Its indexing functionality facilitates fast manipulation of data. Some of the fast operations enhanced by Pandas indexing functionality include aggregation, slicing, and reshaping. Matplotlib plots data in 2D, enhancing the data analysis presentation (Sharma, 2020). The plots are also suitable for publication purposes. All plots on Matplotlib allow users to resize and study them closer (Sharma, 2020). IPython can be used together with Matplotlib to facilitate data exploitation.
Python can be used to analyze data and find least squares and regression for some two variables from a CSV file. The variables could describe data against a determinant variable and provide a prediction. To find least squares and regression, Pandas can be used to read CSV files for data analysis (Sharma, 2020). After the CSV file is read, mathematical and statistical formulas could be applied, and the results plotted using Matplotlib. This task would require three tools, Pandas, NumPy, and Matplotlib, but not just a single tool or library. The tools would be fast for the task of large data, unlike when writing SQL code to analyze data. SQL code could be successful at computing least squares and regression, but it would not analyze large data because it is a relational database management system.
References
Cielen, D., Meysman, A., & Ali, M. (2016). Introducing data science: Big data, machine learning, and more, using Python tools. Manning Publications.
Schaaf, K. V. (2017). How Big Data Analytics Can Contribute to the Marketing Performance of Supermarkets [Master’s thesis]. https://edepot.wur.nl/422370
Sharma, R. D. (2020). Python Tools for Big Data Analytics. International Journal of Science and Research (IJSR), 9(5), 597-602. https://www.ijsr.net/archive/v9i5/SR20507222308.pdf
ORDER A PLAGIARISM-FREE PAPER HERE
We’ll write everything from scratch
Question
Research the following and write a 2- to 3-page double-spaced paper (not including the title and reference pages) addressing the following:
• Define big data.
Reference Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools, Chapter 1: Data Science in a Big Data World, or other scholarly resource.
• Discuss the role of languages like Python in the analysis of big data, such as, but not limited to, a grocery store database of purchases.
• How can software determine patterns of human behavior from this data?
• How does this information affect business decisions?
• What tools are available in Python to analyze data? List at least three.
Reference Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools, Chapter 3: Machine Learning / 3.1.3 Python Tools Used in Machine Learning; Python.org documentation; or other resources.
• Select one specific tool and examine a hypothetical scenario where it would be used in analyzing data to determine a pattern.
Describe the tool and share a documentation resource.
Describe the hypothetical scenario where this tool could be used.
Address how this tool would help in analyzing data to determine patterns.
Viewpoint and purpose should be clearly established and sustained, and the paper should be well-ordered, logical, and unified, as well as original and insightful.
Assignment should follow the conventions of Standard English (correct spelling, grammar, and punctuation) and be in APA format with separate titles and reference pages formatted per APA guidelines.
Requirements:
For more information on APA style formatting, go to Academic Writer, formerly APA Style Central, under the Academic Tools area of this course.
Also, the university policy on plagiarism should be reviewed. If you have any questions, please contact your professor.
Directions for Submitting Your Assignment