Data Science And Machine Learning
*Note: Sub-titles are not captured in Xplore and should not be used
line 1: 1st Given Name Surname
line 2: dept. name of organization
line 3: name of organization
line 4: City, Country
line 5: email address or ORCID
line 1: 4th Given Name Surname
line 2: dept. name of organization
line 3: name of organization
line 4: City, Country
line 5: email address or ORCID
Abstract—The following report outlines the impacts of data science and its constituent Machine Learning in the general aspects of social organizations. It outlines the benefit of MI such as its ability to manipulate and understand large data sets and transform them into actionable knowledge for decision making. It also provides critical issues impacting the rise of AI such as cybersecurity and the potential adverse effects to individuals and society in general.
Keywords—Machine Learning, Data Science, data, dataset, enterprise, consumer, and cybersecurity
I. INTRODUCTION (HEADING 1)
Machine learning and data science remain to be one of the most significant domains in today’s world. The fields of data science, AI and machine learning continues to contribute in a very large way to how life is continually conceptualized in the modern day world. From aspects such as social media, to health to travel, data science has come to play a significant role, and more than this, its role is only expanding to include all aspects of life. In this era of information and growth of internet use, more and more people have seen aspects in the virtual world become incorporated to their day to day work. Take for example advertisements. Its very common today to find advertisement of goods tailored to an individual needs in a market with large and diverse offerings. Fast ordering and timely delivery of all these kinds of goods, somes which emanate from far off countries continue to done, and easy payment of goods, with the knowledge that one will access quality goods and service without ever meeting the buyer or the seller has come to be by default the order of the day. How is all this achieved?
All of this is thankful to the availability of data, and a large pool of resources invested to collect, analyze, evaluate, use and store the data. The internet’s popularity and its ability to make all these things possible is thankful to availability of data, as a form of reciprocal service by the customers to various organization with the main aim of establishing patterns and understanding value in somewhat meaningless information of raw data. The information that is commonly sourced today ranges from individual and company’s values, personal interest, wishes to more sensitive data such as health information, dates of birth, identity and even financial information. For companies to know what an individual needs, access to this data is critical, making data the key resource and lifeline of a digital world. The value for data is so important to extent that some of the major companies in the world, choose to offer their primary service to their customers free of charge in a bid to increase number of users and mine or acquire as much data as possible. This data is used for other significant opportunities in the marketplace and general world.
Companies such as Facebook and Google have come to use this data for purposes of advertising as just a few clicks on their platforms could be enough to identify and group enmasse, for other organizations, their potential buyers, interest, political views, intelligence, customer preference, sexual orientation, gender to exclude others, all of which are important in defining individual company profit . This is really profitable and effective in defining the reigning world order, especially in a globalized world where capitalism continues to define and affect all social value system central to human development. Unlike before where excel sheets stored majority of the data for organizations, increased internet uptake has resulted to development of advance data analysis tools in order to accommodate and understand the value of the data generated. Data science, as such, comes to be defined as a process of extracting useful insights from data by utilizing a variety of integrated tools, algorithms and machine learning fundamental in a bid to understand the customer or social organizations and develop. With data becoming synonymous to understanding various social identity, data science and machine learning are increasingly used to give quality predictions and estimations that ultimately allow for right decision making and smarter actions in real time with near zero human intervention.
The development of machine learning (MI) is commonly referred to as Industrial Revolution 4.0 . Machine learning has come to be commonly interpreted as the new age electricity due to the fact that it has resulted in massive and ground shaking moves in every organizational business framework worldwide. Data science and machine learning today are the hot beds for technology. They are, as a result, growing at a very exponential rate, employing highly paid specialities, as well as having a shortage of talents. But this was not always the case for the fields of data science . In the past human action and agency to achieve workable solutions to common problem was constrained to impossibilities in their abilities to perform said tasks and functions within a specified time frame .
While data has always been available, ability to compile and transform said data and ensure that it provides useful information in relation to common social, economic and political problems was often not available. This meant that credible use and application of said data was less often achieved in the required period time and space, increasingly hampering the ability for people to conduct task in a meaningful manner that would bring results. While computer technology was developed due to an agency to be able to understand data, the emergence of the Internet of Things (IoT) is what was critical in launching Big Data trends that ultimately changed and revolutionized the field of information and information technologies, computer science and operations management . The idea of a data-led social, economic, cultural and political growth was based off an understanding of the role of “data” being an interpretive framework in making sense of the world, and the information gained from said data being critical in interacting with the world, people and phenomena .
Data science and its constituent fields, enable people in the real world, to interact with big data, and through special tools and algorithms, ask meaningful questions that allow them to generate novel insights. Passi and Jackson outline that data science should simply be understood as a sociomaterial practice through which human beings and technical forms of their creation work to achieve specific, significant and mutually shaping results . Machine learning, as a constituent field in understanding the science behind the use and application of data has been key in studying algorithms and techniques for automation of solutions to common problems that are hard to constantly program through common programming methods . Simply stated data science seeks to transform information that is gained from real world actions, by people who make the decisions into actionable knowledge, predicting social behaviors, through current trends of data .
While data science has been effective in extending the actionable knowledge to real life solutions, MI has been key in creating detailed designs, used to understand large data sets, and reduce human error, in many cases such designs can be hard to render despite there being clear specification . Nonetheless their ascension to glory in the world of information technology has come at an increased cost of cybersecurity risks that are constantly growing and adopting to the dynamic sociopolitical world, and manifest in real world implications many of which have played a significant role and continue to do so, in shaping public opinion, moral panics and consequently how the internet works.
III. TECHNICAL OVERVIEW
Data identity is a critical framework that was conceptualized to promote the effectiveness of collection of data and its use, relative to a real-world problem. As already pointed out, data science becomes an enabler in enterprises and organizations (across a wide range of setting) in the world to measure, track and log performance metrics for the main reason of facilitating organization wide enhanced decision making. Individual, companies and organization with access to these data can analyze trends to make critical decisions that actively engage customers in a more nuanced manner, they also enhance company or enterprise performance with the main goal of achieving its mission, vision or stated objective. In the business world for example, this mission would be closely tied to profitability while in the political world the vision may be linked to policy implementation or the election of a specific agenda or individual. Its nature inherently shapes its use and laws in the real world.
In the contemporary understanding of the function of data science, it is important to understand that data is a critical resource in the real world with the ability to make meaningful impacts to people and the economy in general. The impact might be positive or negative depending on its use. Human actions relative to the use of data becomes critical in understanding its use and applications in general, and has been critical in shaping other technologies surrounding the data (eg MI, that will be discussed later on in this chapter). Personal data is data that makes identifiable a natural person. It includes names, dates of birth, financial information, religious beliefs, health information, genetic or biometric data, and IP addresses among others. They all fall under an umbrella of data that ought to be protected as they are sensitive and in the hands of an unauthorized user their application may be detrimental to the individual person. Data science as a field, heavily depends on the credibility and trust of the user. In the context of the society, human action and agency becomes a major basis under which said data is made actionable. Researchers identify that nowadays in data science effectively algorithmic data based on ‘sufficient’ data…are often considered through the perspective of credibility of the user of said data establishing their actionable framework . These perception of data science allows for effective practice and governance of the field, and make proper governance, accountability and responsibility as the main priorities of researchers and practitioners in the field of data science.
Practicewise, trust remains central to ensure credibility. Algorithms become adjudged at all times through their ability to show statistical probabilities, establish performance metrics and through use of prior knowledge and current experts, they come to offer reliable insights. Research indicates that within the data scientist community, the problem is not lack of access to relevant information, rather it is based off the excessive access to irrelevant information . Having the knowledge to process and separate information relative to their relevance in the current space and time- “filtering” and “transforming” this information into current actionable knowledge is the main goal of data science. The information becomes actionable to data scientists, and policy makers across the world to solve both organizational and social problems . Data science bases its answers and abilities to achieve workable solutions on mastery of math, statistics and visual techniques. This techniques go well when blended with computer science and programming skills. Lana-Reyes outlines that data science is a more holistic approach to understanding social organization and its interaction with other branches of human interactions .
Competencies of an data scientist go beyond computational thinking but require that they understand decision, statistical and simulation modeling, data management, enterprise architecture, and domain knowledge to be able to point out context-specific application of acquired knowledge relative to available data. Berry, Mohamed and Yap point out to a demand in advanced data analytic tool . It is widely conceptualized that “Data scientists may have the necessary expertise to test systems, but the manual intractability of large-scale data coupled with the complex, sometimes opaque, nature of state-of-the-art models makes it hard even for them to clearly articulate and ascribe trustworthiness to approaches and insights. This is even harder for lay users who lack specialized data science knowledge but are sometimes the users of these systems or those most impacted by them” . The need for calculated trust to be established and the fact that most of this analysis are complex becomes the basis under which MI uptake is seen.
MI as a subfield of artificial intelligence (AI) has a general objective which is to understand the structure of data and fit the said data into models that are less complex and could be easily understood as actionable knowledge by people. While it is a field in computer science its application and computational approaches relative to traditional computation usually differ from convention. Algorithms which are programed instruction do not explicitly follow this path, rather algorithms are created to allow for the computer to train on data inputs and through use fo statistical analysis create an output with valuable information within a specified range of queries . Machine learning generally comes to allow computers to build models from sampled data sets in a bid to automate decision-making process based on data gathered. For the most part they have been very useful and effective in automated decision making . The tasks in machine learning are generally specified into two distinct categories.
These categories are conceptualized on how learning is achieved or based on how feedback on the self-taught system is provided back to the MI tool . They include supervised learning, and unsupervised learning . Respectively learning is achieved based on feedback provided by humans or learning is achieved through the provisioned algorithm with no labeled data, being compelled to find structure within the input datasets . In supervised learning the computer is generally instructed or taught by being provided with examples inputs that are labeled towards the human’s desired output information. The algorithm is made to learn. On the contrary, in unsupervised learning, data is unlabeled compelling the learning algorithm to identify common aspects between specific data in a dataset. Through correlation and/or regression (identify association between two or more data and/ or defining the independent and dependent variable among specific data) the algorithm achieves learning. Overall MI has allowed computers to take in large amounts of data inputs, process the data, teach themselves new skill using the input data and algorithm, and effectively provide solutions or automate decision making in the real-world to help solve real-world problems. MI can generally be regarded as the first steps in a bid to achieve AI processes among machines as it enables computers to learn and act semi-autonomously, ensuring a reduction of human error in some cases.
IV. IMPACTS OF THE TECHNOLOGY
A. On the Economy
Death of distance is what the internet is conceptualized to have achieved, as it made far fetched places seem near, advancing the ability to communicate and share information. While spatial hinderances was reduced in respect to the internet, MI has been more effective in providing integrated services and continuous delivery, based on people’s needs and wants, understanding their needs before hand, and ensuring all this aspects are at their fingertip when they need it. It could be conceptualized that MI is working and in its initial stages of ensuring death of time. Availability of data on consumer needs, wants, ideologies, and perception has allowed for product and services provision, as well as exchange of money between people and businesses as well as government more streamlined, making the market place more integrated and developed. In capitalist nations this has been immense in building businesses and linking producers/ retailers/ whole sellers and their buyers together. It has increased the probability for profits as it exposes these same enterprises to their customer needs, identifying what they dislike and why as such manifesting greater competition between different enterprises, and more goods for consumers.
B. On Society
On the social levels the impacts have generally been positive and negative. While access to goods and services has been immense in giving the average consumer in the society freedom of choice, the limitation is that there has come to be immense products and service options that has come to hinder their ability to make good choices. Social media companies have been effective in socializing the average user, introducing them to like-minded persons and expanding their social spaces. This has been immensely in increasing social interaction, political discourse and ensuring general growth and development. On the contrary, they have led to aspects such as voter apathy, misinformation due to overloading the public conscience with too much actionable knowledge (many from a variety of perspectives) creating a generally desensitized public. It has also expanded spheres of socialization beyond traditional boundaries, encouraging the spread of bullying and other social vices such as financial fraud, stalking etc beyond schools, and bank halls, etc.
C. On the Environment
MI accelerates the speed of understanding data and provides actionable knowledge for proper/ accurate decision making. In this regard, MI can be seen as a force for good as it has the ability to use these same information to enhance its predictive power  and help detect energy emissions, deduce efforts to reduce CO2, assist in the development of greener transportation networks, monitor deforestation and through data predict hotspot for adverse weather conditions. Negatively, its over reliance on power-intensive GPUs to achieve learning and prediction may be bad for CO2 emissions.
D. In a Global Context
The internet brought about the death of distance and allowed the globe to be synonymous to a small village. MI has done more. From a global economy perspective AI has resulted in a more integrated globalized economic system with multinational companies raking huge profits . Both data science and economics have solid foundations in statistics. This more generally implies greater understanding of information for profitability purposes. It is in the economic sector of most nations that data science and MI have probably made the most inroads primarily because most organizations that pioneered in the advancement of MI and data science in general were doing so in order to achieve profitability and larger market share. Companies such as Facebook, Twitter, Google, Tesla, and other social media, marketing and advertising agencies have gained so much information from the people they serve and in turn used this information to expand and become multinational. In the age of globalization, interconnectedness of economies and a deregulated financial market to allow more movement of goods, services and people has come to imply matu nationalization of such companies, expansion to cover more people around the globe, and ultimately profit. This implies more revenue for the company which results in greater tax benefit for their respective government relative to the laws.
V. FUTURE OF THE TECHNOLOGY
In the current playing field, various important issues have been raised regarding the field of data science and the effects of potential growth in MI. Cybersecurity risks by far present the most concerns for the system. Dynamic growth of the technology has seen a constant shift of machine learning techniques, processes and system throughout the industry, and for many in the industry, this constant growth has exposed many companies and enterprises to vulnerabilities through hacking. A major driving force here is based on the fact that big data trends, and the data itself is of so much value to any entity that finds actionable knowledge [4, 5]. Only a handful of companies have access to this critical data and they too use it for their own benefit, oten exposing the industry to its dark underbelly where lack of accountability and trust, in combination to access to data may result in some companies (eg Facebook, Google) selling this data to the highest bidder. The impact is that large groups of people may be manipulated by the said companies to social engineer their decisions tastes and wants. It is important that this has already been noticed by most people, and in most cases platforms and different initiatives of social scaffolding have been initiated  to introduce laws, regulations and policy that will guide the use of MI in the future.
Conclusively, data science and MI is a force for good. Its impact in decision making to solve real world problem continue to be felt across all sectors. The growth of data “big data trends” has only manifested in more capable MI (due to the limitations of humans to generate complex data analyzes systems constantly to solve problems), with greater ability to learn and automate decision making. The jury is out on whether automated decision making is an effective tool, on the contrary, most of the actionable knowledge that is usually gained from analysis of large data sets by MI tools, have been effective in solving real-world problems.
J. Sempf, “The age of information: What makes your data so valuable?”, Hornetsecurity, 2019. [Online]. Available: https://www.hornetsecurity.com/en/security-information/data-value/?_adin=02021864894. [Accessed: 28- Mar- 2022].
 Rebala, G., Ravi, A., & Churiwala, S. (2019). Machine Learning Definition And Basics. In An Introduction To Machine Learning (pp. 1-17). Springer, Cham.
 C. Morrison, “A brief history of data science and machine learning”, Triplebyte.com, 2020. [Online]. Available: https://triplebyte.com/blog/brief-history-of-data-science-and-machine-learning. [Accessed: 28- Mar- 2022].
 Passi, S., & Jackson, S. J. (2018). Trust In Data Science: Collaboration, Translation, And Accountability Incorporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 1-28.
 Luna-Reyes, L. F. (2018). The Search For The Data Scientist: Creating Value From Data. Acm Sigcas Computers and Society, 47(4), 12-16.
 Berry, M. W., Mohamed, A., & Yap, B. W. (Eds.). (2019). Supervised And Unsupervised Learning For Data Science. Springer Nature.
 Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity Data Science: An Overview From A Machine Learning Perspective. Journal of Big data, 7(1), 1-29.