deutero [reflective/learning capacity]: How are people and organizations denoting and worrying about the phenomena you study?
In 2013 we accumulated a total universe of digital data of 4.4 zetabytes, in 2020 this number has increased by 44 zetabytes, a tenfold increase. For many, it is clear that digital data and its manipulation and analysis through Big Data are key pieces in the future development of our society and our research. Thus, the Big Data is having implications in multiple orders, such as the individual, social, ethical-moral, scientific or economic.
In the last years multiple investigations have been developed in the field of critical studies of algorithms and also in relation to the experience and decision making of the users and their data. An example of this is the work that focuses on self-tracking devices. However, I have had more trouble finding bibliography that focus on the technicians who manipulate, care for and analyze these data. Likewise, in the social sciences the discourse of the epistemological revolution that Big Data represents is gaining ground, despite initial reticence.
Thus, it could be said in general that the subject is one that is valued and respected, that awakens the interest of many people. But it is also an uncomfortable subject, which puts in check the Big Data discourse of objectivity, neutrality and precision and tends to be ignored.
meta [dominant discourses]: What discourses constitute and circulate around the phenomena you study? Where are there discursive risks and gaps?
Considering the magnitude of the phenomenon, it seems reasonable to find multiple discourses that fight for hegemony in this field. On one hand, we could speak of the discourse of Big Data as an epistemological revolution, which is linked to thinking of Big Data as a cognitive device. This discourse defends that Big Data is going to suppose, in the medium-long term, a transformation in our way of thinking.
On the other hand, we find the discourse of Big Data as a product. The assertion that Big Data is 'the new oil' is famous, referring to the billionaire market that exists around it.
The discourse of Big Data as a transforming element of the social order is also relevant. The different imaginaries that exist about it already affect multiple elements of our lives, such as the way we socialize on the Internet, the use of self-tracking devices, personal safety, or social and citizen responsibility. An interesting example of this is the use of self-tracking applications that have emerged in relation to COVID-19.
This can also lead us to think about the legalistic discourse of digital personal data and the Big Data, a discourse that evaluates the ethical-moral character of the use of personal data, and which revolves around the concepts of property, privacy or consent.
Linked to this, we find discourses that approach the Big Data and the use of digital personal data from the concept of extractivism, and attend to the danger of being over-supervised, to a lack of personal freedom, to ignorance and lack of knowledge about the functioning of the Big Data and to the biases of the algorithms and, in general, the digital infrastructure of the Big Data.
macro [law, political economy]: What laws and economies undergird and shape the power the phenomena you study?
In the field of Big Data and personal digital data, it is not difficult to observe an extractivist mechanics. Big Data is a tool that requires many resources: large amounts of data, storage space, large amounts of energy, large numbers of technical workers, etc. The access to the Big Data product, as well as its management, generally falls on power figures: governments, big companies or multinationals, etc. This imbalance characterizes the economy of Big Data, where users are producers of raw information that do not know or are not aware of the mechanics of the process. A process that is left in the hands of a few and whose result benefits directly only a few. It is characteristic (and necessary for this situation to be maintained) a great lack of knowledge about the operation, location or characteristics of the process.
On the other hand, in this same field, data protection and privacy laws play a quite significant role, which tend to refer to the data as an object owned by an individual person.
meso [organizations]: What organizations are implicated in the phenomena you study? What geopolitics are in play?
Big Data involves organizations of all kinds, in which data circulates and acquires different values or meanings. Having the necessary resources to be able to handle and analyze Big Data, and even to have the results of those analyses, places us on a rather tense game board. We might say that the main organizations and actors are: companies, governments and civilian organizations. All these organization vary greatly in size, power or resources.
As organizations that see the data as a product we can find: companies that own data but do not have the resources to do something with it; companies that offer services of maintenance and analysis of data; companies that have the resources to both own data and analyze it; companies that sell their data to third parties. We can also find how most governments and administrations use their data to create new public policies, analyze the economical distribution, or design surveillance systems, for example. But we can also find research groups that use Big Data tools to keep, maintain, take care of and analyze their research data.
There are also organizations whose objective is not to use the data, but to protect it: Here, the role of national and supranational governments in determining what is legal or illegal in the use and acquisition of data should be taken into account. An example of this is the role of the European Union and its GDPR. We should also keep in mind the —mostly local— organizations of digital activists who fight against the extractivist logics of Big Data and defend personal integrity against the abuse of data mining.
There are also organizations that think of Big data not as a product but as a tool to solve social and environmental problematics, such as ‘Data for Good’ or ‘So Good Data’.
To all this we could add certain tensions that have recently appeared in the media and which have to do with the use and abuse of Big Data by large companies. A clear example of this is the case of Cambridge Analytica, but also the efforts of the United States to keep some companies of Chinese origin out of the country.
bio [bodies]: What are the bodily effects of the phenomena you study?
According to multiple works, this phenomenon is having several consequences on our bodies and the way we live. The most prevalent example is that of the self tracking devices, which has altered a good part of our habits and the way we attend to patterns or vital signs. For example, the appearance of intelligent watches has been a determining factor in our idea of 'a healthy life' and has altered our perception of the functioning of our body by allowing its visualization in graphs, or the accumulation, comparison and analysis with other data. It also allows us to share and make a social event of habits that previously went unnoticed or were part of our private life, such as ovulation cycles, sleep patterns or diets.
It is also a phenomenon that is clearly impacting the medical and research fields, allowing correlation relationships to be more easily established.
An example of both points is the COVID-19 apps.
micro [practices]: What (labor, reproductive, communicative) practices constitute and are animated by the phenomena you study?
This is a rather large and difficult question to answer in a short space. The phenomenon of Big Data is big enough to be constituted by or animate very diverse practices. Therefore, I am going to use this space to emphasize the practices that seem most relevant to my research, empirically located in a data laboratory of an international company (Telefonica); the first in Europe and the fifth worldwide.
The manipulation of digital data by technicians requires a series of work practices (they are salaried workers in a company), but it also allows us to talk about care practices. According to some authors, a large percentage of data scientists' working time is spent on tasks involving cleaning, storage and sorting data. That is, making sure that the data have the necessary characteristics to be grouped in a table and crossed later for analysis (i.e. that there are no erroneous or false data), but also worrying that, with use, the data remain useful and functional, that they continue to be accessible, that they can be repurposed, etc. These tasks show that data enters into cycles of repair, decay, construction, reconstruction or growth.
Also, as Pink et al. (2018) have said before, “[a] focus on how data is made and broken highlights the collaboration between people and infrastructures: people’s lives and work become entangled with data production (Berson, 2015). Such entanglements reveal themselves when we familiarize ourselves with data worlds. This familiarization can involve collaborations with the custodians of data, ‘geeks and quants’ (Bell, 2015: 25), or following how data becomes appropriated into the everyday, is valued and converted into forms of value (Fiore-Gartland and Neff, 2015; Ruckenstein, 2014).”.
nano [language, subjectivity]: What kinds of subjects are produced by and imbricated in the phenomena you study?
Subjects of this phenomenon are, broadly speaking: users (data producers), digital activists, NGOs, technicians (work force) and owners (of data, means of production and final product). But all of these categories are quite big. We should keep in mind the diversity and heterogeneity of these subjects.
When we talk about users, we talk about the general public that use or access digital platforms and, by doing so, produce digital data. I believe it is worth spending a line or two in pointing out how the legalist discourse has produced an idea of this public of users as separate individuals who own their data. This also indicates data as something that can be owned, and something that can be private and individual. This, I believe, is one of the keys for thinking of data. Some authors and activists have pointed out as naïve this way of thinking of data as something intrapersonal when most of our lives and our experiences in digital platforms consists on interpersonal relationships.
Also, I would like to point out to the invisibilization of most of the labor of technicians, which I believe I can state is blackboxed. As much as we have an algorithmic imaginary and are able to talk freely about algorithms, technicians tend to be ignored. Thus, it is ignored the part their role in categorizing, organizing, keeping, cleaning, etc. our data.
edxo [education and expertise]: What modes of expertise and education are imbricated in the phenomena you study?
I believe it wouldn’t be wrong to say that Big Data is quite an interdisciplinary phenomenon. This is even though most of the imaginaries of Big Data, and media’s discourse tend to link it to hardcore computation, coding and mathematics/statistics. It has been stated that, due to the interdisciplinarity that characterizes Big Data, it is quite difficult to form or educate experts in the field.
This kind of thought is linked to the idea that Big Data produces the most objective, neutral, precise information —as it has been stated of other new methods before, only to learn that no such a thing exists— and has to do with an emphasis on method. The whole technical process of dealing with digital data is full of decision-making, even when it seems like it is not. Categorization, cleaning, algorithm-design, algorithm-evaluation, data visualization and analysis, to name a few, are complex processes that are fully imbricated with conceptions and preconceptions of how the world/the society is and how it works. That is one of the main reasons why social sciences are absolutely relevant in this field.
data [data infrastructure]: What data, infrastructure, analytic and visualization capabilities account for and animate the phenomena you study?
I work primarily in qualitative observation and analysis, but I am currently learning some programming languages such as SQL or Python. Learning this programming languages allows me to better understand technicians and their work, as well as the logic behind their labor. I probably will have to work with data sets, documents and visualizations of data produced in the lab. Also, the group I want to study is quite active on social media, so I might be compelled to do virtual ethnography.
techno [roads, transport]: What technical conditions produce and delimit the phenomena you study?
As far as I know, the work of Big Data technicians is quite an individual job that is done always in the presence of one or multiple computers. Also, data is also usually located on the cloud. So, roads or transport don’t really limit Big Data—which can be considered one of its strengths.
But precisely because it needs powerful computers and supercomputers, internet usage, cloud access and storage, the interconnexion of thousands of devices, etc., energy consumption does definitely delimit and affect its existence. Data is captured, stored and analyzed digitally, and that usually means that it needs one or various servers that are not necessarily located next to the lab or the company. All this colossal infrastructure needs and consumes a lot of energy and resources. Big Data itself is a product of the development of computational capacities— which have been increasingly energy-consuming.
Also, because of the digital storage and access to data, one of the main concerns that limit this phenomenon is cybersecurity.
eco-atmo [ecology, climate]: What ecological and climatic conditions situate the phenomena you study?
I am not aware of any ecological or climatic conditions that situate the phenomenon of Big Data other than the stated in the next question.
Geo [earth systems]: What geological formations, contaminations, resources and scarcities ground the phenomena?
Big Data has a material infrastructure that is usually overlooked. It has been proved that this material infrastructure pollutes. As Pierson and Lefevre (2015) had pointed out before: “terminal equipment, networks, and data centers […] each consume similar amounts of electrical power, on the order of 40 gigawatts in 2013, which is equivalent of about forty nuclear units. This figure obviously has repercussions on the climate, even if this carbon footprint depends on the energy mix of the user country (34 grams of CO2 per kwh in France in February).”