From Clouds to Ground: Rethinking Data for AI Through Planetary Justice

Presented at “Data for Public Goals: The Social Impact of Data-Driven Approaches” | Osservatori Digital Innovation, Politecnico di Milano, June 9, 2025

As we continue to expand the use of data to inform public policy and services, it becomes increasingly important to ask not just how we use data, but for whom and at what cost. At the recent Data for Public Goals event organized by the Politecnico di Milano’s Osservatori Digital Innovation, we had the opportunity to share a different perspective–one that challenges dominant narratives of data neutrality and technological progress.

Representing the AI + Planetary Justice Alliance, I introduced a framework for evaluating AI systems across their entire supply chain—from raw material extraction to end-of-life disposal. Our goal with the Framework is to make visible the often-overlooked social, ecological, and planetary justice impacts embedded in the life cycle of AI systems.

Artificial Intelligence is frequently framed as ethereal–“in the cloud,” detached from territory, matter, and life. Yet the systems we refer to as AI are profoundly material. They depend on global infrastructures made of mines, machines, energy flows, human labor, and political decisions. The reality of AI is not virtual; it is rooted in lands, bodies, water, ecosystems, minerals, and energy.

Behind every seemingly neutral model lies a chain of extraction, production, and invisibilized labor: the mining of rare earth minerals, the manufacturing of microchips and servers, the harvesting of massive datasets, the annotation and moderation of content by precarious workers, and finally, the deployment and disposal of digital systems. This chain is not metaphorical. It leaves traces–geological, ecological, social–and its consequences are unequally distributed.

At the Data for Public Goals event, I invited the audience to focus on this chain–the AI supply chain–and to understand it not as a backdrop but as central to any conversation about the responsible use of data. I concentrated on two key phases: model training and model deployment. These are the moments when data is extracted, processed, and made operational, often with little regard for the impacts on those who generate, host, or are subjected to it.

Training the Model: Extraction in Disguise

To train AI models, vast datasets are compiled–often comprising billions of elements such as images, text, audio, and video. Most of this content is scraped from online platforms, without the knowledge or consent of those who created it. Accessibility is conflated with permission, and technical capacity is used to justify ethical silence.

A striking example is the LAION-5B dataset, one of the largest open-source collections of images and captions used to train generative models like Stable Diffusion. Despite being described as a resource for researchers, LAION-5B was assembled through automated scraping and included private photos, medical records, copyrighted artworks, and images of minors. Its existence reveals the porous line between what is publicly visible/accessible and what is ethically and justly usable. It also highlights a deeper issue: decisions about what constitutes legitimate data are rarely made democratically. They reflect existing hierarchies–those who can object, and those who cannot.

Moreover, the process of preparing these datasets depends on human labor that is largely hidden. Before data can be used, it must be cleaned, labeled, translated, and sorted–a process known as data annotation. This labor is frequently outsourced to workers in the Global South through gig platforms such as Amazon Mechanical Turk, Appen, or Scale AI. Tasks include tagging emotions in speech, identifying inappropriate images, or moderating violent content.

In 2023, Time Magazine uncovered that OpenAI outsourced content moderation to a subcontractor in Kenya. Workers were exposed to horrific material–scenes of sexual violence, racism, and torture–while earning less than two dollars an hour. Many suffered long-term psychological effects. Yet their contributions are nowhere visible in the end products they helped build.

This is what some call data work–a form of invisible labor embedded in AI systems, sanitized away by polished interfaces and product narratives. It is a mode of cognitive and affective extractivism, where human judgment and emotional resilience are commodified under conditions of exploitation, with no recognition, redistribution, or repair. The very systems we label as “intelligent” are scaffolded on the labor of those excluded from the benefits they produce.

The Environmental Weight of Intelligence

It is not only human bodies that are exploited. Data work demands large-scale infrastructures that consume significant energy and water.

Studies like Strubell et al. have shown that training a single large NLP model can emit as much carbon dioxide as an average car over its entire lifecycle. And that estimate is now outdated–contemporary models are orders of magnitude larger and more demanding. At the same time, the cooling of data centers–critical to prevent overheating–relies heavily on water. Google, for example, plans a data center in Mesa, Arizona that could consume up to 1 million gallons per day, with potential capacity to use as much as 4 million gallons daily if fully built out. Local residents and officials raised serious concerns about the project’s environmental impact and long-term water use, with some city council members and community leaders criticizing the lack of transparency and consultation. Mesa’s Vice Mayor, Jenn Duff, raised concerns during council discussions, warning that such high levels of water consumption could jeopardize the region's future: "I have very serious concerns about our water in Arizona," she said, as reported by Data Centre Dynamics.

Similarly, in Pima County, Arizona, a new AI-focused data center–referred to as “Project Blue”–was approved in June 2025. The center is expected to consume between 1 and 5 million gallons of water per day. The project sparked strong local opposition, with District 3 Supervisor Jennifer Allen voting against the land deal, arguing that the decision failed to prioritize the well-being of future generations in an already water-stressed desert region.

This lack of transparency is widespread. Bloomberg recently estimated that two-thirds of US data centers are located in areas experiencing high water stress. Yet major tech companies frequently withhold data on water consumption under the guise of commercial confidentiality. These are not neutral siting decisions–they reflect and reinforce asymmetries of power, where environmental burdens are shifted onto territories with limited voice or visibility in global governance.

Source: Bloomberg

The ecological footprint of AI is not an externality–rather, it seems to be constitutive of its operation. Every byte of data processed implies land use, electricity, mining, and water withdrawal. To talk about “data for public goals” without addressing this chain is to ignore the planetary infrastructures that make AI possible, and consequently the material implications of its development and deployment.

Deployment: Governance Without Consent

Once trained, AI systems are integrated into systems of decision-making that shape everyday life. From credit scores and welfare eligibility to predictive policing and immigration controls, algorithms are being used to determine access, risk, and legitimacy. This integration is rarely accompanied by transparency, and even less by accountability.

In the U.S., the COMPAS system used to assess recidivism risk was found to overestimate risk for Black defendants and underestimate it for white ones. These biases were not glitches, but reflections of historical data encoded as neutral facts. In the Netherlands, the SyRI system flagged individuals for potential welfare fraud by correlating administrative data. It operated in secret until a 2020 court decision ruled it unconstitutional, citing violations of fundamental rights and the disproportionate impact on marginalized communities.

These cases are not anomalies. They reveal a structural pattern in how AI is deployed–without community involvement, often in contexts of vulnerability, and with little possibility of contestation or resistance. The logic of automation is presented as efficient, objective, and inevitable. Yet it restructures the terms of belonging and recognition, redistributing not only risk, but also visibility and voice.

The right to understand how automated systems make decisions, and the right to refuse their application in sensitive domains, are foundational to any democratic governance of AI. Their absence is not merely a technical oversight–it is a reconfiguration of power.

A Planetary Justice Approach

To truly address the social and ecological implications of AI, we need frameworks that move beyond technical or ethical lenses. Planetary justice offers such a frame. Emerging from environmental justice debates, it expands our ethical horizon to include not only human rights, but also the rights of ecosystems and more-than-human life.

It also allows us to connect domains often treated separately: labor exploitation, data extraction, carbon emissions, and knowledge hierarchies. These are not discrete problems: they are interdependent outcomes of a model of innovation that prioritizes scale, speed, and prediction–often at the expense of care, reciprocity, and sustainability.

Planetarity, as theorized by Gayatri Spivak, also cautions against universalizing technologies. It reminds us that there is no single vision of intelligence or progress. Data is never raw–it is embedded in worldviews, structured by power, and shaped by histories of inclusion and erasure.

Applying this lens to AI means asking different questions: What relationships are enabled–or severed–by datafication? Whose knowledges are centered, and whose are ignored? What forms of life become legible, and what forms of resistance remain illegible?

Rethinking What Counts as Impact

If we are serious about developing AI systems that serve the public good, we may want to interrogate not just how they perform, but what kind of world they help (re)produce. That means:

  • Recognizing data as relational and often collective, rather than merely extractable. This calls for forms of governance that include community deliberation, collective consent, the right to negotiate the terms of data use, and the freedom to question the need for AI in the first place.

  • Evaluating AI not only through accuracy and speed, but also through its labor, environmental, and epistemic costs. We need impact assessments that span the entire lifecycle–from raw material extraction to device disposal.

  • Questioning the drive toward automation as a default means of achieving “public goals”. It is worth asking, who decides what “public goals” are? Whose goals are they, in practice? What’s the price of achieving them, and who pays that price? Sometimes, AI might not be the best means we have at our disposal. Sometimes, AI might not be the point.

These are provocations–deliberately unsettling to the dominant narratives that portray AI as inevitable, neutral, and universally beneficial. They are intended to interrupt the comfort of “responsible” or “ethical” AI discourse, which too often asks how to reduce harm within predefined parameters, rather than questioning the parameters themselves.

Too much of the current conversation remains trapped in the vocabulary of optimization: how can we make AI more accurate, less biased, more explainable, more energy efficient? These are not unimportant questions–but they presume that the ultimate goal of AI is already settled. That it is inherently good. That its expansion is non-negotiable.

What if, instead, we opened the conversation to a broader, more radical horizon?

Rather than asking how to make AI better, we may want to ask:

What kind of society do we want AI to enable, constrain, or co-create?

What forms of life, knowledge, and relation do we want to prioritize?

Whose voices should shape these systems–not just as “users” or “stakeholders,” but as political agents with the right to refuse, to imagine alternatives, to set conditions? To decide what they want and need?

This shift–from betterment to world-making–is uncomfortable because it challenges the foundational assumptions of current AI development. It demands that we look beyond the machine to the social and ecological orders in which it is embedded. It asks us to confront the deep asymmetries–of voice, of power, of resource–that AI technologies currently reproduce and intensify.

To embrace this discomfort is not to reject technology, but to reclaim the political. It is to insist that AI, like any sociotechnical system, is not above contestation. It is not a destiny–it is a choice. And that choice must be grounded not in abstract ideals, but in concrete, situated struggles for justice–planetary, epistemic, ecological, and social.


Next
Next

A Degrowth Perspective on AI: Reimagining Our Digital Futures