3 squiggly worms mushroom
A glitched-out bird overlaid on a choatic field of plain text arrows and symbols. Scattered phrases such as 'of a cybernetic ecology' emerge from the background.

Image via AC Gillette. Poem fragment from Richard Brautigan's poem, "All Watched Over by Machines of Loving Grace." "Emerald toucanet" by wuestengia (CC BY 2.0)

You Are A Strange Animal

A Study in Intimate Datasets

AC Gillette

I don’t remember how I found the forum. Perhaps I had been directed from one of my usual haunts at the time, referred by another user who shared my interest. The internet of 2007 still felt like a serendipitious place, where you could stumble onto mysterious corners. These kinds of spaces were still popular in the days before ubiquitious social media. Back then, communities seemed to emerge like constellations in the spacious inky void of the net.

Like many forums, this one centered on a niche topic: Phillip Pullman’s book series His Dark Materials, the fantasy novel trilogy. The plot featured a world in which humans were born with their soul split from their body and manifested in the form of an animal companion called a daemon. As children the daemons would shift at will into many different forms, until settling into a fixed animal representation of the person’s true self. The focus of the forum was finding your daemon form, and by proxy, your true inner self.

No surprise that those active on the forum were those eager to plumb the depth of self and translate it into animal form. Many on the forum were like myself: shy and awkward teenagers, who were queer and neurodivergent in some way. Years before I came into my own identity and before it would enter mainstream conciousness, I was first exposed to trans identities, polyamorous relationship structures, and neurodivergent identities such as DID (Dissociative Identity Disorder). An at times painfully strange outsider in my own life I found for the first time a community I could immerse myself in, one where I could be myself openly. You can trace my adolescence through the animal forms I embodied during my active years on Forum.

It has a curious shy nature.

When I first joined I was an octopus, soon changing and settling into a fuzzy faced red panda. As my teenage years became more turbulent both in my interior and exterior life I began to favor animals small and hidden; a moth, an anole, a blue tongued skink, a praying mantis, and an underground blind salamander. By the time I was finishing high school I had transitioned to birds, a whooping crane transitioning finally into an emerald toucanet.

Much of the discourse revolved around animal forms, with a public sub forum completely devoted to mapping animal behaviors to human personality traits. Here forum members would request or post their own analysis of a specific species of animal, from animals as well known as the coyote to ones as obscure as the blue dragon sea slug. Members discussed whether it was possible for a daemon to settle as a mythical animal or human (common consensus was no, though there were fringe arguments for it), to mapping Myers Briggs types or Hogwarts houses to specific animals. Some of the more avid members developed analysis templates and standardized ways of interpreting animals. My contributions included an analysis on differing cetaceans, as well as a two part general analysis on bugs called So you think you’re a bug soul.

It’s the kind that only gets so very close to you when you’re completely fused to it.

What resulted was a corpus of information uniquely synthesized for our needs. Built and tended to by the community, it held both accumulated knowledge and a fingerprint of the community itself as it grew and evolved. It’s heterogeneous and diverse, encompassing differing interpretations, tones, emojis, and formatting. It is what I think happens when people tend to data like they would a shared garden.

It’s also what I think of as an intimate dataset: A set of data that is intensely personal, specific to a community need. It’s data that is grown from its own unique conditions on the internet, and that is handled with care and consideration by the people cultivating and using it. Like the physical world it is messy, dynamic, and adaptive, resistant to the decontextualization that so often happens on the internet.

It’s very much like your real world but more often than not you find yourself wandering around and exploring it.

Social media brought with it the rise of people as not only users, but as products themselves. Data is harvested and monetized, reduced to atomized labels with which to target us more efficiently with ads. The process in which this happens is intentionally opaque. We are alienated from our own data by design. The consequence of this can be seen in the atomization of community within the walled gardens of the internet. Influenced by algorithms designed to maximize attention for profit, we have become disempowered to make sense of or influence our own online identity and the content we see. People become drawn into wells of misinformation, controversy, and homogeneity, all for the sake of making it easier to sell our information to make it easier to sell products to us. The form of the centralized, privately owned internet community space is dependant on our inability to steward and cultivate how our information is stored and used.

It’s very hard to keep in touch with its strange ways of thinking.

Over a decade after I had first joined, I came back to the forum with the intention of archiving the collected animal analyses. To my delight it was still active. A new community was now tending to the small corner of the internet I had called home for those formative years. New ways of writing about animals had unfolded, new standards. It was, and still is, a living dataset, being made and remade every moment.

Sifting through the scraped text, I could see how changes in the forum were being reflected in changes in the material itself. There’s a transient quality often lacking in conventional datasets that demand homogeneity. The perspective shifted over time from second person to third. Trends shifted between intuitive snippets and insights to more rigorous and community backed interpretations. The xD and :3 emoticons so common in 2000s internet gave way to decorative text headers and embedded images.

At the same time I had also decided to participate in National Novel Generation Month, an event where people write code that generates 50,000 words of text. Interested in exploring the poetic potential of this data, I fed it into GPT-2, a text generation model. The output reflected the emerging plurality of the source. Animals would shift from you, to I, to their own type of people, speaking in the collective we. The line between animals and humans seemed to blur and cross each other, with animals becoming humans and humans taking on animal aspects.

It’s big and bold.

The landscape of machine learning is rife with vast, impersonal, decontextualized datasets. Little consideration is given to the original sourcing or context of the data used. GPT-2, the latest iteration of a text generation machine learning model was trained on eight million text documents scraped from the internet. Uprooted from its original context, anonymized and sanitized and rendered opaque, the data loses its sense of history and authorship. The only mention of this human labor in the paper on the COCO (Common Objects in Context), a dataset commonly used to train computer vision models, was that it made heavy use of Amazon Turk for the job of categorizing images. Behind every machine model is the unrecognized and unpaid or underpaid labor of real humans.

In New Mexico, where I went to college, the ubiquitous chile pepper so beloved across the state is uniquely synthesized for the specific conditions in which it’s grown. Its taste and cultivation cannot be seperated from the high desert soil from which it is grown. Take hatch chile seeds outside of New Mexico and it transforms into an Anaheim pepper. What would it mean to cultivate data in this way, to recognize that to take it out of its source is to change it? How can we find, grow and cultivate more intimate datasets, ones that are personal, just, and mindful of the context from which they came?

It’s like a rock being dropped into the water.

Subheaders were generating using curated output from a GPT-2 model.

A thank you to the users who contributed to this dataset.

A cartoon praying mantis dancing in front of a laptop.

AC is a technologist, writer, artist and peer counselor currently residing in the Bay Area. A non-exhaustive list of things they are interested in include small scale technology, radical mental health, mycology, queer ecology, textiles, and dreaming up new futures.