Dr. Anne Luther spoke with Professor Yanni Loukissas by phone to discuss his research focus on critical data studies and local readings of data collections. Yanni Loukissas is an assistant professor of digital media in the School of Literature, Media, and Communication at Georgia Tech, where he directs the Local Data Design Lab. He teaches courses in Digital Media, Computational Media, Human-Computer Interaction, and Science, Technology, and Society.
Anne Luther: I would like to talk to you about your research focus and the Local Data Design Lab. What is the lab, and do you have a current research project you could introduce?
Yanni Loukissas: The term ‘lab’ has to be understood kind of broadly. It is basically an umbrella under which I can work with both Masters and PhD students, although I’ve occasionally had some undergrads tied to projects. The lab is, as you said, part of the School of Literature, Media, and Communication, which is a place for the humanities and social sciences at Georgia Tech, but one that treats media design as an integral part of scholarship and inquiry. My group is interested in what might be called humanistic and social perspectives on data. We often pursue this work through design, including data visualizations. But we treat data not simply as evidence that can help us objectively understand some external subject, but as subjects of inquiry themselves. Why are data created in certain ways by one organization and in different ways by another? Why have procedures for generating and managing data changed over time? When used critically, visualization allows us to turn our view back onto the data themselves. We often work in dialogue with the people who make data—whether at an institution like the Arnold Arboretum, where staff make and manage data about plant life, or within grassroots organizations like the Housing Justice League, where data are being used to contest inequitable development in the city of Atlanta. I see these as not only design projects but also data studies projects. At the end of the day, they seek to reveal something about data cultures and how they are changing.
Our work is done in the shadow of big data. I use that term to mean not just data that meet certain technical requirements in terms of their magnitude, but a whole rhetoric and culture emerging around data sets that are larger than those we’ve seen in the past—data sets that present new problems and new challenges for those who seek to understand them. Many of the data sets we work with are big, but only because they bring together smaller data sets from lots of different sources. In our work with metadata from the Digital Public Library of America and housing data from Zillow.com, the actual sources are distributed across the country. In the case of the arboretum, the data are drawn together from across time, stretching back almost 150 years. And really, even though it is one organization, there have been many different regimes of data collection over that time period. The heterogeneity of these data sets allows us to develop comparative perspectives on data and the cultures that produce them.
A.L.: I’m really fascinated by that, using the mixed-method approach in these research projects to integrate a qualitative research approach into researching big data and the visualization of quantitative data. In these projects, the boundary between those two research approaches completely seem to disappear. I would like to talk more about the idea of visualizing data from a social perspective. How can we bring an understanding of data as socially constructed to the field of data analysis, and how can we make data visual from a social perspective?
Y.L.: I think there are two ways we can advance a social perspective on data. One is to simply change the way we talk about data. Recently, I have been thinking a lot about the phrase “data set.” The word “set” suggests a closed, discrete, mobile collection of things that are independent of their place of origin or the particular instruments that we use to manage them. This strikes me as a problematic way of talking about data. Instead of data sets, we might talk in terms of “data settings.” After all, data are local. They are made by people and their dutiful machines at a time and in a place, using specific kinds of instruments, within existing organizations, and often for audiences that are conditioned to receive them. Data are very much grounded, and we need to understand their context. Talking about data in terms of their settings suggests that data are part of larger social and technical systems of people, instruments, and audiences. It would help us acknowledge that if we are moving data from one place to another, some kind of translation work will be necessary. We also have to think about what has to come along with the data. Do certain experts need to travel with the data? Do we need certain kinds of instruments to understand those data? These days, everything you need is too often just assumed to fit in a README file. People who work with data on a regular basis know that this is not really good enough. Making data visual from a social perspective means being in dialogue with those who live with the data and can act as guides.
The other thing we can do is try to develop visualization methods that acknowledge data settings, particularly when data are taken out of context. Often this means broadening the work of visualization to include other media, to include other kinds of annotations or textual accompaniments that put data back into a broader context. And often I think about other artistic genres that do this already. Documentary film, I think, is a nice model of an interpretive and reflexive way of handling evidence, which is essentially what data are. Contemporary documentaries consider the role of the filmmaker, and how she might be part of the film. Michael Moore, for instance, makes his documentaries all about the process, revealing the obstacles, frustrations and failures along the way. Another interesting genre that can offer something to data visualization is the biography. Catherine D’Ignazio introduced me to the term “data biography.” What is the social life of a data set? Who made it, and who is making use of it? When we make visualizations, how do we think of that as the story of the data setting as opposed to just “letting the data speak for themselves,” which is a phrase I often hear from technical people working in visualization.
A.L.: I’m thinking about your background in architecture and I read that place is a topic that is important to you. I’m curious if an outsider perspective to traditional computer sciences or design or other types of technical disciplines informs your curiosity for that cultural, social, and ethnographic research.
Y.L.: I try to make my research deliberately interdisciplinary, even though I think disciplines themselves are important and need boundaries. They have significant histories grounded in certain discourses, instruments, and places. But disciplines can talk to one another. You can move through them and combine them in different ways as long as you acknowledge their respective histories. I certainly bring certain practices and values intact from architecture. For example, I approach drawing is a way of thinking, rather than simply as a means of externalization or communication. When I use visualization it’s not just a presentation technique, but a way of thinking through data. I also bring from architecture a concern for place, of course, and a sense of what it means to design for a local context. Most buildings are one-off designs that are tailored or customized for a particular location, time, and set of material resources. I am not necessarily interested in generalized tools. I am much more interested in ad hoc approaches and improvisational tools. I’m also often drawn to data sets that have an important geographic or space-related dimension that can help me think about space and place in new ways.
The other major field that contributed to my own sensibility is STS (science, technology, and society), which I studied during my time at MIT. That field is itself interdisciplinary: composed of anthropologists, sociologists, historians and increasingly artists and designers who are interested in the social and political implications of science and technology. My ethnographic interest in data comes from working in that field and training with people like Sherry Turkle and David Mindell. They have both helped me to ground my understanding of data in human life worlds. In that sense, I think of my visualization work as a means of doing social studies of data.
A.L.: Have you ever encountered disciplinary boundaries where you thought that this kind of interdisciplinary approach cannot bridge different disciplines?
Y.L.: Yes, this happens all the time. I’m constantly encountering new disciplinary languages and opaque ways of working that I have to grapple with. I see that both on the technical side, when I work with people and information visualization, and on the social science and humanities side, where people may have different expectations for what scholarship looks like. They may be working from a different lexicon or set of references. My approach may seem strange to them. But these differences are productive; if we didn’t have differences, we wouldn’t have anything to discuss. I think that preserving disciplinary distinctions is a means of acknowledging and protecting important epistemological histories and trajectories. If we erased all disciplinary boundaries, the trendiest and most business-friendly disciplines would take over the university the next day, because they have the most money, students, and space. The arts, humanities and social sciences are historically important ways of thinking and doing that need to be protected from market forces. They provide counter-narratives necessary to keep society from being corrupted by the profit motive.
A.L.: It kind of goes back to what you were saying before, because working with specific methodologies or with specific software already tells you something about the discipline or the locality where this research is coming from. Therefore, even having these disciplinary boundaries is a first step toward thinking about the research projects or where the questions are coming from. One project of yours that really fascinates me is The Life and Death of Data. Can you speak about the data visualization and the multi-format exploration that you developed online? How would you describe this format and why you chose it?
The visualization (http://lifeanddeathofdata.org) is made using accessions data from the Arnold Arboretum, Harvard’s “living collection” of trees, vines, and shrubs, which dates back more than 140 years. These plant records include details such as Latin and common names, dates of accession to the collection, and provenance or sources of origin. All this is incorporated into a timeline, with the y-axis representing the year of collection and the x-axis representing the days of the year. Each line is comparable to a tree ring in that it registers what happened at the Arboretum in the course of a year of the institution’s life. I wanted to create an institutional portrait, which would let you read the history of the arboretum year by year. As you scroll over the timeline, you get these interactive section cuts through different years. Each cut tells the story of that year and what was collected. The text adds context about why and how the staff were collecting specific kinds of plants and how their collecting practices changed over time. It points to specific places in the history of accession. It also connects directly to the visualization at several moments, so as you scroll through the text the visualization changes to correspond with what the text is describing. Sometimes the text illuminates important fields in the data, like provenance type, which indicates how the plant was collected: either from the wild, from a nursery, or if the provenance is unknown, why that is the case. Other times, the text calls for the reader to look at data by individual collectors, by species, or by country of origin. There are a lot of different ways to take the data set apart and the text helps you do that. It also makes an argument for how we might use this kind of institutional portraiture to study collections in new ways, specifically using digital technologies.
A.L.: It is great to see that the project shows the data in a visual way without dismissing a textual understanding of the data. The interviews that you conducted with people who actually work in these places with this data, who really understand it from an insider perspective, were also an interesting part of the project to explore. It brings a holistic understanding of the data set but still grants you your own curiosity to take it apart with your own exploration and approach. What might be a future project that relates to these projects?
Y.L.: I am currently working on a book project with MIT Press called “All Data are Local,” which will bring together a number of my own critical and creative studies of data, including work on the Arboretum, the Digital Public Library, NewsScape (a news archive), and Zillow (a marketplace for housing data). Each of these projects involves data from a variety of different sources. I use these cases make an argument about how and why we should look at data as local, with attachments to places, people, times, instruments, and audiences. The book is a combination of text and visualization. I also try to lay out a series of guidelines that readers can follow to develop their own local and critical sensibilities towards data. Building on these case studies and rules of practice, the book introduces an agenda for what I call “critical data pedagogy”…teaching people to understand data in ways that are more humanistic, more culturally attuned, and more sensitive to local sites of production and use. How can we reorient the culture around big data, to challenge the perception of data as homogeneous sources of information. In the coming years, as big data becomes something that the public engages with in their everyday lives, we need to build that capacity. I think we have to do that by changing the way we talk about data but also by visualizing it in new ways.