A vast number of Digital Humanities projects have emerged in the last few decades that digitized museum collections, archives, and libraries and made new data types and sources possible. Interdisciplinary labs such as the medialab at Sciences Po focus on the development of new tools that cater to methods and research in the Humanities. They especially address the regressive gap that has become evident in the current tools needed to analyze, comprehend, and present these new digital sources. Provenance and translocation of cultural assets and the social, cultural, and economic mechanisms underlying the circulation of art is an emerging field of scholarship encompassing all humanities disciplines.
‘With the migration of cultural materials into networked environments, questions regarding the production, availability, validity, and stewardship of these materials present new challenges and opportunities for humanists in contrast with most traditional forms of scholarship, digital approaches are conspicuously collaborative and generative, even as they remain grounded in the traditions of humanistic inquiry, this changes the culture of humanities work as well as the questions that can be asked of the materials and objects that comprise the humanistic corpus.’
Internationally researchers concerned with the ‘social, cultural, and economic mechanisms underlying the circulation of art’ work with object-based databases that describe, document and store information about objects, operations, movements and provenance of objects. The scholarship working with those databases is grounded foremost in disciplines of the Humanities and Social Sciences: Sociology of Art studying the social worlds that construct a discourse of art and aesthetics; closely related to it the Social History of Art concerned with the social contribution to the appearance of certain art forms and practices; Economics and Art Business exploring global economic flow and the influence of wealth and management strategies on art; Art Theory developing discourse on concepts relating to a philosophy of art and therefore closely related to curating and art criticism; and Art History with research in provenance, image analysis and museum studies.
Although established methods for collecting, analysing and storing of various data sources differ with the methodological approach each researcher takes, it can be said that the need for new computational tools that allow the analysis, storage, sharing and understanding of the new digital data sources are needed. Digital data collections can not be studied without tools that are appropriate for the analysis of digital textual, pictorial and numeric databases. Borgman (2015) illustrates the current discourse on Big Data/Little Data and the associated methodological approaches in current research communities and identifies much like Anne Burdick (2012) “there exists not one method, but many” for digital data analysis in the Humanities.
The research project uses the infrastructure of digital databases produced with methods in the DH but will advance a primary focus in computer-aided research with the development of current web based software development standards. Only in the past 7 years researchers started to develop “standardized, native application programming interfaces (APIs) for providing graphics, animation, multimedia, and other advanced features.” This standardization allows researchers to develop tools and applications that can be used in multiple different browsers and on different devices. This is to show that the digitization of object based collections (museum collections, libraries and archives etc.) has an established history but nevertheless the development of tools to access, display and analyse these digitized collections is just at the beginning of current possibilities due to the development of web standards across browsers and devices. The research group will bring this standard to the work with digital museum collections and in general with digital object based collection.
Museums produce phenomenal amounts of data.
They do so by tradition, research about the provenance, narrative, material, production and cultural embeddedness are attached to every piece in a museum collection. By digitizing we do not only consider pictorial surrogates for the works or economic value (acquisition prize, insurance value and other administrative costs that keep a work of art in a specific physical condition) but rather, researchers have produced knowledge, a history and institutional contextualization about works of art in institutions. The mode of shareability of knowledge and access to a corpus of information about works of art in a museum context is, at the moment, unsatisfying but inevitable.
Another development is the creation of a data capitalism that we find in a current museum context especially through applications such as Google’s Art and Culture face detection software in which the private company asks their users to trade a face match with museum portraits for their user data and facial images which is believe to be used to “train and improve the quality of their facial recognition AI technology.” In contrast to this research projects in the Humanities have the aim to find and produce solutions for accessibility to information dissemination of museum collections that work based on the research, methods and data structures of the institution.
But, the advent of digitization and digital modes of exhibitions have only exacerbated the possible facets of the work of art. Researchers from all disciplines in the humanities analyse and explore object based databases with a variety of quantitative and qualitative methods and data analysis tools. The proposed project aims to produce new digital tools for interdisciplinary and mixed method research in the digital humanities, working with extensive data sets that allow the inclusion of a variety of methods, data types and analysis tools.
In its preliminary phases, the project between three institutions is interested in exploiting databases of cultural material to pave the way for new research tools in contemporary art history. The collaborating institutions, The Médialab, Sciences Po, Paris Translocation Cluster at the Technische Universität Berlin and the Center for Data Arts, The New School, New York City will organize data sprints that started in Paris in Fall 2017.
The Data Sprints
This methodology in the overview of methods in the social sciences was borrowed from the development of free software and has been adapted to the new constraints weigh on researchers venturing into the world of digital data. It takes the form of a data sprint, a form of data-centered workshop designed to deliver a better understanding of datasets and conducive to formulate research questions based on their complex exploration. It is the interdisciplinary development of new digital tools for the humanities, a sequence mixing in a short duration, typically over one week: data mining, their (re) shaping and production of descriptive statistics and data visualizations.
“Data-sprints are intensive research and coding workshops where participants coming from different academic and non-academic backgrounds convene physically to work together on a set of data and research questions.”
The six phases of a data sprint are 1) Posing research questions; 2) Operationalizing research questions into feasible digital methods projects; 3) Procuring and preparing datasets; 4) Writing and adapting code; 5) Designing data visualizations and interface and 6) Eliciting engagement and co-production of knowledge.
History of the project
The Medialab, Sciences Po, in coordination with the Centre national des arts plastiques (CNAP) and videomuseum (Consortium of modern and contemporary public art collections), has already tested the possibility of exploiting a data management infrastructure to tease out research questions in art history and sociology of art during a data sprint organized in September 2016 in Paris. This workshop produced an initial study of the Goûts de l’État – the tastes of the State, for once a better rhyme in English than in French – by exploiting the rich documentation of the acquisition and circulation of contemporary art (about 83,956 works in its Fonds national d’art contemporain or FNAC, managed by the CNAP) by the French State since the revolution. The experience has been positive for both the art historians participating and for the CNAP cadres who have also learned quickly about many aspects of their database from the new formats and visualizations tried by the programmers and designers. Several research projects emerged as a result of this encounter between art historians, programmers and designers.
Data Sprint at the MediaLab at Sciences Po in November 2017
Since April 2017, the researchers involved in the project are working together to connect with museums as partnering institutions to the project, strategize data sprints, and generate a team of researchers that will be invited to the ongoing data sprints. The MediaLab at Sciences Po organized the first data sprint in November 2017 inviting over 25 participants including museum staff of the Centre Pompidou. The participants gathered for one week at Sciences Po and developed over 50 visualizations and two working software prototypes based on their preliminary research and on site data analysis.
Research Questions in the First Data Sprint included:
- Provenance and the translocation of cultural assets
- Modalities and temporality of acquisition
- Exhibition in the museum and circulation outside of the museum
- Artistic groups and collectives in the museum
- Social, cultural and historical embeddedness of the collection
Small working groups focused on at least on one of the following approaches:
Qualitative and quantitative methods are used on the same issue and with the same priorities. Quantitative methods give insight into a data set and qualitative methods are used for single cases. Both the qualitative approach can lead to research questions that can be utilized for structuring the quantitative analysis and vice versa. Machine learning and quantitative methods are practiced as forms of ‘distant reading’ that allow the qualitative researcher a ‘deep dive’ with an understanding of patterns of the entire dataset.
Machine Learning: Natural Language Processing
The method in natural language processing that will be focussed on in the workshop is ‘Named entity recognition (NER)’ to detect entities such as people, places or organizations in large text based data sets. NER will make it possible to map these entities according to geo locations, expressions of time or their category (attributes). After the first data sprint, a group of researchers begun to work on a forthcoming focus in image recognition as a tool for authentication in art history:
Machine Learning: Image recognition
In recent years the digitization of cultural collections (archives, museum collections, libraries etc.) has produced a massive amount of digital images. These surrogates for the material objects can be analyzed with qualities (shapes, color, theme, etc.) that the digital object has and can lead to categorization and the identification of patterns in large scale data sets.
The data that the partnering institution, Centre Pompidou in Paris, made available for the interdisciplinary team of researchers contains “over 100,000 works, the collections of the Centre Pompidou (musée national d’art moderne – centre de création industrielle) which make up one of the world’s leading references for art of the 20th and 21st centuries.”