CfP: The web: source and archive

Call for papers, deadline 31 August 2022

This international conference proposes to question the place of sources from the web in the scientific field and to situate web archiving practices in plural scientific approaches and questions.

Founded by the GIS CollEx-Persée, the project is undertaken by the University of Lille and the National Library of France, in partnership with the GERiiCO of the University of Lille, Sciences Po and the Condorcet Campus, it brings libraries and research teams together to think, experiment and share practices related to web archives. The main goal is to bring the producers and the users of the web archive collection closer together, with the help and the mediation of academic libraries. In this perspective, the international symposium organized by the scientific committee led by Laurence Favier (Pr. University of Lille, GERiiCO), Emmanuelle Bermès (Curator, PhD, Deputy director for services and networks – Bibliothèque nationale de France) and Madeleine Géroudet (Curator, Head of research support Departement - University of Lille Library) wishes to develop exchanges of knowledge between STI professionals and scientific teams by engaging the international academic community to participate in its work.

Organized at LILLIAD (Learning Center Innovation of the University of Lille) on the Campus of the Faculty of Science and Technology in Villeneuve D’Ascq, the event will take place over 2 and a half days, on April 3, 4 and 5, 2023.

The work of these days will be published.

Scientific positioning

It is only if it is the object of archiving practices that the web becomes a corpus of research (Beaudouin, Pehlivan, 2016 : 15). As a medium of ephemeral inscription of signs, codes and data that circulate through digital networks, the web as scientific material questions researchers as a source or research field and questions information and documentation professionals as an archive.

From 2013 to 2016, the project “Le devenir en ligne du patrimoine numérisé : l’exemple de la Grande Guerre” (The online future of digitized heritage : the example of the Great War) was the first to use a collection from the Internet archives to apply global analysis methods. A corpus of sites was then specifically delimited to proceed with the collection that would allow for the analysis of amateur practices around digitized heritage content made available by libraries. In the spring of 2022, the Datasprint organized in the framework of the Respadon project confirmed that the mobilization of web archives could be complementary to the study of the living web. The exploration of the web as a source or as a field of study is now essential to many disciplines : contemporary history, political science, sociology, history of science, information and communication science. And because it allows one to make the history of an academic field and to inscribe one’s work in a tradition, the web has become a necessary source for many scientific approaches.

However, the scriptural infrastructures (Denis, 2018) that allow the production of data and its circulation through the web are paradoxically fragile and the contents volatile. This is why cultural and heritage institutions such as the Bibliothèque nationale de France (BnF) and the National Audiovisual Institute (INA) took an early interest in Internet archiving.

As early as 1996, the Internet Archive began to build an international archive of the Web. In France, the law known as “DADVSI” of 2006 (Copyright and Neighboring Rights in the Information Society) extends legal deposit to « signs, signals, writings, images, sounds or messages of any kind communicated to the public by electronic means ». It contributes to defining the French web space as a national heritage representative of the editorial production of its time (Bermès, 2019 ; Ilien et al., 2011).

Many other countries have adopted a similar legal framework and are united in the International Internet Preservation Consortium (IIPC). In parallel, researchers who have started to study web archives as an object have structured themselves into networks with WARCnet and RESAW (Schafer, Musiani, Borelli, 2016 ; Brügger, 2018).

In line with the objectives of the Respadon project (Network of partners for the exploration and analysis of digital data), which consists in making web archives more accessible to researchers from many disciplines, this conference proposes to question the scientific practices where the web can be seized as a source and as an archive.

This event proposes to bring together both concrete examples of the scientific uses of web archives in relation to their role as source and field, and heuristic reflections on the nature of these archives and the ways in which they are preserved and made available by heritage and documentary institutions.

Topic Proposals

The expected papers will focus on the following themes related to the 3 main axes of the conference, described below :

  • Presentations of scientific projects using web corpora and digital humanities
  • Presentations of feedback from research projects using the web as a source : obstacles encountered, success stories
  • Reflections on professional and academic practices involving the collection and preservation of web-based data
  • Presentation of experiments and devices promoting access to digital archives and web corpora
  • Methodological and epistemological reflections on the needs of accessing and preserving online data in different disciplines.

Axis 1 - The web at the intersection of memory and knowledge : epistemological issues

The web as a scientific field implies to consider the material available online as a deposit of traces of the collective memory. How to elaborate a scientific discourse from an essentially volatile source ? What kind of memory is involved when it is “distributed” on the web ? What is the status of web sources in the scientific practices that elaborate a new knowledge and contribute to the emergence of sciences henceforth assisted by the digital ? The aim here is to collectively question the methods, modalities and status of the results produced when online data come into play in research, whatever the discipline concerned (contemporary history, political science, sociology, history of science, media studies, computer science...). Approaches and proposals that illustrate scientific work using the Web and its traces as well as thoughts on the nature of archives and data used in contemporary scientific research will be appreciated.

Axis 2 - Archival policies, practices, techniques and web archives : from document to corpus

For several years, the professional community has developed technical definitions of the digital archive and associated devices. This practice questions the notion of “archive places” and the preservation of corpora : the service offer around web archives must allow the emergence of the working field with researchers, taking into account various questioning :

  • In terms of informational sovereignty, how does the diversity of existing private and public initiatives materialize in the policies for building archives ?
  • In terms of access to archives, what definition of granularity, unity of meaning, and coherence of the archival collection is needed to make web archives usable for research ?
  • How to develop efficient scientific practices while respecting the legal framework and the constraints related to the notion of legal deposit ? How can we make this framework evolve to promote research ?

To shed light on these issues, we could consider mobilizing the history of archiving or a comparative approach to international frameworks.

Axis 3 - Relations between technical devices and scientific data : the networked web archive

A prospective approach to the study of web sources implies considering them not in isolation, but in relation to the different types of data and sources mobilized in the framework of the scientific process. The methodological approaches that mobilize digital materials are plural. Trace analysis, network analysis, and data visualization (quantitative approaches) can be complemented by qualitative approaches in digital ethnography that seek to question the relationship between traces and users. In connection with the issues of research data, the practices of constitution of corpora by researchers can be based on archival know-how : life cycle of data, editorialization and documentation of sources, preservation of technical devices related to data, preservation of research data. We will focus here on the current legal and technical limits to the constitution of the web archive considered as an incomplete archive : the archiving of some object-limits involving technological interactivity (e.g. online video games, social networks, software) leads to question the documentary nature of the web. On the other hand, digital archiving technologies can be mobilized to collect and preserve documentary objects accessible online that are not usually considered as web archives (daily press, scientific articles). This axis will therefore focus on all the issues, methods and reflections that allow us to consider the web as a source and archive that is not isolated, but linked with other corpora and collections.

Evaluation procedure and participation

Anonymized contributions will be double-blind evaluated.

To participate and submit a contribution :

Provisional timetable

  • Distribution of the call for papers : May 2022
  • Receipt of proposals : August 31th, 2022:

– Summaries for plenary conferences (4500 characters max., bibliography included)

– round tables (abstracts et short présentations) : 1 page, 1500 characters

  • Feedback from the scientific committee to the authors, acceptance of proposals : End of November 2022
  • Receipt of complete articles (25 000 à 35 000 characters, spaces included) and short presentations : Mid-january 2023
  • Conference : April 3, 4 and 5 2023

Scientific Committee

  • Eléonore Alquier (INA, Dir. Adj. Data et technologies)
  • Olivier Baude (Université Paris Nanterre, Modyco, Pr., Dir. TGIR Huma-Num)
  • Emmanuelle Bermès (Adj. pour les questions scientifiques et techniques auprès du Directeur des services et des réseaux, BnF)
  • Niels Brügger (Pr. Aarhus University, Pr., Head of WARCNet)
  • Dominique Cardon (Sciences Po, Pr., Dir. scientifique du medialab)
  • Marie Cornu (ISP, Dir. Recherche CNRS)
  • Laurence Favier (GERiiCO, Université de Lille, Pr., Dir. Département de Sciences de l’information et du document)
  • Madeleine Géroudet (SCD, Université de Lille, Rsp. du Département Services à la recherche et aux chercheurs)
  • Abigail Grotke (Library of Congress, Ass. Head, Digital Content Management Section, IIPC’s chair)
  • Ian Milligan (University of Waterloo, Department of History, AP, Unleashed Archives Project)
  • Laurent Romary (INRIA de Paris, Dir. de recherche, Dir. Culture)
  • Philippe Useille (Univ. Polytechnique Hauts-de-France, Institut Sociétés et Humanités - ISH / Laboratoire de Recherche Sociétés & Humanités - LaRSH, MCF, responsable scientifique du pôle Humanités Numériques, MESHS Lille-Nord de France).

More informations


Emmanuelle BERMES (2019) « Quand le dépôt légal devient numérique : épistémologie d’un nouvel objet patrimonial », Quaderni [En ligne], 98 | Hiver 2018-2019, URL : ; DOI :

Niels BRÜGGER (2018) The archived web : Doing history in the digital age. MIT Press.

Valérie GAME, Gildas ILLIEN (2006) « Le Dépôt légal d’Internet à la Bibliothèque nationale de France », in Bulletin des bibliothèques de France (BBF), n° 3, p. 82-85. En ligne, URL : ISSN 1292-8399.

Illien, 2008

Valérie BEAUDOUIN, Philippe CHEVALLIER, Lionel MAUREL (2018) Le web français de la Grande Guerre. Réseaux amateurs et institutionnels. Presses universitaires de Paris Nanterre.

Valérie BEAUDOUIN, Zeynep PEHLIVAN (2016) « Cartographie de la Grande Guerre sur le Web : Rapport final ». Bibliothèque nationale de France ; Bibliothèque de documentation internationale contemporaine ; Télécom ParisTech.

Jérôme DENIS (2018) Le travail invisible des données. Éléments pour une sociologie des infrastructures scripturales, Presses des Mines, 208 p.

Gildas ILLIEN, Pascal SANZ, Sophie SEPETJAN, Peter STIRLING (2011) « La situation du dépôt légal de l’internet en France : retour sur cette nouvelle législation, sur sa mise en pratique depuis cinq ans, et perspectives pour le futur ». Actes du 77e congrès de la Fédération internationale des associations de bibliothécaires et d’institutions (IFLA), URL :

Emily MAEMURA (2021) « Data Here and There : Studying Web Archives Research Infrastructures in Danish and Canadian Settings ». University of Toronto, Faculty of Information, Doctoral Paper.

MAEMURA, Emily, BECKER, Christoph, et MILLIGAN, Ian (2016) « Understanding computational web archives research methods using research objects ». In : IEEE International Conference on Big Data, p. 3250-3259, DOI : 10.1109/BigData.2016.7840982

Frédéric MARTIN (2017), « Les archives de l’internet comme axe de coopération nationale ». In : Webcorpora, URL :

Francesca MUSIANI (ed.) (2019) Qu’est-ce qu’une archive du web ? OpenEdition Press, URL :

Jean-Charles PAJOU (2016) «  L’Observatoire du dépôt légal : un certain regard sur l’édition  », Bulletin des bibliothèques de France (BBF), n° 9, p. 134-144.

En ligne, URL : ISSN 1292-8399.

Nick RUEST, Jimmy LIN, Ian MILLIGAN, Samantha FRITZ (2020) « The Archives Unleashed Project : Technology, Process, and Community to Improve Scholarly Access to Web Archives ». IEEE/ACM Joint Conference on Digital Libraries, Wuhan, Chine.

En ligne, URL :

Valérie SCHAFER, Francesca MUSIANI, Marguerite BORELLI (2016) « Negotiating the web of the past ». French Journal for Media Research, La toile négociée/Negotiating the web, ?id =963

Peter STIRLING (2017) « Le dépôt légal de l’internet dans le projet Corpus ». In : Webcorpora, URL :