RAG4Reports 2026

Call for Paper

Submission on OpenReview

RAG4Reports are interested in various aspects of the Report Generation problem, which is a deep research task that research agents can be used to tackle it. The following is an incomplete list of the topics. Please reach out to us if you are unsure about whether your work is within the scope of this workshop.

Report generation and long-form RAG systems
Multilingual report generation or RAG datasets
Agentic deep research systems
Analysis of various components in report generation systems, such as multilingual retrieval, multilingual generative models.
Evaluation method for report generation

Submission Guidelines and Review Process

Papers can either be directly submitted to the RAG4Reports through OpenReview or committed through ARR. Submissions should use the ACL format with at most 8 pages excluding references. Reviewers will be asked to consider the paper’s length when writing their reviews.

Direct submissions will undergo a single-blinded peer-review process (i.e., reviewers will see the identity of the authors), and each paper will receive at least 3 reviews and one meta-review. For submissions proposing new resources, such as datasets or software, please provide access to the resource for the reviewers to assess its merit.

Each direct submission can choose between archival and non-archival after acceptance. If the authors decide to make the paper non-archival, we will provide a slot for presentation, but not publish the paper under our workshop. If the authors decide to make the submission archival, we will publish the paper with the ACL Anthology.

Important Dates

First call for papers: December 10, 2025
Second call for papers: January 15, 2026
Third call for papers: February 20, 2026
Paper submission deadline: March 5, 2026
Pre-reviewed ARR commitment deadline: March 24, 2026
Notification of acceptance: April 28, 2026
Camera-ready paper due: May 12, 2026
Workshop dates: July 2 or 3, 2026 (TBA)

Shared Tasks

Submission Portal on TIRA Data

Email Eugene Yang to join the RAG4Reports Slack Channel!

Task Description and Evaluation

RAG4Reports will host two tasks:

Automatic Report Evaluation
Multilingual Report Generation

Automatic Report Evaluation

We will provide system-generated reports from 2025 TREC RAGTIME submissions that have been judged by human annotators as the input for the shared task participants. The task is to provide a system ranking based on each report request (long-form query with a description of user background) as well as an overall ranking across all report requests. The submitted rankings will be evaluated on correlation to the ranking derived from human annotations. We will accept two types of submissions:

fully automatic evaluators: without any additional human inputs;
semi-automatic evaluators: with an additional input of human-curated essential facts (will be provided by the organizers) that should be included in a useful report

To study the effect of document languages on the evaluation, we will accept submissions using an English translation (provided by the organizers) of the corpus or using the multilingual corpus with documents in their original languages. We will use AutoARGUE as the baseline for Task a2.

Data and Submission Format

Participants will receive a set of report generation responses that need to be evaluated. Each generation system will map to a JSONL file where each line is the response to a request. The file name will be the generation system ID. Please see the submission format of the Multilingual Report Generation task for details.

The output format should be a TSV with the columns:

topic_id (string): the topic ID that this line is reporting
generation_system_id (string): the generation system that this line is reporting
metric_name (string): the metric name
score (float): the numerical score of the metric for this generation system on this topic

There will be a field in the submission portal to indicate which metric you would like for the shared task. You may contain multiple metrics in the submission and pick one for the evaluation.

Multilingual Report Generation

This task involves generating long-form reports in response to a request using information retrieved from a multilingual corpus. Report requests consist of background information about the user and a statement describing their information need in English. In contrast to other RAG tasks, reports should contain only information that is grounded in the corpus. Generated reports should consist of sentences with citations and will be given a length limit. Reports should be written in the same language as the report request. The corpus consists of four million English, Chinese, Russian, and Arabic documents sampled from Common Crawl News, evenly sampled from 2021 to 2024. The organizers will provide search services accessible through an API in addition to the corpus itself. Submitted reports will be judged automatically based on the Auto-ARGUE framework, which scores reports based on whether nuggets of related information are present and correctly cited in the report. We plan to score reports using a range of LLMs to understand their agreement.

Request and Submission Format

Report requests will be distributed in JSONL format as a list of individual requests, one per line. Each request will contain the following JSON fields:

topic_id (string): A unique ID for this report request
title (string): A short description of the report request
background (string): Describes the context in which the report is being written
problem_statement (string): Describes what should and should not be included in the report
limit (int): Maximum number of NFKC-normalized Unicode characters the report may included

The submission format is a sequence of JSONL entries each representing one report. Each report is a JSON object containing three main objects :

metadata (dictionary)
- topic_id (string): The unique ID of the input report request
- run_id (string): An arbitrary string to identify the run. It is recommended to include your team name as part of the run_id
Other metadata fields may be present but will be ignored.
responses (array): a list of sentence dictionaries.
references (array): a list of reference document IDs (strings). This should be the union of all cited documents.

Sentences must appear in report order. Each sentence dictionary has the following fields:

text (string): a string containing the text of the sentence
citations (dictionary): a dictionary of zero or more document IDs (strings) mapped to scores that are floating point numbers. The higher the number, the more confidence the system has in the validity of that citation.

Submission Instruction

Please submit your runs to TIRA. Each team can submit an unlimited number of submissions, but only the last three submissions from the team for each task will be evaluated and considered in the competition.

Each participating team is expected to submit a system paper after the results are announced. During the conference, the winner in each task will receive a slot for an oral presentation. Other teams will be invited to present at the poster session. We strongly encourage each team to participate in the poster session to share the knowledge.

Important Dates

Data release: December 10, 2025
Task A and B submission deadline: April 8, 2026 ~~March 5, 2026~~
Result announcement: April 28, 2026
System papers due: May 12, 2026
Workshop dates: July 2 or 3, 2026 (TBA)

Workshop Features

Grounded Generation

Multilingual RAG

Long Generation

Call for Paper

Submission Guidelines and Review Process

Important Dates

Shared Tasks

Task Description and Evaluation

Automatic Report Evaluation

Data and Submission Format

Multilingual Report Generation

Request and Submission Format

Submission Instruction

Important Dates

Organizing Committee

Dawn Lawrie

Sean MacAvaney

James Mayfield

Luca Soldaini

Eugene Yang

Andrew Yates