RAG4Reports @ ACL 2026
RAG for Report Generation
Learn More About RAG4Reports

Workshop Features

Text generation has become a critical component in modern AI applications such as chatbots and agentic assistants. To combat hallucinations and provide trustworthy output, retrieval augmented generation (RAG) is the current norm for information-dense tasks. Report Generation is a long-form RAG task with strict attestation requirements that makes it well-suited to explore questions of RAG evaluation and multilingual generation. In this task, a long-form report summarizing the relevant information in a corpus is generated in response to a report request, which consists of a user background and an information need. The generated report should provide proper attribution to the source documents to establish trust.

RAG4Reports would like to draw the attention of the ACL community to developing systems and evaluation methods for the Multilingual Report Generation, which can lead to more general solutions for other long-form RAG problems as well.


Grounded Generation

Strong focus on groundedness and citation supports.

Multilingual RAG

Incorporating information in different languages is critical for effective RAG.

Long Generation

Paragraphs on coherent and information-dense responeses.

Call for Paper

Submission on OpenReview (TBA)

RAG4Reports are interested in various aspects of the Report Generation problem. The following is an incomplete list of the topics. Please reach out to us if you are unsure about whether your work is within the scope of this workshop.

  • Report generation and long-form RAG systems
  • Multilingual report generation or RAG datasets
  • Analysis of various components in report generation systems, such as multilingual retrieval, multilingual generative models.
  • Evaluation method for report generation

Submission Guidelines and Review Process

Papers can either be directly submitted to the RAG4Reports through OpenReview or committed through ARR. Submissions should use the ACL format with at most 8 pages excluding references. Reviewers will be asked to consider the paper’s length when writing their reviews.

Direct submissions will undergo a single-blinded peer-review process (i.e., reviewers will see the identity of the authors), and each paper will receive at least 3 reviews and one meta-review. For submissions proposing new resources, such as datasets or software, please provide access to the resource for the reviewers to assess its merit.

Important Dates

  • First call for papers: December 10, 2025
  • Second call for papers: January 15, 2026
  • Third call for papers: February 20, 2026
  • Paper submission deadline: March 5, 2026
  • Pre-reviewed ARR commitment deadline: March 24, 2026
  • Notification of acceptance: April 28, 2026
  • Camera-ready paper due: May 12, 2026
  • Workshop dates: July 2 or 3, 2026 (TBA)

Shared Tasks

Submission Portal (TBA) Data
Email Eugene Yang to join the RAG4Reports Slack Channel!

Task Description and Evaluation

RAG4Reports will host two tasks:

  1. Automatic Report Evaluation
  2. Multilingual Report Generation

Automatic Report Evaluation

We will provide system-generated reports from 2025 TREC RAGTIME submissions that have been judged by human annotators as the input for the shared task participants. The task is to provide a system ranking based on each report request (long-form query with a description of user background) as well as an overall ranking across all report requests. The submitted rankings will be evaluated on correlation to the ranking derived from human annotations. We will accept two types of submissions:

  1. fully automatic evaluators: without any additional human inputs;
  2. semi-automatic evaluators: with an additional input of human-curated essential facts (will be provided by the organizers) that should be included in a useful report

To study the effect of document languages on the evaluation, we will accept submissions using an English translation (provided by the organizers) of the corpus or using the multilingual corpus with documents in their original languages. We will use AutoARGUE as the baseline for Task a2.

Data and Submission Format

Participants will receive a set of report generation responses that need to be evaluated. Each generation system will map to a JSONL file where each line is the response to a request. The file name will be the generation system ID. Please see the submission format of the Multilingual Report Generation task for details.

The output format should be a TSV with the columns:

  • topic_id (string): the topic ID that this line is reporting
  • generation_system_id (string): the generation system that this line is reporting
  • metric_name (string): the metric name
  • score (float): the numerical score of the metric for this generation system on this topic

There will be a field in the submission portal to indicate which metric you would like for the shared task. You may contain multiple metrics in the submission and pick one for the evaluation.

Multilingual Report Generation

This task involves generating long-form reports in response to a request using information retrieved from a multilingual corpus. Report requests consist of background information about the user and a statement describing their information need in English. In contrast to other RAG tasks, reports should contain only information that is grounded in the corpus. Generated reports should consist of sentences with citations and will be given a length limit. Reports should be written in the same language as the report request. The corpus consists of four million English, Chinese, Russian, and Arabic documents sampled from Common Crawl News, evenly sampled from 2021 to 2024. The organizers will provide search services accessible through an API in addition to the corpus itself. Submitted reports will be judged automatically based on the Auto-ARGUE framework, which scores reports based on whether nuggets of related information are present and correctly cited in the report. We plan to score reports using a range of LLMs to understand their agreement.

Request and Submission Format

Report requests will be distributed in JSONL format as a list of individual requests, one per line. Each request will contain the following JSON fields:

  • topic_id (string): A unique ID for this report request
  • title (string): A short description of the report request
  • background (string): Describes the context in which the report is being written
  • problem_statement (string): Describes what should and should not be included in the report
  • limit (int): Maximum number of NFKC-normalized Unicode characters the report may included

The submission format is a sequence of JSONL entries each representing one report. Each report is a JSON object containing three main objects :

  • metadata (dictionary)
    • topic_id (string): The unique ID of the input report request
    • run_id (string): An arbitrary string to identify the run. It is recommended to include your team name as part of the run_id

    Other metadata fields may be present but will be ignored.

  • responses (array): a list of sentence dictionaries.
  • references (array): a list of reference document IDs (strings). This should be the union of all cited documents.

Sentences must appear in report order. Each sentence dictionary has the following fields:

  • text (string): a string containing the text of the sentence
  • citations (dictionary): a dictionary of zero or more document IDs (strings) mapped to scores that are floating point numbers. The higher the number, the more confidence the system has in the validity of that citation.

Submission Instruction

Please submit your runs through the Google Form that will be announced later. Each team can submit an unlimited number of submissions, but only the last three submissions from the team for each task will be evaluated and considered in the competition.

Each participating team is expected to submit a system paper after the results are announced. During the conference, the winner in each task will receive a slot for an oral presentation. Other teams will be invited to present at the poster session. We strongly encourage each team to participate in the poster session to share the knowledge.

Important Dates

  • Data release: December 10, 2025
  • Task A and B submission deadline: March 5, 2026
  • Result announcement: April 28, 2026
  • System papers due: May 12, 2026
  • Workshop dates: July 2 or 3, 2026 (TBA)

Organizing Committee

Dawn Lawrie

HLTCOE, Johns Hopkins University

Sean MacAvaney

University of Glasgow

James Mayfield

HLTCOE, Johns Hopkins University

Luca Soldaini

Allen Institute for AI

Eugene Yang

HLTCOE, Johns Hopkins University

Andrew Yates

HLTCOE, Johns Hopkins University