FinanceRAG Challenge:

Retrieval-Augmented Generation (RAG)

for Financial Documents

FinanceRAG Challenge:

Retrieval-Augmented Generation (RAG)

for Financial Documents

Competition at ICAIF '24

Competition at ICAIF '24

📅 Key Dates:

Submission Deadline: November 9 (Sat), 2024

Winner Announcement: November 10 (Sun), 2024

Presentation at ICAIF '24: November 15 (Fri), 2024, 13:00 - 14:00 ET


📍 Location:

5 MetroTech Center, 4th Floor, LC400, Brooklyn, NY 11201


If you'd like to attend the presentation, please register here to reserve your spot.

📅 Key Dates:

Submission Deadline:
November 9 (Sat), 2024

Winner Announcement:
November 10 (Sun), 2024

Presentation at ICAIF '24:
November 15 (Fri), 2024,
13:00 - 14:00 ET


📍 Location:

5 MetroTech Center,
4th Floor, LC400, Brooklyn, NY 11201


If you'd like to attend the session,
please register to reserve your spot.

Winner Announcement




Winner Announcement

Speakers

Speakers




Title

Better RAG Through Better Data

— How the world's best ML teams build reliable ingestion pipelines

Adit Abraham

CEO at Reducto

Title

Better RAG Through Better Data

— How the world's best ML teams build reliable ingestion pipelines

Adit Abraham

CEO at Reducto

Title

Parsing is the Key: Unlocking Seamless Data Pipelines for RAG

Lingjie (Kimi) Kong

CTO at CambioML, ex-Google Deepmind

Moderator

Title

Introduction to FinanceRAG

Jin Kim

Co-founder at Linq

Program Schedule

Introduction to Challenge

1:00 PM - 1:15 PM (15 minutes)

Winning Presentation

1:15 PM - 1:30 PM (15 minutes)

RAG for Finance Session 1

1:30 PM - 1:45 PM (15 minutes)

RAG for Finance Session 2

1:45 PM - 2:00 PM (15 minutes)

The goal of the competition is
to apply Retrieval-Augmented Generation (RAG)

for better handling of financial documents.

The goal of the competition is
to apply Retrieval-Augmented Generation (RAG)

for better handling of financial documents.

Overview

Start

Sep 24, 2024

Close

Nov 9, 2024

The Financial RAG Challenge aims to advance Retrieval-Augmented Generation (RAG) systems that can efficiently handle lengthy and complex financial documents. Participants are tasked with building systems capable of retrieving relevant contexts from large financial datasets and generating precise answers to financial queries, addressing real-world challenges such as financial terminology, industry-specific language, and numerical data.

In this competition, we construct a consolidated set of textual and tabular financial datasets. These datasets are designed to test the system's ability to retrieve and reason over financial data. Participants will benefit from baseline examples & official submission code in Github at FinanceRAG, and streamlined access to dataset in huggingface at Linq-AI-Research/FinanceRAG.

Participants who impress with their solutions (and explain how they achieved them) will be invited as a presenter sharing their work at ICAIF ’24.

For a reference baseline, you can explore this Kaggle notebook, which showcases how to implement both the retrieval and reranking components of the FinDER Task using the SentenceTransformers package.

Important: When submitting for the competition, ensure you generate results for all tasks and merge them into a single consolidated file.

Description

In the FinanceRAG Challenge, you are tasked with building a Retrieval-Augmented Generation (RAG) system capable of accurately retrieving relevant contexts from financial documents and generating precise answers to specific queries. The task is divided into two main steps: retrieval and generation. However, only Task 1 [Retrieval] is within the scope of this challenge and will be evaluated for scoring and ranking. Participants are required to complete only the retrieval step, which involves identifying the most relevant contexts from a large collection of financial documents to help answer each question. Task 2 [Generation] is not part of this challenge and will not be evaluated. The primary objective of Task 1 is to maximize retrieval accuracy, ensuring that the system identifies the most relevant and correct contexts from the financial datasets provided.

Task 1 [Retrieval]: This will be the main task for evaluation.

In Task 1, you are required to retrieve the most relevant document chunks (contexts) for a given query from a large document corpus. The process begins by transforming the query into an embedding, which is then used to search through a pre-indexed vector database containing document embeddings. you need to implement an effective retrieval and re-ranking system that can prioritize the top-relevant contexts of the document corpus based on similarity to the query.

Goal: The goal of Task 1 is to maximize the retrieval accuracy (nDCG@10) by ranking the most relevant contexts in response to each query. Evaluation is based on how well the system ranks the retrieved contexts compared to the ground truth.

Task 2 [Generation]: It is not part of the official evaluation criteria.

In Task 2, you move beyond retrieval and are tasked with generating an answer to a query using the retrieved contexts. This step involves using both the query and the retrieved contexts as inputs to a language model (LLM) to produce the final output. The system must extract the correct information from the retrieved chunks and generate a precise and concise response.

Goal: The goal of Task 2 is to ensure the quality of the generated answers by evaluating both their relevance to the query and their correctness, compared to the ground-truth answers.

Citation

Chanyeol Choi, Jy-Yong Sohn, Yongjae Lee, Subeen Pang, Jaeseon Ha, Hoyeon Ryoo, Yongjin Kim, Hojun Choi, Jihoon Kwon. (2024). ACM-ICAIF '24 FinanceRAG Challenge. Kaggle. https://kaggle.com/competitions/icaif-24-finance-rag-challenge

Organized and Backed by

Contact

For further details and guidelines, please contact the workshop organizers via kaggle website