WINNER
2ND
David (Joohyun) Lee
Senior Researcher at Finance Security Institute
Seoul National University
ECE MS
Multi-Reranker : To Maximize Performance of RAG System
WINNER
1ST
Dr. Jing Wang
Senior Applied Scientist at Accrete
Brown University Physics PhD
Finance RAG with Hybrid Search and Reranking
WINNER
3RD
Ho-young Lee
Researcher
HUFS CS BS
Mixture of Experts for Retrieval
Title
Parsing is the Key: Unlocking Seamless Data Pipelines for RAG
Lingjie (Kimi) Kong
CTO at CambioML, ex-Google Deepmind
Moderator
Title
Introduction to FinanceRAG
Jin Kim
Co-founder at Linq
Program Schedule
Introduction to Challenge
1:00 PM - 1:15 PM (15 minutes)
Winning Presentation
1:15 PM - 1:30 PM (15 minutes)
RAG for Finance Session 1
1:30 PM - 1:45 PM (15 minutes)
RAG for Finance Session 2
1:45 PM - 2:00 PM (15 minutes)
Overview
Start
Sep 24, 2024
Close
Nov 9, 2024
The Financial RAG Challenge aims to advance Retrieval-Augmented Generation (RAG) systems that can efficiently handle lengthy and complex financial documents. Participants are tasked with building systems capable of retrieving relevant contexts from large financial datasets and generating precise answers to financial queries, addressing real-world challenges such as financial terminology, industry-specific language, and numerical data.
In this competition, we construct a consolidated set of textual and tabular financial datasets. These datasets are designed to test the system's ability to retrieve and reason over financial data. Participants will benefit from baseline examples & official submission code in Github at FinanceRAG, and streamlined access to dataset in huggingface at Linq-AI-Research/FinanceRAG.
Participants who impress with their solutions (and explain how they achieved them) will be invited as a presenter sharing their work at ICAIF ’24.
For a reference baseline, you can explore this Kaggle notebook, which showcases how to implement both the retrieval and reranking components of the FinDER Task using the SentenceTransformers package.
Important: When submitting for the competition, ensure you generate results for all tasks and merge them into a single consolidated file.
Description
In the FinanceRAG Challenge, you are tasked with building a Retrieval-Augmented Generation (RAG) system capable of accurately retrieving relevant contexts from financial documents and generating precise answers to specific queries. The task is divided into two main steps: retrieval and generation. However, only Task 1 [Retrieval] is within the scope of this challenge and will be evaluated for scoring and ranking. Participants are required to complete only the retrieval step, which involves identifying the most relevant contexts from a large collection of financial documents to help answer each question. Task 2 [Generation] is not part of this challenge and will not be evaluated. The primary objective of Task 1 is to maximize retrieval accuracy, ensuring that the system identifies the most relevant and correct contexts from the financial datasets provided.
Task 1 [Retrieval]: This will be the main task for evaluation.
In Task 1, you are required to retrieve the most relevant document chunks (contexts) for a given query from a large document corpus. The process begins by transforming the query into an embedding, which is then used to search through a pre-indexed vector database containing document embeddings. you need to implement an effective retrieval and re-ranking system that can prioritize the top-relevant contexts of the document corpus based on similarity to the query.
Goal: The goal of Task 1 is to maximize the retrieval accuracy (nDCG@10) by ranking the most relevant contexts in response to each query. Evaluation is based on how well the system ranks the retrieved contexts compared to the ground truth.
Task 2 [Generation]: It is not part of the official evaluation criteria.
In Task 2, you move beyond retrieval and are tasked with generating an answer to a query using the retrieved contexts. This step involves using both the query and the retrieved contexts as inputs to a language model (LLM) to produce the final output. The system must extract the correct information from the retrieved chunks and generate a precise and concise response.
Goal: The goal of Task 2 is to ensure the quality of the generated answers by evaluating both their relevance to the query and their correctness, compared to the ground-truth answers.
Citation
Chanyeol Choi, Jy-Yong Sohn, Yongjae Lee, Subeen Pang, Jaeseon Ha, Hoyeon Ryoo, Yongjin Kim, Hojun Choi, Jihoon Kwon. (2024). ACM-ICAIF '24 FinanceRAG Challenge. Kaggle. https://kaggle.com/competitions/icaif-24-finance-rag-challenge
Organized and Backed by
Contact
For further details and guidelines, please contact the workshop organizers via kaggle website