Abstract:
Today, the application of Large Language Models (LLMs) has revolutionized the processing and analysis of textual information. Technological advancements have opened new
possibilities for understanding the vast amount of news published on the Internet daily. Therefore, there is an urgent need to analyze media stories and comprehend how the content
differs according to the region, the source, and the historical context.
However, because there is such an extensive amount of available information and various media outlets, tools are needed to automate and systematize the processing of vast
news collections. Bias detection, coverage shifts, or perception shifts demand sophisticated
techniques that successfully integrate information retrieval and natural language generation.
The project’s main objective is to develop an LLM-based news analysis system to analyze
the media narrative trends of the Russia-Ukraine war. For this, an architecture built using the
Retrieval-Augmented Generation (RAG) is adopted, where the integration of technologies
including LangChain, ChromaDB, and Ollama is paired with the Gemma 7B model and
backed with Streamlit and Newspaper3k.
The system offers the possibility to ask personalized questions about media coverage and
receive answers given from relevant fragments, with an option to filter by source, language,
or zone. Query customization and detailed fragment visualization provide a deep and flexible
analysis experience.
Lastly, the project’s findings are summarized and presented with future work directions,
including the incorporation of new datasets and consideration of future events.