Contact

Prof. Dr. Philipp Schaer

Prof. Dr. Philipp Schaer

Institut für Informationswissenschaft (IWS)

  • Phone: +49 221-8275-3845

External Project Partner

Dr. Meik Bittkowski
Science Media Center Germany
Head of software development and data science

  • Phone: +49 221-8888-25-142

JoIE - Journalistic Information Extraction

Logo Klaus-Tschira-Stiftung (Image: Klaus-Tschira-Stiftung)

The project Journalistic Information Extraction (JoIE) aims to address the problem of information extraction from unstructured sources, that are relevant for (data) journalism. Based on the two state-of-the-art tools Workbench and Fonduer, a solution will be developed that can handle the different web data sources and makes them usable for journalism by putting them into a structured form.

Data journalism is a new journalistic discipline that focuses particularly on data-driven research and presentation formats. However, a fundamental problem of data journalism, but also of classical journalism, is that much data of journalistic interest is only available in unstructured form: as texts, tables and graphics in documents of various types (Word, PDF, e-mail, etc.) or on websites.

The project Journalistic Information Extraction (JoIE) aims to address the problem of information extraction from unstructured sources, that are relevant for (data) journalism. Based on the two state-of-the-art tools Workbench and Fonduer, a solution is going to be developed that can handle the above-mentioned data sources and makes them usable for journalism by putting them into a structured and thus analyzable form. Workbench is a web-based platform for the preparation and analysis of data, which allows, among other things, the extraction of web data. Fonduer is a toolkit that uses the latest methods of artificial intelligence to automatically learn extraction patterns, e.g. for the recognition of tables. Both applicants, the Science Media Center (SMC) and the working group around Professor Schaer at TH Köln - University of Applied Sciences have already successfully worked together in the field of information extraction and have the corresponding experience and expertise.

Within a 36-month research and development phase, a software system is going to be developed in JoIE as part of a PhD project. The system will integrate the two components Workbench and Fonduer and will have an interface based on the principles of Learnable Programming. With the help of this software system, the problem of information extraction, which is the driving force for (data) journalism, will be addressed. The requirements of data journalists derived from a requirement engineering phase will serve as a basis for the system design and is going to be investigated in corresponding user tests and evaluations. The work combines theoretical approaches of Human-Computer-Interaction (HCI) and practical implementations derived from real-world applications and thus represents a research desideratum.

At a Glance

Category Description
Research project JoIE - Journalistic Information Extraction 
Administration Prof. Dr. Philipp Schaer  More
Faculty Faculty of Information Science and Communication Studies  More
Institute Institute of Information Management  More
Partners Science Media Center Germany  More
Sponsors Klaus-Tschira-Stiftung 
Duration 36 months 

Contact

Prof. Dr. Philipp Schaer

Prof. Dr. Philipp Schaer

Institut für Informationswissenschaft (IWS)

  • Phone: +49 221-8275-3845

External Project Partner

Dr. Meik Bittkowski
Science Media Center Germany
Head of software development and data science

  • Phone: +49 221-8888-25-142

M
M