KIT-Bibliothek

Project 01: Scholars in the loop? Bridging the gap between distinctive knowledge of small disciplines and training data for AI

Die Mediendatei ist nicht mehr verfügbar.

Autor

Germaine Götzelmann, Danah Tonne, Charlotte Debus

Herausgeber

Uwe Ehret, Martin Frank, KIT-Zentrum MathSEE

Beteiligtes Institut

KIT-Zentrum Mathematik in den Natur-, Ingenieur- und Wirtschaftswissenschaften (KIT-Zentrum MathSEE)
Scientific Computing Center (SCC)
Institut für Wasser und Gewässerentwicklung (IWG)

Genre

Veranstaltung

Beschreibung

01 Scholars in the loop? Bridging the gap between distinctive knowledge of small disciplines and training data for AI
MATH PI: Dr. Danah Tonne, Steinbuch Centre for Computing (SCC), Data Exploitation Methods (SCC-DEM)
SEE PI: Dr. Charlotte Debus, Steinbuch Centre for Computing (SCC), Junior Research Group Robust and Efficient AI (SCC-RAI), Germaine Götzelmann, Steinbuch Centre for Computing (SCC), Data Exploitation Methods (SCC-DEM)
Department(s): Informatics (Computer Science)
Type of position: 75% FTE, E13 TV-L
Recently, critical assessments of AI have pointed out that machine learning approaches are acting as a magnifier glass on social biases, revealing critical blind spots and imbalances in the underlying data labels. Small research disciplines, on the other hand, are especially prone to deal with edge cases and filling in blind spots regarding human culture and knowledge. They are important correction factors to address and ultimately adjust cultural biases, skewed views and information gaps in a postcolonial world as well as providing well-needed scholarly information about areas prone to misinformation, fake news and pseudoscience activities. To be able to substantiate their findings by the evaluation of large data collections new methodical approaches like Artificial Intelligence (AI) have been introduced by the research field Digital Humanities.
In small research disciplines, scholarly digital annotation has gained track in recent years, capturing fast amounts of unique knowledge in tedious manual or semiautomatic processes. But the exchange formats and publications of results mainly address human knowledge sharing and cannot be directly transformed into training data. Machine learning projects quite often have to start from scratch in regards to data labeling, making their training data potentially shallow and sparse.
This makes the research process both unsustainable and less likely to succeed. State-of-the-art solutions to address those data labelling challenges are involving activation of a broader community with diversified world views in form of crowd sourcing concepts as well as interactive labeling (IL) and ‘human in the loop’ (HITL) approaches. However, one important puzzle piece for resolving the stated challenges is facilitating the in-depth knowledge already produced by research projects for data labeling purposes.
The goal of this dissertation project is to identify the main obstacles and to develop solutions to facilitate the research data flow between more ‘traditional’ projects and much needed high quality training data. The focus lies on selected use cases from small (humanities) research fields with close connection to domain experts, e.g. philological, medieval, or religious studies and the possibility to adequately survey their needs and constraints.
Requirements for this position:
- Solid background in either Computer Science or related fields like Mathematics, Physics, Electrical Engineering or Digital Humanities
- Software development and basic programming language, e.g. Python, C/C++
- Prior experiences in data science and machine learning, and corresponding software frameworks (e.g. PyTorch) is advantageous
- High interest in interdisciplinary research

Laufzeit (hh:mm:ss)

00:04:41

Serie

KCDS Virtual Open House 2023 - Fall

Publiziert am

19.10.2023

Fachgebiet

Informatik

Lizenz

KITopen-Lizenz

Auflösung 1430 x 720 Pixel
Seitenverhältnis 143:72
Audiobitrate 65411 bps
Audio Kanäle 1
Audio Codec aac
Audio Abtastrate 32000 Hz
Gesamtbitrate 443768 bps
Container mov,mp4,m4a,3gp,3g2,mj2
Dauer 281.344000 s
Dateiname DIVA-2023-260_mp4.mp4
Dateigröße 15.606.462 byte
Bildwiederholfrequenz 25
Videobitrate 372164 bps
Video Codec h264

Embed-Code