Project 01: Scholars in the loop? Bridging the gap between distinctive knowledge of small disciplines and training data for AI
Autor
Germaine Götzelmann, Danah Tonne, Charlotte Debus
Herausgeber
Uwe Ehret, Martin Frank, KIT-Zentrum MathSEE
Beteiligtes Institut
KIT-Zentrum Mathematik in den Natur-, Ingenieur- und Wirtschaftswissenschaften (KIT-Zentrum MathSEE)
Scientific Computing Center (SCC)
Institut für Wasser und Gewässerentwicklung (IWG)
Genre
Beschreibung
01 Scholars in the loop? Bridging the gap between distinctive knowledge of small disciplines and training data for AI
MATH PI: Dr. Danah Tonne, Steinbuch Centre for Computing (SCC), Data Exploitation Methods (SCC-DEM)
SEE PI: Dr. Charlotte Debus, Steinbuch Centre for Computing (SCC), Junior Research Group Robust and Efficient AI (SCC-RAI), Germaine Götzelmann, Steinbuch Centre for Computing (SCC), Data Exploitation Methods (SCC-DEM)
Department(s): Informatics (Computer Science)
Type of position: 75% FTE, E13 TV-L
Recently, critical assessments of AI have pointed out that machine learning approaches are acting as a magnifier glass on social biases, revealing critical blind spots and imbalances in the underlying data labels. Small research disciplines, on the other hand, are especially prone to deal with edge cases and filling in blind spots regarding human culture and knowledge. They are important correction factors to address and ultimately adjust cultural biases, skewed views and information gaps in a postcolonial world as well as providing well-needed scholarly information about areas prone to misinformation, fake news and pseudoscience activities. To be able to substantiate their findings by the evaluation of large data collections new methodical approaches like Artificial Intelligence (AI) have been introduced by the research field Digital Humanities.
In small research disciplines, scholarly digital annotation has gained track in recent years, capturing fast amounts of unique knowledge in tedious manual or semiautomatic processes. But the exchange formats and publications of results mainly address human knowledge sharing and cannot be directly transformed into training data. Machine learning projects quite often have to start from scratch in regards to data labeling, making their training data potentially shallow and sparse.
This makes the research process both unsustainable and less likely to succeed. State-of-the-art solutions to address those data labelling challenges are involving activation of a broader community with diversified world views in form of crowd sourcing concepts as well as interactive labeling (IL) and ‘human in the loop’ (HITL) approaches. However, one important puzzle piece for resolving the stated challenges is facilitating the in-depth knowledge already produced by research projects for data labeling purposes.
The goal of this dissertation project is to identify the main obstacles and to develop solutions to facilitate the research data flow between more ‘traditional’ projects and much needed high quality training data. The focus lies on selected use cases from small (humanities) research fields with close connection to domain experts, e.g. philological, medieval, or religious studies and the possibility to adequately survey their needs and constraints.
Requirements for this position:
- Solid background in either Computer Science or related fields like Mathematics, Physics, Electrical Engineering or Digital Humanities
- Software development and basic programming language, e.g. Python, C/C++
- Prior experiences in data science and machine learning, and corresponding software frameworks (e.g. PyTorch) is advantageous
- High interest in interdisciplinary research
Laufzeit (hh:mm:ss)
00:04:41
Serie
KCDS Virtual Open House 2023 - Fall
Publiziert am
19.10.2023
Fachgebiet
Lizenz
Auflösung | 1430 x 720 Pixel |
Seitenverhältnis | 143:72 |
Audiobitrate | 65411 bps |
Audio Kanäle | 1 |
Audio Codec | aac |
Audio Abtastrate | 32000 Hz |
Gesamtbitrate | 443768 bps |
Container | mov,mp4,m4a,3gp,3g2,mj2 |
Dauer | 281.344000 s |
Dateiname | DIVA-2023-260_mp4.mp4 |
Dateigröße | 15.606.462 byte |
Bildwiederholfrequenz | 25 |
Videobitrate | 372164 bps |
Video Codec | h264 |
Embed-Code