Samia Touileb
Position
Associate Professor, Natural Language Processing
Affiliation
Research groups
Short info
Research
I am an Associate Professor in Natural Language Processing (NLP). I am also the co-leader of the NLP work package at the Research Centre for Responsible Media Technology & Innovation. Prior to this I was a researcher in MediaFutures on Norwegian Language Technologies, and a Postdoc at the Language Technology Group (LTG), Department of Informatics, at the University of Oslo. I have a PhD in NLP from the University of Bergen, and have been working within research in and applications of NLP.
My main research interests are bias and fairness in NLP, alignment, information extraction, summarization, and applications of NLP and machine learning methods to tasks within social science research. I also mainly work on under- and mid-resourced languages such as Norwegian.
Outreach
- Keynote at the AI Regulation and Governance: A Cross-Jurisdictional Approach conference, June 2024. Title of my keynote: Questioning the Machine: Decisions, Bias, and Fairness in AI.
- Invited speaker at the Western Norway Film Fund Industry day (Vestnorske bransjedagar), May 2024. Title of my talk: Etiske problemstillingar rundt bruken av AI.
- Invited talk at KPMG, Bergen, May 2024. Title of the session: Kurs: Etikk, sikkerhet og fremtiden med AI. Title of my talk: Etikk og AI.
- Invited speaker at the Nordic Media Days (Nordiske mediedager), May 2024. Title of my talk: ChatGPT, Copilot og Bard – hvordan fungerer store språkmodeller?.
- Invited speaker the Bergen Chamber of Commerce and Industry seminar, February 2024. Title of the seminar: Realiser AI potensialet på arbeidsplassen - Kan Copilot utløse mulighetene?. I gave a presentation entitled Begrensninger og etiske betraktinger rundt språkmodeller, and I was a member of a panel discussion. The seminars are directed to members from all companies and industries in the Bergen region.
- Invited at Metis high school, Bergen, February 2024. Invited to give a talk at their Internet safety day. I gave a presentation about AI and large language models, their capabilities and weaknesses.
- Speaker at Lærernes dag 2024, January 2024. Invited to give a talk at this years lecturer’s day at the University of Bergen. This yearly arrangement attracts teachers from the Bergen area to promote research. The title of my presentation was: Læring i språkmodellenes tid.
- Invited speaker at the Norwegian Association for the protection of Industrial Property, November 2023. Title of the seminar: Ettermiddagsseminar om kunstig intelligens og immaterialrett. The seminar organised by opphavsrettsforeningen (Norsk Forening for Opphavsrett) The Norwegian Association for the protection of Industrial Property. Title of my talk: Hva er kunstig intelligens i stand til å skape og hva kan vi vente oss i nær fremtid?.
- Interviewed for an article on forskning.no, October 2023. Interviewed by Øystein Rygg Haanæs to write an article published on the research platform forskning.no Title: Språkteknologi på villspor: Drømmer kvinner om å bli voldtatt?.
- Invited talk at Sampol-Konferansen, October 2023. The Comparative Politics Conference, University of Bergen. Title: Big Science: Gullgruve eller fallgruve?.
- Invited talk at The Norwegian Academy of Science and Letters*, September 2023. Det Norske Videnskaps-Akademi. Title: Store språkmodeller: muligheter og utfordringer.
- Interviewed for TV 2’s impact on society report}, October 2023. Interviewed by Øystein Rygg Haanæs to for TV 2’s impact on society report to celebrate their 30 years as independant news providers in Norway. Title: Kunstig intelligens krever åpnehet og integritet.
- op-ed piece in Morgenbladet, July 2023. Title: Chat GPT egner seg dårlig til eksamenssensuren. Authors: Pierre Lison (senior researcher at Norwegian ) and Samia Touileb.
- Invited talk at a European Broadcasting Union (EBU) webinar, June 2023. Theme of the webinar: LLM Benchmarking Strategies. Title of my talk: Benchmarking the societal and ethical implications of large language models.
- Invited talk at the Norwegian School of Economics (NHH), June 2023. Title: Demystifying ChatGPT and language models.
- Invited talk at the Department of Mathematics, University of Bergen, May 2023. Title: Large Language models: What are they, and what are their ethical implications?.
- Invited speaker at Future Week, June 2023. Title of the session: Innovation in the newsroom.
- Invited panellist at Eilerts salong, organised by the University of Agder, May 2023. Title: Blir vi overflødige? En samtale om kunstig intelligens og utdanning.
- Invited talk at norske dataforeningen, May 2023. Title: Sosiale og etiske utfordringer med språkmodeller som ChatGPT.
- Invited to a podcast episode – Nordepodden, May 2023. Title of the episode: Når kunstig intelligens får ordet i sin makt.
- op-ed piece in Medier 24, April 2023. Title: KI-dyret må mates med varsomhet. Authors: Samia Touileb and Per Christian Magnus (Leader of the center of investigative journalism SUJO). Check this for a short English summary.
- Invited to a podcast episode – Abels tårn, March 2023. The episode was a small section of a forthcoming AI series. I talked about NLP and pre-trained language models, and discussed some of the ethical and societal issues.
- Invited talk at the University of Stavanger, March 2023. Title: Når kunstig intelligens inntar redaksjonen, organised by Media City Bergen.
- Invited talk at the Department of Archaeology, History, Cultural Studies and Religion, UiB, March 2023. Title: ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- Invited speaker at its learning webinar, March 2023. Title of the webinar: ChatGPT and AI in education
- Invited to a podcast episode – lektorlomsdalen podcast, February 2023. Title of the episode: ChatGPT og etiske perspektiver.
- Invited speaker at Vestland Fylkesommune seminar about teaching, February 2023. Title: ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- Invited speaker at the lecturers’ conference 2023 (University of Bergen), February 2023. Invited talk about large language models, their potential and drawbacks for use in education. Solstrand, Bergen.
- Invited speaker and panellist at NORA ground-breaking seminar series, February 2023. Title of the talk: The Societal and Ethical Implications of Language Models. Title of the panel discussion: The Ethics of Large Language Models.
- Invited speaker and panellist at UiB AI seminar series, February 2023. Title of the seminar: ChatGPT – trussel eller mulighet i forskning og utdanning?.
- Invited talk at Wolftech, February 2023. Title: Measuring harmful and toxic representations in Scandinavian language models.
- Invited speaker at ForskningsdageneUNG 2022, September 2022. I was invited to give an inspirational talk about my research to high school students during the research days, at the University of Bergen. I gave a presentation about NLP, and tried to motivate future students to seek a degree within informatics, NLP, or the broad field of AI.
- Invited guest to a podcast episode, April 2022. I was invited as a guest to the podcast UiB Popviten. The podcast is the University of Bergen’s popular science podcast. The theme of the episode was the problematic aspects of black box machine learning models and how they can contain various types of biases and cause harmful outcomes when deployed. The podcast is in Norwegian, and can be listened to here or here.
- Speaker at UiB AI seminar series, April 2022. Title: But, why? - make AI answer!. I presented and discussed interpretability of neural models.
- Panel member (invited), March 2022. I was invited to be a member of panel discussion about AI, friend, foe, or fad at the annual Booster conference, in Bergen. Booster is a software conference organized for developers, project managers, architects, UX professionals, testers, and security professionals. The panel was led by Professor Marija Slavkovik, University of Bergen. The other panelists were Kevin Baum (university of Saarland) and Martin Gundersen (NRK – Norwegian Broadcasting Corporation). More information can be found here.
Teaching
- INFO371 – Research Topics in Networks and Text Analysis (Spring 2022, Spring 2024, Spring 2025). Master’s course. Focus on Natural Language Processing, machine learning, and deep learning. Department of Information Science and Media Studies. University of Bergen.
- DIGI117 – Natural Language Processing (Autumn 2023, Spring 2024, Autumn 2024, Spring 2025). New course I have developed for the University of Bergen’s Digital understanding, knowledge and competence (DIGI) course package. Department of Information Science and Media Studies. University of Bergen.
- DIGI114 – Basic introduction to artificial intelligence (Spring 2025). Department of Information Science and Media Studies. University of Bergen.
- Invited guest lecturer at the Norwegian School of Economics (NHH – Autumn 2023). Lecture for Executive MBA students about large language models, how they are currently used, how they can be used, and their ethical and societal impacts. The Norwegian School of Economics, NHH, Bergen.
- AIKI100 – Examen facultatum - Introduction to Artificial Intelligence (Autumns 2021, 2022, 2023, 2024). Bachelor course. I gave an introductory lecture about Natural Language Processing. Department of Information Science and Media Studies. University of Bergen.
- Invited guest lecturer MA8701 – Advanced statistical methods in inference and learning (15.03.2021). PhD course in statistics. Theme of the lecture: Analysing text with neural networks. Department of Mathematical Sciences. NTNU (Norwegian University of Science and Technology).
- IN1140 – Introduction to Language Technology (Autumns 2017, 2018, 2019, and 2020). Bachelor course. Department of Informatics. University of Oslo.
- Guest lecturer IN2110 – Methods in Language Technology (19.02.2019). Bachelor course. Theme of the lecture: lexical semantics and word vectors.
Department of Informatics. University of Oslo. - INFO134 – Client programming (CSS3, HTML5, and JavaScript) (Spring 2017). Bachelor course. Department of Information Science and Media Studies. University of Bergen.
- INFO103 – Information and Knowledge (Spring 2015). Bachelor course. Department of Information Science and Media Studies. University of Bergen.
Publications
Academic article
- Mahmood, Bilal; Elahi, Mehdi; Vadiee, Farhad et al. (2025). A Supervised Machine Learning Approach for Supporting Editorial Article Selection. (external link)
- Mahmood, Bilal; Elahi, Mehdi; Steskal, Lubos et al. (2024). Can Large Language Models Support Editors Pick Related News Articles?. (external link)
- Blum, Sophie; Koudijs, Raoul; Ozaki, Ana et al. (2023). Learning Horn envelopes via queries from language models. (external link)
- Touileb, Samia; Steskal, Lubos (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. (external link)
- Salway, Andrew; Touileb, Samia (2014). Applying grammar induction to text mining. (external link)
- Salway, Andrew; Touileb, Samia; Tvinnereim, Endre (2014). Inducing Information Structures for Data-driven Text Analysis. (external link)
Academic anthology/Conference proceedings
- Habash, Nizar; Bouamor, Houda; Eskander, Ramy et al. (2024). Proceedings of The Second Arabic Natural Language Processing Conference. (external link)
- Galimullin, Rustam; Touileb, Samia (2023). Proceedings of the 5th Symposium of the Norwegian AI Society (NAIS 2023). (external link)
- Habash, Nizar; Bouamor, Houda; Hajj, Hazem et al. (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. (external link)
Poster
- Mahmood, Bilal; Elahi, Mehdi; Vadiee, Farhad et al. (2024). A Supervised Machine Learning Approach for Supporting Editorial Article Selection. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-informed Models to Investigate Bias. (external link)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatically identifying names of unrecognized politicians. (external link)
- Touileb, Samia; Steskal, Lubos (2015). A computational approach to organize and analyze online communication data. (external link)
- Salway, Andrew; Hofland, Knut; Touileb, Samia (2013). Applying Corpus Techniques to Climate Change Blogs. (external link)
Academic chapter/article/Conference paper
- Skulstad, Aud Solbjørg; Touileb, Samia (2024). Large Language Models and their usage in EAL education. (external link)
- Fares, Murhaf; Touileb, Samia (2024). BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling. (external link)
- Touileb, Samia; Murstad, Jeanett; Mæhlum, Petter et al. (2024). EDEN: A Dataset for Event Detection in Norwegian News. (external link)
- Simon, Étienne; Olsen, Helene Bøsei; You, Huiling et al. (2024). Generative Approaches to Event Extraction: Survey and Outlook. (external link)
- Olsen, Helene Bøsei; Touileb, Samia; Velldal, Erik (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2023). Measuring normative and descriptive biases in language models using census data. (external link)
- Samuel, David; Kutuzov, Andrei; Touileb, Samia et al. (2023). NorBench – A Benchmark for Norwegian Language Models. (external link)
- Barnes, Jeremy Claude; Touileb, Samia; Mæhlum, Petter et al. (2023). Identifying Token-Level Dialectal Features in Social Media. (external link)
- You, Huiling; Touileb, Samia; Øvrelid, Lilja (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing. (external link)
- Sheikhi, Ghazaal; Opdahl, Andreas Lothe; Touileb, Samia et al. (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. (external link)
- Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models. (external link)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph: Event Extraction as Semantic Graph Parsing. (external link)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. (external link)
- Touileb, Samia; Nozza, Debora (2022). Measuring Harmful Representations in Scandinavian Language Models. (external link)
- Touileb, Samia (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . (external link)
- Touileb, Samia (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. (external link)
- Mæhlum, Petter; Kåsen, Andre; Touileb, Samia et al. (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. (external link)
- Kutuzov, Andrei; Touileb, Samia; Mæhlum, Petter et al. (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2022). Occupational Biases in Norwegian and Multilingual Language Models. (external link)
- Touileb, Samia; Barnes, Jeremy (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. (external link)
- Barnes, Jeremy; Mæhlum, Petter; Touileb, Samia (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. (external link)
- Touileb, Samia (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. (external link)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. (external link)
- Lison, Pierre; Barnes, Jeremy; Hubin, Aliaksandr et al. (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . (external link)
- Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. (external link)
- Rodina, Julia; Bakshandaeva, Daria; Fomin, Vadim et al. (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. (external link)
- Barnes, Jeremy Claude; Touileb, Samia; Øvrelid, Lilja et al. (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. (external link)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatic identification of unknown names with specific roles. (external link)
- Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander et al. (2018). NoReC: The Norwegian Review Corpus. (external link)
- Touileb, Samia; Salway, Andrew (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . (external link)
Lecture
- Touileb, Samia (2023). Benchmarking the societal and ethical implications of large language model. (external link)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT. (external link)
- Touileb, Samia (2023). The Societal and Ethical Implications of Language Models. (external link)
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet. (external link)
- Touileb, Samia; Fahlvik, Morten; Berg, John Arthur (2023). ChatGPT & AI in education. (external link)
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.. (external link)
- Touileb, Samia; Schjøll, Anita; Throndsen, Eivind et al. (2023). The Ethics of Large Language Models. (external link)
- Touileb, Samia; Åkernes, Hanne Louise (2023). Når kunstig intelligens inntar redaksjonen. (external link)
- Touileb, Samia (2023). Demystifying ChatGPT and language models. (external link)
- Touileb, Samia; Lemaire, Pauline Marguerite (2023). Big Science Gullgruve eller fallgruve?. (external link)
- Touileb, Samia; Duarte, Katherine (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis. (external link)
- Touileb, Samia; Elgesem, Dag; Steskal, Lubos (2012). Networks of texts and people. (external link)
Popular scientific lecture
- Touileb, Samia (2023). Store språkmodeller: muligheter og utfordringer. (external link)
- Goodwin, Morten; Touileb, Samia; Bøhn, Einar Duenger (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning. (external link)
- Touileb, Samia (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?. (external link)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller . (external link)
Feature article
Academic lecture
- Touileb, Samia (2023). Large Language models: What are they, and what are their ethical implications?. (external link)
- Sjøvaag, Helle; Pedersen, Truls Andre; Touileb, Samia (2018). Operationalising Diversity for Big Data Policy Research. (external link)
- Pedersen, Truls Andre; Touileb, Samia; Sjøvaag, Helle (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names . (external link)
- Iversen, Magnus Hoem; Pedersen, Truls Andre; Stavelin, Eirik et al. (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.. (external link)
- Touileb, Samia (2013). Inducing local grammars from n-grams. (external link)
Doctoral dissertation
Projects
OPINION COST action: https://www.cost.eu/actions/CA21129/
MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/
NorDial: https://github.com/jerbarnes/nordial