Samia Touileb
Stilling
Førsteamanuensis, Språkteknologi
Tilhørighet
Forskergrupper
Kort info
Forskning
Jeg er førsteamanuensis innen språkteknologi (Natural Language Processing på Engelsk). Før dette var hun forsker ved MediaFutures (WP5 -- norsk språkteknologi), og postdoktor ved Språkteknologigruppen (LTG), Institutt for informatikk ved Universitetet i Oslo. Jeg har en doktorgrad i språkteknologi fra Universitetet i Bergen.
Mine hoved forskninginteresser inkluderer skjevhet og rettferdighet i modeller innen språkteknologi, informasjonsekstraksjon, automatisk generering av sammendrag, og anvendelser av språkteknologiske- og maskinlæringsmetoder innen samfunnsvitenskapelig forskning.
Formidling
- Keynote at the AI Regulation and Governance: A Cross-Jurisdictional Approach conference, June 2024. Title of my keynote: Questioning the Machine: Decisions, Bias, and Fairness in AI.
- Invited speaker at the Western Norway Film Fund Industry day (Vestnorske bransjedagar), May 2024. Title of my talk: Etiske problemstillingar rundt bruken av AI.
- Invited talk at KPMG, Bergen, May 2024. Title of the session: Kurs: Etikk, sikkerhet og fremtiden med AI. Title of my talk: Etikk og AI.
- Invited speaker at the Nordic Media Days (Nordiske mediedager), May 2024. Title of my talk: ChatGPT, Copilot og Bard – hvordan fungerer store språkmodeller?.
- Invited speaker the Bergen Chamber of Commerce and Industry seminar, February 2024. Title of the seminar: Realiser AI potensialet på arbeidsplassen - Kan Copilot utløse mulighetene?. I gave a presentation entitled Begrensninger og etiske betraktinger rundt språkmodeller, and I was a member of a panel discussion. The seminars are directed to members from all companies and industries in the Bergen region.
- Invited at Metis high school, Bergen, February 2024. Invited to give a talk at their Internet safety day. I gave a presentation about AI and large language models, their capabilities and weaknesses.
- Speaker at Lærernes dag 2024, January 2024. Invited to give a talk at this years lecturer’s day at the University of Bergen. This yearly arrangement attracts teachers from the Bergen area to promote research. The title of my presentation was: Læring i språkmodellenes tid.
- Invited speaker at the Norwegian Association for the protection of Industrial Property, November 2023. Title of the seminar: Ettermiddagsseminar om kunstig intelligens og immaterialrett. The seminar organised by opphavsrettsforeningen (Norsk Forening for Opphavsrett) The Norwegian Association for the protection of Industrial Property. Title of my talk: Hva er kunstig intelligens i stand til å skape og hva kan vi vente oss i nær fremtid?.
- Interviewed for an article on forskning.no, October 2023. Interviewed by Øystein Rygg Haanæs to write an article published on the research platform forskning.no Title: Språkteknologi på villspor: Drømmer kvinner om å bli voldtatt?.
- Invited talk at Sampol-Konferansen, October 2023. The Comparative Politics Conference, University of Bergen. Title: Big Science: Gullgruve eller fallgruve?.
- Invited talk at The Norwegian Academy of Science and Letters*, September 2023. Det Norske Videnskaps-Akademi. Title: Store språkmodeller: muligheter og utfordringer.
- Interviewed for TV 2’s impact on society report}, October 2023. Interviewed by Øystein Rygg Haanæs to for TV 2’s impact on society report to celebrate their 30 years as independant news providers in Norway. Title: Kunstig intelligens krever åpnehet og integritet.
- op-ed piece in Morgenbladet, July 2023. Title: Chat GPT egner seg dårlig til eksamenssensuren. Authors: Pierre Lison (senior researcher at Norwegian ) and Samia Touileb.
- Invited talk at a European Broadcasting Union (EBU) webinar, June 2023. Theme of the webinar: LLM Benchmarking Strategies. Title of my talk: Benchmarking the societal and ethical implications of large language models.
- Invited talk at the Norwegian School of Economics (NHH), June 2023. Title: Demystifying ChatGPT and language models.
- Invited talk at the Department of Mathematics, University of Bergen, May 2023. Title: Large Language models: What are they, and what are their ethical implications?.
- Invited speaker at Future Week, June 2023. Title of the session: Innovation in the newsroom.
- Invited panellist at Eilerts salong, organised by the University of Agder, May 2023. Title: Blir vi overflødige? En samtale om kunstig intelligens og utdanning.
- Invited talk at norske dataforeningen, May 2023. Title: Sosiale og etiske utfordringer med språkmodeller som ChatGPT.
- Invited to a podcast episode – Nordepodden, May 2023. Title of the episode: Når kunstig intelligens får ordet i sin makt.
- op-ed piece in Medier 24, April 2023. Title: KI-dyret må mates med varsomhet. Authors: Samia Touileb and Per Christian Magnus (Leader of the center of investigative journalism SUJO). Check this for a short English summary.
- Invited to a podcast episode – Abels tårn, March 2023. The episode was a small section of a forthcoming AI series. I talked about NLP and pre-trained language models, and discussed some of the ethical and societal issues.
- Invited talk at the University of Stavanger, March 2023. Title: Når kunstig intelligens inntar redaksjonen, organised by Media City Bergen.
- Invited talk at the Department of Archaeology, History, Cultural Studies and Religion, UiB, March 2023. Title: ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- Invited speaker at its learning webinar, March 2023. Title of the webinar: ChatGPT and AI in education
- Invited to a podcast episode – lektorlomsdalen podcast, February 2023. Title of the episode: ChatGPT og etiske perspektiver.
- Invited speaker at Vestland Fylkesommune seminar about teaching, February 2023. Title: ChatGPT: teknologien, datasettet, og det vi (ikke) vet.
- Invited speaker at the lecturers’ conference 2023 (University of Bergen), February 2023. Invited talk about large language models, their potential and drawbacks for use in education. Solstrand, Bergen.
- Invited speaker and panellist at NORA ground-breaking seminar series, February 2023. Title of the talk: The Societal and Ethical Implications of Language Models. Title of the panel discussion: The Ethics of Large Language Models.
- Invited speaker and panellist at UiB AI seminar series, February 2023. Title of the seminar: ChatGPT – trussel eller mulighet i forskning og utdanning?.
- Invited talk at Wolftech, February 2023. Title: Measuring harmful and toxic representations in Scandinavian language models.
- Invited speaker at ForskningsdageneUNG 2022, September 2022. I was invited to give an inspirational talk about my research to high school students during the research days, at the University of Bergen. I gave a presentation about NLP, and tried to motivate future students to seek a degree within informatics, NLP, or the broad field of AI.
- Invited guest to a podcast episode, April 2022. I was invited as a guest to the podcast UiB Popviten. The podcast is the University of Bergen’s popular science podcast. The theme of the episode was the problematic aspects of black box machine learning models and how they can contain various types of biases and cause harmful outcomes when deployed. The podcast is in Norwegian, and can be listened to here or here.
- Speaker at UiB AI seminar series, April 2022. Title: But, why? - make AI answer!. I presented and discussed interpretability of neural models.
- Panel member (invited), March 2022. I was invited to be a member of panel discussion about AI, friend, foe, or fad at the annual Booster conference, in Bergen. Booster is a software conference organized for developers, project managers, architects, UX professionals, testers, and security professionals. The panel was led by Professor Marija Slavkovik, University of Bergen. The other panelists were Kevin Baum (university of Saarland) and Martin Gundersen (NRK – Norwegian Broadcasting Corporation). More information can be found here.
Undervisning
- INFO371 – Research Topics in Networks and Text Analysis (Spring 2022, Spring 2024, Spring 2025). Master’s course. Focus on Natural Language Processing, machine learning, and deep learning. Department of Information Science and Media Studies. University of Bergen.
- DIGI117 – Natural Language Processing (Autumn 2023, Spring 2024, Autumn 2024, Spring 2025). New course I have developed for the University of Bergen’s Digital understanding, knowledge and competence (DIGI) course package. Department of Information Science and Media Studies. University of Bergen.
- DIGI114 – Basic introduction to artificial intelligence (Spring 2025). Department of Information Science and Media Studies. University of Bergen.
- Invited guest lecturer at the Norwegian School of Economics (NHH – Autumn 2023). Lecture for Executive MBA students about large language models, how they are currently used, how they can be used, and their ethical and societal impacts. The Norwegian School of Economics, NHH, Bergen.
- AIKI100 – Examen facultatum - Introduction to Artificial Intelligence (Autumns 2021, 2022, 2023, 2024). Bachelor course. I gave an introductory lecture about Natural Language Processing. Department of Information Science and Media Studies. University of Bergen.
- Invited guest lecturer MA8701 – Advanced statistical methods in inference and learning (15.03.2021). PhD course in statistics. Theme of the lecture: Analysing text with neural networks. Department of Mathematical Sciences. NTNU (Norwegian University of Science and Technology).
- IN1140 – Introduction to Language Technology (Autumns 2017, 2018, 2019, and 2020). Bachelor course. Department of Informatics. University of Oslo.
- Guest lecturer IN2110 – Methods in Language Technology (19.02.2019). Bachelor course. Theme of the lecture: lexical semantics and word vectors.
Department of Informatics. University of Oslo. - INFO134 – Client programming (CSS3, HTML5, and JavaScript) (Spring 2017). Bachelor course. Department of Information Science and Media Studies. University of Bergen.
- INFO103 – Information and Knowledge (Spring 2015). Bachelor course. Department of Information Science and Media Studies. University of Bergen.
Publikasjoner
Vitenskapelig artikkel
- Mahmood, Bilal; Elahi, Mehdi; Vadiee, Farhad et al. (2025). A Supervised Machine Learning Approach for Supporting Editorial Article Selection. (ekstern lenke)
- Mahmood, Bilal; Elahi, Mehdi; Steskal, Lubos et al. (2024). Can Large Language Models Support Editors Pick Related News Articles?. (ekstern lenke)
- Blum, Sophie; Koudijs, Raoul; Ozaki, Ana et al. (2023). Learning Horn envelopes via queries from language models. (ekstern lenke)
- Touileb, Samia; Steskal, Lubos (2016). ADIOS LDA: When Grammar Induction Meets Topic Modeling. (ekstern lenke)
- Salway, Andrew; Touileb, Samia (2014). Applying grammar induction to text mining. (ekstern lenke)
- Salway, Andrew; Touileb, Samia; Tvinnereim, Endre (2014). Inducing Information Structures for Data-driven Text Analysis. (ekstern lenke)
Vitenskapelig antologi/Konferanseserie
- Habash, Nizar; Bouamor, Houda; Eskander, Ramy et al. (2024). Proceedings of The Second Arabic Natural Language Processing Conference. (ekstern lenke)
- Galimullin, Rustam; Touileb, Samia (2023). Proceedings of the 5th Symposium of the Norwegian AI Society (NAIS 2023). (ekstern lenke)
- Habash, Nizar; Bouamor, Houda; Hajj, Hazem et al. (2021). Proceedings of the Sixth Arabic Natural Language Processing Workshop. (ekstern lenke)
Poster
- Mahmood, Bilal; Elahi, Mehdi; Vadiee, Farhad et al. (2024). A Supervised Machine Learning Approach for Supporting Editorial Article Selection. (ekstern lenke)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-informed Models to Investigate Bias. (ekstern lenke)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatically identifying names of unrecognized politicians. (ekstern lenke)
- Touileb, Samia; Steskal, Lubos (2015). A computational approach to organize and analyze online communication data. (ekstern lenke)
- Salway, Andrew; Hofland, Knut; Touileb, Samia (2013). Applying Corpus Techniques to Climate Change Blogs. (ekstern lenke)
Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
- Simon, Étienne; Olsen, Helene Bøsei; You, Huiling et al. (2024). Generative Approaches to Event Extraction: Survey and Outlook. (ekstern lenke)
- Skulstad, Aud Solbjørg; Touileb, Samia (2024). Large Language Models and their usage in EAL education. (ekstern lenke)
- Fares, Murhaf; Touileb, Samia (2024). BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling. (ekstern lenke)
- Touileb, Samia; Murstad, Jeanett; Mæhlum, Petter et al. (2024). EDEN: A Dataset for Event Detection in Norwegian News. (ekstern lenke)
- Olsen, Helene Bøsei; Touileb, Samia; Velldal, Erik (2023). Arabic dialect identification: An in-depth error analysis on the MADAR parallel corpus. (ekstern lenke)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2023). Measuring normative and descriptive biases in language models using census data. (ekstern lenke)
- Samuel, David; Kutuzov, Andrei; Touileb, Samia et al. (2023). NorBench – A Benchmark for Norwegian Language Models. (ekstern lenke)
- Barnes, Jeremy Claude; Touileb, Samia; Mæhlum, Petter et al. (2023). Identifying Token-Level Dialectal Features in Social Media. (ekstern lenke)
- You, Huiling; Touileb, Samia; Øvrelid, Lilja (2023). JSEEGraph: Joint Structured Event Extraction as Graph Parsing. (ekstern lenke)
- Sheikhi, Ghazaal; Opdahl, Andreas Lothe; Touileb, Samia et al. (2023). Making sense of nonsense : Integrated gradient-based input reduction to improve recall for check-worthy claim detection. (ekstern lenke)
- Sheikhi, Ghazaal; Touileb, Samia; Khan, Sohail Ahmed (2023). Automated Claim Detection for Fact-checking: A Case Study using Norwegian Pre-trained Language Models. (ekstern lenke)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph: Event Extraction as Semantic Graph Parsing. (ekstern lenke)
- You, Huiling; Samuel, David; Touileb, Samia et al. (2022). EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction. (ekstern lenke)
- Touileb, Samia; Nozza, Debora (2022). Measuring Harmful Representations in Scandinavian Language Models. (ekstern lenke)
- Touileb, Samia (2022). Exploring the Effects of Negation and Grammatical Tense on Bias Probes . (ekstern lenke)
- Touileb, Samia (2022). NERDz: A Preliminary Dataset of Named Entities for Algerian. (ekstern lenke)
- Mæhlum, Petter; Kåsen, Andre; Touileb, Samia et al. (2022). Annotating Norwegian language varieties on Twitter for Part-of-speech. (ekstern lenke)
- Kutuzov, Andrei; Touileb, Samia; Mæhlum, Petter et al. (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. (ekstern lenke)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2022). Occupational Biases in Norwegian and Multilingual Language Models. (ekstern lenke)
- Touileb, Samia; Barnes, Jeremy (2021). The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus. (ekstern lenke)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2021). Using Gender- and Polarity-Informed Models to Investigate Bias. (ekstern lenke)
- Barnes, Jeremy; Mæhlum, Petter; Touileb, Samia (2021). NorDial: A Preliminary Corpus of Written Norwegian Dialect Use. (ekstern lenke)
- Touileb, Samia (2020). LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. (ekstern lenke)
- Touileb, Samia; Øvrelid, Lilja; Velldal, Erik (2020). Gender and sentiment, critics and authors: a dataset of Norwegian book reviews. (ekstern lenke)
- Lison, Pierre; Barnes, Jeremy; Hubin, Aliaksandr et al. (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach . (ekstern lenke)
- Adouane, Wafia; Touileb, Samia; Bernardy, Jean-Philippe (2020). Identifying Sentiments in Algerian Code-switched User-generated Comments. (ekstern lenke)
- Rodina, Julia; Bakshandaeva, Daria; Fomin, Vadim et al. (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. (ekstern lenke)
- Barnes, Jeremy Claude; Touileb, Samia; Øvrelid, Lilja et al. (2019). Lexicon information in neural sentiment analysis: a multi-task learning approach. (ekstern lenke)
- Touileb, Samia; Pedersen, Truls Andre; Sjøvaag, Helle (2018). Automatic identification of unknown names with specific roles. (ekstern lenke)
- Velldal, Erik; Øvrelid, Lilja; Bergem, Eivind Alexander et al. (2018). NoReC: The Norwegian Review Corpus. (ekstern lenke)
- Touileb, Samia; Salway, Andrew (2014). Constructions: a new unit of analysis for corpus-based discourse analysis . (ekstern lenke)
Faglig foredrag
- Touileb, Samia (2023). Benchmarking the societal and ethical implications of large language model. (ekstern lenke)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller som ChatGPT. (ekstern lenke)
- Touileb, Samia (2023). The Societal and Ethical Implications of Language Models. (ekstern lenke)
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet. (ekstern lenke)
- Touileb, Samia; Fahlvik, Morten; Berg, John Arthur (2023). ChatGPT & AI in education. (ekstern lenke)
- Touileb, Samia (2023). ChatGPT: teknologien, datasettet, og det vi (ikke) vet.. (ekstern lenke)
- Touileb, Samia; Schjøll, Anita; Throndsen, Eivind et al. (2023). The Ethics of Large Language Models. (ekstern lenke)
- Touileb, Samia; Åkernes, Hanne Louise (2023). Når kunstig intelligens inntar redaksjonen. (ekstern lenke)
- Touileb, Samia (2023). Demystifying ChatGPT and language models. (ekstern lenke)
- Touileb, Samia; Lemaire, Pauline Marguerite (2023). Big Science Gullgruve eller fallgruve?. (ekstern lenke)
- Touileb, Samia; Duarte, Katherine (2016). Getting to know large newsflows: Automatically induced information structures as keyphrases for news content analysis. (ekstern lenke)
- Touileb, Samia; Elgesem, Dag; Steskal, Lubos (2012). Networks of texts and people. (ekstern lenke)
Populærvitenskapelig foredrag
- Touileb, Samia (2023). Store språkmodeller: muligheter og utfordringer. (ekstern lenke)
- Goodwin, Morten; Touileb, Samia; Bøhn, Einar Duenger (2023). Blir vi overflødige? En samtale om kunstig intelligens og utdanning. (ekstern lenke)
- Touileb, Samia (2023). Hva er ChatGPT og hvordan fungerer det og lignende verktøy?. (ekstern lenke)
- Touileb, Samia (2023). Sosiale og etiske utfordringer med språkmodeller . (ekstern lenke)
Kronikk
Vitenskapelig foredrag
- Touileb, Samia (2023). Large Language models: What are they, and what are their ethical implications?. (ekstern lenke)
- Sjøvaag, Helle; Pedersen, Truls Andre; Touileb, Samia (2018). Operationalising Diversity for Big Data Policy Research. (ekstern lenke)
- Pedersen, Truls Andre; Touileb, Samia; Sjøvaag, Helle (2017). Finding Voices in the Margins: Computer-Assisted Discovery of Naturally Belonging Names . (ekstern lenke)
- Iversen, Magnus Hoem; Pedersen, Truls Andre; Stavelin, Eirik et al. (2015). Computer supported deliberation and argumentation online. Proposing a system for online argumentation.. (ekstern lenke)
- Touileb, Samia (2013). Inducing local grammars from n-grams. (ekstern lenke)
Doktorgradsavhandling
Prosjekter
OPINION COST action: https://www.cost.eu/actions/CA21129/
MediaFutures: https://mediafutures.no/2021/01/20/postdoc-samia-touileb/
NorDial: https://github.com/jerbarnes/nordial