Home
Faculty of Humanities
Research | language technology

Research across borders

Language resources are becoming more accessible for researchers throughout Europe.

Photo of professor Koenraad De Smedt
ACCESS TO DATA: “This initiative is important for the humanities; it will give researchers access to a wide diversity of language databases,” says professor Koenraad De Smedt.
Photo:
Ingrid Endal

Main content

The CLARINO project established in 2012 will soon be concluded, and researchers in many fields will now have a simpler and more efficient working day. By using a common search engine, access is granted to existing and future language resources in their own and other European countries.

“The resources are databases such as digital dictionaries, text corpora, speech recordings and literary and historical archives - everything to do with language, also video recordings of conversations in different languages and psychological experiments involving language,” explains Koenraad De Smedt.

Making research more effective

De Smedt is professor of computational linguistics and has been responsible for the coordination of the project. The task has consisted of building the Norwegian part of  CLARIN, a European digital infrastructure for language research.

“Research that needed data from different databases and scientific collections has up until now been demanding in both time and resources. A common infrastructure for the databases makes the research more efficient; you no longer need to search through 100 catalogues, it is sufficient to perform the search in one,” says De Smedt.

“Additionally, when Norwegian language databases are connected to the European databases, great opportunities for interesting comparative studies become available. You can for example compare the use of language both historically and across national borders,” he adds.

Asks new research questions

The different partners in CLARINO have performed different tasks in the project. Some have developed the technical platforms, others have delivered content. Language scientists at UiB have facilitated searching in the common catalogue. 

“The work that has been undertaken at UiB has been about standardising the data in the databases, or most important of all, the metadata, in other words the information about the data. We have standardised and catalogued data, and this is what ensures that the information in the databases can be compared and allows searches to be performed,” says De Smedt.

The most important thing about this work, and the major aim of the project, is that all this information is now accessible and can be used. At the same time, now that the metadata from different databases can be seen in new contexts, it will form a basis for totally new research questions.

“We do not develop resources, we contribute to the curation of the resources that exist. To curate the data in a good way is an important part of the academic system.”

Important to humanities

“To put it briefly, CLARIN is about preservation, re-use, accessibility and sharing of research data within the humanities,” De Smedt summarises.

Building the database is more or less completed, just the last finishing touches remain. However, De Smedt underlines that this is just the first part of the project.

“The building is in place, but it must be operated, and in the same way as a library, it will never be completely finished. Also, since the start of the project the world has changed, new technical solutions have become available and new requirements for the use of data have been put forward. It is a continuous development and we need a “service scientist” to operate the infrastructure.”

“If we receive funding to further develop the infrastructure and ensure the operation, this will form the basis of leading international research in areas in which Norway already has a strong environment,” says De Smedt.