Methodology - Toscana Open Research

ToscanaOpenResearch is based on a platform to integration and access heterogeneous data, based on semantic technologies and standard formats for interoperability, i.e. Linked Open Data (LOD).

During the realization of the dashboard, the ONTOP system (ontop.inf.unibz.it/) was used, with an open source approach ‘based on ontologies’ (Ontology-based Data Access and Integration – OBDA/I), which is based on relational databases and which creates a SPARQL endpoint (language used to ‘query’ the integrated datasets in RDF format).

The integration of data through a domain ontology allows users to query data through queries without having to go through the technical terminology related to the physical organization of the databases and their complex internal structure.

The existence of a domain ontology, compliant with global (VIVO) and European (CERIF) standards, totally open and adapted to the Italian context, is an aspect of uniqueness. The functional components of the system make the good practice able to offer a constantly updated monitoring and analysis (every time the original open data is updated, the system automatically updates).

The information system also allows a series of real-time benchmarks functional to map the ecosystem of regional skills and specializations on higher education, research and innovation.

In order to promote the data interoperability, to facilitate the possibility of extracting and analyzing information from different classification systems, ToscanaOpenResearch uses a classification based on three information levels:

Combining different national (e.g. National University Council – CUN, Scientific-Disciplinary Sector -SSD) and European (European Research Council -ERC, bibliometric areas) classifications;
Classifying information from project and publication abstracts (text mining);
Performing vertical analyses using semantic “vocabularies”.

In particular, combining different classifications allows to relate information such as research staff (associated to the CUN classification) with the number of publications (classified by bibliometric areas) and with the number of European projects (associated to the ERC classification) thanks to the combination of national and European classifications.

More details are available in the user manual, available at this link.

For any comments or feedback, you can send an email to the following address: staff@toscanaopenresearch.it.

The development of ToscanaOpenResearch is the result of a process led by the Region of Tuscany, supported by IRPET, FST and a technical partner, Siris Academic.

To date, the main open data that has been integrated is:

Data from national, European and global open databases are available;
A series of additional data has been integrated thanks to different memorandum of understandings (e.g. MIUR, for the integration of CTN/PRIN 2012 data in the Tuscan perimeter, AlmaLaurea data in the Tuscan perimeter, data provided by some research institutions based in Tuscany, such as CNR, INFN, INGV, INAF).
For the publications part, the system uses non open bibliometric databases, but it is already prepared for integration with CINECA-IRIS data.

Focus – The semantic analysis of the “research portfolio”

Summaries of publications, patent descriptions, R&I project objectives, etc. contain a wealth of textual information detailing current challenges, proposed or demonstrated progress and the expected impact of the innovation process.

New methods of Natural Language Processing techniques (NLP) can now be used to exploit this semantic richness and characterize research portfolios to support strategic decision making. Semantic approaches are powerful tools for the mapping of scientific and technological fields because they allow to:

analyze each document individually, avoiding potential confusions related to taxonomy;
construct ad hoc semantic perimeters of the fields of interest, crossing taxonomies, to allow the cross analysis of several data sources at the same time;
systematically analyze documents in customized geographical perimeters, thus allowing benchmarking and related specialization analysis.

These types of analysis can be both “horizontal”, i.e. without a predefined thematic focus, and “vertical”, i.e. targeted to a specific topic of interest. More precisely, on the one hand we talk about topic modelling as a technique to extract research topics and characterize research portfolios, and on the other hand about the development and application of controlled vocabularies to analyze research on a specific area of interest (e.g. Sustainable Development Objectives – SDGs – or Cultural Heritage). To this end, both techniques are used in the context of ToscanaOpenResearch, and a methodology has been developed to rapidly and effectively build controlled vocabularies from a first series of relevant terms.