Member feature Archives - DBpedia Association https://www.dbpedia.org/member-feature/ Global and Unified Access to Knowledge Graphs Thu, 04 May 2023 07:39:08 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png Member feature Archives - DBpedia Association https://www.dbpedia.org/member-feature/ 32 32 LOD activities at the National Archives of the Netherlands https://www.dbpedia.org/blog/lod-activities-at-the-national-archives-of-the-netherlands/ Tue, 14 Feb 2023 10:56:31 +0000 https://www.dbpedia.org/?p=5569 By Ed de Heer About the National Archives This article describes the Linked Open Data (LOD) activities of the National Archives of the Netherlands and is based on my presentation at Semantics 2022 in Vienna. At the National Archives people find information about their lives, Dutch (political/administrative) history and society. Our mission is: “we serve […]

The post <strong>LOD activities at the National Archives of the Netherlands</strong> appeared first on DBpedia Association.

]]>
By Ed de Heer

About the National Archives

This article describes the Linked Open Data (LOD) activities of the National Archives of the Netherlands and is based on my presentation at Semantics 2022 in Vienna.

At the National Archives people find information about their lives, Dutch (political/administrative) history and society. Our mission is: “we serve every person’s right to information, and we offer insight into the history of our country.”  The National Archives believes in the power of open data. We want to offer open data as much as possible. Not only to the government and historians but also to third parties which develop new applications and websites. In this way the general public can participate, and new ways of disclosing heritage information can arise. We publish our data (archives, indexes, and photographs) with a CC0 license by, csv, XML and API’s. Below an overview of our overall collection and services.

Linked open data

We are working on the development of Linked Open Data since 2018. Then we started our first LOD experiments and bought an ETL tool to transform our data to RDF. In 2019 we developed an URI strategy and started to model the indexes. We have indexes about enslaved people and slavery, fish rights, emigrants and finance, etc. So we had to develop all kinds of LOD models and use different ontologies. Now we have just finished the publication of our 400,000 digitized pictures with a CC0 license as RDF through our  SPARQL endpoint (Beta). https://www.nationaalarchief.nl/onderzoeken/linked-open-data/sparql-interface

Challenges of linked open data

When transforming to RDF we faced some challenges. For instance the challenge of data quality. We don’t improve the quality of our data. When we want to curate our data we would have to check the original archives. This would take a lot of effort. And what is right or wrong? When a particular archive speaks of “Amsteredam” instead of “Amsterdam”, the record states “Amsteredam” and not Amsterdam, because that is the original spelling  in the archive. Also, within an organization as the National Archives, a lot of stakeholders are involved. IT, Collection, Services, and management. It takes a lot of time and effort to get all the priorities straight.

The Verkaufsbücher

One of our most successful LOD projects is the Verkaufsbücher. This is an administration of the Nazis during World War II in which they wrote the expropriation of Jewish properties in the Netherlands. These houses were “bought” from the Jewish people far under the real price and the owners were often deported shortly afterwards. The National Archives wanted to visualize this story and this data. We worked with the Offices of the land registry of the Netherlands (Kadaster). And developed a data story https://labs.kadaster.nl/stories/verkaufsbucher/index.html. This data story was noticed by a Dutch broadcasting company and issued an item on national television. This broadcast triggered a lot of exposure and the attention of Dutch government agencies and municipalities. Due to this story, local governments have started to investigate what happened with these properties during and directly after the war and some municipalities are going to compensate the victims or their next of kin.

Digital Heritage Network and the Dataset Register

All these LOD developments don’t thrive on their own. Working together with other institutions and professionals is vital. The Dutch Digital Heritage Network is a partnership of cultural heritage agencies in the Netherlands. It focuses on developing a system of national facilities and services for improving the visibility, usability, and sustainability of digital heritage based on linked open data. The network is open to all Dutch institutions and organizations in the digital heritage field.

The Dataset Register is an initiative of the Digital Heritage Network. The National Archives hosts the Dataset Register. This register provides insight into the availability of data sets in the heritage field and thus stimulates the use of these datasets. The Dataset Register makes it easier to publish information about heritage datasets. By analyzing the datasets we can build a knowledge graph on heritage data for better use and the Dataset Register can help software (Google) to find collections.

Heritage institutions are encouraged to make their data sets available, to describe these data sets and to publish them online. Also to submit the URLs of dataset descriptions to the Dataset Register. The Dataset Register retrieves the dataset descriptions, creating an overall picture of what is available. See also https://datasetregister.netwerkdigitaalerfgoed.nl/?lang=en

Drs. Ed de Heer MIM is advisor and project manager for Linked and Open Data at the National Archives and administrator for the Dataset Register. 

The post <strong>LOD activities at the National Archives of the Netherlands</strong> appeared first on DBpedia Association.

]]>
Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs? https://www.dbpedia.org/blog/can-machine-translation-be-a-reasonable-alternative-for-multilingual-question-answering-systems-over-knowledge-graphs/ https://www.dbpedia.org/blog/can-machine-translation-be-a-reasonable-alternative-for-multilingual-question-answering-systems-over-knowledge-graphs/#respond Tue, 20 Sep 2022 08:10:01 +0000 https://www.dbpedia.org/?p=5455 Spoiler alert: yes it can! TLDR Providing access to information is the main and most important purpose of the Web. Despite available easy-to-use tools (e.g., search engines, question answering) the accessibility is typically limited by the capability of using the English language.  In this work, we evaluate Knowledge Graph Question Answering (KGQA) systems that aim […]

The post Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs? appeared first on DBpedia Association.

]]>
Spoiler alert: yes it can!

TLDR

Providing access to information is the main and most important purpose of the Web. Despite available easy-to-use tools (e.g., search engines, question answering) the accessibility is typically limited by the capability of using the English language. 

In this work, we evaluate Knowledge Graph Question Answering (KGQA) systems that aim at providing natural-language access to data stored in Knowledge Graphs (KG). What makes this work special is that we look at questions in multiple languages. Mainly, we compare the results between the native support quality values and the values obtained while integrating machine translation (MT) tools. The evaluation results demonstrated that the monolingual KGQA systems can be effectively ported to the most of the considered languages with MT tools.

The Problem

Given recent statistics, only 25.9% of online users speak English. At the same time, 61.2% of the web content is published in the English language. Therefore, the users not capable of speaking or reading English have only limited access to the information provided on the Web. Hence, despite the disputed statistics, a clear gap of the information accessibility exists on the Web. This gap is referred to as the digital language divide.

Nowadays, whenever a user queries a search engine they expect to receive a direct answer. The direct answers functionality is based on the Question Answering (QA) methods and driven by knowledge graphs. In 2012, the Google Knowledge Graph was presented that powered Google’s direct answers. For example, when one asks Google “Who wrote Harry Potter?”, the following structured result is expected (see figure below).

Google search result for query “Who wrote Harry Potter?” (screenshot by author)

These direct answers are more readable and satisfy the user’s information need right away, s.t., they don’t need to open each “relevant” web-page given by the search engine (as it was done before) and search for the requested information manually on the opened web-page. But if we ask the same very simple query in a low-resource language , e.g.,  Bashkir (Turkic language with ca. 1.4 million speakers), then  the results are not really satisfying (see figure below).

Google search result for query “Who wrote Harry Potter?” in Bashkir language (screenshot by author)

Thus, there is an obvious limitation of knowledge accessibility on the Web that is visible for the people that don’t speak any “popular” language on a sufficient level. Therefore, in this article we address this problem in the context of Question Answering over Knowledge Graphs (KGQA) following the question “Are automatic machine translation tools usable to increase the accessibility of the knowledge on the Web for non-English speakers?”.

Approach

To answer the research question above, we conducted a large evaluation study. During our experiments, we follow a component-oriented methodology by reusing off-the-shelf QA components provided by the Qanary framework [1] to gain insights for the defined topic. As our goal is to evaluate KGQA systems that are adapted to an unsupported language via machine translation (MT) tools, we therefore need:

1. A set of multilingual KGQA systems supporting particular languages.

2.  A set of high-quality questions written by native speakers in different languages (where two groups are required: one that is supported and one that is not supported by existing KGQA systems).

3. A set of MT tools that are able to translate questions given in unsupported languages into the supported ones.

For the first point, we used the following well-known systems: QAnswer [2], DeepPavlov KBQA [3], and Platypus [4]. For the second point, we used our QALD-9-Plus dataset [5] that has high-quality questions in English, German, Russian, French, Ukrainian, Belarusian, Lithuanian, Bashkir, and Armenian languages and the corresponding answers represented as SPARQL queries over DBpedia and Wikidata knowledge graphs. For the third point, we used Yandex Translate (commercial MT tool) and Opus MT models by Helsinki NLP (open source MT tool) [6]. The evaluation was done using the GERBIL platform [7]. Hence, the following experimental setup was built (see figure below).

Overview of the experimental setup (figure by author)

Results

From the experimental results, we clearly observe strong domination of English as a target translation language. In the majority of the experiments, translation of a source language to English gave the best QA quality results (e.g., German → English, Ukrainian → English).

Experimental values for German and Ukrainian languages as source. The native questions are higlighted with bold text. The star (*) corresponds to the highest quality target language. The best values regarding the system and metric are color-coded with green (table by author).

In the very first case, where English was the source language, the best QA quality was achieved on the original (native) questions as using machine translation additionally was decreasing the QA quality (see figure below).

Experimental values for English language as source. The native questions are highlighted with bold text. The star (*) corresponds to the highest quality target language. The best values regarding the system and metric are color-coded with green (table created by author).

Only in the case where Lithuanian was the source language, the best target language regarding the QA quality turned up to be German (i.e., Lithuanian → German), while English also demonstrated reasonable quality (i.e., Lithuanian → English). Although the experiment was carefully designed, we considered this case as an outlier. Nevertheless, such outliers might have a significant impact while improving the answer quality of QA systems.

Experimental values for Lithuanian language as source. The native questions are highlighted with bold text. The star (*) corresponds to the highest quality target language. The best values regarding the system and metric are color-coded with green (table created by author).

Summary

Our main conclusion is that machine translation can be efficiently used for the purpose of establishing multilingual KGQA systems for the majority of languages. It turns out that the most optimal way of using MT tools is to just translate the source language (e.g. German, Russian, etc.) into English — this will result in the highest quality question answering process. Hence, even if the source and target languages are from the same group (e.g., Ukrainian → Russian — slavic group), it is better to translate them to English from the quality point of view. Despite our results and the concluded recommendation to extend the QA system by machine translation component, we would like to point the research community to many open questions that might affect the answer quality. Therefore, we plan to extend our experiments with additional languages. We welcome any input to help us expand this research.

If you want to see more detailed results, please see our recent paper: Aleksandr Perevalov, Andreas Both, Dennis Diefenbach, and Axel-Cyrille Ngonga Ngomo. 2022. Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs? In Proceedings of the ACM Web Conference 2022 (WWW ’22). Association for Computing Machinery, New York, NY, USA, 977–986. https://dl.acm.org/doi/10.1145/3485447.3511940

The corresponding video presentation is available here:

Acknowledgments

I would like to thank the co-authors of this work, namely: Andreas Both, Dennis Diefenbach and Axel-Cyrille Ngonga Ngomo.

Authors of the paper would like to thank all the contributors involved in the translation of the dataset, specifically: Konstantin Smirnov, Mikhail Orzhenovskii, Andrey Ogurtsov, Narek Maloyan, Artem Erokhin, Mykhailo Nedodai, Aliaksei Yeuzrezau, Anton Zabolotsky, Artur Peshkov, Vitaliy Lyalin, Artem Lialikov, Gleb Skiba, Vladyslava Dordii, Polina Fominykh, Tim Schrader, Susanne Both, and Anna Schrader. Additionally, authors would like to give thanks to Open Data Science community for connecting data science enthusiasts all over the world.

References

[1] Both, A., Diefenbach, D., Singh, K., Shekarpour, S., Cherix, D., & Lange, C. (2016, May). Qanary–a methodology for vocabulary-driven open question answering systems. In European Semantic Web Conference (pp. 625-641). Springer, Cham.

[2] Diefenbach, D., Migliatti, P. H., Qawasmeh, O., Lully, V., Singh, K., & Maret, P. (2019, May). QAnswer: a question answering prototype bridging the gap between a considerable part of the LOD cloud and end-users. In The World Wide Web Conference (pp. 3507-3510).

[3] Evseev, D. and Arkhipov, M.Y. (2020). Sparql query generation for complex question answering with bert and bilstm-based model, in: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue (pp. 270–282).

[4] Pellissier Tanon, T., Assunção, M. D. D., Caron, E., & Suchanek, F. M. (2018, June). Demoing Platypus–A multilingual question answering platform for Wikidata. In European Semantic Web Conference (pp. 111-116). Springer, Cham.

[5] Perevalov, A., Diefenbach, D., Usbeck, R., & Both, A. (2022, January). QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC) (pp. 229-234). IEEE.

[6] Tiedemann, J., & Thottingal, S. (2020, November). OPUS-MT–Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.

[7] Usbeck, R., Röder, M., Ngonga Ngomo, A. C., Baron, C., Both, A., Brümmer, M., … & Wesemann, L. (2015, May). GERBIL: general entity annotator benchmarking framework. In Proceedings of the 24th international conference on World Wide Web (pp. 1133-1143).

The post Can Machine Translation be a Reasonable Alternative for Multilingual Question Answering Systems over Knowledge Graphs? appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/can-machine-translation-be-a-reasonable-alternative-for-multilingual-question-answering-systems-over-knowledge-graphs/feed/ 0
From voice to value using AI https://www.dbpedia.org/blog/from-voice-to-value-using-ai/ https://www.dbpedia.org/blog/from-voice-to-value-using-ai/#respond Tue, 19 Apr 2022 10:54:23 +0000 https://www.dbpedia.org/?p=5243 DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Wallscope, who supports both organisational goals, improves existing processes and embed new technologies by generating the insights […]

The post From voice to value using AI appeared first on DBpedia Association.

]]>
DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Wallscope, who supports both organisational goals, improves existing processes and embed new technologies by generating the insights that power change. David Eccles presents the opportunities of digital audio. Have fun reading!

by David EcclesWallscope

Motivation

The use of digital audio has accelerated throughout the pandemic, creating a cultural shift in the use of audio form content within business and consumer communications.

Alongside this the education and entertainment industries embraced semantic technologies as a means to develop sustainable delivery platforms under very difficult circumstances.

Wallscope’s research and development activities were already aligned to exploring speech-driven applications and through this we engaged with Edinburgh University’s Creative Informatics department to explore practical use cases focusing on enhancing the content of podcasts.

Our focus now is on how user experience can be enhanced with knowledge graph interaction, providing contextually relevant information to add value to the overall experience. As DBpedia provides the largest knowledge repository available, Wallscope embedded semantic queries to the service into the resulting workflow.

Speech to Linked Data

Speech-driven applications require a high level of accuracy and are notoriously difficult to develop, as anyone with experience of spoken dialog systems will probably be aware. A range of Natural Language Processing models are available which perform with a high degree of accuracy – particularly for basic tasks such as Named Entity Recognition – to recognise people, places, and organisations (spaCy and PyTorch are good examples of this). Obviously the tasks become more difficult to achieve when inherently complicated concepts are brought into the mix such as cultural references and emotional reactions.


To this end Wallscope re-deployed and trained a machine learning model called BERT. This stands for Bidirectional Encoder Representations from Transformers and it is a technique for NLP pre-training originally developed by Google.

BERT uses the mechanism of “paying attention” to better understand the contextual relationships between each word (or sub-words) within a sentence. Having previous experience deploying BERT models within the healthcare industry, we adapted and trained the model on a variety of podcast conversations.


As an example of how this works in practice, consider the phrase “It looked like a painting”. BERT looks at the word “it” and then checks its relationship with every other word in the sentence. This way, BERT can tell that “it” refers strongly to “painting”. This allows BERT to understand the context of each word within a given sentence. 


Simple process diagram

We then looked at how this could be used to better engage users across the podcast listening experience, and provide points of knowledge expansion, engagement and ‘socialisation’ of content in web-based environments. This in turn can create a richer and more meaningful experience for listeners that runs in parallel with podcasting platforms.

Working across multiple files containing podcast format audio, we looked at several areas of improvements for listeners, creators and researchers. Our primary aim was to demonstrate the value of semantic enhancements to the transcriptions.

We worked with these across several processes to enhance them with Named Entity Recognition using our existing stack. From there we extended the analysis of ‘topics’ using a blend of Machine Learning models. That very quickly allowed us to gain a deep understanding of the relationships contained with the spoken word content. By visualising that we could gain a deeper insight into the content and how it could be better presented, by reconciling it with references within DBpedia.

This analysis led us to ideate around an interface that was built around the timeline presented by the audio content.

Playback of audio with related terms

This allows the listener to gain contextually related insights by dynamically querying DBpedia for entities extracted from the podcast itself. This knowledge extension is valuable to enhance not only the listeners’ experience but also to provide a layer of ‘stickiness’ for the content across the internet as it enhances findability.

This shows how knowledge can be added to a page using DBpedia.

One challenge is the quality of transcriptions. With digital speech recognition, there is never a 100% confidence level across unique audio recordings such as podcasts as well as within video production.

We are currently working with services which are increasingly harnessing AI technologies to not only improve the quality of transcription but also the insights which can be derived from spoken word data sources. A current area of research for Wallscope is how our ML models can be utilised to improve the curation layer of transcripts. This is important as keeping the human in the loop is critical to ensure the fidelity of any transcription process. By deploying the same techniques – albeit in reverse – there is an interesting opportunity to create dynamic ‘sense-checking’ models. While this is at an early stage, DBpedia undoubtedly will be an important part of that.

We are also developing some visualisation techniques to assist curators to identify ‘errors’ and to provide suggestions for more robust topic classification models. This allows more generalised suggestions for labels. For example while we may have a specific reference to ‘zombie’ to present that as a subset of ‘horror’ has more value in categorisation systems. Another example could relate to location. If we identify ‘France’ in a transcription with 100% certainty, then we can create greater certainty around ‘Paris’ as being Paris, France as opposed to Paris,Texas. This also applies to machine learning-based summarisation techniques.

Next steps

We are further exploring how these approaches can best assist in the exploration of archives as well as incorporating text analysis to improve the actual curation of archives.

Please contact Ian Allaway or David Eccles for more information, or visit www.wallscope.co.uk

Further reading on ‘Podcasting Exploration’

The post From voice to value using AI appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/from-voice-to-value-using-ai/feed/ 0
Data Virtualization: From Graphs to Tables and Back https://www.dbpedia.org/blog/data-virtualization-from-graphs-to-tables-and-back/ Wed, 26 Jan 2022 11:05:53 +0000 https://www.dbpedia.org/?p=5136 Ontotext believes you should be able to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in. GraphDB’s data virtualization opens your graph to the wider semantic web and to relational databases. DBpedia Member Feature – Over the last year we gave […]

The post Data Virtualization: From Graphs to Tables and Back appeared first on DBpedia Association.

]]>
Ontotext believes you should be able to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in. GraphDB’s data virtualization opens your graph to the wider semantic web and to relational databases.

DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Ontotext, who helps enterprises identify meaning across diverse datasets and massive amounts of unstructured information. Jarred McGinnis presents the beauty of data virtualization. Have fun reading!

by Jarred McGinnis, Ontotext

The beauty and power of knowledge graphs is their abstraction away from the fiddly implementation details of our data. The data and information is organized in a way human-users understand it regardless of the physical location of the data, the format and other low-level technical details. This is because the RDF of the knowledge graph enables a schema-less, or a schema-agnostic, approach to facilitate the integration of multiple heterogeneous datasets.

Semantic technology defines how data and information is inter-related. These relationships give context and that context is what gives our data meaning that can be understood by humans AND machines. That’s where the knowledge part of the graph comes from and it is a powerful way of providing a single view on disparate pieces of information.

ETL is Still Your Best Bet

When possible it’s better to pay the initial costs of the ETL process. In a previous blog post, we talked about how knowledge graphs generously repay that investment of time and effort taken in data preparation tasks. However, there are a number of reasons that it is impossible or impractical, such as the size of the dataset or the data exists in a critical legacy system where an ETL process would create more problems than it fixed. In these cases, it is better to take a data virtualization approach.

Ontotext GraphDB provides data virtualization functionality to realize the benefits of real-time access to external data, remove or reduce the need for middle-ware for query processing, reduce support and development time, increase governance and reduce data storage requirements.

Firstly, There’s Federation.

RDF is the language of the semantic web. If you are working with Linked Data, it opens up a world of billions upon billions of factual statements about the world, which is probably why you chose to work with linked data in the first place. Nice work! And that means I don’t have to tell you that DBpedia, a single data set among hundreds, has three billion triples alone. You are no longer limited by the data your organization holds. Queries about internal data can be seamlessly integrated with multiple external data sources.

For example, suppose you want to query well-known people and their birth places for a map application. It’s possible to create a single query that gets the person’s information from DBpedia, which would give you the birthplace and take those results to query another data source like Geonames to provide the geographic coordinates to be able to add them to a mapping application. Since both of these data sources are linked data, it’s relatively straightforward to write a SPARQL query that retrieves the information.

It doesn’t even have to be another instance of GraphDB. It’s part of the reason Ontotext insists on using open standards. With any equally W3C-compliant knowledge graph that supports a SPARQL endpoint, it is possible to retrieve the information you want and add it to your own knowledge graph to do with as you please. A single query could pull information from multiple external data sources to get the data you are after, which is why federation is an incredibly powerful tool to have.

The Business Intelligence Ecosystem Runs on SQL.

Ontotext is committed to lowering the costs of creating and consuming knowledge graphs. Not every app developer or DBA in an organization is going to have the time to work directly with the RDF data models. A previous version of GraphDB 9.4 added the JDBC driver to ensure those who need to think and work in SQL can access the power of the knowledge with SQL.

Knowing the importance and prominence of SQL for many applications, we have a webinar demonstrating how GraphDB does SQL-to-SPARQL transformation and query optimization and how Microsoft’s Power BI and Tableau can be empowered by knowledge graphs. GraphDB provides a SQL interface to ensure those who prefer a SQL view of the world can have it.

Virtualization vs ETL

The most recent GraphDB release has added virtualization functionality beyond simple federation. It is now possible to create a virtual graph by mapping the columns and rows of a table to entities in the graph. It becomes possible to retrieve information from external relational databases and have it play nice with our knowledge graph. We aren’t bound by data that exists in our graph or even in RDF format. Of course it would be easier and certainly quicker to ETL a data source into a single graph and perform the query, but it is not always possible, because either the size of the dataset is too large, it gets updated too frequently or both.

For example, in the basement of your organization is a diesel-powered database that has a geological strata of decades old data and that is critical to the organization. You know and I that database is never going to be ETLed into the graph. Virtualization is your best bet, by creating a virtual graph and mapping that decade old format for client orders by saying,  “When I query about ‘client order’, you need to go to this table and this column in that behemoth server belching black smoke that’s run by Quismodo and return the results”.

There will be an inevitable hit to query performance but there are a number of situations where slow is better than not at all. Such as the over-egged example above. It is important to understand the trade-offs and practicalities between ETL and virtualization. The important thing for Ontotext is to make sure GraphDB is capable of both and provide a combined approach to maximize flexibility. There is also a webinar on this topic, introducing the open-source Ontop.

GraphDB Gives You Data Agility

Data virtualization and federation come with costs as well as benefits. There is no way we are ever going to master where the data you need exists and what format. The days of centralized control are over. It’s about finding the technology that gives your agility and GraphDB’s added virtualization capabilities enables you to create queries that include external open sources and merge it seamlessly with your own knowledge graph. Virtualization of relational databases creates incredible opportunities for applications to provide users with a single coherent view on the complex and diverse reality of your data ecosystem.

Jarred McGinnis

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

The post Data Virtualization: From Graphs to Tables and Back appeared first on DBpedia Association.

]]>
How Innovative Organizations Use The World’s Largest Knowledge Graph https://www.dbpedia.org/blog/how-innovative-organizations-use-the-worlds-largest-knowledge-graph/ Tue, 09 Nov 2021 12:50:53 +0000 https://www.dbpedia.org/?p=5033 DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Diffbot, a California-based company whose mission is to “extract knowledge in an automated way from documents.” They […]

The post How Innovative Organizations Use The World’s Largest Knowledge Graph appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Diffbot, a California-based company whose mission is to “extract knowledge in an automated way from documents.” They will introduce the Diffbot Knowledge Graph and present topics, like Market Intelligence and Ecommerce. Have fun reading!

by Filipe Mesquita & Merrill Cook, Diffbot

Diffbot is on a mission to create a knowledge graph of the entire public web. We are teaching a robot, affectionately known as Diffy, to read the web like a human and translate its contents into a format that (other perhaps less sophisticated) machines can understand. All of this information is linked and cleaned on a continuous basis to populate the Diffbot Knowledge Graph.

The Diffbot Knowledge Graph already contains billions of entities, including over 240M organizations, 700M people, 140M products, and 1.6B news articles. This scale is only possible because Diffy is fully autonomous and doesn’t depend on humans to build the Diffbot Knowledge Graph. Using cutting-edge crawling technology, natural language processing, and computer vision, Diffy is able to read and extract facts from across the entire web.

While we believe a knowledge graph like Diffbot’s will be used by virtually every organization one day, there are 4 use cases where the Diffbot Knowledge Graph excels today: (1) Market Intelligence, (2) News Monitoring, (3) E-commerce, and (4) Machine learning.

Market Intelligence

Video: https://www.diffbot.com/assets/video/solutions-for-media-monitoring.mp4

At its simplest, market intelligence is the generation of insights about participants in a market. These can include customers, suppliers, competitors, as well as attitudes of the general public and political establishment.

While market intelligence data is all over the public web, this can be a “double-edged sword.” The range of potential sources for market intelligence data can exhaust the resources of even large teams performing manual fact accumulation.

Diffbot’s automated web data extraction eliminates the inefficiencies of manual fact gathering. Without such automation, it’s simply not possible to monitor everything about a company across the web.

We see market intelligence as one of the most well-developed use cases for the Diffbot Knowledge Graph. Here’s why:

  • The Diffbot Knowledge Graph is built around organizations, people, news articles, products, and the relationships among them. These are the types of facts that matter in market intelligence.
  • Knowledge graphs have flexible schemas, allowing for new fact types to be added “on the fly” as the things we care about in the world change
  • Knowledge graphs provide unique identifiers for all entities, supporting the disambiguation entities like Apple (the company) vs apple (the fruit).

Market intelligence uses from our customers include:

  • Querying the Knowledge Graph for companies that fit certain criteria (size, revenue, industry, location) rather than manually searching for them in Google
  • Creating dashboards to receive insights about companies in a certain industry
  • Improving an internal database by using the data from the Diffbot Knowledge Graph.
  • Custom solutions that incorporate multiple Diffbot products (custom web crawling, natural language processing, and Knowledge Graph data consumption)

News Monitoring

Sure, the news is all around us. But most companies are overwhelmed by the sheer amount of information produced every day that can impact their business.

The challenges faced by those trying to perform news monitoring on unstructured article data are numerous. Articles are structured differently across the web, making aggregation of diverse sources difficult. Many sources and aggregators silo their news by geographic location or language.

Strengths of providing article data through a wider Knowledge Graph include the ability to link articles to the entities (people, organizations, locations, etc) mentioned in each article. Additional natural language processing includes the ability to identify quotes and who said them as well as the sentiment of the article author towards each entity mentioned in the article.

In high-velocity, socially-fueled media, the need for automated analysis of information in textual form is even more pressing. Among the many applications of our technology, Diffbot is helping anti-bias and misinformation initiatives with partnerships involving FactMata as well as the European Journalism Centre.

Check out how easy it is to build your own custom pan-lingual news feed in our news feed builder.

Ecommerce

Many of the largest names in ecommerce have utilized Diffbot’s ability to transform unstructured product, review, and discussion data into valuable ecommerce intelligence. Whether pointing AI-enabled crawlers at their own marketplaces to detect fraudulent, duplicate, or underperforming products, or by analyzing competitor or supplier product listings.

One of the benefits of utilizing Diffbot’s AI-enabled product API or our product entities within the Knowledge Graph is the difficulty of scraping product data at scale. Many ecommerce sites employ active measures to make the scraping of their pages at scale difficult. We’ve already built out the infrastructure and can begin returning product data at scale in minutes.

The use of rule-based scraping by many competitors or in-house teams means that whenever ecommerce sites shift their layout or you try to extract ecommerce web data from a new location, your extraction method is likely to break. Additionally, hidden or toggleable fields on many ecommerce pages are more easily extracted by solutions with strong machine vision capabilities.

Diffbot’s decade-long focus on natural language processing also allows the inclusion of rich discussion data parsed for entities, connections, and sentiment. On large ecommerce sites, the structuring and additional processing of review data can be a large feat and provide high value.

Machine Learning

Even when you can get your hands on the right raw data to train machine learning models, cleaning and labeling the data can be a costly process. To help with this, Diffbot’s Knowledge Graph provides potentially the largest selection of once unstructured web data, complete with data provenance and confidence scores for each fact.

Our customers use a wide range of web data to quickly and accurately train models on diverse data types. Need highly informal text input from reviews? Video data in a particular language? Product or firmographic data? It’s all in the Knowledge Graph, structured and with API access so customers can quickly jump into validating new models.

With a long association with Stanford University and many research partnerships, Diffbot’s experts in web-scale machine learning work in tandem with many customers to create custom solutions and mutually beneficial partnerships.

To some, 2020 was the year of the knowledge graph. And while innovative organizations have long seen the benefits of graph databases, recent developments in the speed of fact accumulation online mean the future of graphs has never been more bright.

A big thank you to Diffbot, especially to Filipe Mesquita and Merrill Cook for presenting the Diffbot Knowledge Graph.  

Yours,

DBpedia Association

The post How Innovative Organizations Use The World’s Largest Knowledge Graph appeared first on DBpedia Association.

]]>
Bringing Linked Data to the Domain Expert with TriplyDB Data Stories https://www.dbpedia.org/blog/triplydb-data-stories/ Fri, 08 Oct 2021 08:06:08 +0000 https://www.dbpedia.org/?p=4984 DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Triply, a Dutch company. They will introduce TriplyDB and data stories to us. Have fun reading! by […]

The post Bringing Linked Data to the Domain Expert with TriplyDB Data Stories appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Triply, a Dutch company. They will introduce TriplyDB and data stories to us. Have fun reading!

by Kathrin Dentler, Triply

Triply and TriplyDB

Triply is an Amsterdam-based company with the mission to (help you to) make linked data the new normal. Every day, we work towards making every step around working with linked data easier, such as converting and publishing it, integrating, querying, exploring and visualising it, and finally sharing and (re-)using it. We believe in the benefits of FAIR (findable, accessible, interoperable and reusable) data and open standards. Our product, TriplyDB, is a user-friendly, performant and stable platform, designed for potentially very large linked data knowledge graphs in practical and large-scale production-ready applications. TriplyDB not only allows you to store and manage your data, but also provides data stories, a great tool for storytelling. 

Data stories 

Data stories are data-driven stories, such as articles, business reports or scientific papers, that incorporate live, interactive visualizations of the underlying data. They are written in markdown, and can be underpinned by an orchestration of powerful visualizations of SPARQL query results. These visualizations can be charts, maps, galleries or timelines, and they always reflect the current state of your data. That data is just one click away: A query in a data story can be tested or even tweaked by its readers. It is possible to verify, reproduce and analyze the results and therefore the narrative, and to download the results or the entire dataset. This makes a data story truly FAIR, understandable, and trustworthy. We believe that a good data story can be worth more than a million words. 

Examples

With a data story, the domain expert is in control and empowered to work with, analyze, and share his or her data as well as interesting research results. There are some great examples that you can check out straight away:

  • The fantastic data story on the Spanish Flu, which has been created by history and digital humanities researchers, who usually use R and share their results in scientific papers. 
  • Students successfully published data stories in the scope of a course of only 10 weeks. 
  • The beautiful data story on the Florentine Catasto of 1427.

DBpedia on triplydb.com

Triplydb.com is our public instance of TriplyDB, where we host many valuable datasets, which currently consist of nearly 100 billion triples. One of our most interesting and frequently used datasets are those by the DBpedia Association

We also have several interesting saved queries based on these datasets. 

A data story about DBpedia

To showcase the value of DBpedia and data stories to our users, we published a data story about DBpedia. This data story includes comprehensible and interactive visualizations, such as a timeline and a tree hierarchy, all of which are powered by live SPARQL queries against the DBpedia dataset. 

Let us have a look at the car timeline: DBpedia contains a large amount of content regarding car manufacturers and their products. Based on that data, we constructed a timeline which shows the evolution within the car industry. 

If you navigate from the data story to the query, you can analyze it and try it yourself. You see that the query limits the number of manufacturers so that we are able to look at the full scale of the automotive revolution without cluttering the timeline. You can play around with the query, change the ordering, visualize less or more manufacturers, or change the output format altogether. 

Advanced features

If you wish to use a certain query programmatically, we offer preconfigured code snippets that allow you to run a query from a python or an R script. You can also configure REST APIs in case you want to work with variables. And last but not least, it is possible to embed a data story on any website. Just scroll to the end of the story you want to embed and click the “</> Embed” button for a copy-pasteable code snippet. 

Try it yourself! 

Sounds interesting? We still have a limited number of free user accounts over at triplydb.com. You can conveniently log in with your Google or Github account and start uploading your data. We host your first million open data triples for free! Of course, you can also use public datasets, such as the ones from DBpedia, link your data, work together on queries, save them, and then one day create your own data story to let your data speak for you. We are already looking forward to what your data has to say!

A big thank you to Triply for being a DBpedia member since 2020. Especially Kathrin Dentler for presenting her work at the last DBpedia Day in Amsterdam and for her amazing contribution to DBpedia.

Yours,

DBpedia Association

The post Bringing Linked Data to the Domain Expert with TriplyDB Data Stories appeared first on DBpedia Association.

]]>
WordLift – Building Knowledge Graphs for SEO https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/ https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/#respond Mon, 05 Jul 2021 06:27:15 +0000 https://www.dbpedia.org/?p=4712 DBpedia Member Features – Over the last year we gave our DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with WordLift. They will show us how they help companies speak Google’s native language with data from […]

The post WordLift – Building Knowledge Graphs for SEO appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave our DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with WordLift. They will show us how they help companies speak Google’s native language with data from DBpedia.  Have fun while reading!

by Andrea Volpini, WordLift

WordLift is a Software as a Service designed to help companies speak Google’s native language by converting unstructured content into structured data that search engines understand. 

It does so automatically, using natural language processing and machine learning. Most SEO tools provide insights on improving a website. WordLift creates a knowledge graph and automates some of these SEO tasks to help a site rank. We call it agentive SEO: from search intent analysis to content generation, from building internal links to improving on-page user engagement. At the core of this automation, WordLift creates 5-stars linked data using DBpedia. 

Artificial Intelligence is shaping the online world. Commercial search engines like Google and Bing have changed dramatically in the last two decades: from 10 blue links to algorithms that answer user questions without ever clicking a link or leaving the results page. As search evolves, so do the SEO tools that marketers need to cope with these changes. 

Why creating a Knowledge Graph improves SEO?

Imagine the knowledge graph behind a website as the most thoughtful way to help crawlers index and understand its content. Much like Google uses the graph as the engine to power up its search results, a knowledge graph that describes a website’s content helps machines understand the semantic meanings behind it.

In practical terms, a customised knowledge graph helps content marketers in different ways:

  • Enhancing SERP results with structured data and helping Google and Bing disambiguate your brand name or services adequately. 
  • Automating internal links to increase rankings around entities that matter for the business.
  • Providing content recommendations to enhance the customer journey.
  • Bringing additional insights to web analytics by grouping traffic related to entities and not only pages (i.e. how is the content on “artificial intelligence” performing this week?).
  • Providing the factual data required for training language models that can automatically generate content (you can read all about it in my latest blog post on AI text generation for SEO, where you will have the code to fine-tune Google’s T5 using triples from DBpedia 🎉

Here is an example of Natural Language Generation using Google’s T5 

Search Intent Discovery: a practical example of how marketers can use a KG

Let me give you another example: keyword research. The purpose of keyword research is to find and analyze search queries that people enter into search engines to create new content or improve existing ones. Using the custom knowledge graphs that WordLift produces, we help our clients quickly scout for new untapped search opportunities. 

The chart above shows a series of search intents (queries) that WordLift has generated after the user provided three ideas. Using the knowledge graph, these intents are grouped into topics such as “Android”, “Anonymity” or “Gamer” and the content editor can find the right query to target. In the treemap chart, larger boxes correspond to a higher search volume, and lighter colors indicate less competitive queries.   

How does WordLift build a Knowledge Graph?

An entity represents the “thing” described in web pages. Entities help computers understand everything about a person, an organization or a place mentioned on a website. Each entity holds the information required to provide direct answers to questions about itself and questions that can be answered by looking at the relationships with other entities. WordLift uses natural language processing to extract and connect entities with web pages. Therfore we primarily use schema.org vocabulary to build its knowledge graphs.  

WordLift heavily relies on DBpedia. We train our content analysis API on concepts that are in DBpedia and we build knowledge graphs that interlink, among other graphs, with DBpedia.

We are also starting to automatically create content using language models trained with data from DBpedia. More on this front will come in the next future. 

WordLift and the open DBpedia Knowledge Graph

Our users constantly publish and update public web data from various sources (their websites, the catalogue of their products, a data feed from a partner and more) and interlink the entities they publish with DBpedia. 

We are now excited to contribute back some of this knowledge. With the help of DBpedia we can keep on building a distributed, queryable and decentralized linked data cloud. 

Keep following us and keep contributing to the LOD cloud!

A big thank you to WordLift. Especially Andrea Volpini for presenting how WordLift creates a KG and automates some SEO tasks helping a site rank.

Yours,

DBpedia Association

The post WordLift – Building Knowledge Graphs for SEO appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/feed/ 0
ContextMinds: Concept mapping supported by DBpedia https://www.dbpedia.org/blog/contextminds/ https://www.dbpedia.org/blog/contextminds/#respond Fri, 16 Apr 2021 08:51:11 +0000 https://www.dbpedia.org/?p=4498 Contribution from Marek Dudáš (Prague University of Economics and Business – VŠE) ContextMinds is a tool that combines two ideas: concept mapping and knowledge graphs. What’s concept mapping? With a bit of simplification, when you take a small subgraph of not more than a few tens of nodes from a knowledge graph (kg) and visualize it with the classic node-link (or “bubbles and arrows”) approach, you get a concept map. But […]

The post ContextMinds: Concept mapping supported by DBpedia appeared first on DBpedia Association.

]]>
Contribution from Marek Dudáš (Prague University of Economics and Business – VŠE)

ContextMinds is a tool that combines two ideas: concept mapping and knowledge graphs. What’s concept mapping? With a bit of simplification, when you take a small subgraph of not more than a few tens of nodes from a knowledge graph (kg) and visualize it with the classic node-link (or “bubbles and arrows”) approach, you get a concept map. But concept maps are much older than knowledge graphs. They emerged in the 70’s and were originally intended to be created by hand. This was done to represent a person’s understanding of a given problem or question. Shortly after their “discovery” (using diagrams to represent relationships is probably much older idea), they turned out to be a very useful educational tool. 

Going back to knowledge graphs and DBpedia, ContextMinds lets you quickly create an overview of some problem you need to solve, study or explain.  

Figure 1 Text search in concepts from DBpedia: starting point of concept map creation in ContextMinds. 

How you can start 

Starting from a classic text search, you select concepts (nodes) from a knowledge graph, ContextMinds shows how they are related (loads the links from the knowledge graph). It also suggest you what other concepts are there in the kg that you might be interested in. The suggestions are brought from the joint neighborhood of the nodes you already selected and put into the view. Nodes are scored by relevance, basically by the number of links to what you have in the view. So, as you are creating your concept map, an always updated list of around 30 most related concepts is available for simple drag & dropping to your map.  

Figure 2 Concept map and a list of top related concepts found in DBpedia by ContextMinds. 

This helps you make the concept map complete quickly. It also helps to discover relationships between the concepts that you were not aware of. If a concept or relationship is not there yet in the knowledge graph, you can create it. It will not only appear in your concept map, but will also become a part of an extended knowledge for anyone who has access to your map. You can at any time select the sources of concept & relationship suggestions. To do that you can choose any combination of the personal scope (concepts from maps created by you), workspace scope (shared space with teammates), DBpedia (or a different kg) and public scope (everything created by the community and made public). 

The best way of explaining how it works is a short video.

Use Case: Knowledge Graph 

ContextMinds was of course built with DBpedia as the initial knowledge graph. That instance is available at app.contextminds.com and more than 100 schools are using it as an educational aid. Recently, we discovered that the same model can be useful with other knowledge graphs. 

Say you run some machine learning that helps you identify some objects in the knowledge graph as having some interesting properties. Now you might need to look at what is there in the graph about them to either explain the results or show the results to domain experts so that they can use them for further research. And that is where ContextMinds comes in. You put the concepts from the machine learning results into the view and ContextMinds automatically adds the links between them and finds related concepts from their neighborhood. We have done this with kg-covid, a knowledge graph built from various biomedical and Covid-related datasets. There we use RDFrules to mine interesting rules and then visualize the results in ContextMinds (available at contextminds.vse.cz). Because of that biology experts may interpret them and explore further related information. More about that maybe later in another blogpost.

Our Vision 

An additional fun fact: since we started developing ContextMinds to work solely with DBpedia, its data model is kind-of hard-coded in it. Although the plan is to enable loading multiple knowledge graphs into single ContextMinds instance so that the user may interconnect objects from DBpedia with those from other datasets when creating the concept map at the moment we have to transform the data so that they look like DBpedia to be loaded into ContextMinds. 

A big thank you to ContextMinds, especially Marek Dudáš for presenting how ContextMinds combines concept mapping and knowledge graphs.

Yours,

DBpedia Association

The post ContextMinds: Concept mapping supported by DBpedia appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/contextminds/feed/ 0
Structure mining with DBpedia https://www.dbpedia.org/blog/structure-mining-with-dbpedia/ https://www.dbpedia.org/blog/structure-mining-with-dbpedia/#respond Tue, 23 Feb 2021 08:48:36 +0000 https://www.dbpedia.org/?p=4054 DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with Wallscope, they will show how you can derive value from existing data by […]

The post Structure mining with DBpedia appeared first on DBpedia Association.

]]>
DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with Wallscope, they will show how you can derive value from existing data by coupling it with readily available open sources such as DBpedia. Have fun while reading!

by Antero Duarte, Lead Developer, Wallscope

Wallscope and DBpedia

Wallscope has been using DBpedia for many years – for example as part of our demos; to inform discussions with clients; and fundamentally to help people understand linked data and the power of knowledge graphs.

We also work with Natural Language Processing, and for us the intersection of these two areas of research and technology is extremely powerful. We can provide value to organisations at a low cost of entry, since resources like DBpedia and open source NLP models can be used with little effort.

We quickly realised that while linking entities, finding and expanding on keywords was interesting, this was more of an interesting novelty than a technology that would directly solve our clients’ problems. 

For example, when speaking to an art gallery about classifying content, the prospect of automatically identifying artists’ names in text might appeal, because this is often a manual process. The gallery might also be interested in expanding artists’ profiles with things like their birth date and place, and the ability to find other artists based on properties of a previously identified artist. This is all interesting, but it’s mostly information that people looking at an artist’s work will already know.

The eureka moment was how we could use this information to expand and build on the things we already know about the artist. Presenting well-known aspects of the artist’s life alongside things that are not as obvious or well known, to create a wider story.

Frida Kahlo

We can take someone like Frida Kahlo and her audience and expose them to different facets of her work and life. By leveraging structure and connections between entities, rather than just the entities themselves, we can arrange previously created content in a different way and generate a new perspective, new insight and new value.

There’s a common expression in the data world that says that ‘data is the new oil’. While this is a parallel between the richest companies making their money from data nowadays rather than oil, as it used to be just a few decades ago, it is also true that: We mine data like oil. We treat data as a finite resource that we burn once or maybe refine and then burn, or maybe refine and turn into something else. But we don’t really think of data as reusable.

I’d like to propose that data is in fact a renewable resource. Like the wind…

If we take the previous example of the art gallery and how they manage the data they hold about Frida Kahlo. They might want to use the same content in different ways, and why wouldn’t they?

We have different ways of building a story around a single artist and their life. We can learn from DBpedia that Frida Kahlo is considered a surrealist artist, which allows us to build an exhibition about Surrealism.

But we can also learn that Frida Kahlo is a self-taught artist. We can build an exhibition centred around people who were self-taught and influential in different fields.

We can think of Frida’s personal life and how she is an LGBTQ+ icon for being openly and unapologetically queer, more specifically bisexual. This opens up an avenue to show LGBTQ+ representation in media throughout history.

Exploring data related to Frida Kahlo

For us this is one of the most powerful things about linked data, and it’s one of the easiest ways to show potential clients how they can derive value from existing data by coupling it with readily available open sources such as DBpedia.

This also promotes a culture of data reusability that actively goes against the problem of siloed data. Those gathering data don’t just think about their specific use case but rather about how their data can be useful for others and how they can best design it so it’s reusable elsewhere.

Lateral Search Technique

Besides the more obvious aspects of an open knowledge structure, an aspect that can sometimes be overlooked is the inherent hierarchy of concepts in something like Wikipedia’s (and consequently DBpedia’s) category pages. By starting at a specific level and generalising, we are able to find relevant information that relates to the subject laterally.

This process of lateral search can provide very good results, but it can also be a difficult process of testing out different mechanisms and finding the best way to select the most relevant connections, usually on a trial-and-error basis. Over the years we have used this lateral search technique as a more nuanced approach to topic classification that doesn’t require explicit training data, as we can rely on DBpedia’s structure rather than training data to make assertions.

With the trial-and-error approach Wallscope has created a set of tools that helps us iterate faster based on the use case for implementations of combined Natural Language Processing and structure mining from knowledge graphs.

Data Foundry

Data Foundry frontend

Data Foundry is Wallscope’s main packaged software offering for knowledge graph creation and manipulation. It is an extendable platform that is modular by design and scalable across machine clusters. Its main function is to act as a processing platform that can connect multiple data sources (usually a collection of files in a file system) to a single knowledge graph output (usually an RDF triplestore). Through a pipeline of data processors that can be tailored to specific use cases, information is extracted from unstructured data formats and turned into structured data before being stored in the knowledge graph.

Several processors in Data Foundry use the concept of structure mining and lateral search. Some use cases use DBpedia, others use custom vocabularies.

STRVCT

STRVCT frontend

STRVCT is Wallscope’s structured vocabulary creation tool. It aims to allow any user to create/edit SKOS vocabularies with no prior knowledge of RDF, linked data, or structured vocabularies. By virtue of its function, STRVCT gives users ownership of their own data throughout the development process, ensuring it is in the precise shape that they want it to be in.

STRVCT is a stepping stone in Wallscope’s pipeline – once a vocabulary is created, it can be processed by Data Foundry and used with any of our applications.

HiCCUP

HiCCUP frontend

Standing for Highly Componentised Connection Unification Platform, HiCCUP is the “glue” in many of Wallscope’s projects and solutions.

It gives users the ability to create connections to SPARQL endpoints for templated queries and RDF manipulation and exposes those templates as an API with RDF outputs. The latest version also allows users to connect to a JSON API and real time conversion to RDF. This has proven useful in integrating data sources such as IoT device readings into knowledge graph environments.

Pronto

Pronto landing page

Pronto was created to overcome the challenges related to the reuse of ontologies. It is an open-source ontology search engine that provides fuzzy matching across many popular ontologies, originally selected from the prefix.cc user-curated “popular” list, along with others selected by Wallscope. 

Pronto has already proved a reliable internal solution used by our team to shorten the searching process and to aid visualisation.

If you’re interested in collaborating with us or using any of the tools mentioned above, send an email to contact@wallscope.co.uk 

You can find more Wallscope articles at https://medium.com/wallscope and more articles written by me at https://medium.com/@anteroduarte  

A big thank you to Wallscope, especially Antero Duarte for presenting how to extract knowledge from DBpedia and for showcasing cool and innovative tools.  

Yours,

DBpedia Association

The post Structure mining with DBpedia appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/structure-mining-with-dbpedia/feed/ 0
Why Data Centricity Is Key To Digital Transformation https://www.dbpedia.org/blog/why-data-centricity-is-key-to-digital-transformation/ https://www.dbpedia.org/blog/why-data-centricity-is-key-to-digital-transformation/#respond Wed, 17 Feb 2021 09:01:17 +0000 https://www.dbpedia.org/?p=3988 DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with eccenca. They will explain why companies struggle with digital transformation and why data […]

The post Why Data Centricity Is Key To Digital Transformation appeared first on DBpedia Association.

]]>
DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with eccenca. They will explain why companies struggle with digital transformation and why data centricity is the key to this transformation. Have fun while reading!

by Hans-Christian Brockmann, CEO eccenca

Why Data Centricity Is Key To Digital Transformation

Only a few large enterprises like Google, Amazon, and Uber have made the mindset and capability transition to turn data and knowledge into a strategic advantage. They have one thing in common: their roadmap is built on data-centric principles (and yes, they use knowledge graph technology)!


Over the last years it has become obvious that the majority of companies fail at digital transformation as long as they continue to follow their outdated IT management best practices. We, the knowledge graph community, have long been reacting to this with rather technical explanations about RDF and ontologies. While our arguments have been right at all times, they did not really address the elephant in the room: that it’s not only a technological issue but a question of mindset.


Commonly, IT management is stuck with application-centric principles. Solutions for a particular problem (e.g. financial transactions, data governance, GDPR compliance, customer relationship management) are thought about in singular applications. This has created a plethora of stand-alone applications in companies which store and process interrelated or even identical
data but are unable to integrate. That’s because every application has its own schema and data semantics. And companies have hundreds or even thousands different applications at work. Still, when talking about data integration projects or digital transformation the IT management starts the argument from an application point of view.

mastering complexity

Companies Struggle With Digital Transformation Because Of Application Centricity

This application-centric mindset has created an IT quagmire. It prevents automation and digital
transformation because of three main shortcomings.

  1. Data IDs are local. The identification of data is restricted to its source application which prevents global identification, access and (re)use.
  2. Data semantics are local. The meaning of data, information about their constraints, rules and context are hidden either in the software code or in the user’s head. This makes it difficult to work cooperatively with data and also hinders automation of data-driven processes.
  3. The knowledge about data’s logic is IT turf. Business users who actually need this knowledge to scale their operations and develop their business in an agile way are always dependent on an overworked IT which knows the technicalities but doesn’t understand the business context and needs. Thus, scalability and agility are prevented from the start.

Data centricity changes this perspective because it puts data before applications. Moreover, it simplifies data management. The term was coined by author and IT veteran Dave McComb. The aim of data centricity is to “base all application functionality on a single, simple, extensible and federateable data model”, as Dave recently outlined in the latest Escape From Data Darkness
webcast episodes. At first, this might sound like advocating yet another one of these US$ 1bn data integration / consolidation projects done by a big name software vendor, the likes of which have failed over and over again. Alas, it’s quite the opposite.

A Central Data Hub For Knowledge Driven Automation

Data centricity does not strive to exchange the existing IT infrastructure with just another proprietary application. Data centricity embraces the open-world assumption and agility concepts and thus natively plays well with the rest of the data universe. The application-centric mindset always struggles with questions of integration, consolidation and a religious commitment to being the “single source of truth”. The data-centric mindset does not have to, because integration is (no pun intended) an integral part of the system. Or as Dave puts it in his book “The Data-Centric Revolution”: “In the Data-Centric approach […] integration is far simpler within a domain and across domains [because] it is not reliant on mastering a complex schema. […] In the Data-Centric approach, all identifiers (all keys) are globally unique. Because of this, the system integrates information for you. Everything relating to an entity is already connected to that entity” without having to even consolidate it in a central silo.


Of course, this sounds exactly like what we have been talking about all those years with knowledge graph technology and FAIR data. And we have seen it working beautifully with our customers like Nokia, Siemens, Daimler and Bosch. eccenca Corporate Memory has provided them with a central data hub for their enterprise information that digitalizes expert knowledge,
connects disparate data and makes it accessible to both machines and humans. Still, what we have learned from those projects is this: Conviction comes before technology, just as data comes before the application. Knowledge graph technology certainly is the key maker to digital transformation. But a data-centric mindset is key.

A big thank you to eccenca, especially Hans-Christian Brockmann for explainig why data centricity is the key to digital transformation. Four years ago eccenca became a member of the DBpedia Association and helped to increase the DBpedia network. Thanks for your contribution and constant support! Feel free to check out eccenca’s member presentation page: https://www.dbpedia.org/dbpedia-members/eccenca/

Yours,

DBpedia Association

The post Why Data Centricity Is Key To Digital Transformation appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/why-data-centricity-is-key-to-digital-transformation/feed/ 0