guest article Archives - DBpedia Association

From voice to value using AI

Tue, 19 Apr 2022 10:54:23 +0000

DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Wallscope, who supports both organisational goals, improves existing processes and embed new technologies by generating the insights that power change. David Eccles presents the opportunities of digital audio. Have fun reading!

by David Eccles, Wallscope

Motivation

The use of digital audio has accelerated throughout the pandemic, creating a cultural shift in the use of audio form content within business and consumer communications.

Alongside this the education and entertainment industries embraced semantic technologies as a means to develop sustainable delivery platforms under very difficult circumstances.

Wallscope’s research and development activities were already aligned to exploring speech-driven applications and through this we engaged with Edinburgh University’s Creative Informatics department to explore practical use cases focusing on enhancing the content of podcasts.

Our focus now is on how user experience can be enhanced with knowledge graph interaction, providing contextually relevant information to add value to the overall experience. As DBpedia provides the largest knowledge repository available, Wallscope embedded semantic queries to the service into the resulting workflow.

Speech to Linked Data

Speech-driven applications require a high level of accuracy and are notoriously difficult to develop, as anyone with experience of spoken dialog systems will probably be aware. A range of Natural Language Processing models are available which perform with a high degree of accuracy – particularly for basic tasks such as Named Entity Recognition – to recognise people, places, and organisations (spaCy and PyTorch are good examples of this). Obviously the tasks become more difficult to achieve when inherently complicated concepts are brought into the mix such as cultural references and emotional reactions.

To this end Wallscope re-deployed and trained a machine learning model called BERT. This stands for Bidirectional Encoder Representations from Transformers and it is a technique for NLP pre-training originally developed by Google.

BERT uses the mechanism of “paying attention” to better understand the contextual relationships between each word (or sub-words) within a sentence. Having previous experience deploying BERT models within the healthcare industry, we adapted and trained the model on a variety of podcast conversations.

As an example of how this works in practice, consider the phrase “It looked like a painting”. BERT looks at the word “it” and then checks its relationship with every other word in the sentence. This way, BERT can tell that “it” refers strongly to “painting”. This allows BERT to understand the context of each word within a given sentence.

Simple process diagram

We then looked at how this could be used to better engage users across the podcast listening experience, and provide points of knowledge expansion, engagement and ‘socialisation’ of content in web-based environments. This in turn can create a richer and more meaningful experience for listeners that runs in parallel with podcasting platforms.

Working across multiple files containing podcast format audio, we looked at several areas of improvements for listeners, creators and researchers. Our primary aim was to demonstrate the value of semantic enhancements to the transcriptions.

We worked with these across several processes to enhance them with Named Entity Recognition using our existing stack. From there we extended the analysis of ‘topics’ using a blend of Machine Learning models. That very quickly allowed us to gain a deep understanding of the relationships contained with the spoken word content. By visualising that we could gain a deeper insight into the content and how it could be better presented, by reconciling it with references within DBpedia.

This analysis led us to ideate around an interface that was built around the timeline presented by the audio content.

Playback of audio with related terms

This allows the listener to gain contextually related insights by dynamically querying DBpedia for entities extracted from the podcast itself. This knowledge extension is valuable to enhance not only the listeners’ experience but also to provide a layer of ‘stickiness’ for the content across the internet as it enhances findability.

This shows how knowledge can be added to a page using DBpedia.

One challenge is the quality of transcriptions. With digital speech recognition, there is never a 100% confidence level across unique audio recordings such as podcasts as well as within video production.

We are currently working with services which are increasingly harnessing AI technologies to not only improve the quality of transcription but also the insights which can be derived from spoken word data sources. A current area of research for Wallscope is how our ML models can be utilised to improve the curation layer of transcripts. This is important as keeping the human in the loop is critical to ensure the fidelity of any transcription process. By deploying the same techniques – albeit in reverse – there is an interesting opportunity to create dynamic ‘sense-checking’ models. While this is at an early stage, DBpedia undoubtedly will be an important part of that.

We are also developing some visualisation techniques to assist curators to identify ‘errors’ and to provide suggestions for more robust topic classification models. This allows more generalised suggestions for labels. For example while we may have a specific reference to ‘zombie’ to present that as a subset of ‘horror’ has more value in categorisation systems. Another example could relate to location. If we identify ‘France’ in a transcription with 100% certainty, then we can create greater certainty around ‘Paris’ as being Paris, France as opposed to Paris,Texas. This also applies to machine learning-based summarisation techniques.

Next steps

We are further exploring how these approaches can best assist in the exploration of archives as well as incorporating text analysis to improve the actual curation of archives.

Please contact Ian Allaway or David Eccles for more information, or visit www.wallscope.co.uk

Further reading on ‘Podcasting Exploration’

The post From voice to value using AI appeared first on DBpedia Association.

Data Virtualization: From Graphs to Tables and Back

Wed, 26 Jan 2022 11:05:53 +0000

Ontotext believes you should be able to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in. GraphDB’s data virtualization opens your graph to the wider semantic web and to relational databases.

DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Ontotext, who helps enterprises identify meaning across diverse datasets and massive amounts of unstructured information. Jarred McGinnis presents the beauty of data virtualization. Have fun reading!

by Jarred McGinnis, Ontotext

The beauty and power of knowledge graphs is their abstraction away from the fiddly implementation details of our data. The data and information is organized in a way human-users understand it regardless of the physical location of the data, the format and other low-level technical details. This is because the RDF of the knowledge graph enables a schema-less, or a schema-agnostic, approach to facilitate the integration of multiple heterogeneous datasets.

Semantic technology defines how data and information is inter-related. These relationships give context and that context is what gives our data meaning that can be understood by humans AND machines. That’s where the knowledge part of the graph comes from and it is a powerful way of providing a single view on disparate pieces of information.

ETL is Still Your Best Bet

When possible it’s better to pay the initial costs of the ETL process. In a previous blog post, we talked about how knowledge graphs generously repay that investment of time and effort taken in data preparation tasks. However, there are a number of reasons that it is impossible or impractical, such as the size of the dataset or the data exists in a critical legacy system where an ETL process would create more problems than it fixed. In these cases, it is better to take a data virtualization approach.

Ontotext GraphDB provides data virtualization functionality to realize the benefits of real-time access to external data, remove or reduce the need for middle-ware for query processing, reduce support and development time, increase governance and reduce data storage requirements.

Firstly, There’s Federation.

RDF is the language of the semantic web. If you are working with Linked Data, it opens up a world of billions upon billions of factual statements about the world, which is probably why you chose to work with linked data in the first place. Nice work! And that means I don’t have to tell you that DBpedia, a single data set among hundreds, has three billion triples alone. You are no longer limited by the data your organization holds. Queries about internal data can be seamlessly integrated with multiple external data sources.

For example, suppose you want to query well-known people and their birth places for a map application. It’s possible to create a single query that gets the person’s information from DBpedia, which would give you the birthplace and take those results to query another data source like Geonames to provide the geographic coordinates to be able to add them to a mapping application. Since both of these data sources are linked data, it’s relatively straightforward to write a SPARQL query that retrieves the information.

It doesn’t even have to be another instance of GraphDB. It’s part of the reason Ontotext insists on using open standards. With any equally W3C-compliant knowledge graph that supports a SPARQL endpoint, it is possible to retrieve the information you want and add it to your own knowledge graph to do with as you please. A single query could pull information from multiple external data sources to get the data you are after, which is why federation is an incredibly powerful tool to have.

The Business Intelligence Ecosystem Runs on SQL.

Ontotext is committed to lowering the costs of creating and consuming knowledge graphs. Not every app developer or DBA in an organization is going to have the time to work directly with the RDF data models. A previous version of GraphDB 9.4 added the JDBC driver to ensure those who need to think and work in SQL can access the power of the knowledge with SQL.

Knowing the importance and prominence of SQL for many applications, we have a webinar demonstrating how GraphDB does SQL-to-SPARQL transformation and query optimization and how Microsoft’s Power BI and Tableau can be empowered by knowledge graphs. GraphDB provides a SQL interface to ensure those who prefer a SQL view of the world can have it.

Virtualization vs ETL

The most recent GraphDB release has added virtualization functionality beyond simple federation. It is now possible to create a virtual graph by mapping the columns and rows of a table to entities in the graph. It becomes possible to retrieve information from external relational databases and have it play nice with our knowledge graph. We aren’t bound by data that exists in our graph or even in RDF format. Of course it would be easier and certainly quicker to ETL a data source into a single graph and perform the query, but it is not always possible, because either the size of the dataset is too large, it gets updated too frequently or both.

For example, in the basement of your organization is a diesel-powered database that has a geological strata of decades old data and that is critical to the organization. You know and I that database is never going to be ETLed into the graph. Virtualization is your best bet, by creating a virtual graph and mapping that decade old format for client orders by saying, “When I query about ‘client order’, you need to go to this table and this column in that behemoth server belching black smoke that’s run by Quismodo and return the results”.

There will be an inevitable hit to query performance but there are a number of situations where slow is better than not at all. Such as the over-egged example above. It is important to understand the trade-offs and practicalities between ETL and virtualization. The important thing for Ontotext is to make sure GraphDB is capable of both and provide a combined approach to maximize flexibility. There is also a webinar on this topic, introducing the open-source Ontop.

GraphDB Gives You Data Agility

Data virtualization and federation come with costs as well as benefits. There is no way we are ever going to master where the data you need exists and what format. The days of centralized control are over. It’s about finding the technology that gives your agility and GraphDB’s added virtualization capabilities enables you to create queries that include external open sources and merge it seamlessly with your own knowledge graph. Virtualization of relational databases creates incredible opportunities for applications to provide users with a single coherent view on the complex and diverse reality of your data ecosystem.

Jarred McGinnis

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

The post Data Virtualization: From Graphs to Tables and Back appeared first on DBpedia Association.

How Innovative Organizations Use The World’s Largest Knowledge Graph

Tue, 09 Nov 2021 12:50:53 +0000

DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Diffbot, a California-based company whose mission is to “extract knowledge in an automated way from documents.” They will introduce the Diffbot Knowledge Graph and present topics, like Market Intelligence and Ecommerce. Have fun reading!

by Filipe Mesquita & Merrill Cook, Diffbot

Diffbot is on a mission to create a knowledge graph of the entire public web. We are teaching a robot, affectionately known as Diffy, to read the web like a human and translate its contents into a format that (other perhaps less sophisticated) machines can understand. All of this information is linked and cleaned on a continuous basis to populate the Diffbot Knowledge Graph.

The Diffbot Knowledge Graph already contains billions of entities, including over 240M organizations, 700M people, 140M products, and 1.6B news articles. This scale is only possible because Diffy is fully autonomous and doesn’t depend on humans to build the Diffbot Knowledge Graph. Using cutting-edge crawling technology, natural language processing, and computer vision, Diffy is able to read and extract facts from across the entire web.

While we believe a knowledge graph like Diffbot’s will be used by virtually every organization one day, there are 4 use cases where the Diffbot Knowledge Graph excels today: (1) Market Intelligence, (2) News Monitoring, (3) E-commerce, and (4) Machine learning.

Market Intelligence

Video: https://www.diffbot.com/assets/video/solutions-for-media-monitoring.mp4

At its simplest, market intelligence is the generation of insights about participants in a market. These can include customers, suppliers, competitors, as well as attitudes of the general public and political establishment.

While market intelligence data is all over the public web, this can be a “double-edged sword.” The range of potential sources for market intelligence data can exhaust the resources of even large teams performing manual fact accumulation.

Diffbot’s automated web data extraction eliminates the inefficiencies of manual fact gathering. Without such automation, it’s simply not possible to monitor everything about a company across the web.

We see market intelligence as one of the most well-developed use cases for the Diffbot Knowledge Graph. Here’s why:

The Diffbot Knowledge Graph is built around organizations, people, news articles, products, and the relationships among them. These are the types of facts that matter in market intelligence.
Knowledge graphs have flexible schemas, allowing for new fact types to be added “on the fly” as the things we care about in the world change
Knowledge graphs provide unique identifiers for all entities, supporting the disambiguation entities like Apple (the company) vs apple (the fruit).

Market intelligence uses from our customers include:

Querying the Knowledge Graph for companies that fit certain criteria (size, revenue, industry, location) rather than manually searching for them in Google
Creating dashboards to receive insights about companies in a certain industry
Improving an internal database by using the data from the Diffbot Knowledge Graph.
Custom solutions that incorporate multiple Diffbot products (custom web crawling, natural language processing, and Knowledge Graph data consumption)

News Monitoring

Sure, the news is all around us. But most companies are overwhelmed by the sheer amount of information produced every day that can impact their business.

The challenges faced by those trying to perform news monitoring on unstructured article data are numerous. Articles are structured differently across the web, making aggregation of diverse sources difficult. Many sources and aggregators silo their news by geographic location or language.

Strengths of providing article data through a wider Knowledge Graph include the ability to link articles to the entities (people, organizations, locations, etc) mentioned in each article. Additional natural language processing includes the ability to identify quotes and who said them as well as the sentiment of the article author towards each entity mentioned in the article.

In high-velocity, socially-fueled media, the need for automated analysis of information in textual form is even more pressing. Among the many applications of our technology, Diffbot is helping anti-bias and misinformation initiatives with partnerships involving FactMata as well as the European Journalism Centre.

Check out how easy it is to build your own custom pan-lingual news feed in our news feed builder.

Ecommerce

Many of the largest names in ecommerce have utilized Diffbot’s ability to transform unstructured product, review, and discussion data into valuable ecommerce intelligence. Whether pointing AI-enabled crawlers at their own marketplaces to detect fraudulent, duplicate, or underperforming products, or by analyzing competitor or supplier product listings.

One of the benefits of utilizing Diffbot’s AI-enabled product API or our product entities within the Knowledge Graph is the difficulty of scraping product data at scale. Many ecommerce sites employ active measures to make the scraping of their pages at scale difficult. We’ve already built out the infrastructure and can begin returning product data at scale in minutes.

The use of rule-based scraping by many competitors or in-house teams means that whenever ecommerce sites shift their layout or you try to extract ecommerce web data from a new location, your extraction method is likely to break. Additionally, hidden or toggleable fields on many ecommerce pages are more easily extracted by solutions with strong machine vision capabilities.

Diffbot’s decade-long focus on natural language processing also allows the inclusion of rich discussion data parsed for entities, connections, and sentiment. On large ecommerce sites, the structuring and additional processing of review data can be a large feat and provide high value.

Machine Learning

Even when you can get your hands on the right raw data to train machine learning models, cleaning and labeling the data can be a costly process. To help with this, Diffbot’s Knowledge Graph provides potentially the largest selection of once unstructured web data, complete with data provenance and confidence scores for each fact.

Our customers use a wide range of web data to quickly and accurately train models on diverse data types. Need highly informal text input from reviews? Video data in a particular language? Product or firmographic data? It’s all in the Knowledge Graph, structured and with API access so customers can quickly jump into validating new models.

With a long association with Stanford University and many research partnerships, Diffbot’s experts in web-scale machine learning work in tandem with many customers to create custom solutions and mutually beneficial partnerships.

To some, 2020 was the year of the knowledge graph. And while innovative organizations have long seen the benefits of graph databases, recent developments in the speed of fact accumulation online mean the future of graphs has never been more bright.

A big thank you to Diffbot, especially to Filipe Mesquita and Merrill Cook for presenting the Diffbot Knowledge Graph.

Yours,

DBpedia Association

The post How Innovative Organizations Use The World’s Largest Knowledge Graph appeared first on DBpedia Association.

Bringing Linked Data to the Domain Expert with TriplyDB Data Stories

Fri, 08 Oct 2021 08:06:08 +0000

DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Triply, a Dutch company. They will introduce TriplyDB and data stories to us. Have fun reading!

by Kathrin Dentler, Triply

Triply and TriplyDB

Triply is an Amsterdam-based company with the mission to (help you to) make linked data the new normal. Every day, we work towards making every step around working with linked data easier, such as converting and publishing it, integrating, querying, exploring and visualising it, and finally sharing and (re-)using it. We believe in the benefits of FAIR (findable, accessible, interoperable and reusable) data and open standards. Our product, TriplyDB, is a user-friendly, performant and stable platform, designed for potentially very large linked data knowledge graphs in practical and large-scale production-ready applications. TriplyDB not only allows you to store and manage your data, but also provides data stories, a great tool for storytelling.

Data stories

Data stories are data-driven stories, such as articles, business reports or scientific papers, that incorporate live, interactive visualizations of the underlying data. They are written in markdown, and can be underpinned by an orchestration of powerful visualizations of SPARQL query results. These visualizations can be charts, maps, galleries or timelines, and they always reflect the current state of your data. That data is just one click away: A query in a data story can be tested or even tweaked by its readers. It is possible to verify, reproduce and analyze the results and therefore the narrative, and to download the results or the entire dataset. This makes a data story truly FAIR, understandable, and trustworthy. We believe that a good data story can be worth more than a million words.

Examples

With a data story, the domain expert is in control and empowered to work with, analyze, and share his or her data as well as interesting research results. There are some great examples that you can check out straight away:

The fantastic data story on the Spanish Flu, which has been created by history and digital humanities researchers, who usually use R and share their results in scientific papers.
Students successfully published data stories in the scope of a course of only 10 weeks.
The beautiful data story on the Florentine Catasto of 1427.

DBpedia on triplydb.com

Triplydb.com is our public instance of TriplyDB, where we host many valuable datasets, which currently consist of nearly 100 billion triples. One of our most interesting and frequently used datasets are those by the DBpedia Association:

a version from 2017 with 369.205.380 statements
the DBpedia Snapshot 2021-06 Release with 845.807.279 statements, and
a version of the DBpedia ontology

We also have several interesting saved queries based on these datasets.

A data story about DBpedia

To showcase the value of DBpedia and data stories to our users, we published a data story about DBpedia. This data story includes comprehensible and interactive visualizations, such as a timeline and a tree hierarchy, all of which are powered by live SPARQL queries against the DBpedia dataset.

Let us have a look at the car timeline: DBpedia contains a large amount of content regarding car manufacturers and their products. Based on that data, we constructed a timeline which shows the evolution within the car industry.

If you navigate from the data story to the query, you can analyze it and try it yourself. You see that the query limits the number of manufacturers so that we are able to look at the full scale of the automotive revolution without cluttering the timeline. You can play around with the query, change the ordering, visualize less or more manufacturers, or change the output format altogether.

Advanced features

If you wish to use a certain query programmatically, we offer preconfigured code snippets that allow you to run a query from a python or an R script. You can also configure REST APIs in case you want to work with variables. And last but not least, it is possible to embed a data story on any website. Just scroll to the end of the story you want to embed and click the “ Embed” button for a copy-pasteable code snippet.

Try it yourself!

Sounds interesting? We still have a limited number of free user accounts over at triplydb.com. You can conveniently log in with your Google or Github account and start uploading your data. We host your first million open data triples for free! Of course, you can also use public datasets, such as the ones from DBpedia, link your data, work together on queries, save them, and then one day create your own data story to let your data speak for you. We are already looking forward to what your data has to say!

A big thank you to Triply for being a DBpedia member since 2020. Especially Kathrin Dentler for presenting her work at the last DBpedia Day in Amsterdam and for her amazing contribution to DBpedia.

Yours,

DBpedia Association

The post Bringing Linked Data to the Domain Expert with TriplyDB Data Stories appeared first on DBpedia Association.

WordLift – Building Knowledge Graphs for SEO

Mon, 05 Jul 2021 06:27:15 +0000

DBpedia Member Features – Over the last year we gave our DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with WordLift. They will show us how they help companies speak Google’s native language with data from DBpedia. Have fun while reading!

by Andrea Volpini, WordLift

WordLift is a Software as a Service designed to help companies speak Google’s native language by converting unstructured content into structured data that search engines understand.

It does so automatically, using natural language processing and machine learning. Most SEO tools provide insights on improving a website. WordLift creates a knowledge graph and automates some of these SEO tasks to help a site rank. We call it agentive SEO: from search intent analysis to content generation, from building internal links to improving on-page user engagement. At the core of this automation, WordLift creates 5-stars linked data using DBpedia.

Artificial Intelligence is shaping the online world. Commercial search engines like Google and Bing have changed dramatically in the last two decades: from 10 blue links to algorithms that answer user questions without ever clicking a link or leaving the results page. As search evolves, so do the SEO tools that marketers need to cope with these changes.

Why creating a Knowledge Graph improves SEO?

Imagine the knowledge graph behind a website as the most thoughtful way to help crawlers index and understand its content. Much like Google uses the graph as the engine to power up its search results, a knowledge graph that describes a website’s content helps machines understand the semantic meanings behind it.

In practical terms, a customised knowledge graph helps content marketers in different ways:

Enhancing SERP results with structured data and helping Google and Bing disambiguate your brand name or services adequately.
Automating internal links to increase rankings around entities that matter for the business.
Providing content recommendations to enhance the customer journey.
Bringing additional insights to web analytics by grouping traffic related to entities and not only pages (i.e. how is the content on “artificial intelligence” performing this week?).
Providing the factual data required for training language models that can automatically generate content (you can read all about it in my latest blog post on AI text generation for SEO, where you will have the code to fine-tune Google’s T5 using triples from DBpedia )

Here is an example of Natural Language Generation using Google’s T5

Search Intent Discovery: a practical example of how marketers can use a KG

Let me give you another example: keyword research. The purpose of keyword research is to find and analyze search queries that people enter into search engines to create new content or improve existing ones. Using the custom knowledge graphs that WordLift produces, we help our clients quickly scout for new untapped search opportunities.

The chart above shows a series of search intents (queries) that WordLift has generated after the user provided three ideas. Using the knowledge graph, these intents are grouped into topics such as “Android”, “Anonymity” or “Gamer” and the content editor can find the right query to target. In the treemap chart, larger boxes correspond to a higher search volume, and lighter colors indicate less competitive queries.

How does WordLift build a Knowledge Graph?

An entity represents the “thing” described in web pages. Entities help computers understand everything about a person, an organization or a place mentioned on a website. Each entity holds the information required to provide direct answers to questions about itself and questions that can be answered by looking at the relationships with other entities. WordLift uses natural language processing to extract and connect entities with web pages. Therfore we primarily use schema.org vocabulary to build its knowledge graphs.

WordLift heavily relies on DBpedia. We train our content analysis API on concepts that are in DBpedia and we build knowledge graphs that interlink, among other graphs, with DBpedia.

We are also starting to automatically create content using language models trained with data from DBpedia. More on this front will come in the next future.

WordLift and the open DBpedia Knowledge Graph

Our users constantly publish and update public web data from various sources (their websites, the catalogue of their products, a data feed from a partner and more) and interlink the entities they publish with DBpedia.

We are now excited to contribute back some of this knowledge. With the help of DBpedia we can keep on building a distributed, queryable and decentralized linked data cloud.

Keep following us and keep contributing to the LOD cloud!

A big thank you to WordLift. Especially Andrea Volpini for presenting how WordLift creates a KG and automates some SEO tasks helping a site rank.

Yours,

DBpedia Association

The post WordLift – Building Knowledge Graphs for SEO appeared first on DBpedia Association.

ContextMinds: Concept mapping supported by DBpedia

Fri, 16 Apr 2021 08:51:11 +0000

Contribution from Marek Dudáš (Prague University of Economics and Business – VŠE)

ContextMinds is a tool that combines two ideas: concept mapping and knowledge graphs. What’s concept mapping? With a bit of simplification, when you take a small subgraph of not more than a few tens of nodes from a knowledge graph (kg) and visualize it with the classic node-link (or “bubbles and arrows”) approach, you get a concept map. But concept maps are much older than knowledge graphs. They emerged in the 70’s and were originally intended to be created by hand. This was done to represent a person’s understanding of a given problem or question. Shortly after their “discovery” (using diagrams to represent relationships is probably much older idea), they turned out to be a very useful educational tool.

Going back to knowledge graphs and DBpedia, ContextMinds lets you quickly create an overview of some problem you need to solve, study or explain.

Figure 1 Text search in concepts from DBpedia: starting point of concept map creation in ContextMinds.

How you can start

Starting from a classic text search, you select concepts (nodes) from a knowledge graph, ContextMinds shows how they are related (loads the links from the knowledge graph). It also suggest you what other concepts are there in the kg that you might be interested in. The suggestions are brought from the joint neighborhood of the nodes you already selected and put into the view. Nodes are scored by relevance, basically by the number of links to what you have in the view. So, as you are creating your concept map, an always updated list of around 30 most related concepts is available for simple drag & dropping to your map.

Figure 2 Concept map and a list of top related concepts found in DBpedia by ContextMinds.

This helps you make the concept map complete quickly. It also helps to discover relationships between the concepts that you were not aware of. If a concept or relationship is not there yet in the knowledge graph, you can create it. It will not only appear in your concept map, but will also become a part of an extended knowledge for anyone who has access to your map. You can at any time select the sources of concept & relationship suggestions. To do that you can choose any combination of the personal scope (concepts from maps created by you), workspace scope (shared space with teammates), DBpedia (or a different kg) and public scope (everything created by the community and made public).

The best way of explaining how it works is a short video.

Use Case: Knowledge Graph

ContextMinds was of course built with DBpedia as the initial knowledge graph. That instance is available at app.contextminds.com and more than 100 schools are using it as an educational aid. Recently, we discovered that the same model can be useful with other knowledge graphs.

Say you run some machine learning that helps you identify some objects in the knowledge graph as having some interesting properties. Now you might need to look at what is there in the graph about them to either explain the results or show the results to domain experts so that they can use them for further research. And that is where ContextMinds comes in. You put the concepts from the machine learning results into the view and ContextMinds automatically adds the links between them and finds related concepts from their neighborhood. We have done this with kg-covid, a knowledge graph built from various biomedical and Covid-related datasets. There we use RDFrules to mine interesting rules and then visualize the results in ContextMinds (available at contextminds.vse.cz). Because of that biology experts may interpret them and explore further related information. More about that maybe later in another blogpost.

Our Vision

An additional fun fact: since we started developing ContextMinds to work solely with DBpedia, its data model is kind-of hard-coded in it. Although the plan is to enable loading multiple knowledge graphs into single ContextMinds instance so that the user may interconnect objects from DBpedia with those from other datasets when creating the concept map at the moment we have to transform the data so that they look like DBpedia to be loaded into ContextMinds.

A big thank you to ContextMinds, especially Marek Dudáš for presenting how ContextMinds combines concept mapping and knowledge graphs.

Yours,

DBpedia Association

The post ContextMinds: Concept mapping supported by DBpedia appeared first on DBpedia Association.

Structure mining with DBpedia

Tue, 23 Feb 2021 08:48:36 +0000

DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with Wallscope, they will show how you can derive value from existing data by coupling it with readily available open sources such as DBpedia. Have fun while reading!

by Antero Duarte, Lead Developer, Wallscope

Wallscope and DBpedia

Wallscope has been using DBpedia for many years – for example as part of our demos; to inform discussions with clients; and fundamentally to help people understand linked data and the power of knowledge graphs.

We also work with Natural Language Processing, and for us the intersection of these two areas of research and technology is extremely powerful. We can provide value to organisations at a low cost of entry, since resources like DBpedia and open source NLP models can be used with little effort.

We quickly realised that while linking entities, finding and expanding on keywords was interesting, this was more of an interesting novelty than a technology that would directly solve our clients’ problems.

For example, when speaking to an art gallery about classifying content, the prospect of automatically identifying artists’ names in text might appeal, because this is often a manual process. The gallery might also be interested in expanding artists’ profiles with things like their birth date and place, and the ability to find other artists based on properties of a previously identified artist. This is all interesting, but it’s mostly information that people looking at an artist’s work will already know.

The eureka moment was how we could use this information to expand and build on the things we already know about the artist. Presenting well-known aspects of the artist’s life alongside things that are not as obvious or well known, to create a wider story.

Frida Kahlo

We can take someone like Frida Kahlo and her audience and expose them to different facets of her work and life. By leveraging structure and connections between entities, rather than just the entities themselves, we can arrange previously created content in a different way and generate a new perspective, new insight and new value.

There’s a common expression in the data world that says that ‘data is the new oil’. While this is a parallel between the richest companies making their money from data nowadays rather than oil, as it used to be just a few decades ago, it is also true that: We mine data like oil. We treat data as a finite resource that we burn once or maybe refine and then burn, or maybe refine and turn into something else. But we don’t really think of data as reusable.

I’d like to propose that data is in fact a renewable resource. Like the wind…

If we take the previous example of the art gallery and how they manage the data they hold about Frida Kahlo. They might want to use the same content in different ways, and why wouldn’t they?

We have different ways of building a story around a single artist and their life. We can learn from DBpedia that Frida Kahlo is considered a surrealist artist, which allows us to build an exhibition about Surrealism.

But we can also learn that Frida Kahlo is a self-taught artist. We can build an exhibition centred around people who were self-taught and influential in different fields.

We can think of Frida’s personal life and how she is an LGBTQ+ icon for being openly and unapologetically queer, more specifically bisexual. This opens up an avenue to show LGBTQ+ representation in media throughout history.

Exploring data related to Frida Kahlo

For us this is one of the most powerful things about linked data, and it’s one of the easiest ways to show potential clients how they can derive value from existing data by coupling it with readily available open sources such as DBpedia.

This also promotes a culture of data reusability that actively goes against the problem of siloed data. Those gathering data don’t just think about their specific use case but rather about how their data can be useful for others and how they can best design it so it’s reusable elsewhere.

Lateral Search Technique

Besides the more obvious aspects of an open knowledge structure, an aspect that can sometimes be overlooked is the inherent hierarchy of concepts in something like Wikipedia’s (and consequently DBpedia’s) category pages. By starting at a specific level and generalising, we are able to find relevant information that relates to the subject laterally.

This process of lateral search can provide very good results, but it can also be a difficult process of testing out different mechanisms and finding the best way to select the most relevant connections, usually on a trial-and-error basis. Over the years we have used this lateral search technique as a more nuanced approach to topic classification that doesn’t require explicit training data, as we can rely on DBpedia’s structure rather than training data to make assertions.

With the trial-and-error approach Wallscope has created a set of tools that helps us iterate faster based on the use case for implementations of combined Natural Language Processing and structure mining from knowledge graphs.

Data Foundry

Data Foundry frontend

Data Foundry is Wallscope’s main packaged software offering for knowledge graph creation and manipulation. It is an extendable platform that is modular by design and scalable across machine clusters. Its main function is to act as a processing platform that can connect multiple data sources (usually a collection of files in a file system) to a single knowledge graph output (usually an RDF triplestore). Through a pipeline of data processors that can be tailored to specific use cases, information is extracted from unstructured data formats and turned into structured data before being stored in the knowledge graph.

Several processors in Data Foundry use the concept of structure mining and lateral search. Some use cases use DBpedia, others use custom vocabularies.

STRVCT

STRVCT frontend

STRVCT is Wallscope’s structured vocabulary creation tool. It aims to allow any user to create/edit SKOS vocabularies with no prior knowledge of RDF, linked data, or structured vocabularies. By virtue of its function, STRVCT gives users ownership of their own data throughout the development process, ensuring it is in the precise shape that they want it to be in.

STRVCT is a stepping stone in Wallscope’s pipeline – once a vocabulary is created, it can be processed by Data Foundry and used with any of our applications.

HiCCUP

HiCCUP frontend

Standing for Highly Componentised Connection Unification Platform, HiCCUP is the “glue” in many of Wallscope’s projects and solutions.

It gives users the ability to create connections to SPARQL endpoints for templated queries and RDF manipulation and exposes those templates as an API with RDF outputs. The latest version also allows users to connect to a JSON API and real time conversion to RDF. This has proven useful in integrating data sources such as IoT device readings into knowledge graph environments.

Pronto

Pronto landing page

Pronto was created to overcome the challenges related to the reuse of ontologies. It is an open-source ontology search engine that provides fuzzy matching across many popular ontologies, originally selected from the prefix.cc user-curated “popular” list, along with others selected by Wallscope.

Pronto has already proved a reliable internal solution used by our team to shorten the searching process and to aid visualisation.

If you’re interested in collaborating with us or using any of the tools mentioned above, send an email to contact@wallscope.co.uk

You can find more Wallscope articles at https://medium.com/wallscope and more articles written by me at https://medium.com/@anteroduarte

A big thank you to Wallscope, especially Antero Duarte for presenting how to extract knowledge from DBpedia and for showcasing cool and innovative tools.

Yours,

DBpedia Association

The post Structure mining with DBpedia appeared first on DBpedia Association.

Why Data Centricity Is Key To Digital Transformation

Wed, 17 Feb 2021 09:01:17 +0000

DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with eccenca. They will explain why companies struggle with digital transformation and why data centricity is the key to this transformation. Have fun while reading!

by Hans-Christian Brockmann, CEO eccenca

Why Data Centricity Is Key To Digital Transformation

Only a few large enterprises like Google, Amazon, and Uber have made the mindset and capability transition to turn data and knowledge into a strategic advantage. They have one thing in common: their roadmap is built on data-centric principles (and yes, they use knowledge graph technology)!

Over the last years it has become obvious that the majority of companies fail at digital transformation as long as they continue to follow their outdated IT management best practices. We, the knowledge graph community, have long been reacting to this with rather technical explanations about RDF and ontologies. While our arguments have been right at all times, they did not really address the elephant in the room: that it’s not only a technological issue but a question of mindset.

Commonly, IT management is stuck with application-centric principles. Solutions for a particular problem (e.g. financial transactions, data governance, GDPR compliance, customer relationship management) are thought about in singular applications. This has created a plethora of stand-alone applications in companies which store and process interrelated or even identical
data but are unable to integrate. That’s because every application has its own schema and data semantics. And companies have hundreds or even thousands different applications at work. Still, when talking about data integration projects or digital transformation the IT management starts the argument from an application point of view.

Companies Struggle With Digital Transformation Because Of Application Centricity

This application-centric mindset has created an IT quagmire. It prevents automation and digital
transformation because of three main shortcomings.

Data IDs are local. The identification of data is restricted to its source application which prevents global identification, access and (re)use.
Data semantics are local. The meaning of data, information about their constraints, rules and context are hidden either in the software code or in the user’s head. This makes it difficult to work cooperatively with data and also hinders automation of data-driven processes.
The knowledge about data’s logic is IT turf. Business users who actually need this knowledge to scale their operations and develop their business in an agile way are always dependent on an overworked IT which knows the technicalities but doesn’t understand the business context and needs. Thus, scalability and agility are prevented from the start.

Data centricity changes this perspective because it puts data before applications. Moreover, it simplifies data management. The term was coined by author and IT veteran Dave McComb. The aim of data centricity is to “base all application functionality on a single, simple, extensible and federateable data model”, as Dave recently outlined in the latest Escape From Data Darkness
webcast episodes. At first, this might sound like advocating yet another one of these US$ 1bn data integration / consolidation projects done by a big name software vendor, the likes of which have failed over and over again. Alas, it’s quite the opposite.

A Central Data Hub For Knowledge Driven Automation

Data centricity does not strive to exchange the existing IT infrastructure with just another proprietary application. Data centricity embraces the open-world assumption and agility concepts and thus natively plays well with the rest of the data universe. The application-centric mindset always struggles with questions of integration, consolidation and a religious commitment to being the “single source of truth”. The data-centric mindset does not have to, because integration is (no pun intended) an integral part of the system. Or as Dave puts it in his book “The Data-Centric Revolution”: “In the Data-Centric approach […] integration is far simpler within a domain and across domains [because] it is not reliant on mastering a complex schema. […] In the Data-Centric approach, all identifiers (all keys) are globally unique. Because of this, the system integrates information for you. Everything relating to an entity is already connected to that entity” without having to even consolidate it in a central silo.

Of course, this sounds exactly like what we have been talking about all those years with knowledge graph technology and FAIR data. And we have seen it working beautifully with our customers like Nokia, Siemens, Daimler and Bosch. eccenca Corporate Memory has provided them with a central data hub for their enterprise information that digitalizes expert knowledge,
connects disparate data and makes it accessible to both machines and humans. Still, what we have learned from those projects is this: Conviction comes before technology, just as data comes before the application. Knowledge graph technology certainly is the key maker to digital transformation. But a data-centric mindset is key.

A big thank you to eccenca, especially Hans-Christian Brockmann for explainig why data centricity is the key to digital transformation. Four years ago eccenca became a member of the DBpedia Association and helped to increase the DBpedia network. Thanks for your contribution and constant support! Feel free to check out eccenca’s member presentation page: https://www.dbpedia.org/dbpedia-members/eccenca/

Yours,

DBpedia Association

The post Why Data Centricity Is Key To Digital Transformation appeared first on DBpedia Association.

Linked Data projects at the Vrije Universiteit Amsterdam Network Institute

Mon, 08 Feb 2021 10:43:37 +0000

DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with the Network Institute (NI) of VU Amsterdam. They will introduce the NI Academy Assistants programme for bright young master students. Have fun while reading!

by Victor de Boer

The Network Institute of VU Amsterdam is an interdisciplinary and cross-faculty institute that studies the interaction between digital technology and society, in other words the Digital Society. Within the VU, it is a central player in realizing the VU’s Connected World research agenda. As such, the Network Institute supports the sharing of knowledge on the Web therefore is one of the partner members of DBpedia.

Network Institute Academy Assistant (NIAA)

Through a variety of activities, many VU researchers have benefited from their Network Institute-based collaborations, and the institute has introduced interdisciplinary research work to a generation of young VU scholars. One of the ways through which this is organized is through the NI Academy Assistants programme. With the Network Institute Academy Assistant (NIAA) program the Network Institute aims to interest bright young master students for conducting scientific research and pursuing an academic career. The program brings together scientists from different disciplines; every project combines methods & themes from informatics, social sciences and/or humanities. For each project, 2 or 3 student research assistants work together. Since 2010, several projects have included a Semantic Web or Linked Data component and several directly used or reflected on DBpedia information. We would like to showcase a selection here:

The Network Institute Academy Assistants programme supports interdisciplinary research on the topic of the Connected World

INVENiT

In the INVENiT project, which ran in 2014, researchers and students from VU’s computer science and history faculties collaborated on connecting the image database and metadata of the Rijksmuseum with bibliographical data of STCN – Short Title Catalogue of the Netherlands (1550-1800), thereby improving information retrieval for humanities researchers. As humanities researchers depend on the efficiency and effectiveness of the search functionality provided in various cultural heritage collections online (e.g. images, videos and textual material), such links drastically can improve their work. This project combined crowdsourcing, linked data and knowledge engineering with research into methodology.

Academy Assistant Cristina Bucur of the INVENiT project showing a picture bible (Cristina is now a PhD student at VU)

Linked Art Provenance

One other example of a digital humanities project was the Linked Art Provenance project. The goal of this endeavour was to support art-historical provenance research, with methods that automatically integrate information from heterogeneous sources. Art provenance regards the changes in ownership over time of artworks, involving actors, events and places. It is an important source of information for researchers interested in the history of collections. In this project, the researchers collaborated on developing a provenance identification and processing workflow, which incorporated Linked Data principles. This workflow was validated in a case study around an auction (1804), during which the paintings from the former collection of Pieter Cornelis van Leyden (1732-1788) was dispersed.

Network Institute

Interoperable Linguistic Corpora

A more recent project was “Provenance-Aware Methods for the Interoperability of Linguistic Corpora”. The project addresses the interoperability between existing corpora with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications. We demonstrate how linguistic annotations from separate corpora can be merged through a similar format to thereby make annotation content simultaneously accessible. The process for developing the nanopublications is described, and SPARQL queries are performed to extract interesting content from the new representations. The queries show that information of multiple corpora can now be retrieved more easily and effectively with the automated interoperability of the information of different corpora in a uniform data format.

Connected Knowledge

These are just three examples of a great many projects supported through the Network Institute Academy Assistants programme since 2010. A connected world needs connected knowledge and therefore we look forward to more collaboration with the Semantic Web and Linked Data community and DBpedia in specific.

A big thank you to the Network Institute, especially Victor de Boer for presenting their innovative programme.

Yours,

DBpedia Association

The post Linked Data projects at the Vrije Universiteit Amsterdam Network Institute appeared first on DBpedia Association.

The Diffbot Knowledge Graph and Extraction Tools

Thu, 14 Jan 2021 12:00:40 +0000

DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with Diffbot. They will present the Diffbot Knowledge Graph and various extraction tools. Have fun while reading!

by Diffbot

Diffbot’s mission to “structure the world’s knowledge” began with Automatic Extraction APIs meant to pull structured data from most pages on the public web by leveraging machine learning rather than hand-crafted rules.

More recently, Diffbot has emerged as one of only three Western entities to crawl a vast majority of the web, utilizing our Automatic Extraction APIs to make the world’s largest commercially-available Knowledge Graph.

A Knowledge Graph At The Scale Of The Web

The Diffbot Knowledge Graph is automatically constructed by crawling and extracting data from over 60 billion web pages. It currently represents over 10 billion entities and 1 trillion facts about People, Organizations, Products, Articles, Events, among others.

Users can access the Knowledge Graph programmatically through an API. Other ways to access the Knowledge Graph include a visual query interface and a range of integrations (e.g., Excel, Google Sheets, Tableau).

Visually querying the web like a database

Whether you’re consuming Diffbot KG data in a visual “low code” way or programmatically, we’ve continually added features to our powerful query language (Diffbot Query Language, or DQL) to allow users to “query the web like a database.”

Guilt-Free Public Web Data

Current use cases for Diffbot’s Knowledge Graph and web data extraction products run the gamut and include data enrichment; lead enrichment; market intelligence; global news monitoring; large-scale product data extraction for ecommerce and supply chain; sentiment analysis of articles, discussions, and products; and data for machine learning. For all of the billions of facts in Diffbot’s KG, data provenance is preserved with the original source (a public URL) of each fact.

Entities, Relationships, and Sentiment From Private Text Corpora

The team of researchers at Diffbot has been developing new natural language processing techniques for years to improve their extraction and KG products. In October 2020, Diffbot made this technology commercially-available to all via the Natural Language API.

Our Natural Language API Demo Parsing Text Input About Diffbot Founder, Mike Tung

Our Natural Language API pulls out entities, relationships/facts, categories and sentiment from free-form texts. This allows organizations to turn unstructured texts into structured knowledge graphs.

Diffbot and DBpedia

In addition to extracting data from web pages, Diffbot’s Knowledge Graph compiles public web data from many structured sources. One important source of knowledge is DBpedia. Diffbot also contributes to DBpedia by providing access to our extraction and KG services and collaborating with researchers in the DBpedia community. For a recent collaboration between DBpedia and Diffbot, be sure to check out the Diffbot track in DBpedia’s Autumn Hackathon for 2020.

A big thank you to Diffbot, especially Filipe Mesquita for presenting their innovative Knowledge Graph.

Yours,

DBpedia Association

The post The Diffbot Knowledge Graph and Extraction Tools appeared first on DBpedia Association.