NLP Archives - DBpedia Association

Structure mining with DBpedia

Tue, 23 Feb 2021 08:48:36 +0000

DBpedia Member Features – Last year we gave DBpedia members the chance to present special products, tools and applications on the DBpedia blog. We have already published several posts in which our members provided unique insights. This week we will continue with Wallscope, they will show how you can derive value from existing data by coupling it with readily available open sources such as DBpedia. Have fun while reading!

by Antero Duarte, Lead Developer, Wallscope

Wallscope and DBpedia

Wallscope has been using DBpedia for many years – for example as part of our demos; to inform discussions with clients; and fundamentally to help people understand linked data and the power of knowledge graphs.

We also work with Natural Language Processing, and for us the intersection of these two areas of research and technology is extremely powerful. We can provide value to organisations at a low cost of entry, since resources like DBpedia and open source NLP models can be used with little effort.

We quickly realised that while linking entities, finding and expanding on keywords was interesting, this was more of an interesting novelty than a technology that would directly solve our clients’ problems.

For example, when speaking to an art gallery about classifying content, the prospect of automatically identifying artists’ names in text might appeal, because this is often a manual process. The gallery might also be interested in expanding artists’ profiles with things like their birth date and place, and the ability to find other artists based on properties of a previously identified artist. This is all interesting, but it’s mostly information that people looking at an artist’s work will already know.

The eureka moment was how we could use this information to expand and build on the things we already know about the artist. Presenting well-known aspects of the artist’s life alongside things that are not as obvious or well known, to create a wider story.

Frida Kahlo

We can take someone like Frida Kahlo and her audience and expose them to different facets of her work and life. By leveraging structure and connections between entities, rather than just the entities themselves, we can arrange previously created content in a different way and generate a new perspective, new insight and new value.

There’s a common expression in the data world that says that ‘data is the new oil’. While this is a parallel between the richest companies making their money from data nowadays rather than oil, as it used to be just a few decades ago, it is also true that: We mine data like oil. We treat data as a finite resource that we burn once or maybe refine and then burn, or maybe refine and turn into something else. But we don’t really think of data as reusable.

I’d like to propose that data is in fact a renewable resource. Like the wind…

If we take the previous example of the art gallery and how they manage the data they hold about Frida Kahlo. They might want to use the same content in different ways, and why wouldn’t they?

We have different ways of building a story around a single artist and their life. We can learn from DBpedia that Frida Kahlo is considered a surrealist artist, which allows us to build an exhibition about Surrealism.

But we can also learn that Frida Kahlo is a self-taught artist. We can build an exhibition centred around people who were self-taught and influential in different fields.

We can think of Frida’s personal life and how she is an LGBTQ+ icon for being openly and unapologetically queer, more specifically bisexual. This opens up an avenue to show LGBTQ+ representation in media throughout history.

Exploring data related to Frida Kahlo

For us this is one of the most powerful things about linked data, and it’s one of the easiest ways to show potential clients how they can derive value from existing data by coupling it with readily available open sources such as DBpedia.

This also promotes a culture of data reusability that actively goes against the problem of siloed data. Those gathering data don’t just think about their specific use case but rather about how their data can be useful for others and how they can best design it so it’s reusable elsewhere.

Lateral Search Technique

Besides the more obvious aspects of an open knowledge structure, an aspect that can sometimes be overlooked is the inherent hierarchy of concepts in something like Wikipedia’s (and consequently DBpedia’s) category pages. By starting at a specific level and generalising, we are able to find relevant information that relates to the subject laterally.

This process of lateral search can provide very good results, but it can also be a difficult process of testing out different mechanisms and finding the best way to select the most relevant connections, usually on a trial-and-error basis. Over the years we have used this lateral search technique as a more nuanced approach to topic classification that doesn’t require explicit training data, as we can rely on DBpedia’s structure rather than training data to make assertions.

With the trial-and-error approach Wallscope has created a set of tools that helps us iterate faster based on the use case for implementations of combined Natural Language Processing and structure mining from knowledge graphs.

Data Foundry

Data Foundry frontend

Data Foundry is Wallscope’s main packaged software offering for knowledge graph creation and manipulation. It is an extendable platform that is modular by design and scalable across machine clusters. Its main function is to act as a processing platform that can connect multiple data sources (usually a collection of files in a file system) to a single knowledge graph output (usually an RDF triplestore). Through a pipeline of data processors that can be tailored to specific use cases, information is extracted from unstructured data formats and turned into structured data before being stored in the knowledge graph.

Several processors in Data Foundry use the concept of structure mining and lateral search. Some use cases use DBpedia, others use custom vocabularies.

STRVCT

STRVCT frontend

STRVCT is Wallscope’s structured vocabulary creation tool. It aims to allow any user to create/edit SKOS vocabularies with no prior knowledge of RDF, linked data, or structured vocabularies. By virtue of its function, STRVCT gives users ownership of their own data throughout the development process, ensuring it is in the precise shape that they want it to be in.

STRVCT is a stepping stone in Wallscope’s pipeline – once a vocabulary is created, it can be processed by Data Foundry and used with any of our applications.

HiCCUP

HiCCUP frontend

Standing for Highly Componentised Connection Unification Platform, HiCCUP is the “glue” in many of Wallscope’s projects and solutions.

It gives users the ability to create connections to SPARQL endpoints for templated queries and RDF manipulation and exposes those templates as an API with RDF outputs. The latest version also allows users to connect to a JSON API and real time conversion to RDF. This has proven useful in integrating data sources such as IoT device readings into knowledge graph environments.

Pronto

Pronto landing page

Pronto was created to overcome the challenges related to the reuse of ontologies. It is an open-source ontology search engine that provides fuzzy matching across many popular ontologies, originally selected from the prefix.cc user-curated “popular” list, along with others selected by Wallscope.

Pronto has already proved a reliable internal solution used by our team to shorten the searching process and to aid visualisation.

If you’re interested in collaborating with us or using any of the tools mentioned above, send an email to contact@wallscope.co.uk

You can find more Wallscope articles at https://medium.com/wallscope and more articles written by me at https://medium.com/@anteroduarte

A big thank you to Wallscope, especially Antero Duarte for presenting how to extract knowledge from DBpedia and for showcasing cool and innovative tools.

Yours,

DBpedia Association

The post Structure mining with DBpedia appeared first on DBpedia Association.

The Diffbot Knowledge Graph and Extraction Tools

Thu, 14 Jan 2021 12:00:40 +0000

DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with Diffbot. They will present the Diffbot Knowledge Graph and various extraction tools. Have fun while reading!

by Diffbot

Diffbot’s mission to “structure the world’s knowledge” began with Automatic Extraction APIs meant to pull structured data from most pages on the public web by leveraging machine learning rather than hand-crafted rules.

More recently, Diffbot has emerged as one of only three Western entities to crawl a vast majority of the web, utilizing our Automatic Extraction APIs to make the world’s largest commercially-available Knowledge Graph.

A Knowledge Graph At The Scale Of The Web

The Diffbot Knowledge Graph is automatically constructed by crawling and extracting data from over 60 billion web pages. It currently represents over 10 billion entities and 1 trillion facts about People, Organizations, Products, Articles, Events, among others.

Users can access the Knowledge Graph programmatically through an API. Other ways to access the Knowledge Graph include a visual query interface and a range of integrations (e.g., Excel, Google Sheets, Tableau).

Visually querying the web like a database

Whether you’re consuming Diffbot KG data in a visual “low code” way or programmatically, we’ve continually added features to our powerful query language (Diffbot Query Language, or DQL) to allow users to “query the web like a database.”

Guilt-Free Public Web Data

Current use cases for Diffbot’s Knowledge Graph and web data extraction products run the gamut and include data enrichment; lead enrichment; market intelligence; global news monitoring; large-scale product data extraction for ecommerce and supply chain; sentiment analysis of articles, discussions, and products; and data for machine learning. For all of the billions of facts in Diffbot’s KG, data provenance is preserved with the original source (a public URL) of each fact.

Entities, Relationships, and Sentiment From Private Text Corpora

The team of researchers at Diffbot has been developing new natural language processing techniques for years to improve their extraction and KG products. In October 2020, Diffbot made this technology commercially-available to all via the Natural Language API.

Our Natural Language API Demo Parsing Text Input About Diffbot Founder, Mike Tung

Our Natural Language API pulls out entities, relationships/facts, categories and sentiment from free-form texts. This allows organizations to turn unstructured texts into structured knowledge graphs.

Diffbot and DBpedia

In addition to extracting data from web pages, Diffbot’s Knowledge Graph compiles public web data from many structured sources. One important source of knowledge is DBpedia. Diffbot also contributes to DBpedia by providing access to our extraction and KG services and collaborating with researchers in the DBpedia community. For a recent collaboration between DBpedia and Diffbot, be sure to check out the Diffbot track in DBpedia’s Autumn Hackathon for 2020.

A big thank you to Diffbot, especially Filipe Mesquita for presenting their innovative Knowledge Graph.

Yours,

DBpedia Association

The post The Diffbot Knowledge Graph and Extraction Tools appeared first on DBpedia Association.

Home Sweet Home – The 13th DBpedia Community Meeting

Tue, 18 Jun 2019 12:52:59 +0000

For the second time now, we co-located one of our DBpedia community meetings with the LDK-conference. After the previous edition in Galway two years ago, It was Leipzig’s turn to host the event. Thus, the 13th DBpedia community meeting took place in this beautiful city which is also home to the DBpedia Association’s head office. Win, Win we’d say.

After a very successful LDK conference May 20th-21st, representatives of the European DBpedia community met at Villa Ida Mediencampus, on Thursday, May 23rd, to present their work with DBpedia and to exchange about the DBpedia Databus.

For those of you who missed it or for those who want a little retrospective on the day, this blog post provides you with a short LDK-wrap-up as well as a recap of our DBpedia Day.

First things first

First and foremost, we would like to thank LDK organizers for co-locating our meeting and thus enabling fruitful synergies, and a platform for the DBpedia community to exchange.

LDK

The first presentation that kicked-off the conference was given by Prof. Christiane Fellbaum from Princeton University. The topic of her talk was on “Mapping the Lexicons of Signs and Words” with the main focus on her research of mapping WordNet and SignStudy, a resource for American Sign Language. Shortly after, Prof Eduard Werner from Leipzig University gave a very exciting talk on the “Sorbian languages”. He discussed the nature of the Sorbian languages, their historical background, and the unfortunate imminent extinction of lower Sorbian due to a decline of native speakers.

The first day of LDK was full of exciting presentations related to various language-oriented topics. Researchers exchanged about linguistic vocabularies, SPARQL query recommendations, role and reference grammar, language detection, entity recognition, machine translation, under-resourced languages, metaphor identification, event detection and linked data in general. The first day ended with fruitful discussions during the poster session. After at the end of the first conference day, LDK visitors had the chance to mingle with locals in some of Leipzig’s most exciting bars during a pub crawl.

Prof. Christian Bizer from the University of Mannheim opened the second day with a keynote on “Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the Web?”. In his talk, he gave a nice overview of the research on knowledge extraction around the large-scale Web Data Commons corpus, findings, open challenges and possible exploitations of this corpus.

The second day was busy with four sessions, each populated with presentations on exciting topics ranging from relation classification, dictionary linking and entity linking, to terminology models, topical thesauri and morphology.

The series of presentations was ended with an Organ Prelude played by David Timm, the University Music Director at the Leipzig University. Finally, the day and the conference was concluded with a conference dinner at Moritzbastei, one of Leipzig’s famous cultural centres.

DBpedia Day

On May 23rd, the DBpedia Community met for the 13th DBpedia community meeting. The event attracted more than 60 participants who extended their LDK experience or followed our call to Leipzig.

Opening & keynotes

The meeting was opened by Dr. Sebastian Hellmann, the executive director of the DBpedia Association. He gave an overview of the latest developments and achievements around DBpedia, with the main focus on the DBpedia Databus technologies. The first keynote was given by Dr. Peter Haase, from metaphacts, with an unusual interactive presentation on “Linked Data Fun with DBpedia”. The second keynote speaker was Prof. Heiko Paulheim, presenting findings, challenges and results from his work on the construction of the DBkWiki Knowledge Graph by exploiting the DBpedia extraction framework.

Showcase session

The showcases session started with a presentation given by Krzysztof Węcel on “Citations and references in DBpedia”, followed by Peter Nancke with a presentation on the “TeBaQA Question Answering System”, Maribel Acosta Deibe speaking about “Crowdsourcing the Quality of DBpedia” and finally, a presentation by Angus Addlesee on “Data Reconciliation using DBpedia”.

NLP & DBpedia session

The DBpedia & NLP session was opened by Diego Moussallem presenting the results from his work on “Generating Natural Language from RDF Data”. The second presentation was given by Christian Jilek on the topic of “Named Entity Recognition for Real-Time Applications”, which at the same time won the best research paper at the LDK conference. Next, Jonathan Kobbe presented the best student paper at the LDK conference on the topic of “Argumentative Relation Classification”. Finally, Edgard Marx closed the session with an overview presentation on “From the word to the resource”.

Side-Event – Hackathon

The “Artificial Intelligence for Smart Agriculture” Hackathon focused on enhancing the usability of automatic analysis tools which utilize semantic big data for agriculture, as well as conducting an outreach of the DataBio project for the DBpedia community. The event was supported by PNO, Spacebel, PSNC, and InfAI e.V.

We improved the visualization module of Albatross, a platform for processing and analyzing Linked Open Data, and added functionalities to geo-L, the geospatial link discovery tool.

In addition, we presented a paper about Linked Data publication pipelines, focusing on agri-related data, at the co-located LSWT conference.

Wrap Up

After the event, DBpedians joined the DBpedia Association in the nearby pub Gosenschenke to delve into more vital talks about the Semantic Web world, Linked Data & DBpedia.

In case you missed the event, all slides and presentations are available on our website. Further insights feedback and photos about the event can be found on Twitter via #DBpediaLeipzig.

We are currently looking forward to the next DBpedia Community Meeting, on Sept, 12th in Karlsruhe, Germany. This meeting is co-located with the SEMANTiCS Conference. Contributions are still welcome. Just ping us via dbpedia@infai.org and show us what you’ve got. You should also get in touch with us if you want to host a DBpedia Meetup yourself. We will help you with the program, the dissemination or organizational matters of the event if need be.

Stay tuned, check Twitter, Facebook, and the website, or subscribe to our newsletter for the latest news and updates.

Your DBpedia Association

The post Home Sweet Home – The 13th DBpedia Community Meeting appeared first on DBpedia Association.

Artificial Intelligence (AI) and DBpedia

Thu, 11 Apr 2019 13:20:14 +0000

Artificial Intelligence (AI) is currently the central subject of the just announced ‘Year of Science’ by the Federal German Ministry. In recent years, new approaches were explored on how to facilitate AI, new mindsets were established and new tools were developed, new technologies implemented. AI is THE key technology of the 21st century. Together with Machine Learning (ML), it transforms society faster than ever before and, will lead humankind to its digital future.

In this digital transformation era, success will be based on using analytics to discover the insights locked in the massive volume of data being generated today. Success with AI and ML depends on having the right infrastructure to process the data.[1]

The Value of Data Governance

One key element to facilitate ML and AI for the digital future of Europe, are ‘decentralized semantic data flows’, as stated by Sören Auer, a founding member of DBpedia and current director at TIB, during a meeting about the digital future in Germany at the Bundestag. He further commented that major AI breakthroughs were indeed facilitated by easily accessible datasets, whereas the Algorithms used were comparatively old.

In conclusion, Auer reasons that the actual value lies in data governance. Infact, in order to guarantee progress in AI, the development of a common and transparent understanding of data is necessary. [2]

DBpedia Databus – Digital Factory Platform

The DBpedia Databus – our digital factory Platform – is one of many drivers that will help to build the much-needed data infrastructure for ML and AI to prosper. With the DBpedia Databus, we create a hub that facilitates a ‘networked data-economy’ revolving around the publication of data. Upholding the motto, Unified and Global Access to Knowledge, the databus facilitates exchanging, curating and accessing data between multiple stakeholders – always, anywhere. Publishing data on the Databus means connecting and comparing (your) data to the network. Check our current DBpedia releases via http://downloads.dbpedia.org/repo/dev/.

DBpediaDay – & AI for Smart Agriculture

Furthermore, you can learn about the DBpedia Databus during our 13th DBpedia Community meeting, co-located with LDK conference, in Leipzig, May 2019. Additionally, as a special treat for you, we also offer an AI side-event on May 23rd, 2019.

May we present you the thinktank and hackathon – “Artificial Intelligence for Smart Agriculture”. The goal of this event is to develop new ideas and small tools which can demonstrate the use of AI in the agricultural domain or the use of AI for a sustainable bio-economy. In that regard, a special focus will be on the use and the impact of linked data for AI components.

In short, the two-part event, co-located with LSWT & DBpediaDay, comprises workshops, on-site team hacking as well as presentations of results. The activity is supported by the projects DataBio and Bridge2Era as well as CIAOTECH/PNO. All participating teams are invited to join and present their projects. Further Information are available here. Please submit your ideas and projects here.

Finally, the DBpedia Association is looking forward to meeting you in Leipzig, home of our head office. Pay us a visit!

____

Resources:

[1] Zeus Kerravala; The Success of ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Requires an Architectural Approach to Infrastructure. ZK Research: A Division of Kerravala Consulting © 2018 ZK Research, available via http://bit.ly/2UwTJRo

[2] Sören Auer; Statement at the Bundestag during a meeting in AI, Summary is available via https://www.tib.eu/de/service/aktuelles/detail/tib-direktor-als-experte-zu-kuenstlicher-intelligenz-ki-im-deutschen-bundestag/

The post Artificial Intelligence (AI) and DBpedia appeared first on DBpedia Association.

Call for Participation – LDK Conference & DBpedia Day

Sun, 24 Mar 2019 12:40:01 +0000

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. In that context, we would like to draw your attention to the 2nd Language, Data and Knowledge conference, short LDK conference which will be held in Leipzig from May 20th till 22nd, 2019.

The Conference

This new biennial conference series aims at bringing together researchers from across disciplines concerned with language data in data science and knowledge-based applications.

Keynote Speakers

We are happy, that Christian Bizer, a founding member of DBpedia, will be one of the three amazing keynote speakers that open the LDK conference. Apart from Christian, Christiane Fellbaum from Princeton University and Eduart Werner, representative of Leipzig University will share their thoughts on current language data issues to start vital discussions revolving around language data.

Be part of this event in Leipzig and catch up with the latest research outcomes in the areas of acquisition, provenance, representation, maintenance, usability, quality as well as legal, organizational and infrastructure aspects of language data.

DBpedia Community Meeting

To get the full Leipzig experience, we also like to invite you to our DBpedia Community meeting, which is colocated with LDK and will be held on May, 23rd 2019. Contributions are still welcome. Just in get in touch via dbpedia@infai.org .

We also offer an interesting side-event, the Thinktank and Hackathon “Artificial Intelligence for Smart Agriculture”. Visit our website for further information.

Join LDK conference 2019 and our DBpedia Community Meeting to catch up with the latest research and developments in the Semantic Web Community.

Yours DBpedia Association

The post Call for Participation – LDK Conference & DBpedia Day appeared first on DBpedia Association.