DBpedia Live Archives - DBpedia Association https://www.dbpedia.org/dbpedia-live/ Global and Unified Access to Knowledge Graphs Wed, 28 Jul 2021 10:57:30 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png DBpedia Live Archives - DBpedia Association https://www.dbpedia.org/dbpedia-live/ 32 32 LSWT 2021 – Wrap Up: DBpedia Talks and Tech Tutorial https://www.dbpedia.org/blog/lswt-2021-wrap-up-dbpedia-talks-and-tech-tutorial/ https://www.dbpedia.org/blog/lswt-2021-wrap-up-dbpedia-talks-and-tech-tutorial/#respond Mon, 12 Jul 2021 07:35:27 +0000 https://www.dbpedia.org/?p=4745 Last week was a fantastic week for DBpedia. On July 7, 2021 we gave 2 talks at the Leipzig Semantic Web Day (LSWT 2021). One day later we organized a DBpedia Tech Tutorial.  First and foremost, we would like to thank the LSWT organizing team for hosting these events. Following, we will give you a […]

The post LSWT 2021 – Wrap Up: DBpedia Talks and Tech Tutorial appeared first on DBpedia Association.

]]>
Last week was a fantastic week for DBpedia. On July 7, 2021 we gave 2 talks at the Leipzig Semantic Web Day (LSWT 2021). One day later we organized a DBpedia Tech Tutorial. 

First and foremost, we would like to thank the LSWT organizing team for hosting these events. Following, we will give you a brief retrospective about the presentations. For further details of the tutorial follow the link to the slides.

Talks at the LSWT 

Opening

Nathanael Arndt (AKSW / InfAI) opened the third session of the LSWT 2021 with a few welcoming words and information about the programme schedule.

Updates DBpedia: Stable Releases

Afterwards, Marvin Hofer (InfAI / DBpedia Association) spoke about the Milestones of the DBpedia Association before presenting the three major DBpedia editions – Latest core, DBpedia Global & DBpedia Live. 

Latest Core and Tiny Diamond

Marvin explained the differences between the latest core dataset and the snapshot edition – Tiny Diamond. The latest core data is online for 1 year and gets monthly updates of its contained dataset. It contains community extensions, developer debugging as well as rapid community reviewing. The DBpedia snapshot release is the same kind of data as the Latest core, contains the main content of DBpedia and is more stable and consistent. The next snapshot release will be published by end of July 2021. Check more details here

DBpedia Global

Furthermore, Marvin introduced DBpedia Global, a more powerful kernel for LOD Cloud Knowledge Graph that ultimately strengthens the utility of Linked Data principles by adding more decentralization i.e., broadening the scope of Linked Data associated with DBpedia. It was released in June 2021 and you can read more details on the DBpedia blog. Finally, Marvin demonstrated that DBpedia Global can be used very visually with an example of Erich Schröger. To get more information feel free to check out the presentation here.

DBpedia Archivo

Following, Dennis Streitmatter (InfAI / DBpedia Association) explained how DBpedia Archivo can be used, especially in order of ontology FAIRness which aims to improve (re)usability. Therefore, he presented ways to find an ontology, how to access ontologies via an API, how to interoperate an ontology and how to reuse it. Speaking about interoperable ontologies, he also showed the star rating system, which is testing parsing, license and consistency for the ontologies usability. To get more information about DBpedia Archivo feel free to check out the presentation here

DBpedia Live 2.0

As the last part of the session, Alex Winter (InfAI / DBpedia Association) and Maximilian Ressel (InfAI / DBpedia Association) presented DBpedia Live 2.0 which is a cool new API and is more flexible and usable than the recent DBpedia Live version. The goal of the project is to have an always up-to-date DBpedia Knowledge Graph. It started at the beginning of 2021 and it is only available in German and English, although they will adopt more languages. For the future they aim to find early adopters and gain first customers to further improve the service so it can be even more usable. To get more information about DBpedia Live, please go to the DBpedia website or check out the presentation here.

Outro

At the very end Nathanael closed the LSWT 2021 with some thank you words to all presentators, the audience as well as the co-organisators. The next LSWT will be part of a bigger event – the Data Week – which will also include a special DBpedia event. Stay tuned!

DBpedia Tech Tutorial @ LSWT 2021 on July 8

Opening

Jan Forberg (InfAI / DBpedia Association) opened the online tutorial with some general information about the program of the tutorial, the scope and the technical information.

DBpedia in a Nutshell and Getting Started with DBpedia sessions

After the opening, Jan continued with the first topic, the background on the DBpedia Association – how it all started and the evolution of DBpedia. The DBpedia Ontology was also addressed as well as the mappings, extractors and data groups (e.g. mappings, generic, text, wikidata). Jan concluded the first topic with information on the DBpedia Knowledge Graph Diamonds.

Getting Started with DBpedia session

In addition, Jan explained where to find data including DBpedia SPARQL endpoint, the DBpedia Databus platform as a repository for DBpedia and related datasets and the novel “collections” concept. Furthermore he demonstrated how to use the DBpedia data and the DBpedia Knowledge Graph.

 

DBpedia Technology Stack

Fabian Götz (InfAI / DBpedia Association) opened the session with a talk about the DBpedia Databus platform. He explicated how the Databus platform works, the Databus SPARQL endpoints and the Web API as well as the Maven Plugin. After that, he presented dockersized services including DBpedia Virtuoso and the DBpedia Plugin, DBpedia Spotlight (incl. use cases) and DBpedia Lookup.

Afterwards, Marvin Hofer (InfAI / DBpedia Association) explained the DBpedia release process on the Databus and showed his work on debugging DBpedia and the DBpedia Mods technology. He also demonstrated the quality assurance process using the concept of  minidumps. Furthermore, the topics (Pre)fusion, ID management and the novel concept of cartridges were explained by him.

Subsequently, Denis Streitmatter (InfAI / DBpedia Association) presented the DBpedia Archivo ontology manager and how to include ontologies here. He showed various use cases, e.g. how to find ontology, how to test your ontology and how to back it up. Then he shared the ontology tests 4 star schema and the SHACL based tests for ontologies with the audience. Please read the official DBpedia Archivo call here.

Contributions to DBpedia and Outro

As it got to the end of the tutorial, Denis Streitmatter (InfAI / DBpedia Association) explained how to improve mappings or introduce new mappings. He talked about improvement of the DBpedia Information Extraction Framework as well as contributing to DBpedia tests. Afterwards, Marvin Hofer presented how to contribute to DBpedia by writing SHACL tests or by editing mappings. Finally, Jan Forberg closed the meeting by taking questions from the audience. 

In case you missed the tutorial, our presentation is also available here. Further insights, feedback and photos about the event are available on Twitter (#DBpediaTutorial).

We are now looking forward to the next DBpedia tutorial, which will be held on September 1, 2021 co-located with the LDK conference in Zaragoza, Spain. Check more details here and register now! Furthermore, we will organize the DBpedia Day on September 9, 2021 at the Semantics Conference in Amsterdam. We are looking forward to meeting all Dutch DBpedians there! 

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Julia & Emma

on behalf of the DBpedia Association

The post LSWT 2021 – Wrap Up: DBpedia Talks and Tech Tutorial appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/lswt-2021-wrap-up-dbpedia-talks-and-tech-tutorial/feed/ 0
New Prototype: Databus Collection Feature https://www.dbpedia.org/blog/databus-collections-feature/ Thu, 14 Nov 2019 11:39:45 +0000 https://blog.dbpedia.org/?p=1280 We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application. A new Databus Collection Feature? How come, and how does it work? Read below and find out […]

The post New Prototype: Databus Collection Feature appeared first on DBpedia Association.

]]>
We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application.

A new Databus Collection Feature? How come, and how does it work? Read below and find out how using the DBpedia Databus becomes easier by the day and with each new tool.

Motivation

With more and more data being uploaded to the databus we started to develop test applications using that data. The SPARQL endpoint offers a central hub to access all metadata for datasets uploaded to the databus provided you know how to write SPARQL queries. The metadata includes the download links of the data files – it was, therefore, possible to pass a SPARQL query to an application, download the actual data and then use for whatever purpose the app had.

The Databus Collection Editor

The DBpedia Databus now provides an editor for collections. A collection is basically a labelled SPARQL query that is retrievable via URI. Hence, with the collection editor you can group Databus groups and artifacts into a bundle and publish your selection using your Databus account. It is now a breeze to select the data you need, share the exact selection with others and/or use it in existing or self-made applications.

If you are not familiar with SPARQL and data queries, you can think of the feature as a shopping cart for data: You create a new cart, put data in it and tell your friends or applications where to find it. Quite neat, right?

In the following section, we will cover the user interface of the collection editor.

The Editor UI

Firstly, you can find the collection editor by going to the DBpedia Databus and following the Collections link at the top or you can get there directly by clicking here.

What you will see is the following:

General Collection Info

Secondly, since you do not have any collections yet, the editor has already created an empty collection named “Unnamed” for you. At the right side next to the label and description you will find a pen icon. By clicking the icon or the label itself you can edit its content. The collection is not published yet, so the Collection URI is blank.

Whenever you are not logged in or the collection has not been published yet, the editor will also notify you that your changes are only saved in your local browser cache and NOT remotely on our server. Keep that in mind when clearing your cache. Publishing the collection however is easy: Simply log into (or create) your Databus account and hit the publish button in the action bar. This will open up a modal where you can pick your unique collection id and hit publish again. That’s it!

The Collection Info section will now show the collection URI. Following the link will take you to the HTML representation of your collection that will be visible to others. Hitting the Edit button in the action bar will bring you back to the editor.

Collection Hierarchy

Let’s have a look at the core piece of the collection editor: the hierarchy view. A collection can be a bundle of different Databus groups and artifacts but is not limited to that. If you know how to write a SPARQL query, you can easily extend your collection with more powerful selections. Therefore, the hierarchy is split into two nodes:

  • Generated Queries: Contains all queries that are generated from your selection in the UI
  • Custom Queries: Contains all custom written SPARQL queries

Both, hierarchy nodes have a “+” icon. Clicking on this button will let you add generated or custom queries respectively.

Custom Queries

If you hit the “+” icon on the Custom Queries node, a new node called “Custom Query” will appear in the hierarchy. You can remove a custom query by clicking on the trashcan icon in the hierarchy. If you click the node it will take you to a SPARQL input field where you can edit the query.

To make your collection more understandable for others, you can even document the query by adding a label and description.

Writing Your Own Custom Queries

A collection query is a SPARQL query of the form:

SELECT DISTINCT ?file WHERE {
    {
        [SUBQUERY]
    }
    UNION
    {
        [SUBQUERY]
    }
    UNION
    ...
    UNION
    {
        [SUBQUERY]
    }
}

All selections made by generated and custom queries will be joined into a single result set with a single column called “file“. Thus it is important that your custom query binds data to a variable called “file” as well.

Generated Queries

Clicking the “+” icon on the Generated Queries node will take you to a search field. Make use of the indexed search on the Databus to find and add the groups and artifacts you need. If you want to refine your search, don’t worry: you can do that in the next step!

Once the artifact or group has been added to your collection, the Add to Collection button will turn green. Once you are done you can go back to the Editor with Back to Hierarchy button.

Your hierarchy will now contain several new nodes.

Group Facets, Artifact Facets and Overrides

Group and artifacts that have been added to the collection will show up as nodes in the hierarchy. Clicking a node will open a filter where you can refine your dataset selection. Setting a filter to a group node will apply it to all artifact nodes unless you override that setting in any artifact node manually. The filter set in the group node is shown in the artifact facets in dark grey. Any overrides in the artifact facets will be highlighted in green:

Group Nodes

A group node will provide a list of filters that will be applied to all artifacts of that group:

Artifact Nodes

Artifact nodes will then actually select data files which will be visible in the faceted view. The facets are generated dynamically from the available variants declared in the metadata.

Example: Here we selected the latest version of the databus dump as n-triple. This collection is already in use: The collection URI is passed to the new generic lookup application, which then creates the search function for the databus website. If you are interested in how to configure the lookup application, you can go here: https://github.com/dbpedia/lookup-application. Additionally, there will also be another blog post about the lookup within the next few weeks

Use Cases

The DBpedia Databus Collections are useful in many ways.

  • You can share a specific dataset with your community or colleagues.
  • You can re-use dataset others created
  • You can plug collections into databus-ready applications and avoid spending time on the download and setup process
  • You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
  • You can help others to create data queries more easily

We hope you enjoy the Databus Collection Feature and we would love to hear your feedback! You can leave your thoughts and suggestions in the new DBpedia Forum. Feedback of any kinds is highly appreciated since we want to improve the prototype as fast and user-driven as possible! Cheers!

A big thanks goes to DBpedia developer Jan Forberg who finalized the Databus Collection Feature and compiled this text.

Yours

DBpedia Association

The post New Prototype: Databus Collection Feature appeared first on DBpedia Association.

]]>
DBpedia Live Restart – Getting Things Done https://www.dbpedia.org/blog/dbpedia-live-restart-getting-things-done/ https://www.dbpedia.org/blog/dbpedia-live-restart-getting-things-done/#respond Thu, 01 Aug 2019 10:41:03 +0000 https://blog.dbpedia.org/?p=1200 Part VI of the DBpedia Growth Hack series (View all) DBpedia Live is a long term core project of DBpedia that immediately extracts fresh triples from all changed Wikipedia articles. After a long hiatus, fresh and live updated data is available once again, thanks to our former co-worker Lena Schindler whose work we feature in […]

The post DBpedia Live Restart – Getting Things Done appeared first on DBpedia Association.

]]>
Part VI of the DBpedia Growth Hack series (View all)

DBpedia Live is a long term core project of DBpedia that immediately extracts fresh triples from all changed Wikipedia articles. After a long hiatus, fresh and live updated data is available once again, thanks to our former co-worker Lena Schindler whose work we feature in this blog post. Before we dive into Lena’s report, let’s have a look at some general info about DBpedia Live:

Live Enterprise Version

OpenLink Software provides a scalable, dedicated, live Virtuoso instance, built on Lena’s remastering. Kingsley Idehen announced the dedicated business service in our new DBpedia forum. .
On the Databus, we collect publicly shared and business-ready dedicated services in the same place where you can download the data. Databus allows you to download the data, build a service, and offer that service, all in one place. Data up-loaders can also see who builds something with their data

Remastering the DBpedia Live Module

Contribution by Lena Schindler

After developing the DBpedia REST API as part of a student project in 2018, I worked as a student Research Assistant for DBpedia. My task was to analyze and patch severe issues in the DBpedia Live instance. I will shortly describe the purpose of DBpedia Live, the reasons it went out of service, what I did to fix these, and finally, the changes needed to support multi-language abstract extraction.


Overview

The DBpedia Extraction Framework is Scala-based software with numerous features that have evolved around extracting knowledge (as RDF) from Wikis. One part is the DBpedia Live module in the “live-deployed” branch, which is intended to provide a continuously updated version of DBpedia by processing Wikipedia pages on demand, immediately after they have been modified by a user. The backbone of this module is a queue that is filled with recently edited Wikipedia pages, combined with a relational database, called Live Cache, that handles the diff between two consecutive versions of a page. The module that fills the queue, called Feeder, needs some kind of connection to a Wiki instance that reports changes to a Wiki Page. The processing then takes place in four steps: 

  1. A wiki page is taken out of the queue. 
  2. Triples are extracted from the page, with a given set of extractors. 
  3. The new triples from the page are compared to the old triples from the Live Cache.
  4. The triple sets that have been deleted and added are published as text files, and the Cache is updated. 

Background

DBpedia Live has been out of service since May 2018, due to the termination of the Wikimedia RCStream Service, upon which the old DBpedia Live Feeder module relied. This socket-based service provided information about changes to an existing Wikimedia instance and was replaced by the EventStreams service, which runs over a single HTTP connection using chunked transfer encoding, and is following the Server-Sent Event (SSE) protocol. It provides a stream of events, each of which contains information about title, id, language, author, and time of every page edit of all Wikimedia instances.

Fix

Starting in September 2018, my first task was to implement a new Feeder for DBpedia Live that is based on this new Wikimedia EventStreams Service. For the Java world, the Akka framework provides an implementation of a SSE client. Akka is a toolkit developed by Lightbend. It simplifies the construction of concurrent and distributed JVM applications, enabling both Java and Scala access. The Akka SSE client and the Akka Streams module are used in the new EventStreamsFeeder (Akka Helper) to extract and process the data stream. I decided to use Scala instead of Java, because it is a more natural fit to Akka. 

After I was able to process events, I had the problem that frequent interruptions in the upstream connection were causing the processing stream to fail. Luckily, Akka provides a fallback mechanism with back-off, similar to the Binary Exponential Backoff of the Ethernet protocol which I could use to restart the stream (called “Graph” in Akka terminology).

Another problem was that in many cases, there were many changes to a page within a short time interval, and if events were processed quickly enough, each change would be processed separately, stressing the Live Instance with unnecessary load. A simple “thread sleep” reduced the number of change-sets being published every hour from thousands to a few hundred.

Multi-language abstracts

The next task was to prepare the Live module for the extraction of abstracts (typically the first paragraph of a page, or the text before the table of contents). The extractors used for this task were re-implemented in 2017. It turned out to be a configuration issue first, and second a candidate for long debugging sessions, fixing issues in the dependencies  between the “live” and “core” modules. Then, in order to allow the extraction of abstracts in multiple languages, the “live” module needed many small changes, at places spread across the code-base, and care had to be taken not to slow down the extraction in the single language case, compared to the performance before the change. Deployment was delayed by an issue with the remote management unit of the production server, but was accomplished by May 2019.

Summary

I also collected my knowledge of the Live module in detailed documentation, addressed to developers who want to contribute to the code. This includes an explanation of the architecture as well as installation instructions. After 400 hours of work, DBpedia Live is alive and kicking, and now supports multi-language abstract extraction. Being responsible for many aspects of Software Engineering, like development, documentation, and deployment, I was able to learn a lot about DBpedia and the Semantic Web, hone new skills in database development and administration, and expand my programming experience using Scala and Akka. 

“Thanks a lot to the whole DBpedia Team who always provided a warm and supportive environment!”

Thank you Lena, it is people like you who help DBpedia improve and develop further, and help to make data networks a reality.

Follow DBpedia on LinkedIn, Twitter or Facebook and stop by the DBpedia Forum to check out the latest discussions.

Yours DBpedia Association

The post DBpedia Live Restart – Getting Things Done appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/dbpedia-live-restart-getting-things-done/feed/ 0