DBpedia Databus Archives - DBpedia Association https://www.dbpedia.org/dbpedia-databus/ Global and Unified Access to Knowledge Graphs Fri, 24 Jun 2022 08:17:31 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png DBpedia Databus Archives - DBpedia Association https://www.dbpedia.org/dbpedia-databus/ 32 32 Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 https://www.dbpedia.org/blog/wrap-up-dbpedia-tutorial-2-0-knowledge-graph-conference-2022/ Wed, 11 May 2022 10:52:21 +0000 https://www.dbpedia.org/?p=5330 On Tuesday the 2nd of May, the DBpedia team organized the second edition of the DBpedia tutorial at the Knowledge Graph Conference (KGC) 2022. This year Johannes Frey made his way to New York and gave the tutorial on site. Milan Dojchinovski and Jan Forberg joined online. The ultimate goal of the tutorial was to […]

The post Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 appeared first on DBpedia Association.

]]>
On Tuesday the 2nd of May, the DBpedia team organized the second edition of the DBpedia tutorial at the Knowledge Graph Conference (KGC) 2022. This year Johannes Frey made his way to New York and gave the tutorial on site. Milan Dojchinovski and Jan Forberg joined online. The ultimate goal of the tutorial was to teach the participants all relevant technology around DBpedia, the knowledge graph, the infrastructure and possible use cases. The tutorial aimed at existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers, data providers as well as data scientists.

Following, we will give you a brief retrospective about the tutorial. For further details of the presentations follow the link to the slides.

Session 1: DBpedia in a Nutshell

The tutorial was opened by Milan Dojchinovski (InfAI / DBpedia Association /  CTU in Prague) with the DBpedia in a Nutshell session. In a 45 min session Milan presented a DBpedia historical Wrap-up, explained how a DBpedia triple is born as well as demonstrated the power of SPARQL and the DBpedia KG.

Session 2: DBpedia Tech Stack

After a short break, Jan started the DBpedia Tech Stack Session by giving an overview about the DBpedia technology stack. Furthermore, he explained the use of DBpedia for Automatization and Data Pipeline Creation. This included an explanation of Databus, possible ways to automate data tasks and examples such as knowledge extraction and knowledge fusion. After that, he got to the creation of a simple data flow using the Databus. This was about creation of new data, publishing the data on the Databus, aggregation and usage in SPARQL service via docker.

Session 3: Deployment on corporate infrastructure

In the third session Johannes started by presenting technical details in relation to Databus like identifiers, DataIDs and Mods. He also addressed DBpedia Databus popular datasets, where to find DBpedia datasets, how the DBpedia KG partitions are organized as well as popular data collections. As the tutorial came to an end, he explained how to self-host critical services including creation of a custom copy of the latest-core collection, (i.e. a subset of the DBpedia KG) and how to set up a corporate Databus instance.

In case you missed the event, our presentation is also available on the DBpeda event page. Further insights, feedback and photos about the event are available on Twitter (#DBpediaTutorial hashtag).

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Yours DBpedia Association

The post Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 appeared first on DBpedia Association.

]]>
Databus Mods – Linked Data-driven Enrichment of Metadata https://www.dbpedia.org/blog/databus-mods-linked-data-driven-enrichment-of-metadata/ https://www.dbpedia.org/blog/databus-mods-linked-data-driven-enrichment-of-metadata/#respond Mon, 09 Aug 2021 07:42:17 +0000 https://www.dbpedia.org/?p=4895 DBpedia Databus Feature – Over the last few months, we gave our DBpedia members multiple chances to present their work, tools, and applications. In this way, our members have given exclusive insights on the DBpedia blog. This week we will start the DBpedia Databus Feature, which allows you to get more information about current and […]

The post Databus Mods – Linked Data-driven Enrichment of Metadata appeared first on DBpedia Association.

]]>
DBpedia Databus Feature – Over the last few months, we gave our DBpedia members multiple chances to present their work, tools, and applications. In this way, our members have given exclusive insights on the DBpedia blog. This week we will start the DBpedia Databus Feature, which allows you to get more information about current and future developments around DBpedia and the DBpedia Databus. Have fun while reading!

As a review, the DBpedia Databus is a digital factory platform that aims to support FAIRness by facilitating a registry of files (on the Web) using DataID metadata. In a broader perspective, the Databus is part of DBpedia’s Vision which aims to establish a FAIR Linked Data backbone by building an ecosystem using its stable identifiers as a central component. Currently, this ecosystem consists of the Databus file registry, DBpedia Archivo, and the DBpedia Global ID management.

As part of this vision, this article presents Databus Mods, a flexible metadata enrichment mechanism for files published on the Databus using Linked Data technologies. 

Databus Mods are activities analyzing and assessing files published with the Databus DataID that provide additional metadata in the form of fine-grained information containing data summaries, statistics, or descriptive metadata enrichments. 

These activities create provenance metadata based on PROV-O to link any generated metadata to the persistent Databus file identifiers, independent of its publisher. The generated metadata is provided in a SPARQL endpoint and an HTTP file server, increasing (meta)data discovery and access. 

Additionally, this thesis proposes the Databus Mods Architecture, which uses a master-worker approach to automate Databus file metadata enrichments. The Mod Master service monitors the Databus SPARQL endpoint for updates, distributes scheduled activities to Mod Workers, collects the generated metadata, and stores it uniformly. Mod Workers implement the metadata model and provide an HTTP interface for the Mod Master to invoke a Mod Activity for a specific Databus file. The Mod Master can handle multiple Mod Workers of the same type concurrently, allowing scaling the system’s throughput.

The Databus Mods Architecture implementation is provided in a public accessible GitHub repository, allowing other users to deploy their Mods reusing existing components. Further, the repository contains a maven library that can be used to create your own Mod Workers in JVM-like languages or validate the implementation of the so-called Mod API, which is necessary for the Mod Master to control a Mod Worker.

Currently, the DBpedia Databus provides five own initial Databus Mod Workers. The following paragraphs showcase two essential Mods, the first feasible for all Databus files and the second specific for RDF files.

MIME-Type Mod. This essential Mod provides metadata for other applications or Mods about the specific MIME-Type of Databus files. The MIME-Type Mod analyzes every file on the Databus, sniffs on their data using Apache Tika, and generates metadata that assigns detected IANA Media Types to Databus file identifiers using the Mods metadata model.

VoID-Mod. The Vocabulary of Interlinked Datasets (VoID) is a popular metadata vocabulary to describe the content of Linked Datasets. The VoiD Mod generates statistics based on the RDF VoID vocabulary for RDF files. A major use case of the VoID Mod is to search for relevant RDF datasets by using the VoID Mod metadata. By writing federated queries, it is possible to filter files on the Databus that have to contain specific properties or classes.

Listing 12: Federated query over VoID Mod Results and the DataID to retrieve Databus files containing RDF statements having dbo:bithData as property or dbo:Person as type. The results are filtered by dct:version and dataid:account

Example: Federated query over VoID Mod Results and the DataID to retrieve Databus files containing RDF statements having dbo:birthDate as property or dbo:Person as type. The results are filtered by dct:version and dataid:account.

Databus Mods were created as part of my master’s thesis, which I submitted in spring 2021.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Marvin Hofer

on behalf of the DBpedia Association

The post Databus Mods – Linked Data-driven Enrichment of Metadata appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/databus-mods-linked-data-driven-enrichment-of-metadata/feed/ 0
GSoC2021 – Call for Students https://www.dbpedia.org/blog/gsoc2021/ https://www.dbpedia.org/blog/gsoc2021/#respond Wed, 17 Mar 2021 12:45:41 +0000 https://www.dbpedia.org/?p=4314 Pinky: Gee, Brain, what are we gonna do this year?Brain: Wear a mask, keep our distance, and do the same thing we do every year, Pinky. Taking over GSoC2021. For the 10th year in a row, we have been accepted to be part of this incredible program to support young ambitious developers who want to […]

The post GSoC2021 – Call for Students appeared first on DBpedia Association.

]]>
Pinky: Gee, Brain, what are we gonna do this year?
Brain: Wear a mask, keep our distance, and do the same thing we do every year, Pinky. Taking over GSoC2021.

For the 10th year in a row, we have been accepted to be part of this incredible program to support young ambitious developers who want to work with open-source organizations like DBpedia

So far, each year has brought us new project ideas, many amazing students and great project results that shaped the future of DBpedia. Even though Covid-19 changed a lot in the world, it couldn’t shake Google Summer of Code (GSoC) much. The program, designed to mentor youngsters from afar is almost too perfect for us. One of the advantages of GSoC is, especially in times like these, the chance to work on projects remotely, but still obtain a first deep dive into Open Source projects like us . 

DBpedia is now looking for students who want to work with us during the upcoming summer months.  

What is Google Summer of Code?

Google Summer of Code is a global program focused on bringing student developers into open source software development. Funds will be given to students (BSc, MSc, PhD.) to work for two and a half months on a specific task. For GSoC-Newbies, this short video and the information provided on their website will explain all there is to know about GSoC2021.

And this is how it works …

Step 1Check out one of our projects here or draft your own. 
Step 2Get in touch with our mentors as soon as possible and write up a project proposal of at least 8 pages. Information about our proposal structure and a template are available here.  
Step 3After a selection phase, students are matched with a specific project and mentor(s) and start working on the project. 

Application Procedure

Further information on the application procedure is available in our DBpedia Guidelines. There you will find information on how to contact us and how to appropriately apply for GSoC2021. Please also note the official GSoC 2020 timeline for your proposal submission and make sure to submit on time.  Unfortunately, extensions cannot be granted. Final submission deadline is April 13, 2021 at 8 pm, CEST.

Contact

Detailed information on how to apply are available on the DBpedia Website. We’ve prepared an information kit for you. Please find all necessary information regarding the student application procedure here.

And in case you still have questions, please do not hesitate to contact us via dbpedia@infai.org.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Finally, we are looking forward to your contribution!

Yours DBpedia Association

The post GSoC2021 – Call for Students appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/gsoc2021/feed/ 0
A year with DBpedia – Retrospective Part 2/2020 https://www.dbpedia.org/blog/a-year-with-dbpedia-retrospective-part-2-2020/ Wed, 06 Jan 2021 09:24:12 +0000 https://blog.dbpedia.org/?p=1416 This is the final part of our journey through 2020. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a deeper look at the second half of 2020 and give an outlook for 2021. DBpedia Autumn Hackathon and the KGiA Conference From September 21st to October […]

The post A year with DBpedia – Retrospective Part 2/2020 appeared first on DBpedia Association.

]]>
This is the final part of our journey through 2020. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a deeper look at the second half of 2020 and give an outlook for 2021.

DBpedia Autumn Hackathon and the KGiA Conference

From September 21st to October 1st, 2020 we organized the first Autumn Hackathon. We invited all community members to join and contribute to this new format. You had the chance to experience the latest technology provided by the DBpedia Association members. We hosted special member tracks, a Dutch National Knowledge Graph Track and a track to improve DBpedia. Results were presented at the final hackathon event on October 5, 2020. We uploaded all contributions on our Youtube channel. Many thanks for all your contributions and invested time!

The Knowledge Graphs in Action event

Chairs open the KGiA event on October 6, 2020.
Opening the KGiA event

The SEMANTiCS Onsite Conference 2020 had to be postponed till September 2021. To bridge the gap until 2021, we took the opportunity to organize the Knowledge Graphs in Action online track as a SEMANTiCS satellite event on October 6, 2020. This new online conference is a combination of two existing events: the DBpedia Community Meeting, which is regularly held as part of the SEMANTiCS, and the annual Spatial Linked Data conference organised by EuroSDR and the Platform Linked Data Netherlands. We glued it together and as a bonus we added a track about Geo-information Integration organized by EuroSDR. As special joint sessions we presented four keynote speakers. More than 130 knowledge graph enthusiasts joined the KGiA event and it was a great success for the organizing team. Do you miss the event? No problem! We uploaded all recorded sessions on the DBpedia youtube channel.

KnowConn Conference 2020

Our CEO, Sebastian Hellmann, gave the talk ‘DBpedia Databus – A platform to evolve knowledge and AI from versioned web files’ on December 2, 2020 at the KnowledgeConnexions Online Conference. It was a great success and we received a lot of positive and constructive feedback for the DBpedia Databus. If you missed his talk and looking for Sebastians slides, please check here: http://tinyurl.com/connexions-2020

DBpedia Archivo – Call to improve the web of ontologies

Search bar to inspect an archived ontology - DBpedia Archivo
DBpedia Archivo

On December 7, 2020 we introduced the DBpedia Archivo – an augmented ontology archive and interface to implement FAIRer ontologies. Each ontology is rated with 4 stars measuring basic FAIR features. We would like to call on all ontology maintainers and consumers to help us increase the average star rating of the web of ontologies by fixing and improving its ontologies. You can easily check an ontology at https://archivo.dbpedia.org/info. Further infos on how to help us are available in a detailed post on our blog. 

Member features on the blog

At the beginning of November 2020 we started the member feature on the blog. We gave DBpedia members the chance to present special products, tools and applications. We published several posts in which DBpedia members, like Ontotext, GNOSS, the Semantic Web Company, TerminusDB or FinScience shared unique insights with the community. In the beginning of 2021 we will continue with interesting posts and presentations. Stay tuned!

We do hope we will meet you and some new faces during our events next year. The DBpedia Association wants to get to know you because DBpedia is a community effort and would not continue to develop, improve and grow without you. We plan to have meetings in 2021 at the Knowledge Graph Conference, the LDK conference in Zaragoza, Spain and the SEMANTiCS conference in Amsterdam, Netherlands.

Happy New Year to all of you! Stay safe and check Twitter, LinkedIn and our Website or subscribe to our Newsletter for the latest news and information.

Yours,

DBpedia Association

The post A year with DBpedia – Retrospective Part 2/2020 appeared first on DBpedia Association.

]]>
2020 – Oh What a Challenging Year https://www.dbpedia.org/blog/2020-oh-what-a-challenging-year/ Mon, 21 Dec 2020 09:01:19 +0000 https://blog.dbpedia.org/?p=1410 Can you believe it..? … thirteen years ago the first DBpedia dataset was released. Thirteen years of development, improvements and growth. Now more than 2,600 GByte of Data is uploaded on the DBpedia Databus. We want to take this as an opportunity to send out a big Thank you! to all contributors, developers, coders, hosters, […]

The post 2020 – Oh What a Challenging Year appeared first on DBpedia Association.

]]>
Can you believe it..? … thirteen years ago the first DBpedia dataset was released. Thirteen years of development, improvements and growth. Now more than 2,600 GByte of Data is uploaded on the DBpedia Databus. We want to take this as an opportunity to send out a big Thank you! to all contributors, developers, coders, hosters, funders, believers and DBpedia enthusiasts who made that possible. Thank you for your support!

In the upcoming Blog-Series, we like to take you on a retrospective tour through 2020, giving you insights into a year with DBpedia. We will highlight our past events and the development around the DBpedia dataset. 

A year with DBpedia and the DBpedia dataset – Retrospective Part 1

DBpedia Workshop colocated with LDAC2020

On June 19, 2020 we organized a DBpedia workshop co-located with the LDAC workshop series to exchange knowledge regarding new technologies and innovations in the fields of Linked Data and Semantic Web. Dimitris Kontokostas (diffbot, US) opened the meeting with his delightful keynote presentation ‘{RDF} Data quality assessment – connecting the pieces’. His presentation focused on defining data quality and identification of data quality issues. Following Dimitri’s keynote many community based presentations were held, enabling an exciting workshop day

Most Influential Scholars

DBpedia has become a high-impact, high-visibility project because of our foundation in excellent Knowledge Engineering as the pivot point between scientific methods, innovation and industrial-grade output. The drivers behind DBpedia are 6 out of the TOP 10 Most Influential Scholars in Knowledge Engineering and the C-level executives of our members. Check all details here: https://www.aminer.cn/ai2000/country/Germany 

DBpedia (dataset) and Google Summer of Code 2020

For the 9th year in a row, we were part of this incredible journey of young ambitious developers who joined us as an open source organization to work on a GSoC coding project all summer. With 45 project proposals, this GSoC edition marked a new record for DBpedia. Even though Covid-19 changed a lot in the world, it couldn’t shake GSoC. If you want to have deeper insights in our GSoC student’s work you can find their blogs and repos here: https://blog.dbpedia.org/2020/10/12/gsoc2020-recap/

DBpedia Tutorial Series 2020

Stack slide from the tutorial

During this year we organized three amazing tutorials in which more than 120 DBpedians took part. Over the last year, the DBpedia core team has consolidated a great amount of technology around DBpedia. These tutorials are target to developers (in particular of DBpedia Chapters) that wish to learn how to replicate local infrastructure such as loading and hosting an own SPARQL endpoint. A core focus was the new DBpedia Stack, which contains several dockerized applications that are automatically loading data from the DBpedia Databus. We will continue organizing more tutorials in 2021. Looking forward to meeting you online! In case you miss the DBpedia Tutorial series 2020, watch all videos here

In our upcoming Blog-Post after the holidays we will give you more insights in past events and technical achievements. We are now looking forward to the year 2021. The DBpedia team plans to have meetings at the Knowledge Graph Conference, the LDK conference in Zaragoza, Spain and the SEMANTiCS conference in Amsterdam, Netherlands. We wish you a merry Christmas and a happy New Year. In the meantime, stay tuned and visit our Twitter channel or subscribe to our DBpedia Newsletter.   

Yours DBpedia Association

The post 2020 – Oh What a Challenging Year appeared first on DBpedia Association.

]]>
DBpedia Workshop at LDAC https://www.dbpedia.org/blog/dbpedia-workshop-at-ldac/ Thu, 25 Jun 2020 07:31:08 +0000 https://blog.dbpedia.org/?p=1306 More than 90 DBpedia enthusiasts joined the DBpedia Workshop colocated with LDAC2020. On June 19, 2020 we organized a DBpedia workshop co-located with the LDAC workshop series to exchange knowledge regarding new technologies and innovations in the fields of Linked Data and Semantic Web. This workshop series provides a focused overview on technical and applied […]

The post DBpedia Workshop at LDAC appeared first on DBpedia Association.

]]>
More than 90 DBpedia enthusiasts joined the DBpedia Workshop colocated with LDAC2020.

On June 19, 2020 we organized a DBpedia workshop co-located with the LDAC workshop series to exchange knowledge regarding new technologies and innovations in the fields of Linked Data and Semantic Web. This workshop series provides a focused overview on technical and applied research on the usage of Semantic Web, Linked Data and Web of Data technologies for the architecture and construction domains (design, engineering, construction, operation, etc.). The workshop aims at gathering researchers, industry stakeholders, and standardization bodies of the broader Linked Building Data (LBD) community.

First and foremost, we would like to thank the LDAC committee for hosting our virtual meeting and many thanks to Beyza Yaman, Milan Dojchinovski, Johannes Frey and Kris McGlinn for organizing and chairing the DBpedia workshop. 

Following, we will give you a brief retrospective about the presentations.

Opening & Keynote 

The first virtual DBpedia meeting was opened with a keynote presentation ‘{RDF} Data quality assessment – connecting the pieces’ by Dimitris Kontokostas (diffbot, US). He gave an overview on the latest developments and achievements around Data Quality. His presentation was focused on defining data quality and identification of data quality issues.  

Sebastian Hellmann gave a brief overview of DBpedia’s history. Furthermore, he presented the updated DBpedia Organisational architecture, including the vision of the new DBpedia chapters and benefits of the DBpedia membership.

Shortly after,  Milan Dojchinovski (InfAI/CTU in Prague) gave a presentation on  ‘Querying and Integrating (Architecture and Construction) Data with DBpedia’. ‘The New DBpedia Release Cycle’ was introduced by Marvin Hofer (InfAI). Closing the Showcase Session, Johannes Frey, InfAI, presented the Databus Archivo and demonstrated the downloading process with the DBpedia Databus

For further details of the presentations follow the links to the slides.

  • Keynote: {RDF} Data quality assessment – connecting the pieces, by Dimitris Kontokostas, diffbot, US (slides)
  • Overview of DBpedia Organisational Architecture, by Sebastian Hellmann, Julia Holze, Bettina Klimek, Milan Dojchinovski, INFAI / DBpedia Association (slides)
  • Querying and Integrating (Architecture and Construction) Data with DBpedia by Milan Dojchinovski, INFAI/CTU in Prague (slides)
  • The New DBpedia Release Cycle by Marvin Hofer and Milan Dojchinovski, INFAI (slides)
  • Databus Archivo and Downloading with the Databus by Johannes Frey, Fabian Goetz and Milan Dojchinovski, INFAI (slides)

Geospatial Data & DBpedia Session

After the opening session we had the Geospatial Data & DBpedia Session. Milan Dojchinovski (InfAI/CTU in Prague) chaired this session with three very stimulating talks. Hereafter you will find all presentations given during this session:

  • Linked Geospatial Data & Data Quality by Wouter Beek, Triply Ltd. (slides)
  • Contextualizing OSi’s Geospatial Data with DBpedia by Christophe Debruyne, Vrije Universiteit Brussel and ADAPT at Trinity College Dublin
  • Linked Spatial Data: Beyond The Linked Open Data Cloud by Chaidir A. Adlan, The Deutsche Gesellschaft für Internationale Zusammenarbeit GmbH (slides)

Data Quality & DBpedia Session

The first online DBpedia workshop also covered a special data quality session. Johannes Frey (InfAI) chaired this session with three very stimulating talks. Hereafter you will find all presentations given during this session:

  • SeMantic AnsweR Type prediction with DBpedia – ISWC 2020 Challenge by Nandana Mihindukulasooriya, MIT-IBM Watson AI Lab (slides)
  • RDF Doctor: A Holistic Approach for Syntax Error Detection and Correction of RDF Data by Ahmad Hemid, Fraunhofer IAIS (slides)
  • The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with SANSA by Gezim Sejdiu,  Deutsche Post DHL Group and University of Bonn (slides)
  • Closing words by the workshop organizers

In case you missed the event, all slides and presentations are also available on the DBpeda workshop website. Further insights, feedback and photos about the event are available on Twitter (#DBpediaDay hashtag).

We are now looking forward to our first DBpedia Stack tutorial, which will be held online on July 1st, 2020. Over the last year, the DBpedia core team has consolidated a great amount of technology around DBpedia. The tutorial primarily targets developers (in particular of DBpedia Chapters) that wish to learn how to replicate local infrastructure such as loading and hosting an own SPARQL endpoint. A core focus will also be the new DBpedia Stack, which contains several dockerized applications that are automatically loading data from the Databus. Attending the DBpedia Stack tutorial is free and will be organized online. Please register to be part of the meeting.

Stay tuned and check Twitter, Facebook and our Website or subscribe to our Newsletter for latest news and information.

Julia and Milan 

on behalf of the DBpedia Association

The post DBpedia Workshop at LDAC appeared first on DBpedia Association.

]]>
GSoC2020 – Call for Contribution https://www.dbpedia.org/blog/gsoc2020/ Tue, 10 Mar 2020 13:48:00 +0000 https://blog.dbpedia.org/?p=1301 James: Sherry with the soup, yes… Oh, by the way, the same procedure as last year, Miss Sophie? Miss Sophie: Same procedure as every year, James. …and we are proud of it. We are very grateful to be accepted as an open-source organization in this years’  Google Summer of Code (GSoC2020) edition, again. The upcoming […]

The post GSoC2020 – Call for Contribution appeared first on DBpedia Association.

]]>
James: Sherry with the soup, yes… Oh, by the way, the same procedure as last year, Miss Sophie?

Miss Sophie: Same procedure as every year, James.

…and we are proud of it. We are very grateful to be accepted as an open-source organization in this years’  Google Summer of Code (GSoC2020) edition, again. The upcoming GSoC2020 marks the 16th consecutive year of the program and is the 9th year in a row for DBpedia. 

What is GSoC2020? 

Google Summer of Code is a global program focused on bringing student developers into open source software development. Funds will be given to students (BSc, MSc, PhD.) to work for three months on a specific task. For GSoC-Newbies, this short video and the information provided on their website will explain all there is to know about GSoC2020.

This year’s Narrative

Last year we tried to increase female participation in the program and we will continue to do so this year. We want to encourage explicitly female students to apply for our projects. That being said, we already engaged excellent female mentors to also raise the female percentage in our mentor team. 

In the following weeks, we invite all students, female and male alike, who are interested in Semantic Web and Open Source development to apply for our projects. You can also contribute your own ideas to work on during the summer. 

And this is how it works: 4 steps to GSoC2020 stardom

  1. Open source organizations such as DBpedia announce their projects ideas. You can find our project here
  2. Students contact the mentor organizations they want to work with and write up a project proposal. Please get in touch with us via the DBpedia Forum or dbpedia@infai.org as soon as possible.
  3. The official application period at GSoC starts March, 16th. Please note, you have to submit your final application not through our Forum, but the GSoC Website
  4. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.

To all the smart brains out there, if you are a student who wants to work with us during summer 2020, check our list of project ideas, warm-up tasks or come up with your own idea and get in touch with us.

Application Procedure

Further information on the application procedure is available in our DBpedia Guidelines. There you will find information on how to contact us and how to appropriately apply for GSoC2020. Please also note the official GSoC 2020 timeline for your proposal submission and make sure to submit on time.  Unfortunately, extensions cannot be granted. Final submission deadline is March 31st, 2020, 8 pm, CEST.

Finally, check our website for information on DBpedia, follow us on Twitter or subscribe to our newsletter.

And in case you still have questions, please do not hesitate to contact us via praetor@infai.org.

We are thrilled to meet you and your ideas.

Your DBpedia-GSoC-Team


The post GSoC2020 – Call for Contribution appeared first on DBpedia Association.

]]>
New Prototype: Databus Collection Feature https://www.dbpedia.org/blog/databus-collections-feature/ Thu, 14 Nov 2019 11:39:45 +0000 https://blog.dbpedia.org/?p=1280 We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application. A new Databus Collection Feature? How come, and how does it work? Read below and find out […]

The post New Prototype: Databus Collection Feature appeared first on DBpedia Association.

]]>
We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application.

A new Databus Collection Feature? How come, and how does it work? Read below and find out how using the DBpedia Databus becomes easier by the day and with each new tool.

Motivation

With more and more data being uploaded to the databus we started to develop test applications using that data. The SPARQL endpoint offers a central hub to access all metadata for datasets uploaded to the databus provided you know how to write SPARQL queries. The metadata includes the download links of the data files – it was, therefore, possible to pass a SPARQL query to an application, download the actual data and then use for whatever purpose the app had.

The Databus Collection Editor

The DBpedia Databus now provides an editor for collections. A collection is basically a labelled SPARQL query that is retrievable via URI. Hence, with the collection editor you can group Databus groups and artifacts into a bundle and publish your selection using your Databus account. It is now a breeze to select the data you need, share the exact selection with others and/or use it in existing or self-made applications.

If you are not familiar with SPARQL and data queries, you can think of the feature as a shopping cart for data: You create a new cart, put data in it and tell your friends or applications where to find it. Quite neat, right?

In the following section, we will cover the user interface of the collection editor.

The Editor UI

Firstly, you can find the collection editor by going to the DBpedia Databus and following the Collections link at the top or you can get there directly by clicking here.

What you will see is the following:

General Collection Info

Secondly, since you do not have any collections yet, the editor has already created an empty collection named “Unnamed” for you. At the right side next to the label and description you will find a pen icon. By clicking the icon or the label itself you can edit its content. The collection is not published yet, so the Collection URI is blank.

Whenever you are not logged in or the collection has not been published yet, the editor will also notify you that your changes are only saved in your local browser cache and NOT remotely on our server. Keep that in mind when clearing your cache. Publishing the collection however is easy: Simply log into (or create) your Databus account and hit the publish button in the action bar. This will open up a modal where you can pick your unique collection id and hit publish again. That’s it!

The Collection Info section will now show the collection URI. Following the link will take you to the HTML representation of your collection that will be visible to others. Hitting the Edit button in the action bar will bring you back to the editor.

Collection Hierarchy

Let’s have a look at the core piece of the collection editor: the hierarchy view. A collection can be a bundle of different Databus groups and artifacts but is not limited to that. If you know how to write a SPARQL query, you can easily extend your collection with more powerful selections. Therefore, the hierarchy is split into two nodes:

  • Generated Queries: Contains all queries that are generated from your selection in the UI
  • Custom Queries: Contains all custom written SPARQL queries

Both, hierarchy nodes have a “+” icon. Clicking on this button will let you add generated or custom queries respectively.

Custom Queries

If you hit the “+” icon on the Custom Queries node, a new node called “Custom Query” will appear in the hierarchy. You can remove a custom query by clicking on the trashcan icon in the hierarchy. If you click the node it will take you to a SPARQL input field where you can edit the query.

To make your collection more understandable for others, you can even document the query by adding a label and description.

Writing Your Own Custom Queries

A collection query is a SPARQL query of the form:

SELECT DISTINCT ?file WHERE {
    {
        [SUBQUERY]
    }
    UNION
    {
        [SUBQUERY]
    }
    UNION
    ...
    UNION
    {
        [SUBQUERY]
    }
}

All selections made by generated and custom queries will be joined into a single result set with a single column called “file“. Thus it is important that your custom query binds data to a variable called “file” as well.

Generated Queries

Clicking the “+” icon on the Generated Queries node will take you to a search field. Make use of the indexed search on the Databus to find and add the groups and artifacts you need. If you want to refine your search, don’t worry: you can do that in the next step!

Once the artifact or group has been added to your collection, the Add to Collection button will turn green. Once you are done you can go back to the Editor with Back to Hierarchy button.

Your hierarchy will now contain several new nodes.

Group Facets, Artifact Facets and Overrides

Group and artifacts that have been added to the collection will show up as nodes in the hierarchy. Clicking a node will open a filter where you can refine your dataset selection. Setting a filter to a group node will apply it to all artifact nodes unless you override that setting in any artifact node manually. The filter set in the group node is shown in the artifact facets in dark grey. Any overrides in the artifact facets will be highlighted in green:

Group Nodes

A group node will provide a list of filters that will be applied to all artifacts of that group:

Artifact Nodes

Artifact nodes will then actually select data files which will be visible in the faceted view. The facets are generated dynamically from the available variants declared in the metadata.

Example: Here we selected the latest version of the databus dump as n-triple. This collection is already in use: The collection URI is passed to the new generic lookup application, which then creates the search function for the databus website. If you are interested in how to configure the lookup application, you can go here: https://github.com/dbpedia/lookup-application. Additionally, there will also be another blog post about the lookup within the next few weeks

Use Cases

The DBpedia Databus Collections are useful in many ways.

  • You can share a specific dataset with your community or colleagues.
  • You can re-use dataset others created
  • You can plug collections into databus-ready applications and avoid spending time on the download and setup process
  • You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
  • You can help others to create data queries more easily

We hope you enjoy the Databus Collection Feature and we would love to hear your feedback! You can leave your thoughts and suggestions in the new DBpedia Forum. Feedback of any kinds is highly appreciated since we want to improve the prototype as fast and user-driven as possible! Cheers!

A big thanks goes to DBpedia developer Jan Forberg who finalized the Databus Collection Feature and compiled this text.

Yours

DBpedia Association

The post New Prototype: Databus Collection Feature appeared first on DBpedia Association.

]]>
One Billion derived Knowledge Graphs https://www.dbpedia.org/blog/one-billion-derived-knowledge-graphs/ Wed, 02 Oct 2019 13:41:02 +0000 https://blog.dbpedia.org/?p=1258 … by and for Consumers until 2025 One Billion – what a mission! We are proud to announce that the DBpedia Databus website at https://databus.dbpedia.org and the SPARQL API at https://databus.dbpedia.org/(repo/sparql|yasgui) (docu) are in public beta now! The system is usable (eat-your-own-dog-food tested) following a “working software over comprehensive documentation” approach. Due to its many […]

The post One Billion derived Knowledge Graphs appeared first on DBpedia Association.

]]>
… by and for Consumers until 2025

One Billion – what a mission! We are proud to announce that the DBpedia Databus website at https://databus.dbpedia.org and the SPARQL API at https://databus.dbpedia.org/(repo/sparql|yasgui) (docu) are in public beta now!

The system is usable (eat-your-own-dog-food tested) following a “working software over comprehensive documentation” approach. Due to its many components (website, SPARQL endpoints, keycloak, mods, upload client, download client, and data debugging), we estimate approximately six months in beta to fix bugs, implement all features and improve the details.

But, let’s start from the beginning

The DBpedia Databus is a platform to capture invested effort by data consumers who needed better data quality (fitness for use) in order to use the data and give improvements back to the data source and other consumers. DBpedia Databus enables anybody to build an automated DBpedia-style extraction, mapping and testing for any data they need. Databus incorporates features from DNS, Git, RSS, online forums and Maven to harness the full work power of data consumers.

Our vision

Professional consumers of data worldwide have already built stable cleaning and refinement chains for all available datasets, but their efforts are invisible and not reusable. Deep, cleaned data silos exist beyond the reach of publishers and other consumers trapped locally in pipelines. Data is not oil that flows out of inflexible pipelines. Databus breaks existing pipelines into individual components that together form a decentralized, but centrally coordinated data network. In this set-up, data can flow back to previous components, the original sources, or end up being consumed by external components.

One Billion interconnected, quality-controlled Knowledge Graphs until 2025

The Databus provides a platform for re-publishing these files with very little effort (leaving file traffic as only cost factor) while offering the full benefits of built-in system features such as automated publication, structured querying, automatic ingestion, as well as pluggable automated analysis, data testing via continuous integration, and automated application deployment (software with data). The impact is highly synergistic. Just a few thousand professional consumers and research projects can expose millions of cleaned datasets, which are on par with what has long existed in deep silos and pipelines.

To a data consumer network

As we are inverting the paradigm form a publisher-centric view to a data consumer network, we will open the download valve to enable discovery and access to massive amounts of cleaner data than published by the original source. The main DBpedia Knowledge Graph alone has 600k file downloads per year complemented by downloads at over 20 chapters, e.g. http://es.dbpedia.org as well as over 8 million daily hits on the main Virtuoso endpoint.

Community extension from the alpha phase such as DBkWik, LinkedHypernyms are being loaded onto the bus and consolidated. We expect this number to reach over 100 by the end of the year. Companies and organisations who have previously uploaded their backlinks here will be able to migrate to the databus. Other datasets are cleaned and posted. In two of our research projects LOD-GEOSS and PLASS, we will re-publish open datasets, clean them and create collections, which will result in DBpedia-style knowledge graphs for energy systems and supply-chain management.

A new era for decentralized collaboration on data quality

DBpedia was established around producing a queryable knowledge graph derived from Wikipedia content that’s able to answer questions like “What have Innsbruck and Leipzig in common?” A community and consumer network quickly formed around this highly useful data, resulting in a large, well-structured, open knowledge graph that seeded the Linked Open Data Cloud — which is the largest knowledge graph on earth. The main lesson learned after these 13 years is that current data “copy” or “download” processes are inefficient by a magnitude that can only be grasped from a global perspective. Consumers spend tremendous effort fixing errors on the client-side. If one unparseable line needs 15 minutes to find and fix, we are talking about 104 days of work for 10,000 downloads. Providers – on the other hand – will never have the resources to fix the last error as cost increases exponentially (20/80 rule). 

One billion knowledge graphs in mind – the progress so far

Discarding faulty data often means that a substitute source has to be found, which is hours of research and might lead to similar problems. From the dozens of DBpedia Community meetings we held we can summarize that for each clean-up procedure, data transformation, linkset or schema mapping that a consumer creates client-side, dozens of consumers have invested the same effort client-side before him and none of it reaches the source or other consumers with the same problem. Holding the community meetings just showed us the tip of the iceberg. 

As a foundation, we implemented a mappings wiki that allowed consumers to improve data quality centrally. A next advancement was the creation of the SHACL standard by our former CTO and board member Dimitris Kontokostas. SHACL allows consumers to specify repeatable tests on graph structures and datatypes, which is an effective way to systematically assess data quality. We established the DBpedia Databus as a central platform to better capture decentrally created, client-side value by consumers.

It is an open system, therefore value that is captured flows right back to everybody.  

The full document “DBpedia’s Databus and strategic initiative to facilitate “One Billion derived Knowledge Graphs by and for Consumers” until 2025 is available here.  

If you have any feedback or questions, please use the DBpedia Forum, the “report issues” button, or dbpedia@infai.org.

Yours,

DBpedia Association

The post One Billion derived Knowledge Graphs appeared first on DBpedia Association.

]]>
More than 50 DBpedia enthusiasts joined the Community Meeting in Karlsruhe. https://www.dbpedia.org/blog/community-meeting-in-karlsruhe/ Thu, 19 Sep 2019 13:07:07 +0000 https://blog.dbpedia.org/?p=1229 SEMANTiCS is THE leading European conference in the field of semantic technologies and the platform for professionals who make semantic computing work, and understand its benefits and know its limitations. Following, we will give you a brief retrospective about the presentations. Opening Session Katja Hose – “Querying the web of data” ….on the search for […]

The post More than 50 DBpedia enthusiasts joined the Community Meeting in Karlsruhe. appeared first on DBpedia Association.

]]>
SEMANTiCS is THE leading European conference in the field of semantic technologies and the platform for professionals who make semantic computing work, and understand its benefits and know its limitations.

Following, we will give you a brief retrospective about the presentations.

Opening Session

Katja Hose – “Querying the web of data”

….on the search for the killer App.

The concept of Linked Open Data and the promise of the Web of Data have been around for over a decade now. Yet, the great potential of free access to a broad range of data that these technologies offer has not yet been fully exploited. This talk will, therefore review the current state of the art, highlight the main challenges from a query processing perspective, and sketch potential ways on how to solve them. Slides are available here.

Dan Weitzner – “timbr-DBpedia – Exploration and Query of DBpedia in SQL

The timbr SQL Semantic Knowledge Platform enables the creation of virtual knowledge graphs in SQL. The DBpedia version of timbr supports query of DBpedia in SQL and seamless integration of DBpedia data into data warehouses and data lakes. We already published a detailed blogpost about timbr where you can find all relevant information about this amazing new DBpedia Service.

Showcase Session

Maribel Acosta“A closer look at the changing dynamics of DBpedia mappings”

Her presentation looked at the mappings wiki and how different language chapters use and edit it. Slides are available here.

Mariano Rico“Polishing a diamond: techniques and results to enhance the quality of DBpedia data”

DBpedia is more than a source for creating papers. It is also being used by companies as a remarkable data source. This talk is focused on how we can detect errors and how to improve the data, from the perspective of academic researchers and but also on private companies. We show the case for the Spanish DBpedia (the second DBpedia in size after the English chapter) through a set of techniques, paying attention to results and further work. Slides are available here.

Guillermo Vega-Gorgojo – “Clover Quiz: exploiting DBpedia to create a mobile trivia game”

Clover Quiz is a turn-based multiplayer trivia game for Android devices with more than 200K multiple choice questions (in English and Spanish) about different domains generated out of DBpedia. Questions are created off-line through a data extraction pipeline and a versatile template-based mechanism. A back-end server manages the question set and the associated images, while a mobile app has been developed and released in Google Play. The game is available free of charge and has been downloaded by +10K users, answering more than 1M questions. Therefore, Clover Quiz demonstrates the advantages of semantic technologies for collecting data and automating the generation of multiple-choice questions in a scalable way. Slides are available here.

Fabian Hoppe and Tabea Tiez – “The Return of German DBpedia”

Fabian and Tabea will present the latest news on the German DBpedia chapter as it returns to the language chapter family after an extended offline period. They will talk about the data set, discuss a few challenges along the way and give insights into future perspectives of the German chapter. Slides are available here.

Wlodzimierz Lewoniewski and Krzysztof Węcel  – “References extraction from Wikipedia infoboxes”

In Wikipedia’s infoboxes, some facts have references, which can be useful for checking the reliability of the provided data. We present challenges and methods connected with the metadata extraction of Wikipedia’s sources. We used DBpedia Extraction Framework along with own extensions in Python to provide statistics about citations in 10 language versions. Provided methods can be used to verify and synchronize facts depending on the quality assessment of sources. Slides are available here.

Wlodzimierz Lewoniewski – “References extraction from Wikipedia infoboxes” … He gave insight into the process of extracting references for Wikipedia infoboxes, which we will use in our GFS project.

Afternoon Session

Sebastian Hellmann, Johannes Frey, Marvin Hofer – “The DBpedia Databus – How to build a DBpedia for each of your Use Cases”

The DBpedia Databus is a platform that is intended for data consumers. It will enable users to build an automated DBpedia-style Knowledge Graph for any data they need. The big benefit is that users not only have access to data, but are also encouraged to apply improvements and, therefore, will enhance the data source and benefit other consumers. We want to use this session to officially introduce the Databus, which is currently in beta and demonstrate its power as a central platform that captures decentrally created client-side value by consumers.  

We will give insight on how the new monthly DBpedia releases are built and validated to copy and adapt for your use cases. Slides are available here.

Interactive session, moderator: Sebastian Hellmann – “DBpedia Connect & DBpedia Commerce – Discussing the new Strategy of DBpedia”

In order to keep growing and improving, DBpedia has been undergoing a growth hack for the last couple of months. As part of this process, we developed two new subdivisions of DBpedia: DBpedia Connect and DBpedia Commerce. The former is a low-code platform to interconnect your public or private databus data with the unified, global DBpedia graph and export the interconnected and enriched knowledge graph into your infrastructure. DBpedia Commerce is an access and payment platform to transform Linked Data into a networked data economy. It will allow DBpedia to offer any data, mod, application or service on the market. During this session, we will provide more insight into these as well as an overview of how DBpedia users can best utilize them. Slides are available here.

In case you missed the event, all slides and presentations are also available on our Website. Further insights, feedback and photos about the event are available on Twitter via #DBpediaDay

We are now looking forward to more DBpedia meetings next year. So, stay tuned and check Twitter, Facebook and the Website or subscribe to our Newsletter for the latest news and information.

If you want to organize a DBpedia Community meeting yourself, just get in touch with us via dbpedia@infai.org regarding program and organization.

Yours

DBpedia Association

The post More than 50 DBpedia enthusiasts joined the Community Meeting in Karlsruhe. appeared first on DBpedia Association.

]]>