Association Archives - DBpedia Association https://www.dbpedia.org/association/ Global and Unified Access to Knowledge Graphs Fri, 24 Jun 2022 08:17:31 +0000 en-GB hourly 1 https://wordpress.org/?v=6.4.3 https://www.dbpedia.org/wp-content/uploads/2020/09/cropped-dbpedia-webicon-32x32.png Association Archives - DBpedia Association https://www.dbpedia.org/association/ 32 32 Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 https://www.dbpedia.org/blog/wrap-up-dbpedia-tutorial-2-0-knowledge-graph-conference-2022/ Wed, 11 May 2022 10:52:21 +0000 https://www.dbpedia.org/?p=5330 On Tuesday the 2nd of May, the DBpedia team organized the second edition of the DBpedia tutorial at the Knowledge Graph Conference (KGC) 2022. This year Johannes Frey made his way to New York and gave the tutorial on site. Milan Dojchinovski and Jan Forberg joined online. The ultimate goal of the tutorial was to […]

The post Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 appeared first on DBpedia Association.

]]>
On Tuesday the 2nd of May, the DBpedia team organized the second edition of the DBpedia tutorial at the Knowledge Graph Conference (KGC) 2022. This year Johannes Frey made his way to New York and gave the tutorial on site. Milan Dojchinovski and Jan Forberg joined online. The ultimate goal of the tutorial was to teach the participants all relevant technology around DBpedia, the knowledge graph, the infrastructure and possible use cases. The tutorial aimed at existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers, data providers as well as data scientists.

Following, we will give you a brief retrospective about the tutorial. For further details of the presentations follow the link to the slides.

Session 1: DBpedia in a Nutshell

The tutorial was opened by Milan Dojchinovski (InfAI / DBpedia Association /  CTU in Prague) with the DBpedia in a Nutshell session. In a 45 min session Milan presented a DBpedia historical Wrap-up, explained how a DBpedia triple is born as well as demonstrated the power of SPARQL and the DBpedia KG.

Session 2: DBpedia Tech Stack

After a short break, Jan started the DBpedia Tech Stack Session by giving an overview about the DBpedia technology stack. Furthermore, he explained the use of DBpedia for Automatization and Data Pipeline Creation. This included an explanation of Databus, possible ways to automate data tasks and examples such as knowledge extraction and knowledge fusion. After that, he got to the creation of a simple data flow using the Databus. This was about creation of new data, publishing the data on the Databus, aggregation and usage in SPARQL service via docker.

Session 3: Deployment on corporate infrastructure

In the third session Johannes started by presenting technical details in relation to Databus like identifiers, DataIDs and Mods. He also addressed DBpedia Databus popular datasets, where to find DBpedia datasets, how the DBpedia KG partitions are organized as well as popular data collections. As the tutorial came to an end, he explained how to self-host critical services including creation of a custom copy of the latest-core collection, (i.e. a subset of the DBpedia KG) and how to set up a corporate Databus instance.

In case you missed the event, our presentation is also available on the DBpeda event page. Further insights, feedback and photos about the event are available on Twitter (#DBpediaTutorial hashtag).

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Yours DBpedia Association

The post Wrap Up: DBpedia Tutorial 2.0 @ Knowledge Graph Conference 2022 appeared first on DBpedia Association.

]]>
From voice to value using AI https://www.dbpedia.org/blog/from-voice-to-value-using-ai/ https://www.dbpedia.org/blog/from-voice-to-value-using-ai/#respond Tue, 19 Apr 2022 10:54:23 +0000 https://www.dbpedia.org/?p=5243 DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Wallscope, who supports both organisational goals, improves existing processes and embed new technologies by generating the insights […]

The post From voice to value using AI appeared first on DBpedia Association.

]]>
DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Wallscope, who supports both organisational goals, improves existing processes and embed new technologies by generating the insights that power change. David Eccles presents the opportunities of digital audio. Have fun reading!

by David EcclesWallscope

Motivation

The use of digital audio has accelerated throughout the pandemic, creating a cultural shift in the use of audio form content within business and consumer communications.

Alongside this the education and entertainment industries embraced semantic technologies as a means to develop sustainable delivery platforms under very difficult circumstances.

Wallscope’s research and development activities were already aligned to exploring speech-driven applications and through this we engaged with Edinburgh University’s Creative Informatics department to explore practical use cases focusing on enhancing the content of podcasts.

Our focus now is on how user experience can be enhanced with knowledge graph interaction, providing contextually relevant information to add value to the overall experience. As DBpedia provides the largest knowledge repository available, Wallscope embedded semantic queries to the service into the resulting workflow.

Speech to Linked Data

Speech-driven applications require a high level of accuracy and are notoriously difficult to develop, as anyone with experience of spoken dialog systems will probably be aware. A range of Natural Language Processing models are available which perform with a high degree of accuracy – particularly for basic tasks such as Named Entity Recognition – to recognise people, places, and organisations (spaCy and PyTorch are good examples of this). Obviously the tasks become more difficult to achieve when inherently complicated concepts are brought into the mix such as cultural references and emotional reactions.


To this end Wallscope re-deployed and trained a machine learning model called BERT. This stands for Bidirectional Encoder Representations from Transformers and it is a technique for NLP pre-training originally developed by Google.

BERT uses the mechanism of “paying attention” to better understand the contextual relationships between each word (or sub-words) within a sentence. Having previous experience deploying BERT models within the healthcare industry, we adapted and trained the model on a variety of podcast conversations.


As an example of how this works in practice, consider the phrase “It looked like a painting”. BERT looks at the word “it” and then checks its relationship with every other word in the sentence. This way, BERT can tell that “it” refers strongly to “painting”. This allows BERT to understand the context of each word within a given sentence. 


Simple process diagram

We then looked at how this could be used to better engage users across the podcast listening experience, and provide points of knowledge expansion, engagement and ‘socialisation’ of content in web-based environments. This in turn can create a richer and more meaningful experience for listeners that runs in parallel with podcasting platforms.

Working across multiple files containing podcast format audio, we looked at several areas of improvements for listeners, creators and researchers. Our primary aim was to demonstrate the value of semantic enhancements to the transcriptions.

We worked with these across several processes to enhance them with Named Entity Recognition using our existing stack. From there we extended the analysis of ‘topics’ using a blend of Machine Learning models. That very quickly allowed us to gain a deep understanding of the relationships contained with the spoken word content. By visualising that we could gain a deeper insight into the content and how it could be better presented, by reconciling it with references within DBpedia.

This analysis led us to ideate around an interface that was built around the timeline presented by the audio content.

Playback of audio with related terms

This allows the listener to gain contextually related insights by dynamically querying DBpedia for entities extracted from the podcast itself. This knowledge extension is valuable to enhance not only the listeners’ experience but also to provide a layer of ‘stickiness’ for the content across the internet as it enhances findability.

This shows how knowledge can be added to a page using DBpedia.

One challenge is the quality of transcriptions. With digital speech recognition, there is never a 100% confidence level across unique audio recordings such as podcasts as well as within video production.

We are currently working with services which are increasingly harnessing AI technologies to not only improve the quality of transcription but also the insights which can be derived from spoken word data sources. A current area of research for Wallscope is how our ML models can be utilised to improve the curation layer of transcripts. This is important as keeping the human in the loop is critical to ensure the fidelity of any transcription process. By deploying the same techniques – albeit in reverse – there is an interesting opportunity to create dynamic ‘sense-checking’ models. While this is at an early stage, DBpedia undoubtedly will be an important part of that.

We are also developing some visualisation techniques to assist curators to identify ‘errors’ and to provide suggestions for more robust topic classification models. This allows more generalised suggestions for labels. For example while we may have a specific reference to ‘zombie’ to present that as a subset of ‘horror’ has more value in categorisation systems. Another example could relate to location. If we identify ‘France’ in a transcription with 100% certainty, then we can create greater certainty around ‘Paris’ as being Paris, France as opposed to Paris,Texas. This also applies to machine learning-based summarisation techniques.

Next steps

We are further exploring how these approaches can best assist in the exploration of archives as well as incorporating text analysis to improve the actual curation of archives.

Please contact Ian Allaway or David Eccles for more information, or visit www.wallscope.co.uk

Further reading on ‘Podcasting Exploration’

The post From voice to value using AI appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/from-voice-to-value-using-ai/feed/ 0
Data Virtualization: From Graphs to Tables and Back https://www.dbpedia.org/blog/data-virtualization-from-graphs-to-tables-and-back/ Wed, 26 Jan 2022 11:05:53 +0000 https://www.dbpedia.org/?p=5136 Ontotext believes you should be able to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in. GraphDB’s data virtualization opens your graph to the wider semantic web and to relational databases. DBpedia Member Feature – Over the last year we gave […]

The post Data Virtualization: From Graphs to Tables and Back appeared first on DBpedia Association.

]]>
Ontotext believes you should be able to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in. GraphDB’s data virtualization opens your graph to the wider semantic web and to relational databases.

DBpedia Member Feature – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Ontotext, who helps enterprises identify meaning across diverse datasets and massive amounts of unstructured information. Jarred McGinnis presents the beauty of data virtualization. Have fun reading!

by Jarred McGinnis, Ontotext

The beauty and power of knowledge graphs is their abstraction away from the fiddly implementation details of our data. The data and information is organized in a way human-users understand it regardless of the physical location of the data, the format and other low-level technical details. This is because the RDF of the knowledge graph enables a schema-less, or a schema-agnostic, approach to facilitate the integration of multiple heterogeneous datasets.

Semantic technology defines how data and information is inter-related. These relationships give context and that context is what gives our data meaning that can be understood by humans AND machines. That’s where the knowledge part of the graph comes from and it is a powerful way of providing a single view on disparate pieces of information.

ETL is Still Your Best Bet

When possible it’s better to pay the initial costs of the ETL process. In a previous blog post, we talked about how knowledge graphs generously repay that investment of time and effort taken in data preparation tasks. However, there are a number of reasons that it is impossible or impractical, such as the size of the dataset or the data exists in a critical legacy system where an ETL process would create more problems than it fixed. In these cases, it is better to take a data virtualization approach.

Ontotext GraphDB provides data virtualization functionality to realize the benefits of real-time access to external data, remove or reduce the need for middle-ware for query processing, reduce support and development time, increase governance and reduce data storage requirements.

Firstly, There’s Federation.

RDF is the language of the semantic web. If you are working with Linked Data, it opens up a world of billions upon billions of factual statements about the world, which is probably why you chose to work with linked data in the first place. Nice work! And that means I don’t have to tell you that DBpedia, a single data set among hundreds, has three billion triples alone. You are no longer limited by the data your organization holds. Queries about internal data can be seamlessly integrated with multiple external data sources.

For example, suppose you want to query well-known people and their birth places for a map application. It’s possible to create a single query that gets the person’s information from DBpedia, which would give you the birthplace and take those results to query another data source like Geonames to provide the geographic coordinates to be able to add them to a mapping application. Since both of these data sources are linked data, it’s relatively straightforward to write a SPARQL query that retrieves the information.

It doesn’t even have to be another instance of GraphDB. It’s part of the reason Ontotext insists on using open standards. With any equally W3C-compliant knowledge graph that supports a SPARQL endpoint, it is possible to retrieve the information you want and add it to your own knowledge graph to do with as you please. A single query could pull information from multiple external data sources to get the data you are after, which is why federation is an incredibly powerful tool to have.

The Business Intelligence Ecosystem Runs on SQL.

Ontotext is committed to lowering the costs of creating and consuming knowledge graphs. Not every app developer or DBA in an organization is going to have the time to work directly with the RDF data models. A previous version of GraphDB 9.4 added the JDBC driver to ensure those who need to think and work in SQL can access the power of the knowledge with SQL.

Knowing the importance and prominence of SQL for many applications, we have a webinar demonstrating how GraphDB does SQL-to-SPARQL transformation and query optimization and how Microsoft’s Power BI and Tableau can be empowered by knowledge graphs. GraphDB provides a SQL interface to ensure those who prefer a SQL view of the world can have it.

Virtualization vs ETL

The most recent GraphDB release has added virtualization functionality beyond simple federation. It is now possible to create a virtual graph by mapping the columns and rows of a table to entities in the graph. It becomes possible to retrieve information from external relational databases and have it play nice with our knowledge graph. We aren’t bound by data that exists in our graph or even in RDF format. Of course it would be easier and certainly quicker to ETL a data source into a single graph and perform the query, but it is not always possible, because either the size of the dataset is too large, it gets updated too frequently or both.

For example, in the basement of your organization is a diesel-powered database that has a geological strata of decades old data and that is critical to the organization. You know and I that database is never going to be ETLed into the graph. Virtualization is your best bet, by creating a virtual graph and mapping that decade old format for client orders by saying,  “When I query about ‘client order’, you need to go to this table and this column in that behemoth server belching black smoke that’s run by Quismodo and return the results”.

There will be an inevitable hit to query performance but there are a number of situations where slow is better than not at all. Such as the over-egged example above. It is important to understand the trade-offs and practicalities between ETL and virtualization. The important thing for Ontotext is to make sure GraphDB is capable of both and provide a combined approach to maximize flexibility. There is also a webinar on this topic, introducing the open-source Ontop.

GraphDB Gives You Data Agility

Data virtualization and federation come with costs as well as benefits. There is no way we are ever going to master where the data you need exists and what format. The days of centralized control are over. It’s about finding the technology that gives your agility and GraphDB’s added virtualization capabilities enables you to create queries that include external open sources and merge it seamlessly with your own knowledge graph. Virtualization of relational databases creates incredible opportunities for applications to provide users with a single coherent view on the complex and diverse reality of your data ecosystem.

Jarred McGinnis

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

The post Data Virtualization: From Graphs to Tables and Back appeared first on DBpedia Association.

]]>
A year with DBpedia – Retrospective Part 2/2021 https://www.dbpedia.org/blog/a-year-with-dbpedia-retrospective-part-2/ Thu, 06 Jan 2022 10:40:18 +0000 https://www.dbpedia.org/?p=5110 This is the final part of our journey through 2021. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a look at the second half of 2021 and give an outlook for 2022. LSWT 2021 We kicked-off the summer with an online tutorial at the Leipziger […]

The post A year with DBpedia – Retrospective Part 2/2021 appeared first on DBpedia Association.

]]>
This is the final part of our journey through 2021. In the previous blog post we already presented DBpedia highlights, events and tutorials. Now we want to take a look at the second half of 2021 and give an outlook for 2022.

LSWT 2021

We kicked-off the summer with an online tutorial at the Leipziger Semantic Web Day (LSWT). For the first time ever the LSWT team extended the program and organized a second conference day for DBpedia enthusiasts. Many thanks to the hosts and organizing team! It was a pleasure to be part of the LSWT again.If you were unable to take part in the tutorial, please check our slides here or watch the video on the DBpedia Youtube channel.   

DBpedia Snapshot 2021-06 Release

On 23rd of July, 2021 we announced the DBpedia Snapshot 2021-06 Release. Historically, this release has been associated with many names: “DBpedia Core”, “EN DBpedia”, and — most confusingly — just “DBpedia”. In fact, it is a combination of the EN Wikipedia data, 62 million community-contributed cross-references links as well as community extensions such as additional ontologies and taxonomies. Read the announcement on the blog

Tutorial at the LDK Conference and DBpedia Day at the SEMANTiCS Conference

At the beginning of September 2021 we jumped on a plane and gave a tutorial at the Language, Data and Knowledge (LDK) conference in Zaragoza, Spain. Building upon the success of the previous events held in Galway, Ireland in 2017, and in Leipzig, Germany in 2019, the conference brought together researchers from across disciplines concerned with the acquisition, curation and use of language data in the context of data science and knowledge-based applications. This tutorial was a great success and if you would like to catch up and check our slides, please click https://tinyurl.com/TutAtLDK. Few days later we travelled to Amsterdam, The Netherlands, to join this year’s SEMANTiCS Conference. 


The DBpedia Day was part of the conference and was held on the last day of the conference on 9th of September at the Theater de Meervaart. Our CEO, Sebastian Hellmann, opened the DBpedia Day with an update about the DBpedia Databus and our members. He presented the huge and diverse network DBpedia has built up in the last 13 years. Afterwards, Maria-Esther Vidal, TIB, completed the opening session with her keynote “Enhancing Linked Data Trustability and Transparency through Knowledge-driven Data Ecosystems”. Furthermore, we organized a member presentation session, an ontology and a NLP session, where experts presented NLP and DBpedia-related topics. In case you missed the event, all slides are also available on our event page. Further insights, feedback and photos about the event are available on Twitter via #DBpediaDay

Member Features on the Blog

At the beginning of November 2020 we started the member feature on our blog. In 2021 we continued and published further interesting posts and news about our members. We gave our members the chance to present special products, tools and applications. We published several posts in which members, i.e.Triply, WorldLift, Wallscope, eccenca, Diffbot, and the Network Institute (NI) of VU Amsterdam, shared unique insights with the community. Next year we will continue with interesting posts and presentations. Stay tuned!

DBpedia Snapshot 2021-09 Release

On October 22, 2021 we announced the immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. Since the last release we made a few changes. Release notes are now maintained in the Databus collection (2021-09), we improved the image and abstract extractor and the DBpedia team worked on the community issue reporting and fix tracker at Github. The full release description including further statistics can be found on https://www.dbpedia.org/blog/snapshot-2021-09-release/.   

DBpedia Knowledge Graph Tutorial for Beginners

On 2nd of December, 2021 we organized the masterclass “Knowledge Graph tutorial for beginners” at the Connected Data World event. In this masterclass, participants learned how to consume the DBpedia Knowledge Graph with the least amount of effort. Furthermore, the masterclass introduced the DBpedia KG and we explained its dataset partitions. In case you missed the event, please watch the recorded session here.  

We do hope we will meet you and some new faces during our events next year. The association wants to get to know you because DBpedia is a community effort and would not continue to develop, improve and grow without you. We plan to have meetings or tutorials at the Data Week in Leipzig, the Web Conference’22, and the SEMANTiCS’22 conference. We wish you a happy New Year!

Stay safe and check Twitter, Instagram and LinkedIn or or subscribe to our Newsletter for the latest news and information.

Yours,

Julia 

on behalf of the DBpedia Association

The post A year with DBpedia – Retrospective Part 2/2021 appeared first on DBpedia Association.

]]>
2021 – Oh What a Fantastic Year https://www.dbpedia.org/blog/recap-2021-part-1/ Wed, 15 Dec 2021 09:08:31 +0000 https://www.dbpedia.org/?p=5086 Can you believe it..? … fourteen years ago the first DBpedia dataset was released. Fourteen years of development, improvements and growth. Now more than 3, 500 GByte of Data is uploaded on the Databus. We want to take this as an opportunity to send out a big “Thank you!” to all contributors, developers, members, funders, […]

The post 2021 – Oh What a Fantastic Year appeared first on DBpedia Association.

]]>
Can you believe it..? … fourteen years ago the first DBpedia dataset was released. Fourteen years of development, improvements and growth. Now more than 3, 500 GByte of Data is uploaded on the Databus. We want to take this as an opportunity to send out a big “Thank you!” to all contributors, developers, members, funders, believers and enthusiasts who made that possible. Thank you for your support!

In the upcoming blog series, we like to take you on a retrospective tour through 2021, giving you insights into a year with DBpedia. In the following we will also highlight our past events. 

A year with DBpedia – Retrospective Part 1

Our New Face 

On January 28, 2021, the new DBpedia website went online. We worked on the completion for about a year and at the beginning of 2021 we proudly presented the new site to the community and our members. We used the New Year’s break 2022/2021 as an opportunity to alter the layout, design and content of the website, according to the requirements of the community and our members. We’ve created a new site to better present the DBpedia movement in its many facets. We additionally integrated the DBpedia blog on the website, a long overdue step. So now, you have access to all in one spot. Read our announcement here.

Giving knowledge back to Wikipedia: Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources

Since the beginning of DBpedia, there was always a strong consensus in the community, that one of the goals of DBpedia was to feed semantic knowledge back into Wikipedia again to improve its structure and data quality. It was a topic of many discussions over the years about how to achieve this goal. We received a Wikimedia Grant for our project GlobalFactSyncRE and re-iterated the issue again. After almost two years of working on the topic, we would like to announce our final report. We submitted a summary of this report to the Qurator conference and presented it there on February 11, 2021:

Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources. Sebastian Hellmann, Johannes Frey, Marvin Hofer, Milan Dojchinovski, Krzysztof Wecel and Włodzimierz Lewoniewski.

Read the submitted paper here.

DBpedia Tutorial at the Knowledge Graph Conference

On May 4, 2021, we organized a tutorial at the Knowledge Graph Conference 2021. The tutorial targeted existing and potential new users and developers that wish to learn how to replicate our infrastructure. During the course of the tutorial the participants gained knowledge about the DBpedia Knowledge Graph (KG) lifecycle, how to find information, access, query and work with the DBpedia KG and the Databus platform as well as services (Spotlight, Archivo, etc). If you missed our presentations, please check our slides here.

Most Influential Scholars

DBpedia has become a high-impact, high-visibility project because of our foundation in excellent Knowledge Engineering as the pivot point between scientific methods, innovation and industrial-grade output. The drivers behind DBpedia are 4 out of the TOP 10 Most Influential Scholars in Knowledge Engineering and the C-level executives of our members. Check all details here https://www.aminer.cn/ai2000/ke.  

Google Summer of Code and DBpedia

For the 10th year in a row, we were part of this incredible journey of young ambitious developers who joined us as an open source organization to work on a GSoC coding project all summer. Even though Covid-19 changed a lot in the world, it couldn’t shake GSoC. If you want to have deeper insights in our GSoC student’s work you can find their blogs and repos on the DBpedia blog.

DBpedia Global: Data Beyond Wikipedia

Since 2007, we’ve been extracting, mapping and linking content from Wikipedia into what is generally known as the DBpedia Snapshot that provided the kernel for what is known today as the LOD Cloud Knowledge Graph. On June 7, 2021, we launched DBpedia Global. It’s a more powerful kernel for LOD Cloud Knowledge Graph that ultimately strengthens the utility of Linked Data principles by adding more decentralization i.e., broadening the scope of Linked Data associated with DBpedia. Think of this as “DBpedia beyond Wikipedia” courtesy of additional reference data from various sources. Get more insight and read the announcement on the DBpedia blog

In the upcoming blog post after the holidays we will give you more insights in the past events and technical achievements. We are now looking forward to the year 2022. We plan to have meetings at the Data Week 2022 in Leipzig, Germany and the SEMANTiCS 2022 conference in Vienna, Austria. Furthermore, we will be part of the WWW’22 conference and organize a tutorial. 

We wish you a merry Christmas and a happy new year. In the meantime, stay tuned and check our Twitter, Instagram or LinkedIn channels. You can subscribe to our Newsletter for the latest news and information around DBpedia.

Julia,   

on behalf of the DBpedia Association

The post 2021 – Oh What a Fantastic Year appeared first on DBpedia Association.

]]>
How Innovative Organizations Use The World’s Largest Knowledge Graph https://www.dbpedia.org/blog/how-innovative-organizations-use-the-worlds-largest-knowledge-graph/ Tue, 09 Nov 2021 12:50:53 +0000 https://www.dbpedia.org/?p=5033 DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Diffbot, a California-based company whose mission is to “extract knowledge in an automated way from documents.” They […]

The post How Innovative Organizations Use The World’s Largest Knowledge Graph appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Diffbot, a California-based company whose mission is to “extract knowledge in an automated way from documents.” They will introduce the Diffbot Knowledge Graph and present topics, like Market Intelligence and Ecommerce. Have fun reading!

by Filipe Mesquita & Merrill Cook, Diffbot

Diffbot is on a mission to create a knowledge graph of the entire public web. We are teaching a robot, affectionately known as Diffy, to read the web like a human and translate its contents into a format that (other perhaps less sophisticated) machines can understand. All of this information is linked and cleaned on a continuous basis to populate the Diffbot Knowledge Graph.

The Diffbot Knowledge Graph already contains billions of entities, including over 240M organizations, 700M people, 140M products, and 1.6B news articles. This scale is only possible because Diffy is fully autonomous and doesn’t depend on humans to build the Diffbot Knowledge Graph. Using cutting-edge crawling technology, natural language processing, and computer vision, Diffy is able to read and extract facts from across the entire web.

While we believe a knowledge graph like Diffbot’s will be used by virtually every organization one day, there are 4 use cases where the Diffbot Knowledge Graph excels today: (1) Market Intelligence, (2) News Monitoring, (3) E-commerce, and (4) Machine learning.

Market Intelligence

Video: https://www.diffbot.com/assets/video/solutions-for-media-monitoring.mp4

At its simplest, market intelligence is the generation of insights about participants in a market. These can include customers, suppliers, competitors, as well as attitudes of the general public and political establishment.

While market intelligence data is all over the public web, this can be a “double-edged sword.” The range of potential sources for market intelligence data can exhaust the resources of even large teams performing manual fact accumulation.

Diffbot’s automated web data extraction eliminates the inefficiencies of manual fact gathering. Without such automation, it’s simply not possible to monitor everything about a company across the web.

We see market intelligence as one of the most well-developed use cases for the Diffbot Knowledge Graph. Here’s why:

  • The Diffbot Knowledge Graph is built around organizations, people, news articles, products, and the relationships among them. These are the types of facts that matter in market intelligence.
  • Knowledge graphs have flexible schemas, allowing for new fact types to be added “on the fly” as the things we care about in the world change
  • Knowledge graphs provide unique identifiers for all entities, supporting the disambiguation entities like Apple (the company) vs apple (the fruit).

Market intelligence uses from our customers include:

  • Querying the Knowledge Graph for companies that fit certain criteria (size, revenue, industry, location) rather than manually searching for them in Google
  • Creating dashboards to receive insights about companies in a certain industry
  • Improving an internal database by using the data from the Diffbot Knowledge Graph.
  • Custom solutions that incorporate multiple Diffbot products (custom web crawling, natural language processing, and Knowledge Graph data consumption)

News Monitoring

Sure, the news is all around us. But most companies are overwhelmed by the sheer amount of information produced every day that can impact their business.

The challenges faced by those trying to perform news monitoring on unstructured article data are numerous. Articles are structured differently across the web, making aggregation of diverse sources difficult. Many sources and aggregators silo their news by geographic location or language.

Strengths of providing article data through a wider Knowledge Graph include the ability to link articles to the entities (people, organizations, locations, etc) mentioned in each article. Additional natural language processing includes the ability to identify quotes and who said them as well as the sentiment of the article author towards each entity mentioned in the article.

In high-velocity, socially-fueled media, the need for automated analysis of information in textual form is even more pressing. Among the many applications of our technology, Diffbot is helping anti-bias and misinformation initiatives with partnerships involving FactMata as well as the European Journalism Centre.

Check out how easy it is to build your own custom pan-lingual news feed in our news feed builder.

Ecommerce

Many of the largest names in ecommerce have utilized Diffbot’s ability to transform unstructured product, review, and discussion data into valuable ecommerce intelligence. Whether pointing AI-enabled crawlers at their own marketplaces to detect fraudulent, duplicate, or underperforming products, or by analyzing competitor or supplier product listings.

One of the benefits of utilizing Diffbot’s AI-enabled product API or our product entities within the Knowledge Graph is the difficulty of scraping product data at scale. Many ecommerce sites employ active measures to make the scraping of their pages at scale difficult. We’ve already built out the infrastructure and can begin returning product data at scale in minutes.

The use of rule-based scraping by many competitors or in-house teams means that whenever ecommerce sites shift their layout or you try to extract ecommerce web data from a new location, your extraction method is likely to break. Additionally, hidden or toggleable fields on many ecommerce pages are more easily extracted by solutions with strong machine vision capabilities.

Diffbot’s decade-long focus on natural language processing also allows the inclusion of rich discussion data parsed for entities, connections, and sentiment. On large ecommerce sites, the structuring and additional processing of review data can be a large feat and provide high value.

Machine Learning

Even when you can get your hands on the right raw data to train machine learning models, cleaning and labeling the data can be a costly process. To help with this, Diffbot’s Knowledge Graph provides potentially the largest selection of once unstructured web data, complete with data provenance and confidence scores for each fact.

Our customers use a wide range of web data to quickly and accurately train models on diverse data types. Need highly informal text input from reviews? Video data in a particular language? Product or firmographic data? It’s all in the Knowledge Graph, structured and with API access so customers can quickly jump into validating new models.

With a long association with Stanford University and many research partnerships, Diffbot’s experts in web-scale machine learning work in tandem with many customers to create custom solutions and mutually beneficial partnerships.

To some, 2020 was the year of the knowledge graph. And while innovative organizations have long seen the benefits of graph databases, recent developments in the speed of fact accumulation online mean the future of graphs has never been more bright.

A big thank you to Diffbot, especially to Filipe Mesquita and Merrill Cook for presenting the Diffbot Knowledge Graph.  

Yours,

DBpedia Association

The post How Innovative Organizations Use The World’s Largest Knowledge Graph appeared first on DBpedia Association.

]]>
Bringing Linked Data to the Domain Expert with TriplyDB Data Stories https://www.dbpedia.org/blog/triplydb-data-stories/ Fri, 08 Oct 2021 08:06:08 +0000 https://www.dbpedia.org/?p=4984 DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Triply, a Dutch company. They will introduce TriplyDB and data stories to us. Have fun reading! by […]

The post Bringing Linked Data to the Domain Expert with TriplyDB Data Stories appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with Triply, a Dutch company. They will introduce TriplyDB and data stories to us. Have fun reading!

by Kathrin Dentler, Triply

Triply and TriplyDB

Triply is an Amsterdam-based company with the mission to (help you to) make linked data the new normal. Every day, we work towards making every step around working with linked data easier, such as converting and publishing it, integrating, querying, exploring and visualising it, and finally sharing and (re-)using it. We believe in the benefits of FAIR (findable, accessible, interoperable and reusable) data and open standards. Our product, TriplyDB, is a user-friendly, performant and stable platform, designed for potentially very large linked data knowledge graphs in practical and large-scale production-ready applications. TriplyDB not only allows you to store and manage your data, but also provides data stories, a great tool for storytelling. 

Data stories 

Data stories are data-driven stories, such as articles, business reports or scientific papers, that incorporate live, interactive visualizations of the underlying data. They are written in markdown, and can be underpinned by an orchestration of powerful visualizations of SPARQL query results. These visualizations can be charts, maps, galleries or timelines, and they always reflect the current state of your data. That data is just one click away: A query in a data story can be tested or even tweaked by its readers. It is possible to verify, reproduce and analyze the results and therefore the narrative, and to download the results or the entire dataset. This makes a data story truly FAIR, understandable, and trustworthy. We believe that a good data story can be worth more than a million words. 

Examples

With a data story, the domain expert is in control and empowered to work with, analyze, and share his or her data as well as interesting research results. There are some great examples that you can check out straight away:

  • The fantastic data story on the Spanish Flu, which has been created by history and digital humanities researchers, who usually use R and share their results in scientific papers. 
  • Students successfully published data stories in the scope of a course of only 10 weeks. 
  • The beautiful data story on the Florentine Catasto of 1427.

DBpedia on triplydb.com

Triplydb.com is our public instance of TriplyDB, where we host many valuable datasets, which currently consist of nearly 100 billion triples. One of our most interesting and frequently used datasets are those by the DBpedia Association

We also have several interesting saved queries based on these datasets. 

A data story about DBpedia

To showcase the value of DBpedia and data stories to our users, we published a data story about DBpedia. This data story includes comprehensible and interactive visualizations, such as a timeline and a tree hierarchy, all of which are powered by live SPARQL queries against the DBpedia dataset. 

Let us have a look at the car timeline: DBpedia contains a large amount of content regarding car manufacturers and their products. Based on that data, we constructed a timeline which shows the evolution within the car industry. 

If you navigate from the data story to the query, you can analyze it and try it yourself. You see that the query limits the number of manufacturers so that we are able to look at the full scale of the automotive revolution without cluttering the timeline. You can play around with the query, change the ordering, visualize less or more manufacturers, or change the output format altogether. 

Advanced features

If you wish to use a certain query programmatically, we offer preconfigured code snippets that allow you to run a query from a python or an R script. You can also configure REST APIs in case you want to work with variables. And last but not least, it is possible to embed a data story on any website. Just scroll to the end of the story you want to embed and click the “</> Embed” button for a copy-pasteable code snippet. 

Try it yourself! 

Sounds interesting? We still have a limited number of free user accounts over at triplydb.com. You can conveniently log in with your Google or Github account and start uploading your data. We host your first million open data triples for free! Of course, you can also use public datasets, such as the ones from DBpedia, link your data, work together on queries, save them, and then one day create your own data story to let your data speak for you. We are already looking forward to what your data has to say!

A big thank you to Triply for being a DBpedia member since 2020. Especially Kathrin Dentler for presenting her work at the last DBpedia Day in Amsterdam and for her amazing contribution to DBpedia.

Yours,

DBpedia Association

The post Bringing Linked Data to the Domain Expert with TriplyDB Data Stories appeared first on DBpedia Association.

]]>
WordLift – Building Knowledge Graphs for SEO https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/ https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/#respond Mon, 05 Jul 2021 06:27:15 +0000 https://www.dbpedia.org/?p=4712 DBpedia Member Features – Over the last year we gave our DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with WordLift. They will show us how they help companies speak Google’s native language with data from […]

The post WordLift – Building Knowledge Graphs for SEO appeared first on DBpedia Association.

]]>
DBpedia Member Features – Over the last year we gave our DBpedia members multiple chances to present their work, tools and applications. In this way, our members gave exclusive insights on the DBpedia blog. This time we will continue with WordLift. They will show us how they help companies speak Google’s native language with data from DBpedia.  Have fun while reading!

by Andrea Volpini, WordLift

WordLift is a Software as a Service designed to help companies speak Google’s native language by converting unstructured content into structured data that search engines understand. 

It does so automatically, using natural language processing and machine learning. Most SEO tools provide insights on improving a website. WordLift creates a knowledge graph and automates some of these SEO tasks to help a site rank. We call it agentive SEO: from search intent analysis to content generation, from building internal links to improving on-page user engagement. At the core of this automation, WordLift creates 5-stars linked data using DBpedia. 

Artificial Intelligence is shaping the online world. Commercial search engines like Google and Bing have changed dramatically in the last two decades: from 10 blue links to algorithms that answer user questions without ever clicking a link or leaving the results page. As search evolves, so do the SEO tools that marketers need to cope with these changes. 

Why creating a Knowledge Graph improves SEO?

Imagine the knowledge graph behind a website as the most thoughtful way to help crawlers index and understand its content. Much like Google uses the graph as the engine to power up its search results, a knowledge graph that describes a website’s content helps machines understand the semantic meanings behind it.

In practical terms, a customised knowledge graph helps content marketers in different ways:

  • Enhancing SERP results with structured data and helping Google and Bing disambiguate your brand name or services adequately. 
  • Automating internal links to increase rankings around entities that matter for the business.
  • Providing content recommendations to enhance the customer journey.
  • Bringing additional insights to web analytics by grouping traffic related to entities and not only pages (i.e. how is the content on “artificial intelligence” performing this week?).
  • Providing the factual data required for training language models that can automatically generate content (you can read all about it in my latest blog post on AI text generation for SEO, where you will have the code to fine-tune Google’s T5 using triples from DBpedia 🎉

Here is an example of Natural Language Generation using Google’s T5 

Search Intent Discovery: a practical example of how marketers can use a KG

Let me give you another example: keyword research. The purpose of keyword research is to find and analyze search queries that people enter into search engines to create new content or improve existing ones. Using the custom knowledge graphs that WordLift produces, we help our clients quickly scout for new untapped search opportunities. 

The chart above shows a series of search intents (queries) that WordLift has generated after the user provided three ideas. Using the knowledge graph, these intents are grouped into topics such as “Android”, “Anonymity” or “Gamer” and the content editor can find the right query to target. In the treemap chart, larger boxes correspond to a higher search volume, and lighter colors indicate less competitive queries.   

How does WordLift build a Knowledge Graph?

An entity represents the “thing” described in web pages. Entities help computers understand everything about a person, an organization or a place mentioned on a website. Each entity holds the information required to provide direct answers to questions about itself and questions that can be answered by looking at the relationships with other entities. WordLift uses natural language processing to extract and connect entities with web pages. Therfore we primarily use schema.org vocabulary to build its knowledge graphs.  

WordLift heavily relies on DBpedia. We train our content analysis API on concepts that are in DBpedia and we build knowledge graphs that interlink, among other graphs, with DBpedia.

We are also starting to automatically create content using language models trained with data from DBpedia. More on this front will come in the next future. 

WordLift and the open DBpedia Knowledge Graph

Our users constantly publish and update public web data from various sources (their websites, the catalogue of their products, a data feed from a partner and more) and interlink the entities they publish with DBpedia. 

We are now excited to contribute back some of this knowledge. With the help of DBpedia we can keep on building a distributed, queryable and decentralized linked data cloud. 

Keep following us and keep contributing to the LOD cloud!

A big thank you to WordLift. Especially Andrea Volpini for presenting how WordLift creates a KG and automates some SEO tasks helping a site rank.

Yours,

DBpedia Association

The post WordLift – Building Knowledge Graphs for SEO appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/wordlift-building-knowledge-graphs-for-seo/feed/ 0
Retrospective 2021 – Half a year with DBpedia https://www.dbpedia.org/blog/retrospective-2021-half-a-year-with-dbpedia/ https://www.dbpedia.org/blog/retrospective-2021-half-a-year-with-dbpedia/#respond Thu, 01 Jul 2021 08:33:52 +0000 https://www.dbpedia.org/?p=4716 The 2nd of July 2021 marks the halfway point of the year, exactly 183 days have passed and just as many are still ahead of us. Time for us to look back on the past half year. What have we achieved? What still lies ahead of us? In the following, we will take you on […]

The post Retrospective 2021 – Half a year with DBpedia appeared first on DBpedia Association.

]]>
The 2nd of July 2021 marks the halfway point of the year, exactly 183 days have passed and just as many are still ahead of us. Time for us to look back on the past half year. What have we achieved? What still lies ahead of us? In the following, we will take you on a retrospective tour through the first half of 2021. We will highlight our past events and the development around the DBpedia dataset. Have fun reading!

New DBpedia Website

2021 started with very good news, because on January 28 we announced the completion of the new DBpedia website. We used the New Year’s break as an opportunity to alter the layout, design and content of the DBpedia website, according to the requirements of the DBpedia community and DBpedia members. We’ve created a new site to better present the DBpedia movement in its many facets. The positive feedback we received from the DBpedia community was overwhelming. These are just a few of the comments we got for the new website. 

Paper presentation at Qurator Conference on February 11, 2021 

Giving Knowledge Back to Wikipedia: As part of this years Qurator Conference Sebastian Hellmann, Johannes Frey, Marvin Hofer, Milan Dojchinovski, Krzysztof Wecel and Włodzimierz Lewoniewski presented their paper Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources. One of the highlights of the paper, which was presented on February 11, 2021, is that it brings together many aspects that require attention and drafts a roadmap to bring external data into Wikipedia from Linked Data via DBpedia. For more highlights and information read the paper or the DBpedia blog.

Restructured Data, Tools and Services Page

The DBpedia team have diligently cleaned up the website and have removed outdated content. Moreover, we’ve created a platform for new datasets, services, applications and tools. DBpedia evolves with the innovative and active use of its community members who constantly create the emerging DBpedia-related material. We value the efforts of contributing open data, tools and services for a shared reuse and application by the whole DBpedia community. Therefore, we have created a platform where all your contributions can be seen. Submit your DBpedia tool, demo or any kind of application and we will publish it here on the website. Please fill out the provided submission form!

DBpedia is part of the Google Summer of Code project

So far, each year has brought us new project ideas, many amazing students and great project results that shaped the future of DBpedia. Like every year, again we received many applications this year. Out of these applications 10 great projects from students all over the world were selected to work together with our mentors. Right now the students are in the middle of the coding phase. If you want to know more about this year’s projects go and have a look at the DBpedia blog.

DBpedia Tech Tutorial @ Knowledge Graph Conference 

On Tuesday the 4th of May, DBpedia organized a tutorial at the Knowledge Graph Conference (KGC) 2021. The ultimate goal of the tutorial was to teach the participants all relevant tech around DBpedia, the knowledge graph, the infrastructure and possible use cases. The tutorial aimed at existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers, data providers as well as data scientists. Get more information and a deeper insight on the blogpost.

Announcement of DBpedia Global 

Since 2007, we’ve been extracting, mapping and linking content from Wikipedia into what is generally known as the DBpedia Snapshot that provided the kernel for what is known today as the LOD Cloud Knowledge Graph

On June 7, 2021 we launched DBpedia Global, a more powerful kernel for LOD Cloud Knowledge Graph that ultimately strengthens the utility of Linked Data principles by adding more decentralization i.e., broadening the scope of Linked Data associated with DBpedia. Think of this as “DBpedia beyond Wikipedia” courtesy of additional reference data from various sources. Read more details and get more insights on the DBpedia blog.

What Will the Future Bring?

We are now looking forward to the next DBpedia tutorial, which will be held on July 8, 2021 co-located with the 9. Leipzig Semantic Web Day. Save your seat and register now! Furthermore, we will organize a tutorial on September 1, 2021 co-located with the LDK conference in Zaragoza, Spain. Check more details here! All good things come in threes 😉 In addition we will organize the DBpedia Day on September 9, 2021 at the Semantics Conference in Amsterdam. Check more details on our event page and save your seat now! We are looking forward to meeting all Dutch DBpedians there! 

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Julia & Emma

on behalf of the DBpedia Association

The post Retrospective 2021 – Half a year with DBpedia appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/retrospective-2021-half-a-year-with-dbpedia/feed/ 0
GSoC Bonding Period 2021 – DBpedia https://www.dbpedia.org/blog/gsoc-bonding-period-2021-dbpedia/ https://www.dbpedia.org/blog/gsoc-bonding-period-2021-dbpedia/#respond Thu, 13 May 2021 14:15:53 +0000 https://www.dbpedia.org/?p=4593 Congratulations! You made it! You are selected as one of our GSoC students, who will work with DBpedia during the summer of 2021. In the following we will introduce how you can get in contact with the DBpedia community, the developers and your great mentors. Keep reading 😉 Student Projects Announced Yesterday Google finally announced […]

The post GSoC Bonding Period 2021 – DBpedia appeared first on DBpedia Association.

]]>
Congratulations! You made it! You are selected as one of our GSoC students, who will work with DBpedia during the summer of 2021. In the following we will introduce how you can get in contact with the DBpedia community, the developers and your great mentors. Keep reading 😉

Student Projects Announced

Yesterday Google finally announced who is selected as a GSoC student for this year. Accepted students are now paired with a mentor and start planning their projects and milestones. 

GSoC Community Bonding

It’s now time to spend a month learning more about the community of DBpedia. From the 17th of May to the 7th of June, the Community Bonding is taking place before coding starts on the 7th of June. To get in touch with your mentors and everyone else from the DBpedia Community, you have plenty of options:

  • First of all, you can chat with other DBpedians on Slack, where you are able to join DBpedia developers discussion and technical discussions. 
  • But not only that, you also can join our DBpedia-discussion-mailinglist, where we discuss current DBpedia developments. 
  • To increase your visibility in the DBpedia Community, try to answer some questions in the DBpedia forum (especially in the unanswered & support category) and browse the topics. 
  • Last but not least, check out our Github repository for open issues and see if you can help to solve them (e.g issues regarding the extraction framework or mappings).

When you share something about your project on your own blog or github, please inform us and your mentors. Thus, we can share it with the community and show your work results.

In case you still have questions, please do not hesitate to contact us via dbpedia@infai.org.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our newsletter for the latest news and information around DBpedia.

We wish you all the best!

Emma

on behalf of the DBpedia Association

The post GSoC Bonding Period 2021 – DBpedia appeared first on DBpedia Association.

]]>
https://www.dbpedia.org/blog/gsoc-bonding-period-2021-dbpedia/feed/ 0