DBpedia primarily focuses on representing the factual knowledge contained in the Wikipedia infoboxes. A vast amount of information, however, is comprised in the unstructured Wikipedia article texts. In order to broaden and deepen the amount of structured DBpedia data, the article texts are targeted as another data source.
With the representation of the wiki pages in the NLP Interchange Format (NIF) we provide all information directly extractable from the HTML source code divided in three datasets:
- nif-context: the full text of a page as context (including begin and end index)
- nif-page-structure: the structure of the page in sections and paragraphs (titles, subsections etc.)
- nif-text-links: all in-text links to other DBpedia resources as well as external references
These datasets serve as the groundwork for further NLP fact extraction tasks to enrich the gathered knowledge of DBpedia.
Note: The first iteration of this extraction process only covers the abstracts of every wiki page as a trail run. It is based on the DBpedia 2016-10 release and provides the whole wiki page text in the NIF format.
IRIs: As you will see in the examples below, opposed to the IRI regime used for other DBpedia datasets, we use queries containing the version of DBpedia under which these instances were extracted.
If you find inconsistencies in these files, please contact the DBpedia mailing lists or the DBpedia association directly.
Downloads
A sample list of the most recent files is given in the table below. The whole list of available languages can be found on the DBpedia Databus platform as nif-context, nif-page-structure, and nif-text-links.
Language | nif-context | nif-page-structure | nif-text-links |
de | .ttl | .ttl | .ttl |
en | .ttl | .ttl | .ttl |
es | .ttl | .ttl | .ttl |
fr | .ttl | .ttl | .ttl |
it | .ttl | .ttl | .ttl |
ja | .ttl | .ttl | .ttl |
ko | .ttl | .ttl | .ttl |
pl | .ttl | .ttl | .ttl |
pt | .ttl | .ttl | .ttl |
The Ontology
The following Figure represents the main classes and properties of the NIF vocabulary
Libraries
Integrate the NIF library into your project by:
- adding the NIF maven library.
- compiling it on your own with the NIF-lib github project.
- compiling the pyNIF-lib github project.
Documentation
A deeper understanding of NIF can be gained by consulting the documentation. It provides the pointers to all important resources for the NLP Interchange Format.
Example:
input text: “Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history.”
The result is a set of .TTL files containing the context, page structure and text links.
nif-context.ttl
Represents the full text of a wiki page as the context for all subsequent information about this page.
dbr:Anthropology?dbpv=2016-04&nif=context a nif:#Context
.
dbr:Anthropology?dbpv=2016-04&nif=context nif:isString "Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history." .
dbr:Anthropology?dbpv=2016-04&nif=context nif:beginIndex "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:endIndex "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:sourceUrl <http://en.wikipedia.org/wiki/Anthropology> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:predLang <http://lexvo.org/id/iso639-3/eng> .
nif-page-structure.ttl
Represents the structure of the wiki page as nif:Structure instances including section, paragraph and title.
dbr:Anthropology?dbpv=2016-04&nif=context nif:hasSection dbr:Anthropology?dbpv=2016-04&nif=section_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 a nif:Section .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:beginIndex "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:endIndex "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:referenceContext dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:hasParagraph dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:hasParagraph dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:firstParagraph dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634 nif:lastParagraph dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_63 .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 a nif:Paragraph .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 nif:beginIndex "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 nif:endIndex "330"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 nif:referenceContext dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330 nif:superString dbr:Anthropology?dbpv=2016-04&nif=section_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 a nif:Paragraph .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 nif:beginIndex "331"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 nif:endIndex "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 nif:referenceContext dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634 nif:superString dbr:Anthropology?dbpv=2016-04&nif=section_0_634 .
nif-text-links.ttl
Represents all in-text links of a wiki page as nif:Word or nif:Phrase instances.
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 a nif:Word .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 nif:referenceContext dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 nif:beginIndex "29"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 nif:endIndex "37"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 nif:superString dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 <http://www.w3.org/2005/11/its/rdf#taIdentRef> dbr:Human .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37 nif:anchorOf "humanity" .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 a nif:Phrase .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 nif:referenceContext dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 nif:beginIndex "65"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 nif:endIndex "84"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 nif:superString dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 <http://www.w3.org/2005/11/its/rdf#taIdentRef> dbr:Social_anthropology .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84 nif:anchorOf "social anthropology" .
Related Publications
- Integrating NLP using Linked Data
- NIF Combinator: Combining NLP Tool Output
- NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
- NIF4OGGD – NLP Interchange Format for Open German Governmental Data
- Did you consider this information as helpful?
- Yep!Not quite ...