0

I am trying to get the respective DBPedia entry for a list of companies. I can't figure out how to do approximate matches. Example: "Audi" is called "Audi AG" in DBPedia and "Novartis" is called "Novartis International AG" (foaf:name). How do I search for entries with rdf:type = dbo:Company and name closest to whatever I provide?

I'm using SPARQL as the query language. (But I'm open to change if there is an advantage.)

select ?company
where {
  ?company foaf:name "Novartis"@en.
  ?company a dbo:Company.
}
LIMIT 100

I get no hit but http://dbpedia.org/page/Novartis should be found. Matching the beginning of the name might be good enough to get this.

  • SPARQL is a language, with many functions, and DBpedia is hosted on an engine with many extensions... Not to mention built-in tools that can help you build more queries based on one you build within it. – TallTed Apr 16 at 1:50
  • Possible duplicate of Query for best match to a string with SPARQL? – TallTed Apr 16 at 2:00
  • Why do you think http://dbpedia.org/resource/Novartis should be found with your query? The foaf:name of this resource is "Novartis International AG"@en and only its rdfs:label is "Novartis"@en - anything beyond exact matching of existing literals in the RDF triples can only be solved by some FILTER with one of the string functions (regex, contains, strstarts) or some extended functions not part of the SPARQL 1.1 standard but triple store dependent. – AKSW Apr 16 at 3:24
  • 1
    a more complete check for exact match on DBpedia is to consider redirects also known as surface forms or synonyms of resources: ?company ^dbo:wikiPageRedirects?/(rdfs:label|foaf:name) "Novartis"@en. – AKSW Apr 16 at 6:15
1

For DBpedia, the best option might be to use the bif:contains full-text search pseudo property:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  ?name bif:contains "Novartis"@en.
}

This feature is specific to the Virtuoso database that powers the DBpedia SPARQL endpoint.

If you want to stick to standard SPARQL, to match at the beginning of the name only:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  FILTER strStarts(?name, "Novartis")
}

Unlike the full-text feature, this version cannot make use of a text index, so it is slower.

If you want a more flexible match:

SELECT ?company {
  ?company a dbo:Company.
  ?company foaf:name ?name.
  FILTER contains(lCase(?name), lCase("Novartis"))
}

This will find a case-insensitive match anywhere in the name.

  • Thanks @cygri! I went for the first approach using "bif:contains" and it works perfectly! – Mike Dynamite Apr 22 at 11:57

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.