namedropper Code Documentation for NameDropper

spotlight

class namedropper.spotlight.DBpediaResource(uri, language='en', spotlight_info={})

An object to encapsulate properties and functionality related to a specific dbpedia item.

Parameters:
  • uri – dbpedia resource uri
  • language – optional language code, for multilingual properties like label; defaults to ‘en’
  • spotlight_info – optional dictionary of data returned from spotlight annotation, to avoid unnecessary look-ups (e.g., for type of resource)
type

high-level type of resource; currently only supports person, place, and organization.

namedropper.spotlight.OWL = Namespace(u'http://www.w3.org/2002/07/owl#')

rdflib.Namespace for OWL (Web Ontology Language)

class namedropper.spotlight.SpotlightClient(base_url=None, confidence=None, support=None, types=None)

Client for interacting with DBpedia Spotlight via REST API.

http://wiki.dbpedia.org/spotlight/usersmanual?v=ssd

Parameters:
  • base_url – Base URL for DBpedia Spotlight webservice, when not using the hosted service at default_url
  • confidence – default minimum confidence score (optional)
  • support – default minimum support score (optional)
  • types – list or string of default types to be returned when recognizing and annotating text
annotate(text, confidence=None, support=None, types=None)

Call the DBpedia Spotlight annotate service.

All arguments other than text are optional; if default configurations were specified when the client was initialized, those will be used unless an overriding value is specified here.

Parameters:
  • text – text string to be annotated
  • confidence – minimum confidence score (e.g., 0.5) [optional]
  • support – minimum support score [optional]
  • types – list or string of entity types that should be recognized and returned (such as Person, Place, Organization) [optional]
Returns:

dict with information on identified resources

default_url = 'http://spotlight.dbpedia.org/rest'

Default base url for DBpedia Spotlight web service

total_api_calls

number of API calls made

total_api_duration

datetime.timedelta - total duration of all API calls

namedropper.spotlight.cached_property(f)

returns a cached property that is calculated by function f

viaf

class namedropper.viaf.ViafClient

Client for interacting with VIAF (Virtual International Authority File) API.

http://www.oclc.org/developer/documentation/virtual-international-authority-file-viaf/using-api

autosuggest(term)

Query autosuggest API. Returns a list of results.

find_corporate(name)

Search VIAF by local.corporateNames

find_person(name)

Search VIAF by local.personalNames

find_place(name)

Search VIAF by local.geographicNames

search(query)

Query VIAF seach interface. Returns a list of feed entries, as parsed by feedparser.

Parameters:query – CQL query in viaf syntax (e.g., cql.any all "term")

util

class namedropper.util.AnnotateXml(mode, viaf=False, geonames=False, track_changes=False, xml_object=None)

Annotate xml based on dbpedia spotlight annotation results.

Currently using logging (info and warn) when VIAF look-up fails or attributes are not inserted to avoid overwriting existing values.

When track changes is requested, processing instructions will be added around annotated names for review in OxygenXML 14.2+. In cases where a name was untagged, the text will be marked as a deletion and the tagged version of the name will be marked as an insertion with a comment containing the description of the DBpedia resource, to aid in identifying whether the correct resource has been added. If a recognized name was previously tagged, a comment will be added indicating what attributes were added, or would have been added if they did not conflict with attributes already present in the document.

When using the track changes option, it is recommended to also run meth:enable_oxygen_track_changes once on the document, so that Oxygen will automatically open the document with track changes turned on.

Parameters:
  • mode – mode (tei or ead) of tags to insert
  • viaf – if True, convert DBpedia person URIs to VIAF URIs when possible (optional, defaults to False)
  • geonames – if True, convert DBpedia person URIs to GeoNames.org URIs when possible (optional, defaults to False)
  • track_changes – if True, flag annotations with OxygenXML track changes processing instructions for later review
  • xml_objecteulxml.xmlmap.XmlObject for the top-level XML document associated with the node(s) to be annotated. Used for validation to check that inserted elements are allowed.
annotate(node, annotations)

Annotate xml based on dbpedia spotlight annotation results. Assumes that dbpedia annotate was called on the normalized text from this node. Currently updates the node that is passed in; whitespace will be normalized in text nodes where name tags are inserted. For TEI, DBpedia URIs are inserted as ref attributes; since EAD does not support referencing URIs, VIAF ids will be used where possible (currently only supports lookup for personal names).

If recognized names are already tagged as names in the existing XML, no new name tag will be inserted; attributes will only be added if they are not present in the original node.

Parameters:
Returns:

total count of the number of entities inserted into the xml

geonames = False

GeoNames flag: if true, annotate will convert dbpedia place URIs to GeoNames URIs when possible

get_attributes(res, quiet=False)

Get the attributes to be inserted, based on the current document mode and the type of DBpediaResource.

Parameters:resnamedropper.spotlight.DBpediaResource
Returns:dictionary of attribute names -> values
get_tag(res)

Get the name of the tag to be inserted, based on the current document mode and the type of DBpedia resource.

Parameters:resnamedropper.spotlight.DBpediaResource instance for the tag to be inserted
Returns:string tag
track_changes = False

OxygenXML track changes flag: if true, annotation will be tagged with OxygenXML track changes processing instruction, to enable review within Oxygen Author mode

viaf = False

VIAF flag: if true, annotate will convert dbpedia person URIs to VIAF URIs when possible

namedropper.util.OLDannotate_xml(node, result, mode='tei', track_changes=False)

Annotate xml based on dbpedia spotlight annotation results. Assumes that dbpedia annotate was called on the normalized text from this node. Currently updates the node that is passed in; whitespace will be normalized in text nodes where name tags are inserted. For TEI, DBpedia URIs are inserted as ref attributes; since EAD does not support referencing URIs, VIAF ids will be used where possible (currently only supports lookup for personal names).

If recognized names are already tagged as names in the existing XML, no new name tag will be inserted; attributes will only be added if they are not present in the original node.

Currently using logging (info and warn) when VIAF look-up fails or attributes are not inserted to avoid overwriting existing values.

When track changes is requested, processing instructions will be added around annotated names for review in OxygenXML 14.2+. In cases where a name was untagged, the text will be marked as a deletion and the tagged version of the name will be marked as an insertion with a comment containing the description of the DBpedia resource, to aid in identifying whether the correct resource has been added. If a recognized name was previously tagged, a comment will be added indicating what attributes were added, or would have been added if they did not conflict with attributes already present in the document. When using the track changes option, it is recommended to also run meth:enable_oxygen_track_changes once on the document, so that Oxygen will automatically open the document with track changes turned on.

Parameters:
  • node – lxml element node to be updated
  • result – dbpedia spotlight result, as returned by namedropper.spotlight.SpotlightClient.annotate()
  • track_changes – mark changes using OxygenXML track changes processing instructions, to enable review in OxygenXML author mode
Returns:

total count of the number of entities inserted into the xml

namedropper.util.autodetect_file_type(filename)

Attempt to auto-detect input file type. Currently supported types are EAD XML, TEI XML, or text. Any document that cannot be loaded as XML is assumed to be text.

Returns:“tei”, “ead”, “text”, or None if file type is not recognized
namedropper.util.enable_oxygen_track_changes(node)

Add a processing instruction to a document with an OxygenXMl option to enable the track changes mode.

namedropper.util.normalize_whitespace(txt, next=None, prev=None)

Normalize whitespace in a string to match the logic of normalize-space() in XPath. Replaces all internal sequences of white space with a single space and conditionally removes leading and trailing whitespace.

Parameters:
  • txt – text string to be normalized
  • next – optional next string; used to determine if trailing whitespace should be removed
  • prev – optional preceding string; used to determine if leading whitepace should be removed

scripts

class namedropper.scripts.ScriptBase

Base class for namedropper command-line scripts.

Directions for use:
  • script description, to be displayed via argparse, should be set as class doctring
  • extend init_parser() for additional command-line arguments
  • extend run() with main script functionality

Init method will initialize the argument parser, parse command-line arguments, check that file type is either specified or can be auto-detected, and execute run().

Parser is saved as parser, in case other script logic needs reference to it.

init_parser()

Initialize an argument parser with common arguments. Currently includes filename and input type.

Extend to add arguments.

init_xml_object()

Initialize an xmlobject based on user-specified arguments for filename and type. Returns an instance of the appropriate XmlObject, or displays an error message if the document could not be parsed as XML.

parser = None

argparse.ArgumentParser instance to be initialized by init_parser() at class instantiation.

run()

placeholder method - extend with script logic

Project Versions

Table Of Contents

Previous topic

name-dropper scripts

Next topic

Scripts

This Page