Entity Extraction Question & Answer

advertisement
Alchemist Hour – Entity Extraction
Question & Answer:
1. How many different entity types does AlchemyAPI support?
AlchemyAPI supports 42 primary types, along with hundreds
of subtypes. Click here for a list.
2. Is there a limit to the number of entities the API can identify in
one call?
No, as long as the size of the text falls within the 50kb limit.
3. Can you raise the size restriction for the text or HTML?
No. With entities, however, you can break up the text into
multiple pieces if the file is too large. This will not impact the
number of entities extracted.
4. How quickly does extraction occur?
The entity extraction call takes less than one second to return
a response with entities from the text.
5. When looking at the types of entities supported, it does not
appear that dates are included. Is there any reason for that?
To determine the publication date of a piece of text, use
the Pub Date API, as opposed to the Entity Extraction API.
6. So, your entity extraction uses machine learning to find entities?
Yes. We use a hybrid neural network and statistical
approach to train our systems both in an unsupervised and
supervised manner.
7. Is the knowledge graph a dbpedia graph, or some other
customized graph?
The knowledge graph is our own custom graph and was built
by our engineers.
8. Will AlchemyAPI lemmatize or stem non-capitalized words? If so,
will this impact the entities extracted?
Capitalization will be taken into account when extracting
entities, but it is not the sole factor in determining whether
an entity is extracted.
9. Are entities based on ontology of general things or can they be
trusted when extracting entities from websites containing scientific
data?
Our Entity Extraction API is trained on relatively general data.
However, we have trained it on a large sum of data, and therefore
it performs well on industry-specific entities. In the near future, we
hope to allow people to customize based on their own documents.
Try our demo to see how well the Entity Extraction API works for
your use case.
10. When given an input, how does the API identify and extract
names?
The system is trained to recognize well-known names. That
being said, it can also recognize generic names based on the
context of the sentence.
11. How does pricing work for sentiment analysis in entity extraction?
Entity extraction classifies as one transaction. When you add
sentiment analysis, it will add an additional transaction, equaling
two total transactions. With AlchemyAPI’s free plan, you receive
1,000 transactions per day. To upgrade from the free plan, our
sales team can get you up and running quickly. Visit our sales
page to contact them.
12. What are some good examples of use cases for entity extraction?
Entity extraction is often used in combination with our other
services. For example, it is often used in conjunction with our
Sentiment Analysis API and News API to pull out relevant semantic
information.
13. Can you provide an example of the entity extraction using the
“Crime” entity?
If you input the sentence, “He was accused of murder”, the
Entity Extraction API will extract “murder” as a crime-type entity.
Try it for yourself in the demo.
14. Questions surrounding personalization
We received several questions about various types of
personalization, including:
 Can we feed the AlchemyAPI parser with our own
classes?
 Can we provide/add a third-party datasource for
entity disambiguation?
 Can I define a list of custom entities?
 Can I build a customized knowledge graph, which
includes entities specific to my company’s corpus?
We would like to address all of these questions
simultaneously. We are looking to accommodate customization requests,
but do not currently have any solutions at this time.
15. How is the sentiment score of an entity calculated?
Our sentiment score represents how confident the API is
about the sentiment type of the associated term. Score values close to
zero represent low confidence, while values close to -1 indicate that we
have high confidence the sentiment is negative and values close to 1
indicate that we have a high confidence that the sentiment is
positive. Exactly how this number is calculated is proprietary, but it
involves using both supervised and unsupervised learning to train neural
nets.
16. How does sentiment analysis in entity extraction differ from
sentiment analysis in the News API?
Using the News API, developers can pull entity-level
sentiment analysis information, determined on a word-by-word
basis. Document-level sentiment analysis is a bit different, as it
looks at the entire text to determine if it is positive or negative.
17. Is the knowledge graph the same as the taxonomy feature?
No, but they are similar. Taxonomy looks at the entire text
and tries to fit it within a hierarchy. The knowledge graph works
behind the scenes to find a hierarchy for a single keyword or
entity.
18. Does AlchemyAPI provide any functionality to compare entities in
two separate documents and match text, based on similar, if not
the same, entities?
This is not currently an available function. However, this is
fairly easy to do on the application side once you have the results
from the two separate calls.
19. What is the difference between keywords, entities, and concepts?
Entities are specific nouns and noun phrases (i.e. specific
companies/persons, such as “Douglas Adams”)
Keywords are general nouns and noun phrases (i.e. nonspecific, such as “Author”)
Concepts are identified as the sum of the text, instead of
extracting individual pieces, even it the topic is not explicitly
stated.
For example, the phrase, “The CEO of Space X and Tesla Motors.”
Keywords: “CEO”
Entities: “Space X”, “Tesla Motors”, “CEO”
Concepts: “Elon Musk”
20. Is it possible to exclude extraction of certain entity types?
Using the structuredEntities parameter, you can eliminate
certain types of entities, including Twitter handles and hashtags. It
is also easy to filter entities after your data is returned, based on
type. Visit the docs for more information.
21. Does AlchemyAPI identify entailment? For HTML, for instance,
JavaScript that makes elements visible.
Our system tries to identify and analyze the main text in a
document.
22. Is all data extracted only in JSON format, or does AlchemyAPI
graph visualizations to communicate the data?
AlchemyAPI provides structured text output, such as JSON or
XML. To obtain visual representations of the data, you will need to
construct those visualizations from the output that is returned. Take a
look at this recipe that showcases building visualizations using R.
23. Is there much in academic literature about AlchemyAPI and its
features? Do you have scientists and researchers who might
publish?
AlchemyAPI has not published any papers surrounding the
techniques used for our services; that information is kept
proprietary.
24. When the API references entities related to pronouns, how is the
distance of that reference determined? Is it immediately adjacent
sentences only?
The way in which our systems operate is kept for proprietary
use only. Please see the answer to question 22.
25. We noticed that some location entities have geonames ids, and
others do not. Will you be adding geonames ids to more geo
entities in the future?
We do not have any current plans to update our geo entities
feature. Keep checking in for any updates that may become
available.
Download