This document was produced in the context of Innoradiant’s effort to provide people engaged on Covid-19 research a text mining platform to explore relevant scientific papers. In this document we describe a prototypical exploration of AsI-Health document base.
In this section we propose an example of how explorations and text mining could be performed in AsI-Health. This is really just an example which uses free text search to restrict the initial number of data. However, it is not at all a binding approach: it is perfectly plausible to start with just some constraints on sentence and refine by using constraint on relation. Or even to start directly from relations.
In most cases our research of information will start with a full text query. Imagine we are interested in the role played by proteins. We can write:
All information in the page is updated accordingly, and data is gathered only from sentences containing that word. For instance the tag clouds of concepts and medical terms (Named Entities) are upgraded with all concepts and medical terms which are present in sentences containing the word protein:
For more information on the possibility to formulate different kinds of queries please consult section 2.1.
Suppose that in these tags could I see something interesting, for instance the term binding:
By clicking on that word I select information only from sentences containing the word protein and the concept “binding”:
This selection process has no limit and I can add more and more constraints to restrict the quantity of information “under study”. In the specific case the topmost left widget summarize the information:
These numbers must be interpreted as: “with the current constraints active AsI-Health displays information from 120 sentences which are distributed in 72 distinct documents and contain 336 relations.
By looking at the section Relations (see our document “General Principles” to learn more on the nature of relations) we have a more precise picture of the situation:
By clicking for instance on binding in the object cloud, you will restrict your selection to relations where the word binding appears an object (e.g. in the table to the right of tag clouds):
The numbers at the top of the page must now be interpreted as:
There are 905 relations whose object is binding and which are contained in sentences which contains the word protein and the concept binding. These relations are distributed in 511 sentences, which are in turn contained in 212 papers.
This is the interpretation of the different tables in the Relation section:
- The table at the right of tag clouds shows only full relations, i.e. relations where both a subject and an object have been identified:
- The leftmost table in the Partial Relation section shows all predicate-object pairs where binding is the object, irrespective on the fact that a subject was found or not (in this case it is interesting only for looking at the aggregations of predicates).
- The rightmost table shows all predicate-subject pairs irrespective of object identification. It goes without saying that, as in this specific case we imposed the constraint that the object must be “binding” (thus necessarily present), this partial table is identical to the table of full triples
If a table contains more than 20 lines, you can scroll down to access pagination:
The research on relation we conducted so far was quite “strict” in the sense that we selected only relation where the object is the word binding, literally taken. So for instance we did not include relations where the object is “efficient binding”, “binding to human ace2”, etc. So we have only 905 relations:
To have a wider vision we must go to the section Nominals (please refer to “General Principles” document for a conceptual explanation on nominal). From the lemma table we select (Filter for value) the lemma “binding”:
We see that nothing change in our data, and this is as expected: indeed the constraint that we select on the literal word binding is still active. We can move the mouse over the “stronger” constraint and deselect it (or even delete it, see section 2.2):
(we could have deselected also binding as a concept, but in this case it would have caused no difference, as sentences containing the word as object also contain it as a concept)
Now the situation looks different. First, we have now 2524 relations, all having the word binding as a head:
And if we look to any of the three relation tables we see much more varied objects, all containing binding as a head lemma:
At any moment you can scroll down the page to see the sentences you are currently focusing on:
Here it should be noticed that documentText has to be interpreted as sentenceText. Moreover, for the time being, highlight only works for results of free text search.
By clicking on View doc the original document will pop up in a new window (assuming it is still present and free access on the publisher site).
Here we provide a more detailed description of some of the widget used in the AsI-Health Graphical User Interface.
Searches are by default as OR thus by typing “protein erythrocyte” you will have all sentences containing either of the words ranked by pertinence (e.g. sentences where the searched words are closer will appear in first positions). In order to run a strict AND search you must explicitly use the AND operator e.g. “protein AND erythrocyte”. To search for a phrase, i.e. sentences containing adjacent words you must use quotes as in:
To search for sentence containing the word protein but not ace2 you must type
protein NOT ace2
Parenthesis can be used to formulate mix of Boolean queries as in:
protein NOT (ace2 OR sars) AND cell
which means: “search for all sentences containing the word protein and cell but not either ace2 or sars.
Please take care of the fact that for the time being words are not indexed in their lemmatized form, thus to be complete your research might have to include both “protein” and “proteins”.
Constraints (with the exclusion of free text search) appear on a scrolling window as in:
When hoovering with the mouse on a constraint several actions appear:
- Disable the constraint temporarily
- Pin/unpin the constraints (advanced usage)
- Set the constraint to a negative one (i.e. select all sentences not containing concept Wuhan)
- Get rid of the constraint
- Edit the constraint manually (advanced usage)
Some global actions (i.e. affecting all filters/constraints) are available as well:
AsI-Health offers the possibility to select specific temporal interval. Formally temporal selection operates in the same way as any other constraint. On histogram type diagrams temporal selection is operated by selecting an area of the diagram. For instance:
The selection would give, on mouse release:
I.e. our set of sentences is now constrained to the ones appeared between April 20th 2020 and May 18th 2020. It should be noticed that temporal constraints are stickier than standard constraints. First, they do not appear in the standard constraint area, but in the red circled are of the previous diagram. Second, they cannot be disabled, but just changed. If you click on the selected interval you will have options for the selection.
In this panel, you will find different ways to precisely obtain the desired time selection: