Xynn is our proprietary technology platform, which incorporates our own algorithms as well as cutting-edge artificial intelligence research from academia. It powers all the products on our website and also allows us to quickly generate bespoke solutions. In this post, I’d like to share a little about Xynn and how it provides a very powerful basis for building natural language solutions.
Most of what we do here at Lore involves understanding human-generated content, and Xynn is really at the core of what allows us to do that. We sometimes call it a “knowledge discovery engine,” but at its lowest level it’s a “text understanding engine” (you can’t discover knowledge until you’ve understood what you’ve read!).
Xynn isn’t a single entity but rather a platform that powers our products and customized solutions. It’s a collections of tools we’ve developed over the past year that work closely together to ingest, annotate, understand, and analyze text. In order to process a client’s document, whether a large multi-page PDF presentation or a single tweet, we first need to pull out and understand the relevant text (and possibly images) in the document and then store and index that information. This is Xynn’s job.
In very general terms, Xynn processes text in several steps:
Extract: This involves parsing the input document (PDF, Word file, HTML, etc.), pulling out the text, and splitting it into a series of paragraphs, sentences, and words. Documents can also have metadata associated with them (author, publication date, category, etc.), which is all pulled in as well.
Parse & Annotate: Xynn tries to break down the grammatical structure of each sentence, determining what role each word plays (Is it the subject or object? What verb is it connected with? etc.). This information is also used to identify interesting objects (or entities) in the text, including proper nouns (people, organizations, cities, etc.) and other useful categories (currencies, dates, time periods, etc.). By leveraging its understanding of grammar, Xynn can identify entities it has never seen before because of how they’re used in a sentence.
Link & Contextualize: The annotations above all work at the level of individual sentences, but in this next step we apply global intelligence. This happens in two stages. First, each sentence is compared to the rest of the document and to all related documents (e.g., from the same client). Second, the information in the text is compared to all the relevant structured information we have: a client’s database, taxonomy, or even information in public data sources (like Wikipedia). For instance, Xynn would recognize that the word “Lincoln” might be a reference to a US president or to a brand of cars, and it would figure out which is correct based on the other text in the document (or how the word is used in the sentence).
Sentiment & Themes: Xynn uses the language of the text to tag it with more “subjective” annotations, such as sentiment or tone (positive, negative, etc.). It also assigns a set of likely keywords and topics to each part of a document. These may be specific, important terms from the text itself, but they can also be pulled from a list provided by the customer. These annotations provide very useful metadata when analyzing or interacting with the text.
Index & Enrich: Finally, the output of this process is saved into our high-performance database so that it’s ready to be searched and analyzed. Each document is stored and indexed along with all the information extracted above. In many cases, we also enrich the document with additional information pulled in from public data sources or customer databases. For instance, if a document mentions a group of suppliers of one of our customers, Xynn would retrieve and index each supplier’s country, as well as the fact that they are suppliers, along with the document. This would enable us to very quickly retrieve all documents mentioning all suppliers from a given region (even if the region’s name is never mentioned in the document).
After processing, Xynn’s version of each document can be thought of as an extremely informative “onion,” with the original document at the core and successive layers building on each other to provide an ever-richer understanding of the document.
A final step is to aggregate and analyze the extracted information across all the documents in a given library. Xynn builds a database of all the entities, the relationships between them, the properties they have, the contexts in which they appear, and so forth. This Knowledge Graph is a centralized and structured version of the information contained in all the documents. A lot more can (and should!) be said about this, but we’ll leave that for a future post…
Once documents have been ingested and processed by Xynn, the real work begins! The intelligence captured in the document “onion” provides the basis for a wide range of interesting applications, such as:
- Understanding questions from your customers in a support email or chat.
- Deciphering the subject, tone, and relevance of a tweet about your product.
- Evaluating whether a new report should be classified as “high” or “low” risk.
- Finding exactly the right document in response to a complex query.
Of course, to perform these kinds of tasks (the real “knowledge discovery”), we have to turn to the “higher level” of intelligence in Xynn. This is a set of more bespoke tools that leverage the rich structure described above to achieve very specific goals.
For instance, we used Xynn to build Salient, our “AI-powered assistant” for auditors or analysts. Salient harnesses the underlying intelligence of Xynn to provide the following additional functionality:
Classification (Highlighters): Documents or individual sentences within them can each be tagged with a set of labels provided by our customers through an interactive web interface. Xynn analyses the labeled text to determine what characterizes a given label. Bolstered by its rich understanding of the document (the “onion”), it can very quickly achieve a high labeling accuracy using only a few dozen examples. This allows an analyst or auditor to effortlessly teach Salient to identify exactly the kinds of phrases or sentences they’re interested in and instantaneously extract them out of hundreds or thousands of documents, a task that would otherwise take several days.
Information Extraction (Patterns): Because Xynn breaks sentences down into their grammatical constituents, it can also be used to extract structured information from unstructured text. For instance, using Xynn, we can generate an investor/company/amount table by scouring press releases and finding any mention of an investor investing in a company. The information in that text is then converted into a row in a table, allowing us to aggregate information from hundreds of documents and put it into an immediately actionable form.
Of course, there are many more things that can be done with Xynn. It’s a very flexible, scalable and performant system. Running on a single modern computer, it can process over 1000 sentences each second, and it’s also fully cluster native, making it easy to run in parallel for high-volume workloads. Using Xynn, we process tens of thousands of news articles and corporate filings every day.