Moscow, Russia
Moscow, Russia

ABBYY is an international software company that provides optical character recognition, document capture and language software for both PC and mobile devices.The majority of ABBYY products, such as ABBYY FineReader, are intended to simplify converting paper documents to digital data. ABBYY also provides language products and services. Wikipedia.


Time filter

Source Type

An algorithm for assigning priorities to tasks queued for processing by users based on how heavily each tasks user used the system resources in the past, including the number of tasks queued by the user in the past, the volume of these tasks, and the amount of processor time used. In the OCR context, the tasks are graphic files placed on servers and chosen for processing in accordance with the assigned priorities.


Patent
Abbyy | Date: 2016-08-04

Disclosed are systems, computer-readable mediums, and methods for detecting glare in a frame of image data. A frame of image data is preprocessed. A set of connected components in the preprocessed frame is determined. A set of statistics is calculated for one or more connected components in the set of connected components. A decision for the one or more connected components is made, using the calculated set of statistics, if the connected component is a light spot over text. Whether glare is present in the frame is determined.


Disclosed are methods, systems, and computer-readable mediums for automatic training of a syntactic and semantic parser using a genetic algorithm. An initial population is created, where the initial population comprises a vector of parameters for elements of syntactic and semantic descriptions of a source sentence. A natural language compiler (NLC) system is used to translate the sentence from the source language into a target language based on the syntactic and semantic descriptions of the source sentence. A vector of quality ratings is generated where each quality rating in the vector of quality ratings is of a corresponding parameter in the vector of parameters. Quality ratings are evaluated according to specific criterion, which comprise parameters such as a BLEU score and a number of emergency sentences. A number of parameters in the vector of parameters are replaced with adjusted parameters.


The invention describes a system and method for creating a comparable corpus by obtaining a set of source documents containing text, constructing language-independent semantic structures for at least one sentence of each of the texts in the source documents; determining universal similarity measures for groups of the source documents by comparing the constructed language-independent semantic structures of the texts in the source documents; identifying sets of similar documents based on the determined universal similarity measures for the groups of the source documents; and creating the comparable corpus based on the identified sets of similar documents.


Patent
Abbyy | Date: 2015-03-26

A method and system for facilitating a semantic search based on one or more corpuses of natural language texts and presenting clustered results are provided. One or more corpuses of natural language texts are received including indexed linguistic parameters and semantic structures of lexical units. The linguistic parameters and semantic structures are generated during a preliminary syntactico-semantic analysis. Searching for text fragments satisfying a query in the one or more corpuses is performed. Relevance of the search results is estimated according to selected lexical meaning.


Patent
Abbyy | Date: 2016-06-26

Disclosed are systems, computer-readable mediums, and methods for determining that text contains Chinese, Japanese, or Korean characters. One method includes determining a language hypothesis for each text fragment in a plurality of text fragments identified from connected components in a document image. The method further includes selecting a first subset of text fragments from the plurality of text fragments based on ratings for the language hypothesis of each text fragment in the plurality of text fragments. The method further includes verifying, by a processor, the language hypothesis of one or more text fragments in the first subset of text fragments based on optical character recognition of the one or more text fragments. The method further includes determining, by the processor, that Chinese, Japanese, or Korean (CJK) characters are present in the document image based on the verification of the language hypothesis of each of the one or more text fragments.


There is disclosed a method of determining a document type associated with a digital document, the method executable by an electronic device. A processor of the electronic device is configured to execute a plurality of machine learning algorithm (MLA) classifiers, each of the plurality of MLA classifiers having been trained to identify a specific document type. The plurality of MLA classifiers is ranked in a hierarchical order of execution of the plurality of MLA classifiers. A method of training the plurality of MLA classifiers is also disclosed.


Patent
Abbyy | Date: 2015-09-16

The current document is directed to an electronic community-based translation service that includes a distributed computer system, electronic communications media and infrastructure, multiple user processor-controlled devices, and control components that control the distributed computer system, communications infrastructure, and processor-controlled user devices to provide an electronic community in which users can view and access translations, search for translations, and provide translations. Because the translation service is community-based, users accessing translations are able to determine various characteristics associated with community-provided translations and those who provide them. In addition, users can follow community use of translations, including determining who has used community-provided translations, shared community-provided translations with others, and rated translations.


Systems and methods for creating ontologies by analyzing natural language texts. An example method comprises: receiving a plurality of semantic structures associated with a text corpus; identifying a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associating, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure.


Systems and methods for classifying document images using color layer information. An example method comprises: receiving, by a processing device, a document image; determining values of one or more parameters of the document image, wherein at least one parameter is evaluated by extracting one or more color layers of the document image; and associating, based on the values of the parameters, the document image with a category of a plurality of categories.

Loading ABBYY collaborators
Loading ABBYY collaborators