Time filter

Source Type

Moscow, Russia

ABBYY /ˈʌbɪ/ is an international software company that provides optical character recognition, document capture and language software for both PC and mobile devices.The majority of ABBYY products, such as ABBYY FineReader, are intended to simplify converting paper documents to digital data. ABBYY also provides language products and services. Wikipedia.

Disclosed are methods, systems, and computer-readable mediums for automatic training of a syntactic and semantic parser using a genetic algorithm. An initial population is created, where the initial population comprises a vector of parameters for elements of syntactic and semantic descriptions of a source sentence. A natural language compiler (NLC) system is used to translate the sentence from the source language into a target language based on the syntactic and semantic descriptions of the source sentence. A vector of quality ratings is generated where each quality rating in the vector of quality ratings is of a corresponding parameter in the vector of parameters. Quality ratings are evaluated according to specific criterion, which comprise parameters such as a BLEU score and a number of emergency sentences. A number of parameters in the vector of parameters are replaced with adjusted parameters.

Abbyy | Date: 2015-01-02

The present disclosure provides methods and systems for performing syntactic analysis of a text. In some implementations the method includes performing rough syntactic analysis of the text, generating a graph of generalized constituents of the text and filtering arcs of the graph of generalized constituents with a combination classifier which includes a tree classifier and one or more linear classifiers. The combination classifier is trained using parallel analysis of an untagged two-language text corpus.

The current document is directed to methods and systems for identifying Chinese, Japanese, Korean, or similar language symbols that correspond to symbol images in a scanned-document image or other text-containing image. In a first processing phase, each symbol image is associated with a set of candidate graphemes. In a second processing phase, each symbol image is evaluated with respect to the set of candidate graphemes identified for the symbol image during the first phase. As candidate graphemes are processed, the currently described methods and systems monitor progress towards identifying a matching grapheme and, when insufficient progress is observed, terminate processing of the candidate graphemes and identify the symbol image as a non-symbol-containing area of the scanned-document image or other text-containing image.

The current document is directed to methods and systems that convert document images containing mathematical expression into corresponding electronic documents. In one implementation, an image or sub-image containing a mathematical expression is recursively partitioned into blocks separated by white-space stripes. Horizontal and vertical partitioning are alternately and recursively applied to the image or sub-image containing a mathematical expression until the lowest-level blocks obtained by partitioning correspond to symbols recognizable by character-recognition methods. Graph-based analysis of the recognized symbols provides a basis for encoding an equivalent representation of the mathematical expression contained in the image or sub-image.

Abbyy | Date: 2015-01-02

Systems and methods for enhancing and comparing documents. An example method comprises: comparing document images to identify a first document image of a reference document that corresponds with a second document image of a related document; transforming the second document image based on a layout of the first document image; and performing character recognition of the second document image.

Discover hidden collaborations