Moscow, Russia
Moscow, Russia

ABBYY is an international software company that provides optical character recognition, document capture and language software for both PC and mobile devices.The majority of ABBYY products, such as ABBYY FineReader, are intended to simplify converting paper documents to digital data. ABBYY also provides language products and services. Wikipedia.


Time filter

Source Type

An algorithm for assigning priorities to tasks queued for processing by users based on how heavily each tasks user used the system resources in the past, including the number of tasks queued by the user in the past, the volume of these tasks, and the amount of processor time used. In the OCR context, the tasks are graphic files placed on servers and chosen for processing in accordance with the assigned priorities.


Patent
Abbyy | Date: 2016-08-04

Disclosed are systems, computer-readable mediums, and methods for detecting glare in a frame of image data. A frame of image data is preprocessed. A set of connected components in the preprocessed frame is determined. A set of statistics is calculated for one or more connected components in the set of connected components. A decision for the one or more connected components is made, using the calculated set of statistics, if the connected component is a light spot over text. Whether glare is present in the frame is determined.


Disclosed are methods, systems, and computer-readable mediums for automatic training of a syntactic and semantic parser using a genetic algorithm. An initial population is created, where the initial population comprises a vector of parameters for elements of syntactic and semantic descriptions of a source sentence. A natural language compiler (NLC) system is used to translate the sentence from the source language into a target language based on the syntactic and semantic descriptions of the source sentence. A vector of quality ratings is generated where each quality rating in the vector of quality ratings is of a corresponding parameter in the vector of parameters. Quality ratings are evaluated according to specific criterion, which comprise parameters such as a BLEU score and a number of emergency sentences. A number of parameters in the vector of parameters are replaced with adjusted parameters.


Systems and methods for identifying word collocations in natural language texts. An example method comprises: performing, by a computing device, semantico-syntactic analysis of a natural language text to produce a plurality of semantic structures; generating, in view of relationships defined by the semantic structures, a raw list of word combinations; producing a list of collocations by applying a heuristic filter to the raw list of word combinations; and using the list of collocations to perform a natural language processing operation.


Patent
Abbyy | Date: 2015-12-14

A data capture component of a mobile device receives information for an identification of a data field in a physical document. The data capture component receives a video stream comprising a plurality of frames, wherein each frame comprises a portion of the physical document. A frame is selected from the plurality of frames in the video stream. One or more text regions in the frame are identified. Each of the identified text region(s) in the frame is processed to identify data of each of the identified text region(s) and to select data of one of the identified text region(s) that corresponds to a set of attributes associated with the data field. The selected data is then compared with data of text regions of a subsequent frame. If the data of the text regions of the subsequent frame is a closer match to the set of attributes, the selected data is updated. A display field is then provided with the selected data for presentation in a user interface.


Patent
Abbyy | Date: 2015-12-18

Systems and methods for creating ontologies by analyzing natural language texts. An example method comprises: receiving identifiers of a first plurality of word groups within a natural language text, each word group comprising one or more natural language words; associating an object represented by each word group with a concept of an ontology; identifying, within the natural language text, a second plurality of word groups, wherein each word group of the second plurality of word groups is associated with the concept of the ontology; responsive to receiving a confirmation that a word group of the second plurality of word groups represents an object associated with the concept of the ontology, modifying a parameter of a classification model that produces a value reflecting a degree of association of a given object with the concept of the ontology.


Patent
Abbyy | Date: 2015-12-16

The current application is directed to a method and system for automatically determining the sense orientation of regions of scanned-document images. In one implementation, the sense-orientation method and system to which the current application is directed employs a relatively small set of orientation characters that occur frequently in printed text. In this implementation, for at least one set of orientation characters, each of two or more different orientations of character-containing subregions within a text-containing region of a scanned-document image are compared to each orientation character in the at least one set of orientation characters in order to determine an orientation for each of the character-containing subregions with respect to a reference orientation of the text-containing region. The determined orientations for the character-containing subregions are then used to determine an overall sense orientation for the text-containing region of the scanned-document image.


The invention describes a system and method for creating a comparable corpus by obtaining a set of source documents containing text, constructing language-independent semantic structures for at least one sentence of each of the texts in the source documents; determining universal similarity measures for groups of the source documents by comparing the constructed language-independent semantic structures of the texts in the source documents; identifying sets of similar documents based on the determined universal similarity measures for the groups of the source documents; and creating the comparable corpus based on the identified sets of similar documents.


Patent
Abbyy | Date: 2016-06-26

Disclosed are systems, computer-readable mediums, and methods for determining that text contains Chinese, Japanese, or Korean characters. One method includes determining a language hypothesis for each text fragment in a plurality of text fragments identified from connected components in a document image. The method further includes selecting a first subset of text fragments from the plurality of text fragments based on ratings for the language hypothesis of each text fragment in the plurality of text fragments. The method further includes verifying, by a processor, the language hypothesis of one or more text fragments in the first subset of text fragments based on optical character recognition of the one or more text fragments. The method further includes determining, by the processor, that Chinese, Japanese, or Korean (CJK) characters are present in the document image based on the verification of the language hypothesis of each of the one or more text fragments.


There is disclosed a method of determining a document type associated with a digital document, the method executable by an electronic device. A processor of the electronic device is configured to execute a plurality of machine learning algorithm (MLA) classifiers, each of the plurality of MLA classifiers having been trained to identify a specific document type. The plurality of MLA classifiers is ranked in a hierarchical order of execution of the plurality of MLA classifiers. A method of training the plurality of MLA classifiers is also disclosed.

Loading ABBYY collaborators
Loading ABBYY collaborators