Metadata-Version: 1.0
Name: guess-language
Version: 0.2
Summary: Guess the natural language of a text
Home-page: http://code.google.com/p/guess-language
Author: Kent Johnson
Author-email: kent3737@gmail.com
License: LGPL
Description: Attempts to determine the natural language of a selection of Unicode (utf-8) text.
        
        Based on `guesslanguage.cpp
        <http://websvn.kde.org/branches/work/sonnet-refactoring/common/nlp/guesslanguage.cpp?view=markup>`_
        by Jacob R Rideout for KDE which itself is based on
        `Language::Guess <http://languid.cantbedone.org/>`_ by Maciej Ceglowski.
        
        Detects over 60 languages - all languages listed in the `trigrams
        <http://code.google.com/p/guess-language/source/browse/trunk/guess_language/trigrams/>`_
        directory plus Japanese, Chinese, Korean and Greek.
        
        guess_language uses heuristics based on the character set and trigrams in a sample text
        to detect the language. It works better with longer samples and will be confused if
        the sample text includes markup such as HTML tags.
        
        Usage
        =====
        
        The main entry points all take a single string as input and return a language identifier.
        The string must be Unicode or UTF-8 text. The language identifer can be the language name
        in English, the two- or three-letter IANA language code, a language ID or a tuple containing
        all three codes.
        
        The primary entry points, and the return values, are as follows::
        
        guessLanguage(txt) - IANA language code
        guessLanguageTag(txt) - IANA language code (same as guessLanguage)
        guessLanguageName(txt) - Language name in English
        guessLanguageId(txt) - language ID
        guessLanguageInfo(txt) - tuple of (IANA code, id, name)
        
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Text Processing :: Linguistic
