Results 1 to 8 of 8
  1. #1

    Default Morphological Analysis


    Hi there,

    I just wanna ask on the proper way of english word stemming and as to how affixes are being separated from the stem (root word) and recognized ready for word translation to other language using java.

    Thanx in advance

  2. #2
    you will deal with string built-in functions or string libraries.
    You should have list of root words stored into memory or database.

  3. #3
    Quote Originally Posted by sogate View Post
    Hi there,

    I just wanna ask on the proper way of english word stemming and as to how affixes are being separated from the stem (root word) and recognized ready for word translation to other language using java.

    Thanx in advance
    try one of the following stemming algorithms, the best solution would be to combine stemming algorithms to get you the root word.

    1) Use brute force - In java create a hastable, map or a hashmap (you decide the implementation choice). It will contain your root word as your map/hashmap/hashtable key (of course this would be unique). Your values can be a collection/array of stemmed words (i.e. key is 'run', your values would be a collection - runs, running, ran, ... etc).

    2) Use suffix stripping - in the case of 'running' .. prune the string to take out 'ing' (i.e. running - runn, eating - eat). Once you get the remainder, use the brute force algorithm above.

    That's the easiest algorithm I can think of, other algorithms you can use is lemmatisation algorithms and stochastic algorithms.

    Better yet, I suggest if you wan't to pursue this project, try to look at libtranslate, you may try to download the source code and look at its implementation. Although this is in C/C++, you can still use its algorithm to study. Its simple enough to provide your own natural language conversion plugin to convert whole paragraphs/documents from English->Cebuano and vice versa.

    http://www.nongnu.org/libtranslate

    Or if you really wan't to dig deeper into Natural Language Processing ... get the NLTK (Natural Language Toolkit) which was written in Python - http://www.nltk.org

    Cheers!
    Last edited by kolz; 07-08-2009 at 10:30 AM.

  4. #4

  5. #5
    ok mga bay, will try to take a look at your suggested algorithms and generators. pero kani ba nga mga generators built-in sa java? gusto man gud nako sa java lang tanan mga bay.

  6. #6
    Just wan't to make a point that NLP (Natural Language Processing) is different from Formal Language Processing (or more commonly known as Computer Language Processing).

    Natural Language refers to written and spoken language (like English, Cebuano, Tagalog ... etc.) and Formal Language/Computer Language Processing refers to C++, Perl, and what-have-you-computer-language ... etc.

    The tools suggested by eax will refer to Formal Language, where you need parser and parser generators (lex, yacc, bison, etc....). These tools are used for crafting your own compiler, interpreter, etc. They target machine language (computer language). Although sometimes with regards to parsing, the concepts can overlap. But they are two entirely different set of tools altogether, and should not be confused with NLP.

    For NLP, which the TS needs ... try using OpenNLP which is a set of Java tools for Natural Language Processing ... The OpenNLP Homepage

    To Qoute:

    "OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects. Click here to see the current list of OpenNLP projects. We'll also try to keep a fairly up-to-date list of useful links related to NLP software in general.

    OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package. To start using these tools download the latest release here, and check out the OpenNLP Tools API. For the latest news about these tools and to participate in discussions, check out OpenNLP's Sourceforge project page. "

  7. #7
    Also checkout the Stanford University Natural Language Processing lab ...

    The Stanford NLP (Natural Language Processing) Group


    They have an online version of their Java NLP at works ...
    http://nlp.stanford.edu:8080/parser/index.jsp
    Last edited by kolz; 07-09-2009 at 10:29 AM.

  8. #8
    yep thanx for the clarification, kolz. that's why I was asking if the suggested generator is a java-built-in one because it seems that it's another language parsing generator independent of netbeans. but at least the idea is there--the way words are parsed in those suggested generators can be used with netbeans, i think.

  9.    Advertisement

Similar Threads

 
  1. IMEI NUmber Analysis
    By chanbri in forum Gizmos & Gadgets (Old)
    Replies: 9
    Last Post: 06-09-2011, 07:23 PM
  2. Prof. Jose Maria Sison's analysis...Interesting read!
    By JoRed in forum General Discussions
    Replies: 245
    Last Post: 03-19-2010, 12:12 PM
  3. a cost analysis of windows vista..
    By StyM in forum Software & Games (Old)
    Replies: 0
    Last Post: 01-24-2007, 10:11 AM
  4. X1900 Launch Analysis by NVIDIA
    By fish in forum Computer Hardware
    Replies: 10
    Last Post: 02-18-2006, 03:18 AM
  5. Analysis...From Political Crises to a Revolutionary Situation
    By JoRed in forum Politics & Current Events
    Replies: 1
    Last Post: 07-04-2005, 11:38 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
about us
We are the first Cebu Online Media.

iSTORYA.NET is Cebu's Biggest, Southern Philippines' Most Active, and the Philippines' Strongest Online Community!
follow us
#top