Porter Stemmer

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Porter Stemmer

Philip Bennefall
Hi all,

Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/

?

I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL?

Kind regards,

Philip Bennefall
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Richard Hipp-3
On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]>wrote:

> Hi all,
>
> Is the algorithm used in the stemming tokenizer in SqLite's fts extension
> equivalent to the C implementation found at
> http://tartarus.org/~martin/PorterStemmer/
>

The built-in Porter stemmer is a copy/paste from the above link.



>
> ?
>
> I am asking this because some sources say that there are improved versions
> of this algorithm released much later than 2000/2001. Does SqLite's
> implementation differ in any significant ways from the C implementation
> found at the above URL?
>
> Kind regards,
>
> Philip Bennefall
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Philip Bennefall
Thanks, Richard. That's good to know because I am trying to decide whether to add a new tokenizer with some custom processing, as opposed to using the built in stemmer.

Kind regards,

Philip Bennefall
  ----- Original Message -----
  From: Richard Hipp
  To: [hidden email] ; General Discussion of SQLite Database
  Sent: Friday, June 15, 2012 1:03 PM
  Subject: Re: [sqlite] Porter Stemmer





  On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]> wrote:

    Hi all,

    Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/


  The built-in Porter stemmer is a copy/paste from the above link.

   

    ?

    I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL?

    Kind regards,

    Philip Bennefall
    _______________________________________________
    sqlite-users mailing list
    [hidden email]
    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




  --
  D. Richard Hipp
  [hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Philip Bennefall
In reply to this post by Richard Hipp-3
I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the "rebuild" command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations?

Kind regards,

Philip Bennefall
  ----- Original Message -----
  From: Richard Hipp
  To: [hidden email] ; General Discussion of SQLite Database
  Sent: Friday, June 15, 2012 1:03 PM
  Subject: Re: [sqlite] Porter Stemmer





  On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]> wrote:

    Hi all,

    Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/


  The built-in Porter stemmer is a copy/paste from the above link.

   

    ?

    I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL?

    Kind regards,

    Philip Bennefall
    _______________________________________________
    sqlite-users mailing list
    [hidden email]
    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




  --
  D. Richard Hipp
  [hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Richard Hipp-3
On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall <[hidden email]>wrote:

> I had another quick question. If I have built an fts table using the
> stemmer tokenizer, and then I later decide that I want to change to the
> simple one, is there an easy way to do this? I see the "rebuild" command,
> can I somehow tell that to change the tokenizer as well? I see the
> reference to custom ones, but what about the internal implementations?
>

If you change your tokenizer, you need to retokenize all of the source text.



>
> Kind regards,
>
> Philip Bennefall
>  ----- Original Message -----
>  From: Richard Hipp
>  To: [hidden email] ; General Discussion of SQLite Database
>  Sent: Friday, June 15, 2012 1:03 PM
>  Subject: Re: [sqlite] Porter Stemmer
>
>
>
>
>
>   On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]>
> wrote:
>
>    Hi all,
>
>    Is the algorithm used in the stemming tokenizer in SqLite's fts
> extension equivalent to the C implementation found at
> http://tartarus.org/~martin/PorterStemmer/
>
>
>  The built-in Porter stemmer is a copy/paste from the above link.
>
>
>
>    ?
>
>    I am asking this because some sources say that there are improved
> versions of this algorithm released much later than 2000/2001. Does
> SqLite's implementation differ in any significant ways from the C
> implementation found at the above URL?
>
>    Kind regards,
>
>    Philip Bennefall
>    _______________________________________________
>    sqlite-users mailing list
>    [hidden email]
>    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>
>  --
>  D. Richard Hipp
>  [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Philip Bennefall
I understand that, but let's say that I already have a virtual fts table created that I set to use the Porter tokenizer, how then would I go about rebuilding and retokenizing this table with the simple tokenizer at a later time? Would I need to create an entirely new table? What I'm wondering is basically how I might take an existing fts virtual table, change its tokenizer and then rebuild the index?

Kind regards,

Philip Bennefall
  ----- Original Message -----
  From: Richard Hipp
  To: [hidden email] ; General Discussion of SQLite Database
  Sent: Friday, June 15, 2012 3:14 PM
  Subject: Re: [sqlite] Porter Stemmer





  On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall <[hidden email]> wrote:

    I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the "rebuild" command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations?


  If you change your tokenizer, you need to retokenize all of the source text.

   

    Kind regards,

    Philip Bennefall
     ----- Original Message -----
     From: Richard Hipp
     To: [hidden email] ; General Discussion of SQLite Database
     Sent: Friday, June 15, 2012 1:03 PM
     Subject: Re: [sqlite] Porter Stemmer






     On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]> wrote:

       Hi all,

       Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/


     The built-in Porter stemmer is a copy/paste from the above link.



       ?

       I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL?

       Kind regards,

       Philip Bennefall
       _______________________________________________
       sqlite-users mailing list
       [hidden email]
       http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




     --
     D. Richard Hipp
     [hidden email]
    _______________________________________________
    sqlite-users mailing list
    [hidden email]
    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




  --
  D. Richard Hipp
  [hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Richard Hipp-3
On Fri, Jun 15, 2012 at 9:26 AM, Philip Bennefall <[hidden email]>wrote:

> I understand that, but let's say that I already have a virtual fts table
> created that I set to use the Porter tokenizer, how then would I go about
> rebuilding and retokenizing this table with the simple tokenizer at a later
> time? Would I need to create an entirely new table? What I'm wondering is
> basically how I might take an existing fts virtual table, change its
> tokenizer and then rebuild the index?
>

Yes.  You'll need to DROP or RENAME the original table, then CREATE the new
one.


>
> Kind regards,
>
> Philip Bennefall
>  ----- Original Message -----
>  From: Richard Hipp
>  To: [hidden email] ; General Discussion of SQLite Database
>   Sent: Friday, June 15, 2012 3:14 PM
>  Subject: Re: [sqlite] Porter Stemmer
>
>
>
>
>
>  On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall <[hidden email]>
> wrote:
>
>    I had another quick question. If I have built an fts table using the
> stemmer tokenizer, and then I later decide that I want to change to the
> simple one, is there an easy way to do this? I see the "rebuild" command,
> can I somehow tell that to change the tokenizer as well? I see the
> reference to custom ones, but what about the internal implementations?
>
>
>  If you change your tokenizer, you need to retokenize all of the source
> text.
>
>
>
>    Kind regards,
>
>    Philip Bennefall
>     ----- Original Message -----
>     From: Richard Hipp
>     To: [hidden email] ; General Discussion of SQLite Database
>     Sent: Friday, June 15, 2012 1:03 PM
>     Subject: Re: [sqlite] Porter Stemmer
>
>
>
>
>
>
>     On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]>
> wrote:
>
>       Hi all,
>
>       Is the algorithm used in the stemming tokenizer in SqLite's fts
> extension equivalent to the C implementation found at
> http://tartarus.org/~martin/PorterStemmer/
>
>
>     The built-in Porter stemmer is a copy/paste from the above link.
>
>
>
>       ?
>
>       I am asking this because some sources say that there are improved
> versions of this algorithm released much later than 2000/2001. Does
> SqLite's implementation differ in any significant ways from the C
> implementation found at the above URL?
>
>       Kind regards,
>
>       Philip Bennefall
>       _______________________________________________
>       sqlite-users mailing list
>       [hidden email]
>       http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>
>     --
>     D. Richard Hipp
>     [hidden email]
>    _______________________________________________
>    sqlite-users mailing list
>    [hidden email]
>    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>
>  --
>  D. Richard Hipp
>  [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Porter Stemmer

Philip Bennefall
Understood. Thank you very much for your quick help. Now I have all the information I need to get coding. And thanks once again for a great library!

Kind regards,

Philip Bennefall
  ----- Original Message -----
  From: Richard Hipp
  To: [hidden email] ; General Discussion of SQLite Database
  Sent: Friday, June 15, 2012 3:39 PM
  Subject: Re: [sqlite] Porter Stemmer





  On Fri, Jun 15, 2012 at 9:26 AM, Philip Bennefall <[hidden email]> wrote:

    I understand that, but let's say that I already have a virtual fts table created that I set to use the Porter tokenizer, how then would I go about rebuilding and retokenizing this table with the simple tokenizer at a later time? Would I need to create an entirely new table? What I'm wondering is basically how I might take an existing fts virtual table, change its tokenizer and then rebuild the index?


  Yes.  You'll need to DROP or RENAME the original table, then CREATE the new one.
   

    Kind regards,

    Philip Bennefall
     ----- Original Message -----
     From: Richard Hipp
     To: [hidden email] ; General Discussion of SQLite Database

     Sent: Friday, June 15, 2012 3:14 PM
     Subject: Re: [sqlite] Porter Stemmer





     On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall <[hidden email]> wrote:

       I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the "rebuild" command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations?


     If you change your tokenizer, you need to retokenize all of the source text.



       Kind regards,

       Philip Bennefall
        ----- Original Message -----
        From: Richard Hipp
        To: [hidden email] ; General Discussion of SQLite Database
        Sent: Friday, June 15, 2012 1:03 PM
        Subject: Re: [sqlite] Porter Stemmer






        On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall <[hidden email]> wrote:

          Hi all,

          Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/


        The built-in Porter stemmer is a copy/paste from the above link.



          ?

          I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL?

          Kind regards,

          Philip Bennefall
          _______________________________________________
          sqlite-users mailing list
          [hidden email]
          http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




        --
        D. Richard Hipp
        [hidden email]
       _______________________________________________
       sqlite-users mailing list
       [hidden email]
       http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




     --
     D. Richard Hipp
     [hidden email]
    _______________________________________________
    sqlite-users mailing list
    [hidden email]
    http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




  --
  D. Richard Hipp
  [hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users