To report on a personal selection of stand-alone modules which offer stop word lists.
| Module names are read from data/module.list.ini, which is shipped with the distro. |
| Each module's data has an indicator - 'include = Yes/No' - which makes it easy to edit & re-run. |
| But, because each included module has a different mechanism for returning the list of words, their names are also hard-coded in the source code. |
| Excluded modules are listed at the end of this report. |
|
Module
|
Version
|
|
1.01
|
| Name | Package | Version | Word count |
| 1: Lingua::EN::StopWordList | Lingua::EN::StopWordList | 1.00 | 659 |
| 2: Lingua::EN::StopWords | Lingua::EN::Segmenter | 0.1 | 213 |
| 3: Lingua::StopWords | Lingua::StopWords | 0.09 | 174 |
| Name | Package | Version | Word count |
| Id | Lingua::EN::StopWordList | Lingua::EN::StopWords | Lingua::StopWords |
| 1 | a | a | a |
| 2 | a's | ||
| 3 | able | ||
| 4 | about | about | about |
| 5 | above | above | above |
| 6 | abroad | ||
| 7 | according | ||
| 8 | accordingly | ||
| 9 | across | across | |
| 10 | actually | ||
| 11 | adj | adj | |
| 12 | after | after | after |
| 13 | afterwards | ||
| 14 | again | again | again |
| 15 | against | against | against |
| 16 | ago | ||
| 17 | ahead | ||
| 18 | ain't | ||
| 19 | all | all | all |
| 20 | allow | ||
| 21 | allows | ||
| 22 | almost | almost | |
| 23 | alone | alone | |
| 24 | along | along | |
| 25 | alongside | ||
| 26 | already | ||
| 27 | also | also | |
| 28 | although | although | |
| 29 | always | always | |
| 30 | am | am | am |
| 31 | amid | ||
| 32 | amidst | ||
| 33 | among | among | |
| 34 | amongst | ||
| 35 | an | an | an |
| 36 | and | and | and |
| 37 | another | another | |
| 38 | any | any | any |
| 39 | anybody | anybody | |
| 40 | anyhow | ||
| 41 | anyone | anyone | |
| 42 | anything | anything | |
| 43 | anyway | ||
| 44 | anyways | ||
| 45 | anywhere | anywhere | |
| 46 | apart | apart | |
| 47 | appear | ||
| 48 | appreciate | ||
| 49 | appropriate | ||
| 50 | are | are | are |
| 51 | aren't | aren't | |
| 52 | around | around | |
| 53 | as | as | as |
| 54 | aside | aside | |
| 55 | ask | ||
| 56 | asking | ||
| 57 | associated | ||
| 58 | at | at | at |
| 59 | available | ||
| 60 | away | away | |
| 61 | awfully | ||
| 62 | b | ||
| 63 | back | ||
| 64 | backward | ||
| 65 | backwards | ||
| 66 | be | be | be |
| 67 | became | ||
| 68 | because | because | because |
| 69 | become | ||
| 70 | becomes | ||
| 71 | becoming | ||
| 72 | been | been | been |
| 73 | before | before | before |
| 74 | beforehand | ||
| 75 | begin | ||
| 76 | behind | behind | |
| 77 | being | being | being |
| 78 | believe | ||
| 79 | below | below | below |
| 80 | beside | ||
| 81 | besides | besides | |
| 82 | best | ||
| 83 | better | ||
| 84 | between | between | between |
| 85 | beyond | beyond | |
| 86 | both | both | both |
| 87 | brief | ||
| 88 | but | but | but |
| 89 | by | by | by |
| 90 | c | ||
| 91 | c'mon | ||
| 92 | c's | ||
| 93 | came | ||
| 94 | can | can | |
| 95 | can't | can't | |
| 96 | cannot | cannot | cannot |
| 97 | cant | ||
| 98 | caption | ||
| 99 | cause | ||
| 100 | causes | ||
| 101 | certain | ||
| 102 | certainly | ||
| 103 | changes | ||
| 104 | clearly | ||
| 105 | co | ||
| 106 | co. | ||
| 107 | com | ||
| 108 | come | ||
| 109 | comes | ||
| 110 | concerning | ||
| 111 | consequently | ||
| 112 | consider | ||
| 113 | considering | ||
| 114 | contain | ||
| 115 | containing | ||
| 116 | contains | ||
| 117 | corresponding | ||
| 118 | could | could | could |
| 119 | couldn't | couldn't | |
| 120 | course | ||
| 121 | currently | ||
| 122 | d | ||
| 123 | dare | ||
| 124 | daren't | ||
| 125 | deep | ||
| 126 | definitely | ||
| 127 | described | ||
| 128 | despite | ||
| 129 | did | did | did |
| 130 | didn't | didn't | |
| 131 | different | ||
| 132 | directly | ||
| 133 | do | do | do |
| 134 | does | does | does |
| 135 | doesn't | doesn't | |
| 136 | doing | doing | doing |
| 137 | don't | don't | |
| 138 | done | done | |
| 139 | down | down | down |
| 140 | downwards | downwards | |
| 141 | during | during | during |
| 142 | e | ||
| 143 | each | each | each |
| 144 | edu | ||
| 145 | eg | ||
| 146 | eight | ||
| 147 | eighty | ||
| 148 | either | either | |
| 149 | else | else | |
| 150 | elsewhere | ||
| 151 | end | ||
| 152 | ending | ||
| 153 | enough | enough | |
| 154 | entirely | ||
| 155 | especially | ||
| 156 | et | ||
| 157 | etc | etc | |
| 158 | even | even | |
| 159 | ever | ever | |
| 160 | evermore | ||
| 161 | every | every | |
| 162 | everybody | everybody | |
| 163 | everyone | everyone | |
| 164 | everything | ||
| 165 | everywhere | ||
| 166 | ex | ||
| 167 | exactly | ||
| 168 | example | ||
| 169 | except | except | |
| 170 | f | ||
| 171 | fairly | ||
| 172 | far | far | |
| 173 | farther | ||
| 174 | few | few | few |
| 175 | fewer | ||
| 176 | fifth | ||
| 177 | first | ||
| 178 | five | ||
| 179 | followed | ||
| 180 | following | ||
| 181 | follows | ||
| 182 | for | for | for |
| 183 | forever | ||
| 184 | former | ||
| 185 | formerly | ||
| 186 | forth | forth | |
| 187 | forward | ||
| 188 | found | ||
| 189 | four | ||
| 190 | from | from | from |
| 191 | further | further | |
| 192 | furthermore | ||
| 193 | g | ||
| 194 | get | get | |
| 195 | gets | gets | |
| 196 | getting | ||
| 197 | given | ||
| 198 | gives | ||
| 199 | go | ||
| 200 | goes | ||
| 201 | going | ||
| 202 | gone | ||
| 203 | got | got | |
| 204 | gotten | ||
| 205 | greetings | ||
| 206 | h | ||
| 207 | had | had | had |
| 208 | hadn't | hadn't | |
| 209 | half | ||
| 210 | happens | ||
| 211 | hardly | hardly | |
| 212 | has | has | has |
| 213 | hasn't | hasn't | |
| 214 | have | have | have |
| 215 | haven't | haven't | |
| 216 | having | having | having |
| 217 | he | he | |
| 218 | he'd | he'd | |
| 219 | he'll | he'll | |
| 220 | he's | he's | |
| 221 | hello | ||
| 222 | help | ||
| 223 | hence | ||
| 224 | her | her | her |
| 225 | here | here | here |
| 226 | here's | here's | |
| 227 | hereafter | ||
| 228 | hereby | ||
| 229 | herein | ||
| 230 | hereupon | ||
| 231 | hers | hers | |
| 232 | herself | herself | herself |
| 233 | hi | ||
| 234 | him | him | him |
| 235 | himself | himself | himself |
| 236 | his | his | his |
| 237 | hither | ||
| 238 | hopefully | ||
| 239 | how | how | how |
| 240 | how's | ||
| 241 | howbeit | ||
| 242 | however | however | |
| 243 | hundred | ||
| 244 | i | i | i |
| 245 | i'd | i'd | |
| 246 | i'll | i'll | |
| 247 | i'm | i'm | |
| 248 | i've | i've | |
| 249 | ie | ||
| 250 | if | if | if |
| 251 | ignored | ||
| 252 | immediate | ||
| 253 | in | in | in |
| 254 | inasmuch | ||
| 255 | inc | ||
| 256 | inc. | ||
| 257 | indeed | indeed | |
| 258 | indicate | ||
| 259 | indicated | ||
| 260 | indicates | ||
| 261 | inner | ||
| 262 | inside | ||
| 263 | insofar | ||
| 264 | instead | instead | |
| 265 | into | into | into |
| 266 | inward | inward | |
| 267 | is | is | is |
| 268 | isn't | isn't | |
| 269 | it | it | it |
| 270 | it'd | ||
| 271 | it'll | ||
| 272 | it's | it's | |
| 273 | its | its | its |
| 274 | itself | itself | itself |
| 275 | j | ||
| 276 | just | just | |
| 277 | k | ||
| 278 | keep | ||
| 279 | keeps | ||
| 280 | kept | kept | |
| 281 | know | ||
| 282 | known | ||
| 283 | knows | ||
| 284 | l | ||
| 285 | last | ||
| 286 | lately | ||
| 287 | later | ||
| 288 | latter | ||
| 289 | latterly | ||
| 290 | least | ||
| 291 | less | ||
| 292 | lest | ||
| 293 | let | ||
| 294 | let's | let's | |
| 295 | like | ||
| 296 | liked | ||
| 297 | likely | ||
| 298 | likewise | ||
| 299 | little | ||
| 300 | look | ||
| 301 | looking | ||
| 302 | looks | ||
| 303 | low | ||
| 304 | lower | ||
| 305 | ltd | ||
| 306 | m | ||
| 307 | made | ||
| 308 | mainly | ||
| 309 | make | ||
| 310 | makes | ||
| 311 | many | many | |
| 312 | may | ||
| 313 | maybe | maybe | |
| 314 | mayn't | ||
| 315 | me | me | |
| 316 | mean | ||
| 317 | meantime | ||
| 318 | meanwhile | ||
| 319 | merely | ||
| 320 | might | might | |
| 321 | mightn't | ||
| 322 | mine | mine | |
| 323 | minus | ||
| 324 | miss | ||
| 325 | more | more | more |
| 326 | moreover | ||
| 327 | most | most | most |
| 328 | mostly | mostly | |
| 329 | mr | ||
| 330 | mrs | ||
| 331 | much | much | |
| 332 | must | must | |
| 333 | mustn't | mustn't | |
| 334 | my | my | |
| 335 | myself | myself | myself |
| 336 | n | ||
| 337 | name | ||
| 338 | namely | ||
| 339 | nd | ||
| 340 | near | near | |
| 341 | nearly | ||
| 342 | necessary | ||
| 343 | need | ||
| 344 | needn't | ||
| 345 | needs | ||
| 346 | neither | neither | |
| 347 | never | ||
| 348 | neverf | ||
| 349 | neverless | ||
| 350 | nevertheless | ||
| 351 | new | ||
| 352 | next | next | |
| 353 | nine | ||
| 354 | ninety | ||
| 355 | no | no | no |
| 356 | no-one | ||
| 357 | nobody | nobody | |
| 358 | non | ||
| 359 | none | none | |
| 360 | nonetheless | ||
| 361 | noone | ||
| 362 | nor | nor | nor |
| 363 | normally | ||
| 364 | not | not | not |
| 365 | nothing | nothing | |
| 366 | notwithstanding | ||
| 367 | novel | ||
| 368 | now | ||
| 369 | nowhere | nowhere | |
| 370 | o | ||
| 371 | obviously | ||
| 372 | of | of | of |
| 373 | off | off | off |
| 374 | often | often | |
| 375 | oh | ||
| 376 | ok | ||
| 377 | okay | ||
| 378 | old | ||
| 379 | on | on | on |
| 380 | once | once | |
| 381 | one | ||
| 382 | one's | ||
| 383 | ones | ||
| 384 | only | only | only |
| 385 | onto | onto | |
| 386 | opposite | ||
| 387 | or | or | or |
| 388 | other | other | other |
| 389 | others | others | |
| 390 | otherwise | ||
| 391 | ought | ought | ought |
| 392 | oughtn't | ||
| 393 | our | our | our |
| 394 | ours | ours | ours |
| 395 | ourselves | ourselves | |
| 396 | out | out | out |
| 397 | outside | outside | |
| 398 | over | over | over |
| 399 | overall | ||
| 400 | own | own | own |
| 401 | p | p | |
| 402 | particular | ||
| 403 | particularly | ||
| 404 | past | ||
| 405 | per | per | |
| 406 | perhaps | ||
| 407 | placed | ||
| 408 | please | please | |
| 409 | plus | plus | |
| 410 | possible | ||
| 411 | pp | ||
| 412 | presumably | ||
| 413 | probably | ||
| 414 | provided | ||
| 415 | provides | ||
| 416 | q | ||
| 417 | que | ||
| 418 | quite | quite | |
| 419 | qv | ||
| 420 | r | ||
| 421 | rather | rather | |
| 422 | rd | ||
| 423 | re | ||
| 424 | really | really | |
| 425 | reasonably | ||
| 426 | recent | ||
| 427 | recently | ||
| 428 | regarding | ||
| 429 | regardless | ||
| 430 | regards | ||
| 431 | relatively | ||
| 432 | respectively | ||
| 433 | right | ||
| 434 | round | ||
| 435 | s | ||
| 436 | said | said | |
| 437 | same | same | |
| 438 | saw | ||
| 439 | say | ||
| 440 | saying | ||
| 441 | says | ||
| 442 | second | ||
| 443 | secondly | ||
| 444 | see | ||
| 445 | seeing | ||
| 446 | seem | seem | |
| 447 | seemed | ||
| 448 | seeming | ||
| 449 | seems | ||
| 450 | seen | ||
| 451 | self | self | |
| 452 | selves | selves | |
| 453 | sensible | ||
| 454 | sent | ||
| 455 | serious | ||
| 456 | seriously | ||
| 457 | seven | ||
| 458 | several | several | |
| 459 | shall | shall | |
| 460 | shan't | shan't | |
| 461 | she | she | she |
| 462 | she'd | she'd | |
| 463 | she'll | she'll | |
| 464 | she's | she's | |
| 465 | should | should | should |
| 466 | shouldn't | shouldn't | |
| 467 | since | since | |
| 468 | six | ||
| 469 | so | so | so |
| 470 | some | some | some |
| 471 | somebody | somebody | |
| 472 | someday | ||
| 473 | somehow | ||
| 474 | someone | ||
| 475 | something | ||
| 476 | sometime | ||
| 477 | sometimes | ||
| 478 | somewhat | somewhat | |
| 479 | somewhere | ||
| 480 | soon | ||
| 481 | sorry | ||
| 482 | specified | ||
| 483 | specify | ||
| 484 | specifying | ||
| 485 | still | still | |
| 486 | sub | ||
| 487 | such | such | such |
| 488 | sup | ||
| 489 | sure | ||
| 490 | t | ||
| 491 | t's | ||
| 492 | take | ||
| 493 | taken | ||
| 494 | taking | ||
| 495 | tell | ||
| 496 | tends | ||
| 497 | th | ||
| 498 | than | than | than |
| 499 | thank | ||
| 500 | thanks | ||
| 501 | thanx | ||
| 502 | that | that | that |
| 503 | that'll | ||
| 504 | that's | that's | |
| 505 | that've | ||
| 506 | thats | ||
| 507 | the | the | the |
| 508 | their | their | their |
| 509 | theirs | theirs | theirs |
| 510 | them | them | them |
| 511 | themselves | themselves | themselves |
| 512 | then | then | then |
| 513 | thence | ||
| 514 | there | there | there |
| 515 | there'd | ||
| 516 | there'll | ||
| 517 | there're | ||
| 518 | there's | there's | |
| 519 | there've | ||
| 520 | thereafter | ||
| 521 | thereby | ||
| 522 | therefore | therefore | |
| 523 | therein | ||
| 524 | theres | ||
| 525 | thereupon | ||
| 526 | these | these | these |
| 527 | they | they | they |
| 528 | they'd | they'd | |
| 529 | they'll | they'll | |
| 530 | they're | they're | |
| 531 | they've | they've | |
| 532 | thing | ||
| 533 | things | ||
| 534 | think | ||
| 535 | third | ||
| 536 | thirty | ||
| 537 | this | this | this |
| 538 | thorough | thorough | |
| 539 | thoroughly | thoroughly | |
| 540 | those | those | those |
| 541 | though | ||
| 542 | three | ||
| 543 | through | through | through |
| 544 | throughout | ||
| 545 | thru | ||
| 546 | thus | thus | |
| 547 | till | ||
| 548 | to | to | to |
| 549 | together | together | |
| 550 | too | too | too |
| 551 | took | ||
| 552 | toward | toward | |
| 553 | towards | towards | |
| 554 | tried | ||
| 555 | tries | ||
| 556 | truly | ||
| 557 | try | ||
| 558 | trying | ||
| 559 | twice | ||
| 560 | two | ||
| 561 | u | ||
| 562 | un | ||
| 563 | under | under | under |
| 564 | underneath | ||
| 565 | undoing | ||
| 566 | unfortunately | ||
| 567 | unless | ||
| 568 | unlike | ||
| 569 | unlikely | ||
| 570 | until | until | until |
| 571 | unto | ||
| 572 | up | up | up |
| 573 | upon | upon | |
| 574 | upwards | ||
| 575 | us | ||
| 576 | use | ||
| 577 | used | ||
| 578 | useful | ||
| 579 | uses | ||
| 580 | using | ||
| 581 | usually | ||
| 582 | v | v | |
| 583 | value | ||
| 584 | various | ||
| 585 | versus | ||
| 586 | very | very | very |
| 587 | via | ||
| 588 | viz | ||
| 589 | vs | ||
| 590 | w | ||
| 591 | want | ||
| 592 | wants | ||
| 593 | was | was | was |
| 594 | wasn't | wasn't | |
| 595 | way | ||
| 596 | we | we | |
| 597 | we'd | we'd | |
| 598 | we'll | we'll | |
| 599 | we're | we're | |
| 600 | we've | we've | |
| 601 | welcome | ||
| 602 | well | well | |
| 603 | went | ||
| 604 | were | were | were |
| 605 | weren't | weren't | |
| 606 | what | what | what |
| 607 | what'll | ||
| 608 | what's | what's | |
| 609 | what've | ||
| 610 | whatever | whatever | |
| 611 | when | when | when |
| 612 | when's | ||
| 613 | whence | ||
| 614 | whenever | whenever | |
| 615 | where | where | where |
| 616 | where's | where's | |
| 617 | whereafter | ||
| 618 | whereas | ||
| 619 | whereby | ||
| 620 | wherein | ||
| 621 | whereupon | ||
| 622 | wherever | ||
| 623 | whether | whether | |
| 624 | which | which | which |
| 625 | whichever | ||
| 626 | while | while | while |
| 627 | whilst | ||
| 628 | whither | ||
| 629 | who | who | who |
| 630 | who'd | ||
| 631 | who'll | ||
| 632 | who's | who's | |
| 633 | whoever | ||
| 634 | whole | ||
| 635 | whom | whom | whom |
| 636 | whomever | ||
| 637 | whose | whose | |
| 638 | why | why | |
| 639 | why's | ||
| 640 | will | will | |
| 641 | willing | ||
| 642 | wish | ||
| 643 | with | with | with |
| 644 | within | within | |
| 645 | without | without | |
| 646 | won't | won't | |
| 647 | wonder | ||
| 648 | would | would | would |
| 649 | wouldn't | wouldn't | |
| 650 | x | ||
| 651 | y | ||
| 652 | yes | ||
| 653 | yet | yet | |
| 654 | you | you | |
| 655 | you'd | you'd | |
| 656 | you'll | you'll | |
| 657 | you're | you're | |
| 658 | you've | you've | |
| 659 | young | ||
| 660 | your | your | your |
| 661 | yours | yours | |
| 662 | yourself | yourself | yourself |
| 663 | yourselves | yourselves | |
| 664 | z | ||
| 665 | zero | ||
| Id | Lingua::EN::StopWordList | Lingua::EN::StopWords | Lingua::StopWords |
| Name | Package | Notes |
| 1: AI::Categorizer::Document | AI::Categorizer | Not stand-alone. User may provide a stopword list |
| 2: Blog::Spam::Plugin::stopwords | Blog::Spam | Not stand-alone. Uses hard-coded path to stopword file |
| 3: Combine::Matcher | combine | Not stand-alone. User must provide the stopwords in an (apparently) undocumented fashion |
| 4: DBIx::FullTextSearch::StopList | DBIx::FullTextSearch | Not stand-alone |
| 5: DBIx::TextIndex::StopList::cz | DBIx::TextIndex | Czech-language stop words |
| 6: Elastic::Manual::Analysis | Elastic::Model | Not stand-alone. User may provide a stopword list |
| 7: HTML::Index::Store | HTML::Index | Not stand-alone. User may provide a stopword list |
| 8: Image::WordCloud::StopWords::EN | Image::WordCloud | Not stand-alone |
| 9: KinoSearch1::Analysis::Stopalizer | KinoSearch1 | Not stand-alone |
| 10: KinoSearch::Analysis::Stopalizer | KinoSearch | Not stand-alone |
| 11: Lucy::Analysis::SnowballStopFilter | Lucy | Not stand-alone. Supports 13 languages |
| 12: Perl::Critic::Policy::Documentation::PodSpelling | Perl::Critic | Not stand-alone. Uses Pod::Spell |
| 13: Pod::Weaver::Plugin::StopWords | Pod::Weaver | Not stand-alone. User may provide a stopword list |
| 14: Pod::Wordlist | Pod::Spell | Not stand-alone. Built-in stopword list is Perl-specific |
| 15: Search::Glimpse::Index | Search::Glimpse | Not stand-alone. Also, requires a Glimpse server |
| 16: Search::Indexer::Incremental::MD5 | Search::Indexer::Incremental::MD5 | Not stand-alone. User may provide a stopword list, or use a built-in Perl-specific list |
| 17: Search::Tokenizer | Search::Tokenizer | Has a option to accept the Lingua::StopWords list |
| 18: Search::Tools::QueryParser | Search::Tools | Not stand-alone. Use may provide a stopword list |
| 19: Test::Spelling | Test::Spelling | Perl-specific words via Pod::Spell. User may add words |
| 20: Text::DeDuper | Text::DeDuper | User may provide a stopword list |
| 21: Text::Language::Guess | Text::Language::Guess | Uses Lingua::Stopwords |
| 22: Text::Similarity::Overlaps | Text::Similarity | Not stand-alone. User must provide a stopword file |
| 23: UMLS::SenseRelate::TargetWord | UMLS::SenseRelate | Not stand-alone. Has option to disregard an (apparently) undocumented list of stopwords |
| 24: WAIT::Filter | WAIT | Apparently contains a built-in list of freeWAIS-sf stopwords |
| 25: WordNet-Similarity | WordNet-Similarity | Not standalone. User may provide a stopword file |
| Name | Package | Notes |
| Modules are excluded if they are not stand-alone, |
| or if they require the user to supply the stopword list. |
| Lastly, modules are excluded if they use one of the other modules listed in this report. |
|
Author
|
|
|
Date
|
2012-08-20
|
|
OS
|
Debian V 6.0.4
|
|
Perl
|
5.14.2
|