MonadicLatentSemanticAnalysis


This package provides monad-like implementation for for the following Latent Semantic Analysis (LSA) main sequence of steps :

  1. ingesting a collection of documents;
  2. creating a document-term matrix (linear vector space representation);
  3. facilitating term-paragraph matrix creation or other breakdowns;
  4. apply different type of term-weighting functions;
  5. extract topics using NNMF (or SVD) with required parameters;
  6. provide topic interpretation;
  7. produce corresponding statistical thesauri;
  8. provide different statistics over the document collection.

  (* Get text data. *)
  speeches = ResourceData[ResourceObject["Presidential Nomination Acceptance Speeches"]];
  texts = Normal[speeches[[All, "Text"]]];

  (* Run the main processing pipeline. *)
  res =
    LSAMonUnit[texts]⟹
    LSAMonMakeDocumentTermMatrix[{}, Automatic]⟹
    LSAMonApplyTermWeightFunctions[]⟹
    LSAMonExtractTopics["MinNumberOfDocumentsPerTerm" -> 5, "NumberOfTopics" -> 20, Method -> "NNMF", "MaxSteps" -> 6, "PrintProfilingInfo" -> True];

  (* Show statistical thesaurus in two different ways. *)
  res⟹
    LSAMonExtractStatisticalThesaurus[{"arms", "banking", "economy", "education", "freedom", "tariff", "welfare"}, 6]⟹
    LSAMonRetrieveFromContext["statisticalThesaurus"]⟹
    LSAMonEchoValue⟹
    LSAMonEchoStatisticalThesaurusTable[];