Document clustering software tools | Information retrieval
Document clustering tools aim to group documents into subjects for easier management of large unordered lists of results. Clustering is mostly performed by the use of MeSH terms, UMLS dictionaries, GO terms, titles, affiliations, keywords, authors, standard vocabularies, extracted terms or any combination of the aforementioned, including semantic annotation.
Facilitates collaborative, targeted proteomics projects that use the Skyline software. Panorama provides assistance for storing and arranging curated results in Skyline documents. These curated results can be aggregated and exported as chromatogram libraries. This software is delivered as part of LabKey Server, an open-source biomedical research data management system and can be installed for a local work.
A web-based tool that processes articles retrieved from PubMed and displays multiple aspects of the articles to the user, according to pre-defined categories such as the "most important" words found in titles or abstracts; topics; journals; authors; publication years; and affiliations. Anne O'Tate is an integrated, generic tool for summarization, drill-down and browsing of PubMed search results that accommodates a wide range of biomedical users and needs.
Shares, validates, and documents mock community data resources. mockrobiota includes data set and sample metadata, expected composition data, and links to raw data for each mock community data set. It does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. The tool currently requires expected observation data in the form of sequence annotations, e.g., taxonomy or gene annotations, but also references sequences in the form of accession numbers.
Resolves pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models. DPClass could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. It gains even better accuracy by only using top-20 discriminative patterns.
Contains a semisupervised multi-instance (MI) learning method for multilabel classification. MLTM is a multilabel topic model that discovers a set of M topics (latent structure in the data) and predicts bag-level and instance-level class labels by learning the associations between class labels and the discovered topics. It uses Markov chain Monte Carlo (MCMC) methods for inference. MCMC methods are typically serial and generally do not scale well to large data sets.
An approach based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation.
Offers a platform dedicated to cluster editing of experimental information. PEACE merges several methods and algorithms including data reduction techniques and lower/upper bounds. It is a standalone software, based on NP-complete cluster editing problem approach, that is able to solve instances on large graphs with several thousands of edge modifications.