Bo Segerman research group

Bioinformatic methods and tools for analysis of variability in large data sets with microbial whole-genome sequences

Whole-genome sequencing has become a standard analysis in microbiology research and also starts to be widely used in surveillance programs and in outbreak investigations. Very large amounts of data are generated. A typical whole genome sequence can be divided into a conserved “core genome” part and a variable "accessory genome" part that is only found in some isolates. The total gene set in the species/clade is called the pan-genome. A pan-genome can be used as a reference when analyzing variability in sequence data from a large number of isolates. Variability exists at several levels (for example: mutations in the core or accessory-genome, variability in gene content, variability in gene order, variability in repetitiveness, and variability in mobile genetic elements). Some of the variability in the genome data sets also have technical reasons.

The research is bioinformatically oriented and focuses primarily on methods for analyzing the large data sets generated in surveillance programs for infectious microorganis

Specific focus areas:

  • Bioinformatic methods to analyze different types of variability in the large microbiological sequence datasets.
  • Structured storage of data describing microbiological genome variability at different levels.
  • Analysis of variability in genomic regions that assemble poorly.
  • Distinguish between variability caused by technical errors and true evolutionary events.
  • Explore machine learning methods as tools for analyzing large datasets with microbial genomics data.
  • Implement methods/tools that can be used practically in microbial surveillance programs/outbreak investigations to visualize and interpret variability.

Last modified: 2022-02-28