Course introduction
Modern experimental platforms generate large sets of often noisy data that requires its processing by appropriate analytic and statistical methods. High-confidence data interpretation is built upon correct application of methods such as statistical models and pattern recognition. Furthermore, proper visualization of the results helps presenting and understanding the results. This course introduces the students to the main concepts of biostatistics, data analysis and visualization, so they understand the principles to design and apply work flows that handle a certain data type. The course will have a theoretical and a practical part, with the objective to provide general understanding of data analysis and application of bioinformatics tools.
Among currently available software suits, the R scripting language became very popular to deal with biostatistics and analysis of large data sets, as it (i) provides a vast number of statistical tools, (ii) allows adaptation of the analysis to any experimental design, (iii) offers simple commands to operate on entire data sets, (iv) provides a wide range of methods for data visualization and (v) has a large and active community of researchers developing new tools. However, it requires the user to acquire scripting skills to take advantage of the many features.
The course will introduce the students to basic programming of R scripts, data visualization and basic statistical models necessary to deal with data from modern high-throughput experiments.
Content
The following main topics are contained in the course:
- basic probability
- different types of data modeling
- basic statistical models
- data visualization
- data interpretation
- basic multi-variate analysis
Prerequisites
Students taking the course are expected to:
- Have knowledge in statistics
- Understand the basic principles of molecular biology
Learning outcomes
The learning objectives of the course are that the student demonstrates the ability to:
- independently analyze biological data sets.
- work with large data amounts and carry out standard statistical analysis to identify relevant features.
- use standard algorithms for multi-variate analysis
- design scripts for detailed visualization of their results.
- apply tools for data interpretation.
- know how to objectively discuss applied data analysis methods presented e.g. in publications.
Files/Documents
ISCED Categories