Scripting and Data Analysis in R Language
Code  Completion  Credits  Range  Language 

17VSADR  Z  2  2C  Czech 
 Garant předmětu:
 Lubomír Štěpánek
 Lecturer:
 Lubomír Štěpánek
 Tutor:
 Lubomír Štěpánek
 Supervisor:
 Department of Biomedical Informatics
 Synopsis:

The course is aimed at students interested in programming language and environment R and the field of data science as well, as R is widely used for data science applications. R is not only a programming language designed for statistical computing and graphics purposes, but also a Turingcomplete generalpurpose programming language suitable for complex tasks solutions. Advantages of R over commercial systems such as MATLAB are (i) opensource distribution ? both free in the sense of costing no money (?freeasinbeer?) and having absolutely no restrictions on source code editing or commercial use (?freeasinspeech?). Among other benefits, (ii) there is a large online community congregated around R ready to help and answer user?s questions; R also provides (iii) an easy development of R web applications or (iv) userfriendly TeX documents typesetting directly via R code. The syntax of R language is simple, intuitive and quite similar to the syntax of MATLAB language. According to the recent kaggle.com worldwide statistics, R became the most popular programming language chosen for data analysis, data science and machine learning. Let?s say R is the lingua franca of data science. Class is practisebased and focused on problemsolving, numbercrunching exercises and on realdata analyses solved via handson R programming and scripting; assigned tasks follow an easytodifficult schedule.
 Requirements:

The course has no formal prerequisites. No prior experience with R is necessary, although some familiarity with procedural or even scripting programming languages such as MATLAB, Octave or Python would be helpful.
 Syllabus of lectures:
 Syllabus of tutorials:

Introduction, installation, R data types and structures overview; basic operations, numbers, vectors and simple manipulation.
2ndMore on data types in R, data structures and structures manipulation. Matrices, data frames, lists.
3rdLoading external data into R. Saving data from R to a file. Data (pre)processing.
4thFunctions in R. Useful builtin functions. Userdefined functions in R.
5thR as a programming language. Scoping, ifstatement, loops, fordo, whiledo, repeatuntil. Warnings. Errors. Flowcontrol. The R apply() function.
6thElements of statistics and data analysis in R. Probability distributions. Measures of average and variability. Hypothesis testing in R.
7thAdvanced statistics and data analysis in R. Linear models including generalized ones (GLM). Linear regression. Logistic regression. Survival analysis.
8thSelected advanced statistical methods in R, both linear and nonlinear. Cluster analysis. Discriminant analysis. Time series. Jacknife. Bootstrap.
9thSelected methods of machine learning in R. Na?ve Bayes classifier. Support Vector Machine (SVM). Cross Validation (CV). Principal Component Analysis (PCA). Decision trees. Random forests. Neural networks. Association rules.
10thGraphical outputs in R. Lowlevel and highlevel graphical commands. Multivariate data displaying. Parameters of plots and diagrams.
11thOverview of plots and diagrams in R and how to save a plot to a file. Choosing the most appropriate type of chart to use.
12thText processing in R. Handling and processing strings in R. Regular expressions in R. Tokenization, ngramming. TeX code included within R code. How to add R code or results of data analysis and plots outputted by R into TeX code and typeset a pdf.
13thBuilding web applications with R and Shiny package. Shiny package. Components of web application built with R. Using HTML, CSS and javascript to build R web application.
14thReviewing topics covered over the course. End of the course summary.
 Study Objective:
 Study materials:
 Note:
 Further information:
 No timetable has been prepared for this course
 The course is a part of the following study plans: