#
Techniques for data analysis

A.Y. 2020/2021

Learning objectives

The mean aim of this course is to provide suitable tools to quantitatively describe one or more phenomena belonging to different environments (i.e. economic, social, political, administrative, historical, juridical, etc.).

This topic can be addressed by the organization of the raw data in frequency tables, the graphical representation and through the use of suitable indexes. Furthermore, when the observed data are the result of (partial) sample surveys, it is necessary to take advantage of the inference. In these cases, through the partial knowledge of the phenomenon, we can extend the results to the whole population, in term of probability.

This, during the class will be presented the elementary tools of calcolus of probability and the estimation theory.

The basic matematical elements are exential to understand and to apply these statistical tools.

This topic can be addressed by the organization of the raw data in frequency tables, the graphical representation and through the use of suitable indexes. Furthermore, when the observed data are the result of (partial) sample surveys, it is necessary to take advantage of the inference. In these cases, through the partial knowledge of the phenomenon, we can extend the results to the whole population, in term of probability.

This, during the class will be presented the elementary tools of calcolus of probability and the estimation theory.

The basic matematical elements are exential to understand and to apply these statistical tools.

Expected learning outcomes

At the end of this course, the students will know and will be able to understand the main mathematical and statistics techniques presented during the classes. They will know to carry out a descriptive analysis of a dataset, by highlighting the main features of the variables of interest. Furthermore, the students will be able to draw significant and reasonable conclusions in concordance with the context whom the aim of the analysis belongs (i.e. economic, social, political, administrative, historical and juridical).

Through several examples based on real data, we will show how to face analysis, how to interpret the results and how to draw conclusions coherent with the context in which the variables belong. We will judge the ability to understand and exposition of the students through an analysis similar to the ones presented during the classes. The mathematics and statistics techniques will give the fundamentals for further development and deeper analysis that the students can be faced with during their educational path.

Through several examples based on real data, we will show how to face analysis, how to interpret the results and how to draw conclusions coherent with the context in which the variables belong. We will judge the ability to understand and exposition of the students through an analysis similar to the ones presented during the classes. The mathematics and statistics techniques will give the fundamentals for further development and deeper analysis that the students can be faced with during their educational path.

**Lesson period:**
Second trimester

**Assessment methods:** Esame

**Assessment result:** voto verbalizzato in trentesimi

Course syllabus and organization

### Single session

Responsible

Lesson period

Second trimester

The first module of Mathematics will be taught remotely with synchronous lessons through the Microsoft Teams platform.

In case of necessity due to the health emergency, the second part of the course, concerning data analysis techniques, will be also offered remotely with synchronous lessons through the Microsoft Teams platform.

The exam will consist of a written test lasting one hour and 15 minutes, with three exercises and 6 multiple choice questions: 1 exercise and 2 questions relating to the part of Mathematics, the rest relating to the part of Data Analysis. In case of necessity, the exam may have to be online through the exam.net platform.

In December there will be a partial test on the Mathematics part, which (if passed) will be integrated by carrying out only the Data Analysis part during the official exams. In case of necessity, the partial test may have to be online, through the exam.net platform.

In case of necessity due to the health emergency, the second part of the course, concerning data analysis techniques, will be also offered remotely with synchronous lessons through the Microsoft Teams platform.

The exam will consist of a written test lasting one hour and 15 minutes, with three exercises and 6 multiple choice questions: 1 exercise and 2 questions relating to the part of Mathematics, the rest relating to the part of Data Analysis. In case of necessity, the exam may have to be online through the exam.net platform.

In December there will be a partial test on the Mathematics part, which (if passed) will be integrated by carrying out only the Data Analysis part during the official exams. In case of necessity, the partial test may have to be online, through the exam.net platform.

**Course syllabus**

Mathematics - Probability and random variables:

1) Random experiment, outcomes and events, and definition of probability. Some elementary concepts about the calculus of probability. Definition of independet events.

2) Functions of one variable. General concepts and characteristics on some types of function (linear, quadratic, some transcendent). Notes on the concepts of limit and continuity. Differential calculus and simple applications. Notes on the concepts of integral and primitive. Examples of calculating definite integrals.

3) Continuous and dicrete random variables: probability law; derivatives and integrals to define the probability density function and the probability distribution; expected value, mode, median and variance of a random variable. Definition of independence between random variables.

4) The central limit theorem and the large number law.

5) The Bernoulli, Binomial and Normal random variables.

Data Analysis Techniques - Descriptive Statistics:

1) Classification of statistical phenomena (types of characters) and frequency distributions (absolute, relative and cumulative frequencies).

2) Graphical representations: bar graph, stick graph, frequency histogram.

3) Calculation of: mode, median and sample mean when the data are classified in a table. Theorems and properties of the mean.

4) Some indices of variability: range, difference and interquartile difference, variance and standard deviation. The coefficient of variation .

5) Contingency tables and bivariate analysis: definitions of absolute, relative, marginal and conditional "joint frequency distribution"; the Pearson index for independence; the depence on average; covariance and linear correlation coefficient; the simple linear regression model (least squares method; goodness of fit and coefficient of determination; prediction).

Data Analysis Techniques - Inferential Statistics:

1) Point estimation: definition of unbiased estimator; The "standard error" as precision measure of an estimator. The sample mean and variance; the sample proportion.

2) Confidence intervals (with Normal observations and known or unknown variance). Confidence intervals for a proportion.

3) General definitions of hypothesis testing and of p-value. Hypothesis testing for means, with Normal observations and known or unknown variance.

4) Hypothesis testing for proportions.

1) Random experiment, outcomes and events, and definition of probability. Some elementary concepts about the calculus of probability. Definition of independet events.

2) Functions of one variable. General concepts and characteristics on some types of function (linear, quadratic, some transcendent). Notes on the concepts of limit and continuity. Differential calculus and simple applications. Notes on the concepts of integral and primitive. Examples of calculating definite integrals.

3) Continuous and dicrete random variables: probability law; derivatives and integrals to define the probability density function and the probability distribution; expected value, mode, median and variance of a random variable. Definition of independence between random variables.

4) The central limit theorem and the large number law.

5) The Bernoulli, Binomial and Normal random variables.

Data Analysis Techniques - Descriptive Statistics:

1) Classification of statistical phenomena (types of characters) and frequency distributions (absolute, relative and cumulative frequencies).

2) Graphical representations: bar graph, stick graph, frequency histogram.

3) Calculation of: mode, median and sample mean when the data are classified in a table. Theorems and properties of the mean.

4) Some indices of variability: range, difference and interquartile difference, variance and standard deviation. The coefficient of variation .

5) Contingency tables and bivariate analysis: definitions of absolute, relative, marginal and conditional "joint frequency distribution"; the Pearson index for independence; the depence on average; covariance and linear correlation coefficient; the simple linear regression model (least squares method; goodness of fit and coefficient of determination; prediction).

Data Analysis Techniques - Inferential Statistics:

1) Point estimation: definition of unbiased estimator; The "standard error" as precision measure of an estimator. The sample mean and variance; the sample proportion.

2) Confidence intervals (with Normal observations and known or unknown variance). Confidence intervals for a proportion.

3) General definitions of hypothesis testing and of p-value. Hypothesis testing for means, with Normal observations and known or unknown variance.

4) Hypothesis testing for proportions.

**Prerequisites for admission**

The Mathematics program carried out in any higher education institution provides sufficient bases for following the teaching. A general review is recommended.

**Teaching methods**

The professors explain on the blackboard tendentially without the use of slides, the lecture in this way is more interactive and is adapted to the needs of the class, both in terms of presentation speed and in-depth study of the concepts.

Non-attending students can find everything in the reference material indicated (textbook and any handouts on ARIEL).

After the introduction of any new concept, various numerical examples are presented to fully understand its meaning and to practice with the calculations.

Comments and requests for clarification during the lectures / exercises by the students are always welcome, because they make the lectures more lively and certainly more useful for everyone.

Non-attending students can find everything in the reference material indicated (textbook and any handouts on ARIEL).

After the introduction of any new concept, various numerical examples are presented to fully understand its meaning and to practice with the calculations.

Comments and requests for clarification during the lectures / exercises by the students are always welcome, because they make the lectures more lively and certainly more useful for everyone.

**Teaching Resources**

Mathematics - Probability and random variables:

"Introduzione all'inferenza statistica" by Ferrari, Nicolini e Tommasi, Giappichelli Editore - Torino (2009) - CHAPTERS: 1 and 2; in additon, some lecture notes written by the professor will be available on the ARIEL platform.

Data Analysis Techniques - Descriptive Statistics:

2 lecture notes written by the professor will be available on the ARIEL platform.

Data Analysis Techniques - Inferential Statistics:

"Introduzione all'inferenza statistica" by Ferrari, Nicolini and Tommasi, Giappichelli Editore - Torino (2009) - CHAPTERS: 3-4

and a lecture note written by the professor concerning "point estimation" will be available on the ARIEL platform.

"Introduzione all'inferenza statistica" by Ferrari, Nicolini e Tommasi, Giappichelli Editore - Torino (2009) - CHAPTERS: 1 and 2; in additon, some lecture notes written by the professor will be available on the ARIEL platform.

Data Analysis Techniques - Descriptive Statistics:

2 lecture notes written by the professor will be available on the ARIEL platform.

Data Analysis Techniques - Inferential Statistics:

"Introduzione all'inferenza statistica" by Ferrari, Nicolini and Tommasi, Giappichelli Editore - Torino (2009) - CHAPTERS: 3-4

and a lecture note written by the professor concerning "point estimation" will be available on the ARIEL platform.

**Assessment methods and Criteria**

The exam of "mathematics and data analysis" consists in a written test lasting an hour and a quarter. It is formed by 3 exercises and 6 multiple choice questions, concerning the topics listed in the program (1 exercise and 2 questions for each point of the program). The exam is evaluated from 0 to 30 points and is considered sufficient if a score of at least 18 is obtained.

To carry out the written test you need to bring a calculator with you.

To carry out the written test you need to bring a calculator with you.

SECS-S/01 - STATISTICS - University credits: 6

SECS-S/06 - MATHEMATICAL METHODS OF ECONOMICS, FINANCE AND ACTUARIAL SCIENCES - University credits: 3

SECS-S/06 - MATHEMATICAL METHODS OF ECONOMICS, FINANCE AND ACTUARIAL SCIENCES - University credits: 3

Lessons: 60 hours

Professors:
Manicone Francescopaolo, Tommasi Chiara

Professor(s)