Package 'Routliers'

Title: Robust Outliers Detection
Description: Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) <doi:10.1016/j.jesp.2013.03.013> and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) <doi:10.1016/j.jesp.2017.09.011>. There is also the more known but less robust Mahalanobis distance method, only for comparison purposes.
Authors: Olivier Klein [aut], Marie Delacre [aut, cre]
Maintainer: Marie Delacre <[email protected]>
License: MIT + file LICENSE
Version: 0.0.0.3
Built: 2025-02-28 05:17:34 UTC
Source: https://github.com/mdelacre/routliers

Help Index


Data collected the day after the terrorist attacks in Brussels (on the morning of 22 March 2016) assessing the Sense of Coherence, anxiety and depression symptoms of 2077 subjects (1056 were in Brussels during the terrorist attacks, and 1021 were not).

Description

The Sense of Coherence was assessed with the SOC-13 (Antonovsky, 1987): 7-point Likert scale (13 items) Anxiety and depression were assessed with the HSCL-25 (Derogatis, Lipman, Rickels, Uhlenhuth & Covi, 1974).Subjects have to mention in a 4-point Likert Scale how much there were bothered or upset by each trouble during the last 14 days (1 = not at all; 2 = a little; quite a few; 4 = a lot).

Usage

data(Attacks)

Format

A data frame with 2077 rows and 46 variables:

age

age of participants, in years

presencebxl

were participants present in Brussels during the terrorist attacks; 1 = yes; -1 = no

genre

participant gender, 1 = female; -1 = male

soc1

Vous avez le sentiment que vous ne vous souciez pas reellement de ce qui se passe autour de vous: 1 = Tres rarement ou rarement; 7 = Souvent

soc1r

item1 reversed

soc2

Vous est-il arrive dans le passe d etre surpris(e) par le comportement de gens que vous pensiez connaitre tres bien ?: 1 = Jamais; 7 = Toujours

soc2r

item2 reversed

soc3

Est-il arrive que des gens sur lesquels vous comptiez vous decoivent ?: 1= Jamais; 7 = Toujours

soc3r

sense of coherence, item3 reversed

soc4

Jusqu a maintenant, votre vie : 1 = N a eu aucun but ni objectif clair; 7 = A eu des buts et des objectifs tres clairs

soc5

Avez-vous le sentiment que vous etes traite(e) injustement ?:1 = Tres souvent; 7 = Tres rarement ou jamais

soc6

Avez-vous le sentiment que vous etes dans une situation inconnue et que vous ne savez pas quoi faire ?: 1 = Tres souvent; 7 = Tres rarement ou jamais

soc7

Faire les choses que vous faites quotidiennement est : 1 = Une source de plaisir et de satisfaction; 7 = Une source de souffrance profonde et d ennui

soc7r

item7 reversed

soc8

Avez-vous des idees ou des sentiments confus(es) ?: 1 = Tres souvent; 7 = Tres rarement ou jamais

soc9

Vous arrive-t-il d avoir des sentiments intimes que vous prefereriez ne pas avoir ?: 1 = Tres souvent; 7 = Tres rarement ou jamais

soc10

Beaucoup de gens (meme s’ils ont beaucoup de caractere) se sentent parfois de pauvres cloches. Avez-vous deja eu ce sentiment dans le passe ?: 1 = Jamais; 7 = Tres souvent

soc10r

item10 reversed

soc11

Quand quelque chose arrive, vous trouvez generalement que : 1 = Vous surestimez ou sous-estimez son importance; 7 = Vous voyez les choses dans de justes proportions

soc12

Avez-vous le sentiment que les choses que vous faites dans la vie quotidienne ont peu de sens ?: 1 = Tres souvent; 7 = Tres rarement ou jamais

soc13

Vous avez le sentiment que vous n etes pas sur(e) de vous maitriser : 1 = Tres souvent; 7 = Tres rarement ou jamais

hsc1

Mal de tete

hsc2

Tremblement

hsc3

Fatigue ou etourdissement

hsc4

Nervosite, agitation au fond de soi

hsc5

Peur soudaine sans raison particuliere

hsc6

Continuellement peureux ou anxieux

hsc7

Battements du coeur qui s'emballent

hsc8

Sensation d etre tendu, stresse

hsc9

Crise d angoisse ou de panique

hsc10

Tellement agite qu'il en est difficile de rester assis

hsc11

Manque d energie, tout va plus lentement que d habitude

hsc12

Se fait facilement des repproches

hsc13

Pleure facilement

hsc14

Pense a se tuer

hsc15

Mauvais appetit

hsc16

Probleme de sommeil

hsc17

Sentiment de desespoir en pensant au futur

hsc18

Decourage, morose

hsc19

Sentiment de solitude

hsc20

Perte d interets et d envies sexuelles

hsc21

Sentiment de s etre fait prendre au piège ou fait prisionnier

hsc22

Agite ou se tracasse beaucoup

hsc23

Aucun interet pour quoique ce soit

hsc24

Sentiment que tout est fatiguant

hsc25

Sentiment d etre inutile

Details

In french


Study five of Rogers, T. & Milkman, K. L. (2016). Reminders through association. Psychological Science, 27, 973-986.

Description

Participants have to answer to many questions (in a 11-page-survey). For 5 questions (indicated by $$ at the beginning of the question), they are told that there is a correct answer and that they will earn $0.06 if they provide this correct answer. At the beginning of the experiment, there are also told that they will earn a $0.60 bonus if they choose the answer E on the last question (whatever this is the correct answer or not).

Usage

data(Intention)

Format

age

age

choice

Did participants choose to have a reminder? (1 = yes; 0 = no). Note that in conditions 2 and 4, participants had no choices and therefore, 0 is coded for all subjects in these two conditions

Condition

Condition 1 = free-reminder-through-association condition: participants read that they can choose to have (for free) an image of an elephant (presented on screen) that would appear at the bottom of page 11 as a reminder of selecting answer E; Condition 2 = non condition: no reminders; Condition 3 = costly-reminder-through-association condition: participants read that if they pay $0.03, an image of an elephant (presented on screen) would appear at the bottom of page 11 as a reminder of selecting answer E Condition 4 = forced-reminder-through-association condition: participants read that an image of an elephant (presented on screen) would appear at the bottom of page 11 as a reminder of selecting answer E.

correct

Did participants earn $0.60 bonus? (1 = yes; 0 = no)

dup

No available information

fee_for_reminder

How much was paid for a reminder? ($0.00 or $0.03)

filter_.

No available information

final_problem

Earned money for answering E on the last question: $0.00 (if E was not selected) or $0.60 (if E was selected)

gender

Gender; 0 = male; 1 = female

id

participants id

plus

Earned money at the beginning ( $0.06 for all participants)

problem1

First question for which participants earn a $0.03 bonus if they provide the correct answer

problem2

Second question for which participants earn a $0.03 bonus if they provide the correct answer

problem3

Third question for which participants earn a $0.03 bonus if they provide the correct answer

problem4

Fourth question for which participants earn a $0.03 bonus if they provide the correct answer

problem5

Fifth question for which participants earn a $0.03 bonus if they provide the correct answer

Total_Amount_Earned

Intention$final_problem minus Intention$fee_for reminder; They are 4 possibles outcomes: (1) $-0.03, if a reminder was paid and answer E was not selected on the last question; (2) $0.00, if no reminder was paid and answer E was not selected on the last question; (3) $0.57, if a reminder was paid and answer E was selected on the last question; (4) $0.60, is no reminder was paid and answer E was selected on the last question

Total_Amount_Earned_if.forced.to.pay.for.cue

equals Intention$Total_Amount_Earned in all but one condition: in condition 1 (free-reminder-through-association condition): Intention$Total_Amount_Earned_if.forced.to.pay.for.cue= Intention$Total_Amount_Earned - 0.03


Replication of Experiments Evaluating Impact of Psychological Distance on Moral Judgment (Eyal, Liberman & Trope, 2008; Gong & Medin, 2012) Study 2

Description

For 6 scenarios, participants have to evaluate the wrongness of actions, with a scale ranging from 1 (not ok) to 5 (completely ok) Contributors: Biljana Jokic Iris Zezelj osf link: https://osf.io/8wqvc/

Usage

data(Morality)

Format

a data frame with 145 rows and 10 columns

number

participant id

Orig_rep

Is participant English or Serbian?

social_distance

Is the person in the scenario someone participants know (i.e. colleague, neighbor) ?

swing_r

A girl pushing another kid off a swing because she really wants to use it before going home

flag_r

A woman cutting it up a national flag into small pieces and using it in order to clean her house

hands_r

A man eating his food with his hands, like most of his family members, also in public, after he washes them

mother_r

A loving man who promised her dying mother that he would visit her grave every week but didn't keep his promise because he was very busy

kiss_r

Two cousins kissing each other passionately on the mouth, in secret, because there are in love

dog_r

Eating our dog that was hitten by a car in front of our house and was killed

mean_judge_r

average of all scenarios judgment


MAD function to detect outliers

Description

Detecting univariate outliers using the robust median absolute deviation

Usage

outliers_mad(x, b, threshold, na.rm)

Arguments

x

vector of values from which we want to compute outliers

b

constant depending on the assumed distribution underlying the data, that equals 1/Q(0.75). When the normal distribution is assumed, the constant 1.4826 is used (and it makes the MAD and SD of normal distributions comparable).

threshold

the number of MAD considered as a threshold to consider a value an outlier

na.rm

set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE

Value

Returns Call, median, MAD, limits of acceptable range of values, number of outliers

Examples

#### Run outliers_mad
x <- runif(150,-100,100)
outliers_mad(x, b = 1.4826,threshold = 3,na.rm = TRUE)

#### Results can be stored in an object.
data(Intention)
res1=outliers_mad(Intention$age)
# Moreover, a list of elements can be extracted from the function,
# such as all the extremely high values,
# That will be sorted in ascending order
#### The function should be performed on dimension rather than on isolated items
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
res=outliers_mad(x = SOC)

mahalanobis function to detect outliers

Description

Detecting multivariate outliers using the Mahalanobis distance

Usage

outliers_mahalanobis(x, alpha, na.rm)

Arguments

x

matrix of bivariate values from which we want to compute outliers

alpha

nominal type I error probability (by default .01)

na.rm

set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE

Value

Returns Call, Max distance, number of outliers

Examples

#### Run outliers_mahalanobis
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6","soc7r",
"soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mahalanobis(x = cbind(SOC,HSC), na.rm = TRUE)
# A list of elements can be extracted from the function,
# such as the position of outliers in the dataset
# and the coordinates of outliers
res$outliers_pos
res$outliers_val

MCD function to detect outliers

Description

Detecting multivariate outliers using the Minimum Covariance Determinant approach

Usage

outliers_mcd(x, h, alpha, na.rm)

Arguments

x

matrix of bivariate values from which we want to compute outliers

h

proportion of dataset to use in order to compute sample means and covariances

alpha

nominal type I error probability (by default .01)

na.rm

set whether Missing Values should be excluded (na.rm = TRUE) or not (na.rm = FALSE) - defaults to TRUE

Value

Returns Call, Max distance, number of outliers

Examples

#### Run outliers_mcd
# The default is to use 75% of the datasets in order to compute sample means and covariances
# This proportion equals 1-breakdown points (i.e. h = .75 <--> breakdown points = .25)
# This breakdown points is encouraged by Leys et al. (2018)
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6","soc7r",
"soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mcd(x = cbind(SOC,HSC), h = .75)
res

# Moreover, a list of elements can be extracted from the function,
# such as the position of outliers in the dataset
# and the coordinates of outliers
res$outliers_pos
res$outliers_val

Plotting function for the mad

Description

plotting data and highlighting univariate outliers detected with the outliers_mad function

Usage

plot_outliers_mad(res, x, pos_display = FALSE)

Arguments

res

result of the outliers_mad function from which we want to create a plot

x

data from which the outliers_mad function was performed

pos_display

set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE)

Value

None

Examples

#### Run outliers_mad and perform plot_outliers_mad on the result
data(Intention)
res=outliers_mad(Intention$age)
plot_outliers_mad(res,x=Intention$age)

### when the number of outliers is small, one can display the outliers position in the dataset
x=c(rnorm(10),3)
res2=outliers_mad(x)
plot_outliers_mad(res2,x,pos_display=TRUE)

Plotting function for the Mahalanobis distance approach

Description

plotting data and highlighting multivariate outliers detected with the mahalanobis distance approach

Usage

plot_outliers_mahalanobis(res, x, pos_display = FALSE)

Arguments

res

result of the outliers_mad function from which we want to create a plot

x

matrix of multivariate values from which we want to compute outliers. Last column of the matrix is considered as the DV in the regression line.

pos_display

set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE)

Details

plotting data and highlighting multivariate outliers detected with the MCD function Additionnally, the plot return two regression lines: the first one including all data and the second one including all observations but the detected outliers. It allows to observe how much the outliers influence of outliers on the regression line.

Value

None

Examples

#### Run plot_outliers_mahalanobis
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mahalanobis(x = cbind(SOC,HSC))
plot_outliers_mahalanobis(res, x = cbind(SOC,HSC))
# it's also possible to display the position of the multivariate outliers ion the graph
# preferably, when the number of multivariate outliers is not too high
c1 <- c(1,4,3,6,5,2,1,3,2,4,7,3,6,3,4,6)
c2 <- c(1,3,4,6,5,7,1,4,3,7,50,8,8,15,10,6)
res2 <- outliers_mahalanobis(x = cbind(c1,c2))
plot_outliers_mahalanobis(res2, x = cbind(c1,c2),pos_display = TRUE)

# When no outliers are detected, only one regression line is displayed
c3 <- c(1,4,3,6,5)
c4 <- c(1,3,4,6,5)
res3 <- outliers_mahalanobis(x = cbind(c3,c4))
plot_outliers_mahalanobis(res3,x = cbind(c3,c4))

Plotting function for the MCD

Description

plotting data and highlighting multivariate outliers detected with the MCD function Additionnally, the plot return two regression lines: the first one including all data and the second one including all observations but the detected outliers. It allows to observe how much the outliers influence of outliers on the regression line.

Usage

plot_outliers_mcd(res, x, pos_display = FALSE)

Arguments

res

result of the outliers_mad function from which we want to create a plot

x

matrix of multivariate values from which we want to compute outliers. Last column of the matrix is considered as the DV in the regression line.

pos_display

set whether the position of outliers in the dataset should be displayed on the graph (pos_display = TRUE) or not (pos_display = FALSE)

Value

None

Examples

#### Run plot_outliers_mcd
data(Attacks)
SOC <- rowMeans(Attacks[,c("soc1r","soc2r","soc3r","soc4","soc5","soc6",
"soc7r","soc8","soc9","soc10r","soc11","soc12","soc13")])
HSC <- rowMeans(Attacks[,22:46])
res <- outliers_mcd(x = cbind(SOC,HSC),na.rm=TRUE,h=.75)
plot_outliers_mcd(res,x = cbind(SOC,HSC))

# it's also possible to display the position of the multivariate outliers ion the graph
# preferably, when the number of multivariate outliers is not too high
c1 <- c(1,4,3,6,5,2,1,3,2,4,7,3,6,3,4,6)
c2 <- c(1,3,4,6,5,7,1,4,3,7,50,8,8,15,10,6)
res2 <- outliers_mcd(x = cbind(c1,c2),na.rm=TRUE)
plot_outliers_mcd(res2, x=cbind(c1,c2),pos_display=TRUE)

# When no outliers are detected, only one regression line is displayed
c3 <- c(1,2,3,1,4,3,5,5)
c4 <- c(1,2,3,1,5,3,5,5)
res3 <- outliers_mcd(x = cbind(c3,c4),na.rm=TRUE)
plot_outliers_mcd(res3,x=cbind(c3,c4),pos_display=TRUE)