I'm quite a beginner with R and R packages in general. I'd like to ask you if there is any clear solution to the problem below. I've imported my data in .csv format as you can see in the following picture
https://dl.dropboxusercontent.com/u/23801982/1234.jpg
These are grouped data by entity year month and are about the 4 parameters as you can see in the next columns. If also produce a box plot for the e.g. Absrtactions column as following:
https://dl.dropboxusercontent.com/u/23801982/1234566.jpg
Now I'm trying to identify the outliers which I did with boxplot.stats command.
But I don't know how to eliminate exclude the outliers from the results and export them in a new file (e.g. .txt or .csv) due to grouped data. I saw also a manual external way to calculate with IQR but I think it doesn't fit to the exportable dataset required.
The code I used so far is:
rm(list = ls())
library("gdata")
s1 <- read.csv("C:\\Users\\G\\Documents\\R\\Projects\\20141125.csv", header = T)
boxplot(s1$Abstractions ~ s1$Entity, col="green", srt=45)
boxplot.stats(s1$Abstractions)
Thank you
You are looking at the right function boxplot.stats
to look at what a function in R you can use
?functionName
so try
?boxplot.stats
and you will see that it return the outliers values in a slot call out
Value:
List with named components as follows:
stats: a vector of length 5, containing the extreme of the lower
whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and
the extreme of the upper whisker.
n: the number of non-‘NA’ observations in the sample.
conf: the lower and upper extremes of the ‘notch’ (‘if(do.conf)’).
See the details.
out: the values of any data points which lie beyond the extremes
of the whiskers (‘if(do.out)’).
Note that ‘$stats’ and ‘$conf’ are sorted in _in_creasing order,
unlike S, and that ‘$n’ and ‘$out’ include any ‘+- Inf’ values.
so to remove the outliers you can do something like this
outliersValue<- boxplot.stats(x)$out
x[!x %in% outliersValue]
where x is your data .
the %in%
operator will check if a value exist in another value. Adding !
is a negation operator , which this case, will reverse the logic, returning True
for x
that are not found in outliersValue
I hope you find this useful. Happy R-ing
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句