I am using pandas to read a text file filled with numerical data. I need to search two columns for user specified values (a specified constant integer value and an approximate float value) to return the rows which uniquely identify each dataset. I am currently able to return the rows containing a specified column integer value using:
import pandas as pd
integer = some integer
df = pd.read_csv("...")
array = df[(df[i] == integer)]
This successfully returns all rows containing a particular integer value and assigns to a dataframe named "array".
I am unable, however, to return rows which contain a specified floating point value using the same method. It simply returns an empty array, although I know that a row containing my test value is present in the data.
Additionally, I don't simply want to search for an exact float value, but I need to search for an approximate float value in the columns. For example, say the nominal user-specified value is '.6', the experimental value may actually have been '.59993' or '.60004'. So I need for the user to input '.6' and python search for values which are slightly greater than or less than .6.
Here is an example of what I have tried:
import pandas as pd
some_float = .6
df = pd.read_csv("...")
array = df[(df[i] <= some_float+.01 and >= some_float-.01)]
All attempts at using the various operators have resulted in a ValueError: "the truth value of an series is ambiguous. Use a.empty, a.any() or a.all()." However, this may be partly due to it not reading float values at all. Ultimately, the dataset which is "extracted" from the initial dataframe will be uniquely identified by a specified integer value and a float value.
Thanks
I tend to check whether the absolute value of the difference is less than some tolerance:
>>> df = pd.DataFrame({"A": np.random.random(1000), "B": np.arange(1000) % 4})
>>> some_float = 0.6
>>> abs_tol = 0.001
>>> df[(df.A - some_float).abs() < abs_tol]
A B
66 0.600845 2
180 0.600577 0
922 0.599571 2
Or if you want both the float and int comparison:
>>> df[((df.A - some_float).abs() < abs_tol) & (df.B == 0)]
A B
180 0.600577 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments