I have data from hospital on patients deaths. The data is not arranged properly, the data is as in link below
All the dates should in either column DOA (H Column) or DOD (I Column) or MB (J Column) and the rest of text should arranged in a separate column. Can anyone please help me to clean this data, as I have more than 5000 observations. enter link description here
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| Sl.NO | District | State P No | Age In Years | Sex | Symptoms | Co-Morbidities | DOA | DOD | MB Date | Notes |
+=======+===========+=============+===============+=========+===================+====================================================+==================+=================================+============+=======================+
| 10 | X4 | 6553 | 53 | F | Fever | Cold | Cough | Thyroid disease | 10-06-2020 | 20-06-2020 |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 11 | X5 | 8872 | 62 | M | Fever | Diabetes Mellitus | 16-06-2020 | 16-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 12 | X5 | 8880 | 55 | M | Pneumonia | Respiratory distress Obese, Chronic Alcoholic | 18-06-2020 | 20-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 13 | X2 | 9149 | 70 | M | Loss of Appetite | Weakness, Hypertension | 18-06-2020 | 18-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 14 | X3 | 9150 | 46 | M | Weakness | Convulsions, Hypertension | 17-06-2020 | 18-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 15 | X4 | 7732 | 60 | Female | Fever | Cough | Breathlessness | uncontrolled Diabetes Mellitus | 17-06-2020 | 22-06-2020 |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 16 | X5 | 9237 | 90 | M | Asymptomatic | Hypertension | | 20-06-2020 | 22-06-2020 | Died at his residence |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
You can obtain your desired output using Power Query
, available in Windows Excel 2010+ and Office 365 Excel
Data => Get&Transform => From Table/Range
Home => Advanced Editor
Applied Steps
window, to better understand the algorithm and stepsI have assumed, based on your data sample, that when there is an error, it is because there are three co-morbidities that have been split on the commas.
That being the case, I have tested to see if the DOD
column contains a date.
If that logic doesn't work for all, or if it is in error, it can be easily changed.
すべての処理の魔法は、Table.Group
関数の引数で発生します
Mコード
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//Replace any blanks ("") with null for easier processing down the road
#"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms", "Co-Morbidities", "DOA", "DOD", "MB Date", "Notes"}),
//Group the table by the first six columns
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms"}, {
//Extract the columns to be processed using the logic shown
{"Co-Morbidities", each
if Value.Is([DOD]{0}, type datetime)
then [#"Co-Morbidities"]{0}
else Text.Combine({[#"Co-Morbidities"]{0},[DOA]{0}, [DOD]{0}},", "), type text},
{"DOA", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([DOA]{0})
else if [DOA]{0} = null then null
else DateTime.Date([#"MB Date"]{0}), type date},
{"DOD", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([DOD]{0})
else DateTime.Date([Notes]{0}), type date},
{"MB Date", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([#"MB Date"]{0})
else null, type date},
{"Notes", each
if Value.Is([Notes]{0}, type datetime)
then null
else [Notes]{0}}
})
in
#"Grouped Rows"
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加