Covid-19 has shone what big data and a high level of technology can do. Countries deployed these things. China tapped into big data, machine learning and other digital tools as the virus spread through the nation in order to track and contain the outbreak.
Note: This column originally appeared in The Manila Times on June 2, 2020.
Politicians or anyone who has been involved in a political campaign actually do contact tracing. They do house to house. They know where their voters are and where the support of the opposition is. They build databases of voters and understand the math when it comes to putting together their vote base and capturing the undecided and even that of the support of their opponent. So, it is surprising that with the coronavirus disease 2019 (Covid-19) pandemic, local government units (LGUs) are unable to determine where the vulnerable populations and where those with physical challenges are.
Now that the National Capital Region (Metro Manila) has shifted to general community quarantine (GCQ), the lead in the battle against Covid-19 shifts to the local governments at the level of the barangay (village). The problem with this effort is that there is no mandatory position for a data analyst for the province or the city. It is often lodged with the planning or the health department. For some, it remains in the archaic mold of electronic data processing — certainly a thing of the past because today it is all about networked platforms.
One cannot do contact tracing independent of appreciating how data is managed. Under the GCQ, contact tracing becomes the weapon to contain the spread. The idea being, one limits the contact of people infected. Doing so, transmission and spread is limited and isolation and quarantine become the behavioral redefinition to battle the virus. Contact tracing is manually done just like in campaigns. Comparing the contact list with the data base of the LGU, gives one the strategic sense of visualizing risks and dealing with it.
Data is the controling factor today. With data, one can crunch to gain actionable information and with that, decision making is made easier since the obvious stands out. The problem with data, though, is that one needs to recognize the kind before one can use it to extract insights. Dirty data sets one to erroneous conclusions, interpretations and inferences. Good data can bring one too many conclusions that can aid in solving problems or identifying solutions and options. Others can build scenario and enhance horizon planning.
Data science is an “interdisciplinary field focused on extracting knowledge from data sets, which are typically large (see big data). The field encompasses analysis, preparing data for analysis and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design and business.” Turing Awardee Jim Gray imagined data science as a “fourth paradigm” of science (empirical, theoretical, computational and, now, data-driven). He said “everything about science is changing because of the impact of information technology and the data deluge.” The deluge is staggering, that is why without a uniform coding book and protocols, analytics will vary from one to another and this leads to confusion for a public not trained in appreciating data.
Data deluge is one major problem we have today. Some would often use the phrase garbage in, garbage out or GIGO. True to a certain degree since whatever is inputted and analyzed influences the outcome. But data management entails quality data and not just picking data from a veritable dump and using it to suit one’s agenda. The poor quality of data has to be taken into consideration and this is not even at the coding level.
But poorly written source text certainly makes the job of any analyst harder. From GIGO, attempts have been made to consider QIGO, or quality in, garbage out. You put quality data and you throw garbage out. The ideal would have been QIQO, or quality in, quality out.
QIQO is a phrase that refers to the fact that the value of the inputs usually influences the value of the output. Quality in, quality out is a more optimistic take on the concept of garbage in, garbage out. In practice, it means that as long as the data going into an application or analytical model is good, the resulting work done by the application or model will be accurate.
Take the case of data dump on Covid-19; one has to clean the data at any point in time. You cannot just copy and paste. There is no coding book agreed upon. Consequently, hope it can be posted real time. Given 5,000 data points, after cleaning, one can be down to 1,686 clean data sets that can be used for analysis. No real time there. Different protocols since the Department of Health (DoH) has not defined its coding protocol thereby coming out with different results.
Coding is an analytical process in which data are categorized to facilitate analysis. One purpose of coding is “to transform the data into a form suitable for computer-aided analysis. This categorization of information is an important step, for example, in preparing data for computer processing with statistical software. Prior to coding, an annotation scheme is defined. It consists of codes or tags. During coding, coders manually add codes into data where required features are identified. The coding scheme ensures that the codes are added consistently across the data set and allows for verification of previously tagged data.” Some studies will employ “multiple coders working independently on the same data.”
This also “minimizes the chance of errors from coding and is believed to increase the reliability of data.”
Looking at the problem of DoH is essentially looking at how government data is being made and used. Just like the use of the ZIP code in the country which could have been used early on and made coding a lot easier.
The ZIP code is used by the Philippine Postal Corp., or PhilPost, to simplify the distribution of mail. While its function is similar to the ZIP codes used in the United States, its form and usage are quite different. The use of ZIP codes in the Philippines is not mandatory; however it is highly recommended by the PhilPost. Unlike US ZIP codes though, our code is a four-digit number representing two things: in Metro Manila, a barangay within a city or city district (as in the case for Manila); and outside Metro Manila, a municipality or city. Usually, more than one code is issued for areas within Metro Manila; provincial areas are issued one code for each municipality and city, with some rare exceptions such as Dasmariñas in Cavite, which has three ZIP codes (4114, 4115, and 4126); Los Baños in Laguna, which has two ZIP codes (4030 and 4031 for the University of the Philippines Los Baños); and Angeles, which has two ZIP codes (2009 and 2024 for Barangay Balibago). We should revisit this and make its use mandatory. And then we build things properly in terms of spatial data.
Covid-19 has shone what big data and a high level of technology can do. Countries deployed these things. China tapped into big data, machine learning and other digital tools as the virus spread through the nation in order to track and contain the outbreak. The lessons learned there have continued to spread across the world as other countries fight the spread of the virus and use digital technology to develop real-time forecasts and arm healthcare professionals and government decision makers with intel they can use to predict the impact of the coronavirus. QIQA leads to actionable information and that makes decision making better as countries fight with a phantom virus.BLOG COMMENTS POWERED BY DISQUS