(C) Common Dreams
This story was originally published by Common Dreams and is unaltered.
. . . . . . . . . .
The Fingerprints of Fraud: Evidence from Mexico’s 1988 Presidential Election [1]
['Francisco Cantú']
Date: 2019-08-15
The structure of the rest of the paper is as follows. The second section provides a brief contextual background for the 1988 Mexican elections, describing the structural and institutional conditions for this event, as well as describing the main irregularities documented in the literature. The third section defines the conditions in which aggregation fraud is more likely to occur, providing qualitative evidence from the study case. The fourth section describes the methodology and presents the results of the classification of all of the images in the database. Using this classification as the dependent variable, the fifth section proposes the theoretical expectations and explores the determinants of this fraud technology. Finally, the sixth section summarizes the findings and provides suggestions for future research.
The final contribution of this article is the documentation of an overlooked electoral irregularity in an oft-cited case that epitomizes how incumbents control non-democratic elections (Chernykh and Svolik Reference Chernykh and Svolik2015 ; Levitsky and Way Reference Levitsky and Way2010 ; Schedler Reference Schedler2002a ). Prior research on the 1988 election in Mexico had focused on its consequences for the country’s gradual democratization process (Bruhn Reference Bruhn1997 ; Eisenstadt Reference Eisenstadt2004 ; Greene Reference Greene2007 ; Magaloni Reference Magaloni2006 ). Nevertheless, to this date, there is little comprehensive evidence of the existence and scope of fraud in this election. This paper analyzes for the first time the results from all the polling stations that were open on July 6, 1988, and it shows that most of the electoral irregularities took place at the district councils.
I document the extent of aggregation fraud in the election by using a novel database with images of more than 50,000 vote tallies available for the election. Applying Convolutional Neural Networks (CNN)—a computer-aided detection system used for image-recognition problems—I identify blatant alterations in about a third of the vote tallies in the country. A complementary analysis suggests that these alterations were more likely to occur in tallies from polling stations where the opposition was absent and in the jurisdictions of governors who had either personal ties to the official candidate or expertise in leading electoral operations for the ruling party.
This paper explores the role of electoral institutions in concealing manipulation using new data on the 1988 presidential election in Mexico. This election is often taken as an example of the way hegemonic parties rely on fraud despite their overwhelming control of the electoral administration. Nevertheless, the ways and the scope of electoral manipulation in this event remain unknown. I focus on the opportunities to alter the vote tallies after an electoral reform that allowed district officials to amend the results, preventing any legal objection from the opposition. While these provisions yielded the formal opportunities to manually alter the results, the official candidate’s surprising lack of popularity behooved the incumbent party to rely on the governors of each state, who each had the ultimate task of coordinating and monitoring the electoral operation. I analyze the variation of fraud at the sub-national level by considering the governors’ electoral experience and personal ties to the presidential candidate. Working at the interface between formal and informal politics, I look for the constraints and opportunities involved in manipulating the election results during the vote-aggregation process.
In hegemonic party regimes, the dilemma between encouraging electoral competition and trying to curb the outcome is particularly relevant. The stability of these regimes depends upon their capacity to balance concessions to the opposition with the fine control of electoral institutions. However, while incumbent parties can achieve these goals by tailoring electoral rules to their benefit (Díaz-Cayeros and Magaloni Reference Díaz-Cayeros, Magaloni and Colomer2004 ; Higashijima and Chang Reference Higashijima and Chang2015 ; Levitsky and Way Reference Levitsky and Way2010 ), they often end up relying on fraud (Birch Reference Birch2012 ; Little Reference Little2015 ; Rozenas Reference Rozenas2015 ; Simpser Reference Simpser2013 ). If hegemonic parties contravene the rules they created in first place, the role of electoral institutions in concealing electoral irregularities is unclear. Do electoral rules in non-democratic regimes shape the opportunities for fraud, or are they a mere façade for electoral manipulation?
Electoral authorities resumed the public vote count 3 days later, on July 10, when the official vote tabulation took place in each of the country’s 300 district councils. Later that day, officials announced the victory of the PRI’s Carlos Salinas with 50.4% of the vote, followed by Cárdenas with 31.1% and Clouthier with 17.1%. These results sparked multiple protests from opposition parties and citizens across the country. The confrontation over the official results, however, gradually weakened in part because of disagreements within the opposition (Gómez Tagle Reference Gómez Tagle and Casanova1990 ; Magaloni Reference Magaloni2010 ). This allowed the ratification of Salinas’s victory by the Chamber of Deputies on September 10, 1988.
Doubts about the legitimacy of the process escalated on the election night after electoral authorities suddenly stopped publishing the results. With only 2% of the vote tallies counted on election night, the preliminary results showed the PRI’s imminent defeat in Mexico City metropolitan area and a very narrow vote margin between Salinas and Cárdenas (Molinar Reference Molinar1991 ). These results triggered the anxiety of President Miguel de la Madrid, who—as he recognizes in his memoirs—instructed election officials to interrupt the public vote count (de la Madrid Reference de la Madrid2004 , 816). A few minutes later, the screens at the Ministry of Interior went blank, an event that electoral authorities justified as a technical problem caused by an overload on telephone lines (Castañeda Reference Castañeda2000 ). Skeptical about the official explanation, opposition representatives urged election officials to continue with the public vote count after finding a computer in the building’s basement that continued to receive electoral results (Valdés Zurita and Piekarewicz Reference Valdés Zurita, Piekarewicz and Casanova1990 ). The sudden interruption of public information and the refusal of electoral authorities to release further results caused this incident to be referred to as “crash of the system,” suggesting that the interruption of the vote count allowed federal election officials in Mexico City to manipulate the final results.
As soon as the voting started on July 6, 1988, opposition parties and news agencies gave accounts of wide-ranging irregularities taking place throughout the country. The incidents included, for example, polling stations opening with an undue delay (New York Times 1988 ), stolen and stuffed ballot boxes (La Jornada Reference La Jornada1988b ), and destroyed ballots marked for Cárdenas (Los Angeles Times 1988 ). Later that day, all opposition candidates signed a letter documenting these and other irregularities—such as absent election officials, inflated voter rolls, and voters casting multiple ballots—and asked election officials to “reestablish the legality of the electoral process” (Cárdenas, Clouthier, and Ibarra Reference Cárdenas, Clouthier, Ibarra and Graf1989 ).
The 1988 presidential race pitted the PRI’s candidate Carlos Salinas against two main candidates campaigning from opposite sides of the ideological spectrum. Footnote 1 On the left, a number of small parties and civic organizations created the Democratic National Front (FDN) to endorse Cuauhtémoc Cárdenas’s candidacy. Cárdenas, who led the PRI’s splinter a year earlier, aimed his campaign toward an electorate frustrated by declining living standards and governmental corruption (Bruhn Reference Bruhn1997 ). On the right, the National Action Party (PAN) nominated Manuel Clouthier, whose campaign targeted middle-class voters disappointed with the country’s economic policies (Shirk Reference Shirk, Middlebrook and Jolla2001 ). Facing unequal campaign resources and biased media coverage (Lawson Reference Lawson2002 ; Reding Reference Reding1988 ), both opposition candidates focused on mobilizing the protest vote and emphasizing that a PRI defeat was the first step toward democratizing the country (Domínguez and McCann Reference Domínguez and McCann1996 ).
And yet, the most critical weakening factor for the regime may have sprung from within the PRI itself. In the early 1980s, a group of party members with more technical skills than political experience began occupying top positions in the federal administration (Camp Reference Camp2014 ). The gradual influence of this group within the party faced hostility from the traditional political bosses, who opposed the new pro-market policies promoted by the government (Langston Reference Langston2017 ). The intra-party disagreements escalated in 1987 when a handful of prominent PRI members spoke out against the government’s orthodox measures to deal with the economic crisis and the lack of democracy within the party. When the president and party authorities did not attend to the demands, the dissident group left the PRI a year before the presidential election; this was the most critical split in the party since 1940 (Magaloni Reference Magaloni2006 ).
By the second half of the 1980s, however, the PRI’s invincibility began to wane. The popularity of the official party gradually fell as a new generation of urban citizens, unfamiliar with the country’s economic boom 30 years earlier, reached the voting age (Craig and Cornelius Reference Craig, Cornelius, Mainwaring and Scully1995 ). The erosion of the regime’s public support intensified with the financial crisis of the early 1980s, which saw it lose support from popular sectors and the business people (Bruhn Reference Bruhn1997 ; Haber et al. Reference Haber, Klein, Maurer and Middlebrook2008 ). Discontent with the government and the official party became evident during the 1985 legislative election, where the PRI’s vote share dropped to a new low of 64% (Molinar Reference Molinar1991 ).
For most of the twentieth century, elections in Mexico were an instrument for the official party to “rule perpetually and rule with consent” (Przeworski et al. Reference Przeworski, Alvarez, Cheibub and Limongi2000 , 26). Although multiparty elections were held uninterruptedly, a complex system of formal institutions and informal arrangements enabled the Institutional Revolutionary Party (PRI) to win all the Senate, gubernatorial, and presidential elections from 1929 to 1988 (Johnson Reference Johnson1978 ; Langston Reference Langston2017 ; Scott Reference Scott1964 ). The strength of the official party relied on the legitimacy gained by competing in elections and the uneven playing field for the opposition parties (Levitsky and Way Reference Levitsky and Way2010 ; Schedler Reference Schedler2002a , 37).
The most straightforward way to verify the validity of these anecdotes and evaluate the prevalence of such alterations would be to compare the votes in every ballot box with the results reported by election authorities. Unfortunately, this comparison turns out to be impossible as authorities only published the results at the district level and the government destroyed the ballots in 1992 (Magaloni Reference Magaloni2006 ). Nevertheless, a close inspection of the stored tallies for the 1988 election shows several instances of altered vote numbers, as Figure 1 shows. The examples at the top present crossed-out numbers as well as inconsistencies in ink color and handwriting. Meanwhile, the images at the bottom illustrate those altered tallies involving number insertions that have irregular slants and different pressure. Section C.2 in the Appendix provides additional examples of tallies with blatant alterations that changed the vote totals by significant amounts. The next section presents quantitative evidence for this irregularity and estimates the overall prevalence of the altered tallies in the election.
In polling station number 2, the PRI obtained 232 votes, as it appears in the certified copy provided to the political parties. However, Mr. Carlos Olvera, the president of the Electoral Committee in the District, submitted an apparent altered tally during the official vote count on Sunday the 10th, recording 1,422 instead of 232 votes. (…) In polling station number 3, the PRI actually got 184 votes, but the altered tally gives it 2,488. The real vote tally of polling station number 4 shows 154 votes for the PRI, but the false tally shows 720. Meanwhile, the real number of votes for the Popular Socialist Party was 240 but the false tally gave it only 140 (Senado de la República 1988 , 115).
The amendments to the tallies’ vote totals became evident when opposition representatives compared the results they recorded at the polling stations on Election Day with the few official results published at the polling-station level. Consider the following quote from a member of the Popular Socialist Party (PPS) describing the discrepancies between the results recorded by the party representatives at the polling stations and those reported by electoral authorities:
Interviews with two representatives of the Mexican Socialist Party (PMS) in the Federal Electoral Commission (CFE) at the time confirmed this particular story. One of them recalls that the stenographic records in that district described the demand from all opposition parties to examine the discrepancy of the results, but the motion was turned down by the majority of PRI votes at the council. Both representatives later compared the results in the district and found a difference between the total number of votes for president and Congress of more than 70,000 votes. Footnote 5
An official would page through the pile of precinct tallies one by one, calling out in a loud voice—in Spanish, cantando—the votes for each candidate as a secretary wrote the totals onto the district spreadsheet. (…) Each time Salinas’s votes from a precinct were read out loud, the PAN representative complained, the district committee secretary was adding a zero to Salinas’s total on the spreadsheet, changing 73 votes for Salinas to 730 votes, for instance. (p. 172)
The fact that the PRI had the majority of votes in every district council made it impossible for the opposition to prevent any irregularities from occurring during the district tabulation. For example, Preston and Dillon ( Reference Preston and Dillon2004 ) describe the manipulation of vote tallies in the Second District of Puebla:
Qualitative evidence suggests the way in which aggregation fraud took part during the tabulation of the votes a few days after Election Day. Óscar de Lassé, chief of staff in the Ministry of Interior (1982–8), admits the deliberate suspension of the public vote count, but corroborates that the official results announced by the ministry were based on what they received from the 300 district councils a week after Election Day. In his own words, “if (the results) were amended, those amendments occurred in the district councils, and not in the Ministry of Interior” (Anaya Reference Anaya2008 , 263). José Newman, director of the National Electoral Registry in 1988, confirms that the tallies were unavailable to officials in Mexico City before the announcement of the results. He also acknowledges the amendment of the tallies as a common practice at the time. This strategy entailed, for example, having poll workers fill the tallies exerting low pressure with their writing instruments so the numbers could be later modified outside the polling stations. Footnote 4
The incentives for aggregation fraud in this election were shaped by an electoral reform in 1987 that shifted the control of the electoral process to the district councils. Footnote 3 On the one hand, the new electoral code recognized for the first time the legal standing of party representatives; expulsion of such representatives from a polling station constituted a reason to nullify the votes of the precinct (Barquín Reference Barquín1987 , 52). This addition to the electoral code addressed one of the most reported irregularities since 1940 (Simpser and Hernández Company Reference Simpser and Hernández Company2014 ), and it strengthened the role of opposition parties to monitor the process, witness the tabulation, and document the electoral outcome of the polling stations. On the other hand, the law entitled district-level authorities to modify the results of any voting precinct in their jurisdiction (Klesner Reference Klesner1997 , 44). In the case that opposition parties objected any amendment during the district vote count, the new code also provided the PRI with the default majority of votes in every district council, outnumbering those from the opposition by 12 to 19 seats (Valdés Zurita and Piekarewicz Reference Valdés Zurita, Piekarewicz and Casanova1990 ). In other words, the electoral reform gave the district councils the opportunity to recount the results with the assent of the official party, which—unlike the case in many polling stations—had the absolute majority for any decision. As Gómez-Tagle ( Reference Gómez-Tagle and Harvey1993 , 87–8) concludes, these conditions suggest that the greatest “adjustments” to the results should occur in the district councils.
Before presenting the evidence of this irregularity for the case study, it is important to understand the institutional context for the opportunities of aggregation fraud in the 1988 election. Beginning at 6 p.m. on Election Day, poll workers counted the ballots and filled the vote tally in the presence of party representatives, who signed and got a carbon copy of the tally sheet. Once the vote count concluded, poll workers delivered the electoral material to one of the country’s 300 district councils, where election officials reported the preliminary results via telephone to the Ministry of Interior in Mexico City (Valdés Zurita and Piekarewicz Reference Valdés Zurita, Piekarewicz and Casanova1990 ). Despite the interruption of the national vote count, district councils continued receiving the tallies that were used 3 days later for the official vote tabulation.
The literature on electoral manipulation provides multiple accounts on how aggregation fraud is accomplished. Caro ( Reference Caro1991 ), for example, offers an astonishing description of how the Democratic political machine in southern Texas altered a tally in Jim Wells County to give Lyndon B. Johnson 200 extra votes and flip the result of the 1948 Senate primary election. In a study of the 2003 presidential election in Nigeria, Beber and Scacco ( Reference Beber and Scacco2012 ) find a similar handwriting style across multiple tally sheets and demonstrate that the last digits in the vote totals significantly deviated from the uniform distribution, a pattern suggesting the alteration of the electoral results. Myagkov, Ordeshook, and Shakin ( Reference Myagkov, Ordeshook and Shakin2009 ) detail the inflation of vote returns in contemporary Russian elections and describe the incentives for local bosses to falsify the tallies under their jurisdiction. Callen and Long ( Reference Callen and Long2015 ) compare the reported results of a random sample of polling stations at several stages of the 2010 parliamentary elections in Afghanistan and find discrepancies in the vote results in 78% of the observations.
While there were multiple irregularities alleged for the 1988 election in Mexico, this paper focuses on identifying the alteration of the vote tallies by officials when the vote totals from polling stations were added up. This irregularity, referred to in other works as aggregation fraud (Callen and Long Reference Callen and Long2015 ), is a prevalent problem in many modern elections and is a top concern of election observers and international election experts. Footnote 2 Aggregation fraud is usually performed by a reduced number of middle-level officials with the expertise to carry out manipulations and who have close links with the candidates (Callen and Long Reference Callen and Long2015 ). In the case of the 1988 election in Mexico, the existence of this irregularity implies that the vote counts of the PRI’s candidate were inflated at the district councils after electoral authorities received the results from the polling stations and before the officials reported the district vote totals to the Ministry of Interior in Mexico City. The occurrence of fraud in the 1988 election brings into view an overlooked hypothesis for how electoral manipulation was carried out in this case.
ANALYSIS
This section introduces a methodology to identify alterations to the vote results reported in the tally sheets. To accomplish this task, I apply CNN, a computer algorithm able to learn visual patterns from previously labeled examples and then classify new unlabeled images (LeCun et al. Reference LeCun, Boser, Denker, Henderson, Howard, Hubbard, Jackel and Touretzky1990). CNN emulate the functioning of the brain’s visual system, which transforms sensory information into conceptual understanding. The architecture of CNN models consists of a set of layers, which are vectors of nonlinear transformation that extract different features from the image. The first layer receives the image input, the intermediate layers compress multiple representations of the original inputs, and the last layer provides a prediction output (Buduma Reference Buduma2017).
For the specific goal of this paper, the proposed method complements recent developments in electoral forensics, which employs statistical tests to identify anomalous patterns in election data (Mebane Reference Mebane2015). The strength of the approach described below is to identify not only the existence of potential irregularities but also the source behind the oddities in the vote results as well as its geographic location. Furthermore, computerized classification increases the reliability of the labels by not depending on factors such as the coder’s focus or commitment to the task (Hoque, el Kaliobly, and Picard Reference Hoque, el Kaliobly, Picard, Ruttkay, Kipp, Nijholt and Vilhjalmsson2009; Grimmer and King Reference Grimmer and King2011). In other words, this approach does away with the potential impatience and inattention of human coders were they to be assigned the tedious exercise of classifying thousands of tallies.
Notwithstanding the CNN’s advantages, it is worth mentioning the limitations of the method. On the one hand, since the model is trained to identify alterations of the vote numbers, it may be vulnerable to misclassify cases with non-intentional errors or benign amendments as altered tallies. I mitigate this concern in three ways. First, when training the model, I intentionally include images of tallies with benign adjustments as examples of non-altered tallies. This strategy allows the model to glean the features that distinguish each type of amendment. Second, the label classification takes a conservative approach to minimize the number of false positive cases in the analysis. Finally, I verify the inferences of the model by testing its accuracy on a different database. I describe in detail each of these approaches below.
On the other hand, the irregularities identified by the CNN are not exhaustive. In other words, it can also be the case that the model overlooks irregularities that did not involve any modification of the numbers originally registered in the vote tallies, such as voters casting multiple votes, vote suppression, or the replacement of the original tally.Footnote 6 This approach, therefore, estimates the lower limit for the irregularities that occurred in the election, and its results may complement alternative approaches for analyzing the data.
I describe below the classification of the vote tallies in four stages. First, I collected, organized, and pre-processed the tally images and their respective vote results. Second, I inspected a subset of images and identified those with potential alterations in their numbers. Third, I used the labeled images to train and fine-tune the CNN model. Finally, I used the trained model to label the rest of the images in the database.
Data Collection This paper presents new data from more than 53,000 polling stations opened on July 6, 1988, whose respective vote tally sheets are stored at the National Archive in Mexico City. The data collection and digitization process produced two databases. The first one contains the images of all the vote tallies from the 1988 election.Footnote 7 With the help of two research assistants, I photographed, digitally edited, and organized by electoral district every vote tally available in the archive. To minimize the noise of the images during the classification stage, I manually cropped every picture to include only the area of the image that contains the vote returns, as the examples in Figure 1 illustrate. The second database includes the vote returns at the polling station level for every candidate. This information was entered by a team of professional data coders and double-supervised by the coding team manager and me. The data-entry process proved impossible for a handful of images with faded writing or inadequate contrast. The total number of observations in the database, thus, is 53,249. As Table A in the Appendix shows, these vote totals are very similar to the official total votes reported at the national and district level. The resemblance validates the information of my database and suggests that any electoral manipulation occurred before officials compiled the results from the vote tallies. Table B in the Appendix provides descriptive statistics of the database.
Data Splitting The image database was divided into three parts: a training set, a validation set, and a test set. The first two sets came from a sample of 1,050 images that were manually labeled as either “with alterations” or “without alterations,” ending up with 525 images for each class. The training set contains 900 of these images, which I use as inputs to fit the model. The remaining 150 images constitute the validation set, which I use to verify the accuracy of the model. Finally, the test set contains almost 52,300 unlabeled images that help me to estimate the overall rate of aggregation fraud. The selection of labeled examples follows two common strategies for an efficient training: class balance and active learning. The first strategy makes sure that all classes in the training set are represented by a similar number of examples (Buda, Maki, and Mazurowski Reference Buda, Maki and Mazurowski2018). Class balance prevents skewing the predictions of the model toward the label with more training instances (Japkowicz and Stepehn Reference Japkowicz and Stepehn2002). This is a recurrent issue in situations where the positive cases represent a minority of all cases, such as the detection of cancerous cells (Wahab, Khan, and Lee Reference Wahab, Khan and Lee2017), locating oil-spills (Kubat, Holte, and Matwin Reference Kubat, Holte and Matwin1998), or identifying fraudulent bank operations (Chan and Stolf Reference Chan and Stolf1998). Therefore, the training set includes the same number of instances for “with alterations” and “without alteration” classes. The second strategy, active learning, consists on selecting the most useful instances of each class to train the model (Settles Reference Settles2009). This approach is suitable when the labeled instances are very difficult, time-consuming, or expensive to obtain. The selection of cases was then based on two criteria: informativeness and representativeness. The former considers how much the instances help the classifier to improve its performance. whereas the latter examines how well the instances represent the overall input patterns of the entire dataset. Informativeness and representativeness are seldom achieved simultaneously, and researchers often need to choose which criteria to prioritize at the cost of the other (Huang, Jin, and Zhou Reference Huang, Jin and Zhou2014). In this case, I focus on the informativeness of the instances for the “with alterations” class by picking those instances of irregularities backed up by primary and secondary sources and that better represent examples of blatant irregularities. In contrast, the selection of cases for the “without alteration” class includes instances of clean tallies that represent the entire database plus the addition of some informative examples containing benign alterations. The selection of instances for the “with alterations” class used information from interviews with the director of the National Electoral Registry in 1988 and two representatives of the PMS during the presidential election, as well as the stenographic record of the debates in the Chamber of Deputies to certify the election (Senado de la República 1988). These information helped me to locate the districts where aggregation fraud had been reported. I then selected those images showing alterations suggested by the primary sources, such as the cross-outs or number insertions illustrated in Figure 1. Therefore, my priority when picking the instances for this class was to choose those more likely to inform the model what type of irregularities were supported by the witness. To address the lack of representativeness of this class, I increase the number of training cases by picking examples from other districts showing similar patterns of manipulation. The examples labeled as “without alterations” are images of tallies with no flagrant modifications in their numbers. To make sure that the model only distinguishes deliberate alterations on the tally, this set also includes two types of exceptional cases. First, I incorporate images of tallies showing benign amendments or accidental errors, such as misplaced numbers or marginal corrections to a candidate’s vote totals. These examples force the model to distinguish among different adjustments on the tally. Second, I also included images where a candidate gets all the votes in the polling station but there are no clear patterns of alterations in their numbers. Section C.4 in the Appendix provides a few examples for each case. I verified the reliability of the labels in two different tests. The first one used crowdsourcing to compare the labels provided by 200 respondents recruited through Amazon’s Mechanical Turk (MTurk) for an online survey fielded in February 2017. The survey asked respondents to identify tallies they perceived as altered from a random sample of 10 images. A second check recruited four students at the University of Houston, who were asked to identify altered tallies from a random sample of 50 images. In both tests, subjects were never informed of the labels I had assigned to each of images. The details of each experiment are available in the Appendix. In both tests, the subjects’ choices show a substantial agreement with the original labeling.Footnote 8
Classifier Training The training stage consists of repeated passes of the training examples throughout the network illustrated in Figure 2.Footnote 9 This stage allows the model to absorb the information from the images and calibrate its inferences for each label. The training process comprises three steps: feature extraction, classification, and model evaluation. Feature Extraction For the computer to analyze the images, it first transforms each picture into a numerical array of size 227 (height) × 227 (width) × 3 (RGB color channels), where every number in the array represents a specific pixel value of the image. The array passes through a first convolutional layer, which contains 32 filters, or neurons. A filter is also a numerical array of size 3 × 3 × 3 and represents a basic visual feature, such as a straight line, an edge, or a curve. Each filter slides across every 3 × 3 pixel area of the image searching for similar shapes to the one it represents. For every slide, the filter multiplies its array with the pixel values of the image area, and its sums up the product in a single number. Larger values represent those regions in the image with similar shapes than those in the filter. After sliding across each region of the picture, the 32 filters produce the same number of representations of the same input image. The resultant representations are then used as inputs for the second convolutional layer, which also contains 32 filters. These filters slide across each representation searching for more complex features, such as the combination of curves or straight lines. The process repeats through four more convolutional layers, each of them gradually looking for higher-level features of the images in larger regions of the pixel space. The outputs from the last convolutional layer are flattened into a unidimensional vector for the “learning” phase. Classification This step feeds the extracted image features into a fully connected neural network, which is used to find out the patterns likely to predict each label. The distinction of features in each category is gleaned through a procedure called backpropagation (Rumelhart, Hinton, and Williams Reference Rumelhart, Hinton and Williams1988), and consists of four steps. First, after the image passes through the entire network, the model estimates the probabilities for the tally to belong to each label. Second, the model compares its prediction with the image’s label and estimates its prediction error given a loss function. Third, to minimize the amount of loss, the image passes back through the network, allowing the model to estimate the error derivatives of each unit, of the change in the loss as it modifies the weight of a hidden unit. Finally, the model updates the weights of the units and repeats the process with the next image in the training set. For the gradual learning to happen, the model visits the images of the entire training set multiple times, or epochs in computer science jargon. After completing every epoch, I check the general accuracy of the model using the images of the validation set. I repeat this process as many epochs as necessary before the estimated loss value in the validation set stops decreasing. The model faces two types of misclassification: labeling as “with alterations” those tallies with no clear patterns of manipulation (Error Type I) or labeling as “without alterations” those tallies with potential altered features (Error Type II). Given the political sensitivity of misclassifying unaltered tallies, I chose to minimize the first error type. In other words, the classifier would label a tally as altered only when its probability of belonging to this category is at least twice its probability of belonging to the non-altered category. This conservative approach thus labels a tally as “without alterations” when its estimated probabilities are too close to call, which minimizes the number of false positives in the model. Model Evaluation I evaluate the predictions of the model using a 20-fold Monte Carlo cross-validation (Johansson and Ringnér Reference Johansson, Ringnér, Dubitzky, Granzow and Berrar2007). Every fold randomly picks 900 labeled images to train the model, and its accuracy is verified using the remaining 150 labeled images. After registering the accuracy of the fold, all images are again randomly assigned to either the training or validity sets, and the model is trained again from scratch. The accuracy is then averaged over folds, the results of which are shown in Table 1. The overall accuracy rate of the CNN model is 89%, and its precision varies across classes; whereas 85% of the tallies with alterations are correctly classified, the accuracy rate for the tallies without alterations is 93%. The differences in the classification are due to the priority of minimizing the number of false positives at the cost of increasing the produced false negatives. I further validate the model inferences using the tallies for the 2015 legislative election in Mexico. While the procedures and technology during the vote counting are very similar to the 1988 election, the differences lie in the impartiality of the process: poll workers were randomly selected, representatives of all parties witnessed the ballot counting at every polling station, and the reasons to open a ballot box in a district council were stipulated in the electoral code. Moreover, the images of all tallies filled at the polling stations were available online 24 hours after the polls closed. There are no concerns about irregularities during the vote count or the integrity of the tallies. Therefore, this test can help us to infer the rate of false positives that the model produces in a clean election.Footnote 10 I used a computer script to download all the pictures and crop the tally area with the vote numbers.Footnote 11 This pre-processing of the images was necessary to make sure the images were as similar as possible to the training cases. The classifier labels the 2015 tallies as “with alterations” only 5% of the time—within the expected measurement error. Many of the misclassified cases correspond to tallies that were slightly misplaced on the website, making the cropped images to include features alien to the training set. Figure C.10 in the Appendix shows a few of these examples.
[END]
---
[1] Url:
https://www.cambridge.org/core/journals/american-political-science-review/article/fingerprints-of-fraud-evidence-from-mexicos-1988-presidential-election/8F3C1BCA4C53FE85EA48E51321E339E9
Published and (C) by Common Dreams
Content appears here under this condition or license: Creative Commons CC BY-NC-ND 3.0..
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/commondreams/