The Positive Disruption of Big Data Jury Venire Selection

Seminar Paper, 2019

23 Pages, Grade: 1,0

Maike Heideke (Author)


Table of Contents

1 Introduction

2 Key Concepts
2.1 Big Data
2.2 Algorithm
2.3 Positive Disruption

3 Big Data Jury Venire Selection
3.1 Context and Outline of Hypothetical Jury Venire Selection Algorithm
3.2 Social Good – The “Bright Side”
3.3 A “Dark Side” Diagnosis of Algorithmic Big Data Jury Venire Selection
3.3.1 Computational Violations of Privacy
3.3.2 Information Asymmetry and Lack of Information
3.3.3 Social Exclusion and Discrimination
3.4 Prescription for Positive Disruption
3.5 Evaluation and Summary

4 Conclusion



Appendix A

This paper looks at an algorithmically- led decision process that is designed to select an (almost) perfectly demographically representative cross-section for a Jury Venire using Big Data. With the scope of Lepri et al. (2017), this paper identifies that “dark sides” such as privacy violations, informational opacity and discrimination are likely to apply to such a (yet) hypothetical Big Data Jury Venire selection process. Answering the question of how this selection process could be positively disrupted is posed, this paper finds that policies akin to Lepri et al. (2017) would address the majority of the problems identified. Further research will be required to illuminate further potential dark-sides, define more general, positively disruptive policies, as well as to specify policy suggestions.

Keywords: Algorithm – Big Data– Positive Disruption – Jury Selection – Privacy – Transparency – Discrimination

1 Introduction

In recent cases, algorithmically- led decision processes have proven their potential to harm, and spurred discussion regarding ethical use, fairness and legality (compare Future of Privacy Forum 2017: 3). Those problematic cases involved for example criminal risk assessments deciding about parole (Monahan and Skeem 2014: 158), and predictive policing (Executive Office of the President 2016: 20- 21). Already, algorithmically- led decision processes are expanding to credit scoring based on social network data (Wei et al. 2018: 234), and in the future potentially to the selection of the Jury Venire for US courts (Ferguson 2016: 936, 942, 1006).

Focusing on data-driven decision- making around social good provision, Lepri et al. (2017: 13- 17) propose a set of factors that enable a positive disruption of algorithmically-led decision processes. This paper aims to augment Lepri et al.’s (2017) investigation by adding the case of Big Data Jury Venire selection as laid out by Ferguson (2016) to their illuminated cases. Henceforth, this paper will evaluate whether Lepri et al.’s (2017: 13- 17) set of policies would suffice to positively disrupt the Jury Venire Selection Algorithm, and would address the potential harms at hand. The research question posed is

How can Big Data Jury Venire Selection be positively disrupted?

Consequently, this paper starts off by clarifying underlying key concepts, before moving on to introducing and then analysing the (yet) hypothetical Big Data Jury Venire selection algorithm in the mode of Lepri et al. (2017: 4, 8, 9- 13). This includes a placement of the algorithmically- led selection process’ function and type, its key areas of impact and which problem it seeks to address, as well as an assessment of whether or to what extent which “dark sides” such as discrimination prevail. Identifying the applicable “dark sides” is followed by an examination of Lepri et al.’s (2017: 13- 17) presented policies might also positively disrupt the Big Data Jury Venire selection process.

The finding of this paper is that the identified potential harms of a Big Data Jury Venire selection process could be positively disrupted by policies similar to the ones suggested in Lepri et al. (2017: 13- 17). However, problems of design and the need of public explanation of the process will have to be addressed as otherwise those problems will persist in the selection process.

2 Key Concepts

This section will clarify key concepts, including Big Data which hands over to algorithms, followed by positive disruption in the sense of Lepri et al. (2017).

2.1 Big Data

Big Data is a term that requires clarification as it not uniformly employed, even in the literature presented within this paper.

Ferguson (2016: 935) links Big Data to newly available data sets that log intricate details of individual digital action. Further, Ferguson (2016: 959, 961) singles out “Bright” Data as a subset of Big Data which has a more narrow focus on personally- identifiable individuals as well as groups. Ferguson (2016: 963) points out findings that affirm that the personal transactional data can reveal personal preferences, and other specifically private information like health and location. What is more, Ferguson (ibidem) connects Big Data with the technologies necessary to generate and collect it; Big data only has a value through processing technology that read out searchable data trails that highly digitally connected individuals leave behind (Ferguson 2016: 959- 960).

Boyd and Crawford (2012: 663) provide a concept of Big Data that encompasses the former aspects: Big Data as a phenomenon, shaped by technological possibilities, the analytical approach, and mythology.

2.2 Algorithm

Mythology around Big Data – that is the belief that larger sets of data offer greater insights and have an “aura of truth” (Boyd and Crawford 2012: 663) –, combined with an increasing demand for evidence-based decisions (Lepri et al. 2017b: 612), might explain the temptation for decision-makers to employ tools like algorithms that promise to summon upon Big Data.

Alike Big Data, there is no unique definition of what makes an algorithm. However, Kurgalin and Borzunov (2018: 329- 330, see Appendix A) point out that all definitions explicitly or implicitly point to five properties of an algorithm that were first summarised by Markov (as cited in ibidem):

Firstly, an algorithm is a “problem-solving process”. This implies an algorithm is set up in a scenario where there is a defined problem that is sought to be solved. For this task, an algorithm is designed to execute consecutive, separate steps. The rest of the Markov properties could be seen as an outline to good practice in coding; uniquely defined operations, no superfluous commands, as simple and short as possible. However, the property “directedness” shows that the algorithm architect should know what the result will look like.

Algorithms might be employ data to determine some already realised property – for example guilt or innocence in a court scenario – or to extrapolate and make predictions, such as risk assessments. Hybrid forms – algorithms that are both, backward- and forward- looking – are increasingly used as supplementary information in US courts (Hyatt and Bergstrom 2011: 266 ; Monahan and Skeem 2014: 158).1

Ferguson (2016: 959) categorises the output as insights that segment, target and predict.2 More generally, the output generated by an algorithm can be distinguished as classifying, prioritising, associating or filtering (Diakopoulos 2015: 400- 402)3. A typology need not be distinct and exclusive, but as also for this paper’s case (see section 3.1), can consist of multiple of these functions that work hand in hand.

This paper limits its investigation to static algorithms, distinct from machine- learning.

2.3 Positive Disruption

Lepri et al. (2017: 13) see the benefits that Big Data and respective algorithms hold by enabling for example policy design and implementation that would benefit a broader majority of the population. However, they (ibidem) also acknowledge the potential harm that data-driven social good decisions can bring. As a response to this tension, Lepri et al. (ibidem) suggest a positive disruption approach to Big Data-driven decisions. Positively disrupting algorithmically- led decisions means implementing data policies to ensure the beneficial functions that algorithms can provide, whilst keeping harms within bounds.

Their positively disruptive policy prescriptions are human-centric as humans are both the actors as well as the subjects in their considered scenarios. The same applies for this paper’s focus on the Jury Venire selection for (US) courts on civil and criminal trials. Their policy suggestions will be discussed in greater depth in section 3.4 of this paper. Specifically, this paper seeks to evaluate whether for the Big Data Jury Venire Selection their policy catalogue would positively disrupt the selection decision-process in its “dark side” identified in section 3.3 .

3 Big Data Jury Venire Selection

This section provides an outline of the hypothetical Jury Venire selection algorithm in the sense of Ferguson (2016). This Algorithm will be placed into context, as well as into relation to social goods. Thereafter, this section will look at the three “dark sides” that Lepri et al. (2017) identify, and check to what extent they apply for the Jury Venire selection algorithm. This illumination of the potential “dark sides” will be followed by the respective policy prescriptions as brought forward by Lepri et al. (2017). The investigation in this section will close with summing up the results to answer the research question of how a Big Data Jury Venire Selection Algorithm can be positively disrupted.

3.1 Context and Outline of Hypothetical Jury Venire Selection Algorithm

Why is there a Jury in some court cases in the first place? In brief, a jury is commonly seen to reflect a community conscience, which ensures not only legal but moral legitimacy of the verdict (Ferguson 2016: 978). Further, the jury is a control on judiciary power, and helps form civic thinking (ibidem: 977- 978).

So far, the Jury Venire was composed by randomly selecting a pool of individuals from the list of registered voters – sometimes supplemented by lists from driver’s licenses, tax information and similar sources – and informing them by sending out letters to the associated addresses (Ferguson 2016: 943- 944). Those lists are incomplete, and often contain outdated information (ibidem: 956- 957).

This is the step where a Big Data-fed algorithm could step in:

The Algorithm would use for example insurance data for more up-to-date addresses, and then filter out applicants who are not eligible (such as people who were convicted in the past). The selection of this Jury Venire could even allow to control for (debatably) desirable properties of this jury, such as being demographically representative. The following step – choosing the jury from this Jury Venire by eliminating jurors in a so- called Voir Dire – would remain in the hands of lawyers.

The typology of the algorithm with respect to its function in the sense of Diakopoulos (2015: 401- 402; compare section 2.2) can be placed as follows: The first step of this selection algorithm represents a classification algorithm; an algorithm that that categorises, or segments according to the entity’s features. From the resulting classified data set the demographic distribution, and hence the necessary relative representation of each segment can be determined. Thereafter the classified data is filtered in the sense of Diakopoulos (2015: 402); this step could include filtering out not- eligible individuals. From this classified and filtered data jurors are summoned by randomly drawing the respective relative number of jurors from each segment. The result would be an (almost) perfectly demographically representative cross-section as a Jury Venire. The specification of the segments – the demographically relevant characteristics– would need to be set by the architects of the algorithm. This algorithm would be static, and if deemed- as- relevant information had to be inferred from the available data points this algorithm would be backwards- looking (see section 2.2). This hypothetical segmenting selection algorithm, henceforth referred to as Big Data Jury Venire selection process, will be the centre of discussion in sections 3.3 – 3.4 .

3.2 Social Good – The “Bright Side”

A jury – that is, the jurors that were not eliminated from the Jury Venire – can be argued to by itself not constitute a social good. However, a jury has features that can be seen to enable the existing (American) system of justice:

The jury has instrumental value by helping to track the truth through providing diverse standpoints (Ferguson 2016: 976). This effect is furthered by their (supposed) impartiality which helps to select facts as they are converted into evidence (ibidem).

A jury also has a procedural value in its participatory aspect: citizens constituting the jury represents a direct check on judicial power (ibidem: 977- 978). Such citizen participation can also serve for civic education (ibidem: 975, 979- 980). Further, the jury brings the so- called community conscience to court which legitimises the verdict, both on moral as well as on legal grounds (ibidem: 977- 978). Hence, a jury can be seen to enable a social good; legitimate verdicts combined with a check on judicial powers enable the provision of a system of justice.

Pointing towards the core contribution of the Jury Venire selection algorithm, table 2 in Lepri et al. (2017: 8) could be augmented as follows:

Abbildung in dieser Leseprobe nicht enthalten

Table 1: Key Area of a Jury Venire Selection Algorithm, addition to the summary table 2 in Lepri et al. (2017: 4)

The Big Data Jury Venire selection process could improve the smooth functioning of the (American) court system through the following aspects:

The unique contribution of the selection algorithm which utilises Bright Big Data would originate from the efficiency with which it selects the Jury Venire with the desired properties:

The algorithm could be designed to make a soft selection – only eliminating individuals that are not eligible with certainty (for example because of prior convictions [Ferguson 2016: 993]) – and thereafter randomly draw from the entire pool. Such a soft selection algorithm could provide a more (cost) efficient running of the selection.

But as introduced in section 3.1, the algorithm could also be designed to provide an almost perfectly demographically representative Jury Venire; strictly segmenting the population before randomly selecting from each segment. Such a segmenting Big Data Jury Venire selection process would provide a (depending on the set parameters) heterogeneous and, most of all, representative Jury Venire. Such a representative group would provide the benefits listed above, including greater legitimacy and more widely dispersed participation. If heterogeneity increases with representativeness, the trial might benefit in terms of the efficiency of the results as heterogeneous groups tend to be better truth-trackers (Ferguson 2016: 976; see section 3.2). Further, distributive fairness4 – albeit possibly at the expense of procedural fairness4 – as understood by for example Grigic- Hlaca et al. (2018: 1) could be improved.

One demand on the court system is that court administrators must improve the jury yield (Ferguson 2016: 967). The ratio of participants relative to invited individuals continues to be comparatively low (Ferguson 2016: 941, 957)., and sent-out letters alone prove to be undeliverable up to 15% of the time (Mize et al. 2007: 22). More up-to-date addresses within Bright Big Data would improve the jury yield by increasing the number of letters reaching the selected jurors (see table 1).

Secondly, an improved jury yield could be achieved by diverting information from the Big Data Jury Venire process for research. To achieve this, more data on the jurors could be collected and attributed. Combined with respective research, (Bright) Big Data could unveil the underlying reasons, and corresponding incentives could be designed to ensure greater citizen participation (Ferguson 2016: 941, 957). More widespread participation would expand civic education and solidify legitimacy (compare section 3.1- 3.2).

3.3 A “Dark Side” Diagnosis of Algorithmic Big Data Jury Venire Selection

Summing up the prior section, a Jury Venire can be seen to enable the working of the (American) system of justice. However, the introduction of a segmenting selection algorithm could undermine this contribution, and raise concerns about privacy, transparency and discrimination. In the upcoming sections 3.3.1 – 3.3.3, those concerns will be further laid out. Specifically, it will be diagnosed whether the “dark sides” discussed in Lepri et al. (2017: 8-13) apply, and if so, to what extent. This evaluation is a necessary prerequisite for this investigation; before answering the research question of how the Big Data Jury Venire selection can be positively disrupted, clarification is required of whether and where there is a need for positive disruption.

3.3.1 Computational Violations of Privacy

A computational violation of privacy means that inferences about otherwise not disclosed, private information are made, using newly- available behavioural data and computational possibilities (compare Lepri et al. 2017: 9).

Such a computational violation of privacy might occur indirectly as the data would be sourced from external, non- court providers. Further, providing courts with access to (Bright) Big Data makes not only indirect but direct computational violations of privacy a possibility.

So far, courts stuck with “dim” data – that is basic, and often outdated information as described in section 3.1 . This is in line with the jurors’ expectation of being treated as anonymous numbers (Ferguson 2016: 982). Jurors only have to disclose further private information after being summoned in the Voir Dire in order to determine the fit for the case. On the surface, the juror expectations on the treatment of their privacy seems to be met. However, Voir Dire can be lengthy and invasive, as well as wealthier litigants will invest in investigating about the potential jurors beforehand, for example by googling them or driving by their residence (ibidem: 937, 983, 985- 986).

Moving this privacy invasion to a more digital and systematic level would disappoint the jurors’ expectations to anonymity to begin with. Especially when the collection and the kind and source of data is not transparent, Big Data Jury Venire Selection could lead to distrust and a generalised sense of privacy invasion and surveillance (Ferguson 2016: 982- 983, 992).This would in turn decrease the system’s legitimacy and participation rate (Ferguson 2016: 983, 987).

Ferguson (2016: 993) concludes that the issue of privacy invasion lies not within the court utilises the information, but is a problem independent of this situation. However, this paper would like to underline the situation-specific threat that might be experienced by the court system having Bright Big Data access. The potential threat of using the information available in another context, might undermine the check on judicial power that the jury is supposed to execute (Ferguson 2016: 983, 993).


1 Monahan and Skeem (2014: 162) point out that some assessment results are transformed into a single- value score. This raises the question of the magnitude of the impact of such a score on judicial decision making: what if the court’s judgement does not even fall within the confidence interval of the seemingly objective assessment?

2 Ferguson’s (2016: 959) mention of insights points towards an important debate that criticises the over-valuation of insights, which is correlations from (opaque) data over causation.

3 Similarly, Mau (2017) distinguishes between sorting, rating, and ranking algorithms.

4 Grigic- Hlaca et al. (2018: 1) define distributive fairness as fairness regarding the outcome, and procedural fairness as regarding the process or the means.

Excerpt out of 23 pages


The Positive Disruption of Big Data Jury Venire Selection
University of Hamburg
Interdisciplinary Seminar in Politics and Philosophy: Ethics, Politics and Epistemology of Big Data
Catalog Number
ISBN (eBook)
ISBN (Book)
Big Data Jury Court Algorithm Algorithmus Politik Policies, Cross section Representation
Quote paper
Maike Heideke (Author), 2019, The Positive Disruption of Big Data Jury Venire Selection, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: The Positive Disruption of Big Data Jury Venire Selection

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free