Bulk Culling

Bulk Culling by Custodian, Dates, Deduplication, DeNISTing, File Types, Domains, etc. Legal search and review should begin with bulk culling, meaning the elimination of files from review. Some of this bulk culling is technical work, not legal, but still must be lawyer supervised, for example, DeNisting. But most bulk culling, even if carried out by technologists, not lawyers, must be specified by lawyers. Further, some bulk culling can only be carried out after data is loaded into a search and review database, such as similarity searches, a kind of deduplication, and tested keyword filtering. For a complete article describing Culling, where it is divided into two stages, with the first stage being the Bulk Culling described on this page as requiring legal judgment, and the second being the CAR stage using predictive coding, Click Here.  This will download a 25 pg. PDF article by Ralph Losey entitled License to Kull. The article explains the Two-Filter method of Culling that is illustrated in the below diagram and introduced on this page.


The first filter shown in the above diagram is the best practice of Bulk Culling. The second filter refers to the best practice of Computer Assisted Review using predictive coding.

  • Bulk Culling by Custodian.The primary way to cull out ESI is by limitation by custodian. This is usually accompanied by attorney ranking of the preliminary evaluation of the importance of the witness and the ESI of which they have custody. In most cases the custodians whose ESI is collected for preservation purposes is ranked into at least two categories: Class A custodians who are key players whose ESI must certainly be reviewed for possible production; and, Class B, who may have relevant discoverable ESI (and thus their ESI was preserved by collection), but the probative value of the ESI is expected to be less, repetitive, and possibly unnecessary. Larger or more complex cases typically have a multitude of classifications, i.e. Classes A though D, with Class A to be reviewed in the first round production, and class D unlikely to ever be reviewed at all, but still preserved by collection and available to be reviewed should that prove necessary.
    • Since as described in the Productions page of EDBP the best practice is to segment productions by Phases, the first phase is typically always be the Class A custodians (assuming no accessibility issues). Review of less important custodians is deferred until after the first production is made. Such phasing not only improves efficiency and reduces costs, but also facilitates the requesting party’s ability to make subsequent discovery requests much more focused because they are knowledge based.
    • Disclosure of Custodians and Rankings is common as part of Cooperative dialogues, and so is allowing the requesting party to have substantial input on selection of the custodian selected for first review and production. It is also a best practice for both requesting and responding parties to reserve rights for further discovery and productions requests, including, without limitation, objections to any future discovery, rights to seek costs-shifting, rights to expand discovery, etc. This is examined further in the Cooperation step.
  • Bulk Culling by Date RangeIn every case attorneys should also consider the date ranges of relevant information. Is there a beginning and end date for relevant evidence? This requires careful legal and factual analysis of the causes of action plead. Often in addition to the outside parameters there are also one or more key time periods within the larger date range. In some cases it may be appropriate to limit the first round of review to these key periods.
  • Bulk Culling by Deduplication. This is a technologists function to carry out, but the decision of the extent of deduplication and order of deduplication is a legal task. It is a best practice in most cases not only to do vertical deduplication by custodian, but also to do full global or horizontal deduplication of all custodians. In addition, the way deduplication is carried out depends upon which custodians are loaded first. For that reason attorneys should specify that the most important ESI custodian be loaded first, and order of ranking be followed in loading. The technical deduplication can be expanded upon by an attorney’s legal judgment by careful use of the similarity search functions. When a group of emails are identified in the database as irrelevant by any other bulk culling technique, the bulk coding of irrelevant documents can often be safely expanded by use of find similar searches.
  • Bulk Culling by File Types. This is again a technologists function to carry out, but the decision of what file types to exclude should be made by an attorney based on their knowledge and evaluation of the facts of the particular case. When in doubt files types are usually allowed into the review platform and then subject to later, more focused culling using the review software.
  • Bulk Culling by Email Domains. This is a way to build a kind of custom spam filter. You select domains, and subdomains, and sometimes even individual addresses in a domain, that cannot be relevant. Examples are newsletters, bulletins, and other notices and reports from outside sources, such as the NY Times, Ebay, or trade periodical, and sometimes also internal sources, such as the IT Department. Some software has built in spam filters that can assist, and most will give you a list of all domains so that you can easily pick the ones that cannot possible have relevant evidence.
  • Bulk Culling by Keyword. This is a generally inaccurate filtering method and should be sparingly for that reason. This kind of culling should only be used with carefully tested and verified keywords and should not be based on conjecture alone. Still, it has its place in modern practice in smaller cases, where it is necessary to reduce data volume up front for budgetary reasons, or in cases of any size, where very large information collections are involved. The NSA’s use of keyword search of metadata of all email and telephony in the world is an example of the later application. It can also be done after the data is loaded into the review platform. This is the safest approach because the keyword effectiveness can be tested.


Again, for greater detail on Bulk Culling see Losey’s 25 page article, License to Kull.  Please let us know if you have any suggestions for defensible bulk culling.

7 thoughts on “Bulk Culling

  1. Pingback: Bottom Line Driven Proportional Review (2013 Updated Version) | e-Discovery Team ®

  2. The benefits of global deduplication are clear in terms of reducing the data set for hosting and review. But, when it is carried out, then later loaded custodians will have incomplete files, right? Any thoughts on reconciling global deduplication when our practice is to generally produce complete responsive families? I ask because when producing for lower ranked (but arguably still important) custodians, we will be producing partial families where the original “master” duplicate existed in a higher ranked and previously loaded custodian. Does that make sense?

  3. Pingback: My Basic Plan for Document Reviews: The “Bottom Line Driven” Approach | e-Discovery Team ®

  4. Pingback: depo.com | Does Your CAR (“Computer Assisted Review”) Have a Full Tank of Gas?

  5. Pingback: PRESUIT: How Corpropate Counsel Could Use “Smart Data” to Predict and Prevent Litigation | e-Discovery Team ®

  6. Pingback: Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal Search – Part One | e-Discovery Team ®

  7. Pingback: ei-Recall | ZEN Document Review

Please leave a comment and help improve EDBP