Print this page
In this topic
Predictive coding facilitates document review by using a human-reviewed set of training documents to make predictions on the entire population. On a population's Predictive Coding page, administrators can easily access the documents in a population that are most likely to be responsive. After users begin reviewing these documents, predictive coding uses an algorithm called Continuous Active Learning (CAL) to check for newly reviewed documents at regular intervals, and uses those documents' scores to refine the predictive model. The model is then used to reprioritize documents for review. When you are happy with the performance of the model, you can stop the review.
To create and train predictive models to use for coding, use the Predictive Models functionality, available under Analytics on the Case Home page. For more information about the differences between CAL and Predictive Models, see Compare standard predictive coding and Continuous Active Learning.
You must perform several preliminary steps before you can use CAL for predictions. For more information, see Preliminary steps for predictive coding.
The typical workflow to perform predictive coding using CAL includes the following steps:
1.Prepare for predictive coding:
a.Create a binder of documents.
b.Create a population from the binder, and create a random sample from the population.
c.Perform a traditional human review of the random sample. CAL will use this sample to train your predictive model.
2.Continuously train and predict:
a.Configure training.
b.Create and train the model.
CAL automatically trains and predicts the model at a specified time interval. All coded documents are used to train the model.
c.Review the highest ranked documents.
The newly reviewed documents automatically update the model's training.
d.Assess the review progress by measuring the recall to date.
e.Decide whether you are happy with the performance:
□If no, continue reviewing the highest ranked documents, and then reassess the review progress.
□If yes, stop the review.
Use the following procedure to configure training for predictive coding and generate a predictive model.
1.On the Case Home page, under Analysis, click Populations.
2.Click the name of a population.
3.In the navigation pane, click Predictive Coding.
4.On the Predictive Coding page, click Configure training.
5.In the Training field [Pick List] list, select the pick list field that Ringtail will use to compare with human reviewers' marks on training set documents.
6.In the Positive list, select one or more values that the model should consider a "positive" mark made by a human reviewer.
7.In the Negative list, select one or more values that the model should consider a "negative" mark made by a human reviewer.
Note: You cannot designate the same values as both positive and negative.
8.To specify how frequently CAL training occurs, in the Time interval list, select one of the following options:
oHourly
o4 Hours
o12 Hours (default)
oDaily
9.Click Save.
After you configure training, Ringtail creates a new CAL model automatically and trains it using the reviewed documents in the population. The new model scores the entire population as soon as training is completed, giving the highest scores to the documents that are most likely to be relevant. At this point, administrators can identify the documents that are most likely to be relevant, and assign them for review.
CAL checks for newly reviewed documents automatically and uses them to refine the model. It then re-ranks the documents using the scores from the updated model. After initial training is complete, you can choose to disable this feature by clearing the Active (enable Continuous Active Learning) check box on the population's Predictive Coding page. You can manually check for newly reviewed documents and update the model's prediction by clicking Run training at the top of the Predictive Coding page.
After creating and training the model, users can begin reviewing the documents that are most likely to be responsive.
Administrators can identify documents that are most likely to be responsive in the following ways.
Use the following procedure to find documents using the links on the Predictive Coding page.
1.To access the Predictive Coding page for a population, do the following:
a.On the Case Home page, under Analysis, click Populations.
b.Click the name of a population.
c.In the navigation pane, click Predictive Coding.
2.Under the Population heading on the left or the Sample heading on the right, click any of the links in the Positives, Negatives, or Unreviewed columns.
Important: By default, documents appear in descending CAL score order, from most positive to most negative.
Use the following procedure to find documents using advanced search.
1.Access the Search page. For more information, see Perform an advanced search.
2.In the Select a field box, select [RT] CAL - PopulationName_Score, where PopulationName is the name of a population.
3.In the Select a value box, select has a value.
4.Click Search.
The list of documents appears in the List pane.
5.Sort the results in descending order of CAL score, from most positive to most negative. For information about how to add the CAL score as a column in the List pane, see Configure columns in the List pane.
Tip: On the Documents page, administrators can add unreviewed documents to a phase of a workflow, or start reviewing documents in the Map. For more information about these methods, see Create assignments: add and remove documents in a phase and Review documents in the Map pane.
The predictive coding results that appear on the Predictive Coding page, including recall and precision achieved to date, are continually updated based on the reviewed sample. The review may be considered complete when recall is sufficient.
Because the ratio of positives to negatives may decline for one or more batches, you should draw a sample from the remaining unreviewed documents before you consider the review complete. Doing so provides a reasonable estimate of the number of positive documents that have not yet been identified, which facilitates a defensible decision as to whether you can consider the review complete. This scenario may mean that most of the positives that can be found have been found. It may also indicate that the CAL model has found most of a certain type of responsive document. The addition of another random sample may help the model identify unrelated types of responsive documents.
The following table describes the information that is available on the Predictive Coding page.
Element |
Description |
Continually prioritize and train |
The Active (enable Continuous Active Learning) option is selected by default. Continuous Active Learning (CAL) checks for newly reviewed documents at the specified time interval, and uses those scores to refine the prediction. Note: One training job runs at a time. When a job is scheduled to start, if a previous job is already running, the job is postponed until the next training interval. After initial training is complete, you can choose to disable the Continuous Active Learning feature by clearing the check box. |
Last processed |
The date and time that Ringtail collected the most recent review data and used it to refine the model. If the Active (enable Continuous Active Learning) option is selected, Ringtail also lists the date and time of the next review data collection. |
Document score field |
The field that contains the predicted scores for the documents. To set security for the field, click the field name. Click Security in the navigation pane, and then select an option. For more information, see Set security for fields. |
(Graph) |
A visual representation of the distribution of scores for the population. The scale at the bottom of the graph displays the range of predicted document scores, from -1 to +1. |
Training field [Pick List] |
The pick list field and its associated positive and negative values that Ringtail will use to compare with human reviewers' marks on training set documents. |
Population |
For each training field value, human-reviewed documents in the population fall into one of the following categories: ●Positive: The number of documents that the reviewers marked relevant. In other words, responsive or privileged, depending on context. ●Negative: The number of documents that the reviewers marked as not relevant. In other words, nonresponsive or not privileged, depending on context. ●Unreviewed: The number of documents that have not yet been reviewed. Note: To open the documents on the Documents page, click a number. Documents open in descending order of CAL scores. |
Sample |
Select a sample: Select a sample from this box to view the following information about the sample: ●For each training field value, human-reviewed documents in the selected sample fall into one of the following categories: oPositive: The number of documents that received a positive score from the reviewers. oNegative: The number of documents that received a negative score from the reviewers. oUnreviewed: The number of documents that have not yet been reviewed. Note: To open the documents on the Documents page, click a number. Documents open in descending order of CAL scores. ●Confidence level: The percentage probability that the confidence interval contains the true value of the quantity being estimated. For example, predictive coding may state with 95% confidence that the interval between 82% and 92% contains the actual value of achieved recall. Note: Changing this percentage will impact the rest of the data in this area. ●Projected positives in population: The estimated range of true positives in the whole population based on the sample. oRecall to date: An estimate of the percentage of relevant documents found so far by the reviewers, by any means. This estimate is based on the known number of relevant documents found so far, and the estimate from the sample of the total number of relevant documents in the population. oRecall worst case: A worst-case scenario for the Recall to date estimate, taking into account the potential impact of any unreviewed documents in the sample. oPrecision to date: The percentage of documents reviewed that were marked positive. |
The review conflicts feature allows you to review documents with a predicted score that differs significantly from the human reviewer's mark. These documents are called conflict documents. For example, if the predicted score of a document is strongly negative, but the human reviewer marked the document as positive, it would be considered a strong conflict document. By reviewing conflict documents, you can confirm the human reviewers' marks, and then allow the model to retrain itself and improve its predictions.
Use the following procedure to review conflicts within a population.
1.To access the Predictive Coding page for a population, do the following:
a.On the Case Home page, under Analysis, click Populations.
b.Click the name of a population.
c.In the navigation pane, click Predictive Coding.
2.Click Review Conflicts.
3.Do any of the following:
oIn the False negatives below area, adjust the slider to find false negative documents that are below the specified score. The number of documents to be reviewed again appears in parentheses.
oIn the False positives above area, adjust the slider to find false positive documents that are above the specified score. The number of documents to be reviewed again appears in parentheses.
Note: Ringtail evaluates all coded documents in the population for false negatives and false positives.
4.Click OK.
The Documents page opens. You can now review the conflict documents.
Ringtail uses the newly reviewed documents to refine the model the next time that CAL training occurs. For more information, see Create and train the model.