Print this page
In this topic
You must perform several preliminary steps before you can make predictions. For more information, see Preliminary steps for predictive coding.
When you perform standard predictive coding, you typically iterate the process multiple times to refine the predictive model. After you are happy with the performance of the model, you can then apply codes to the target population.
The typical workflow to perform standard predictive coding includes the following steps:
1.Prepare for predictive coding:
a.Create a binder of documents.
b.Create a population from the binder, and define the comparison sample.
c.Create a binder that includes the population and excludes the comparison sample.
d.Create a seed set.
e.Perform a traditional human review of the seed set and comparison sample.
2.Train the model:
a.Create a new model.
b.Train the model.
c.Score the comparison sample and assess performance.
d.Decide whether you are done training:
□If no, create a new version of the model with additional training documents, and then repeat the previous training steps.
□If yes, start predicting.
3.Predict:
a.Add the scrubbed population and create a validation sample.
b.Score and review the validation sample, and assess performance.
c.Decide whether you are happy with the performance:
□If no, create a new version of the model with additional training documents, and then repeat the previous training and prediction steps.
□If yes, score the population and set the threshold with the validation sample. Then, start coding.
4.Code:
a.Apply codes to the target population.
b.Create a validation report.
Predictive coding involves building a model you can use to predict whether a given document is responsive. The model is trained using human reviewers’ marks on a set of documents and the features of those documents. An algorithm determines which features are likely to indicate responsiveness and which are not. After the model has been trained, you can apply it to the features of unreviewed documents to predict whether they are responsive.
To create a predictive model, you must choose a training set, also referred to as a "seed" set. A training set is a static set of documents that is used to train a predictive model or model version. You can create the training set from a random sample of documents or by manually assembling a binder of documents (for example, a judgmental sample). The training set must be coded by human reviewers before you train the model.
Use one of the following procedures to choose a training set:
●If you do not have documents that have already been human reviewed, you can create a random sample from a population. For more information, see Work with populations and samples.
Note: This method often involves taking a sample of the target population that you want to predict. For more information about creating a target population and sample, see Add a population for prediction.
●If you have documents that have already been human reviewed and marked, create a binder of the marked documents. For more information, see Create a binder.
After the training set has been reviewed, you can create a predictive model.
Use the following procedure to create a predictive model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click Add.
3.In the Name box, type a name for the model.
4.In the Description box, type an optional description of the model.
5.In the Training set list, select the source or the training set, either Sample or Binder. In the list that appears, select the appropriate sample or binder.
6.In the Template list, select Standard or Standard + people.
Note: The template contains a predefined set of parameters that govern the behavior of a predictive model. The Standard + people template gives more weight to individuals associated with a document (for example, the To and From fields and addresses found in email messages). Selecting this template means that documents that have people in common are considered more similar for the purposes of the training set, even if the documents do not share many concepts
7.In the Training field [Pick List] list, select a pick list field that the model will reference when observing the human reviewers' marks on the documents in the training set.
8.In the Positive list, select one or more values that the model should consider a positive mark made by a human reviewer. For example, you can configure the values responsive or privileged as positive marks.
9.In the Negative list, select one or more values that the model should consider a negative mark made by a human reviewer. For example, you can configure the values nonresponsive or not privileged as negative marks.
Note: You cannot designate the same values as both negative and positive.
10.If your training set is not yet human reviewed, clear the Train model on save check box. If you select this option, the application automatically starts the training process when you click Save. If your training set is not yet human reviewed, clear this check box.
11.By default, the predictive model discards a portion of the documents with negative scores, so that an approximately equal number of positives and negatives are used in training. If you want to retain all documents, clear the Enable balancing check box. When selected, this option discards a portion of the documents with negative scores, so that an approximately equal number of positives and negatives are used in training.
12.Click Save.
Note: To determine the human reviewers' progress, you can see how many documents have been marked negative or positive on the model’s Properties page. For more information, see View predictive models.
13.Submit the training set for human review, if it has not already been completely reviewed.
You can train a model at any time, but we recommend that you wait until the human reviewers have completed most, if not all, of their review of the training set. The table at the bottom of the Properties page provides information about the number of negative, positive, and unreviewed documents in the training set at any given time.
Use the following procedure to train a model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.You can train the model in one of the following ways:
oIn the row for the model, click the Train button.
oClick the name of the model. In the navigation pane, click Properties, and then click Train.
Note: You cannot retrain a model after you add a population to the model for prediction.
3.To refresh the training status, on the Predictive Models page, click the Refresh button at the bottom of the page.
To create a prediction, select a population to predict.
Use the following procedure to add a population for prediction.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click Add.
4.In the Source list, select a source for the population, either Binder or Saved Search.
5.In the list that appears, select the name of a binder or saved search.
6.To allow the initial training job to run faster, check Extract text to prepare population for training.
7.In some situations, you may want to include all documents in the population, regardless of whether they have been used as part of the training set. You may also want to test the model against a specific sample. To add an existing population, click Advanced Options in the Add document population dialog box, select the Include training documents check box, and then click Save.
8.Click Save.
The new population appears in the Predictions list with the word scrubbed added to the population name.
After you add a population to the Predictions page, you can start the scoring process for any sample that is associated with that population.
Use the following procedure to score a sample.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the Score button in the row for the sample that you want to score.
After scoring is complete, the sample displays a check mark in the Scored column on the Predictions page.
After a sample has been both scored and human reviewed, you can evaluate the predicted scores of a sample using the Scores graph and the Projections table.
When the human review of the sample is complete, you can evaluate the quality of the model’s prediction using the Scores graph.
Note: If you change the display options in the Scores graph, the changes are retained per user across all predictive coding graphs.
Use the following procedure to view and interpret the data on the Scores graph.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the name of a sample.
4.To select the data to display, click Display Options, and then select any of the following options:
oPrecision: An orange line appears on the graph to represent precision.
oRecall: An purple line appears on the graph to represent recall.
oAccuracy: An green line appears on the graph to represent accuracy.
oDocument count: Text boxes appear on the graph containing information about projected positive and negative document counts.
5.Click OK.
An example of the Scores graph is shown in the following figure.
The Scores graph can contain the following information:
oThe gray area at the bottom of the graph displays the distribution of projected scores across the population's documents. Scores range from -1 to +1. Scores near -1 or +1 are stronger predictions. Scores near 0 are weaker, less certain predictions. The height of the gray area at any given point indicates how many documents received that score.
oThe Threshold slider bar at the top of the graph indicates the user-defined threshold that separates positive documents from negative documents. Documents with a score that is greater than or equal to the threshold score are considered positive. Documents with a score that is less than the threshold score are considered negative.
oThe Documents not reviewed number at the top of the graph indicates the number of documents in the sample that have not been marked as positive or negative by human reviewers.
oThe purple line represents recall.
oThe orange line represents precision.
oThe green line represents accuracy.
oThe Negative document counts list on the left side of the user-defined threshold includes the total number of documents that are projected to be negative, the projected number of false negative documents, and the projected number of true negative documents.
oThe Positive document counts list on the right side of the user-defined threshold includes the total number of documents that are projected to be positive, the projected number of true positive documents, and the projected number of false positive documents.
You can analyze projected document counts and percentages for the target population at different levels of recall using the Projections table. The data in the Projections table is based on how the model's scores compared to the human review of the sample.
Use the following procedure to view and interpret the data in the Projections table.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the name of a validation sample.
4.On the Scores page, click Projections.
The Projections table contains the following information.
oRecall: Levels of recall for the population, starting at 65%. Recall is the percentage of documents that reviewers marked as positive that also received predicted positive marks from the model. The higher the recall, the lower the proportion of positive documents that the model’s prediction missed.
oProjected
true positives : The number of documents in the
population that are projected to be true positives.
oProjected
true positives plus false positives :
The total number of documents in the population that are projected to
be positive. That is, the sum of true positives plus false positives.
oProjected positives in population: The percentage of documents in the population that are projected to be positive.
oPrecision: The percentage of documents with positive predicted marks that actually received positive marks from the human reviewers.
oAccuracy: The percentage of documents in the population for which the predicted score is expected to agree with the human reviewer’s marks. This value is the sum of true negatives plus true positives.
oThreshold: The user-defined threshold that separates positive documents from negative documents. Positive documents have a score that is greater than or equal to the threshold score. Negative documents have a score that is less than the threshold score.
oAdditional true positives vs. 65% recall: The number of additional true positive documents that would be identified for each level of recall, compared to a baseline recall of 65%.
oAdditional false positives vs. 65% recall: The number of additional false positive documents that would be identified for each level of recall, compared to a baseline recall of 65%.
oConfidence level: The probability that the actual value of recall falls within a desired range of values. In other words, the chance that a value that you predict to happen, actually happens. For example, a 95% confidence level means that if you drew 100 independent, random samples from a population, and then calculated the expected range of recall for each sample, the expected range of recall would contain the true value of recall in about 95 of 100 times.
Note: Changing the confidence level affects the rest of the data in the table.
Sometimes, the predicted score of a document disagrees with the human reviewer's mark. A document with a discrepancy between the predicted score and the human reviewer's mark is called a conflict document. A strong conflict document has a score that differs greatly from the human reviewer's mark. For example, if the score of a document is strongly negative, but the human reviewer marked the document as positive, the document is considered a strong conflict document.
Conflict documents include documents that are false negatives and false positives. A false negative is a document that the model predicted with a negative code, but that the human reviewer marked with a positive code. A false positive is a document that the model predicted with a positive code, but that the human reviewer marked with a negative code.
By reviewing conflict documents, you can confirm the human reviewers' marks, and then retrain the model and improve its predictions.
Note: Retraining a model deletes all of that model's previous scores.
Use the following procedure to review conflicts within a sample set.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the name of the sample in the Population column.
4.On the Scores page, click Review Conflicts.
5.In the False negatives below area, adjust the slider to find false negative documents that are below the specified score. The number of documents to be reviewed again appears in parentheses.
6.In the False positives above area, adjust the slider find false positive documents that are above the specified score. The number of documents to be reviewed again appears in parentheses.
7.Click OK.
The Documents page opens. You can now review the conflict documents.
8.If any of the human marks are changed as a result of the review, refresh the Scores page to update the data that appears in the graph.
You can take the following actions to refine a predictive model:
●Review conflicts. For more information, see Review conflicts: find false positives and false negatives.
●To add more training documents to the model, you can do either of the following.
oCreate a new version of a model based on the existing one.
The review conflicts feature allows you to review documents with a predicted score that differs significantly from the human reviewer's mark. These documents are called conflict documents. For example, if the predicted score of a document is strongly negative, but the human reviewer marked the document as positive, it is considered a strong conflict document.
Conflict documents include documents that are false negatives and false positives. A false negative is a document that the model predicted with a negative code, but that the human reviewer marked with a positive code. A false positive is a document that the model predicted with a positive code, but that the human reviewer marked with a negative code.
By reviewing conflict documents, you can confirm the human reviewers' marks, and then retrain the model to improve its predictions.
Note: Retraining a model deletes all of that model's previous scores.
Use the following procedure to review conflicts within a training set.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.In the navigation pane, click Properties.
4.Click Review Conflicts.
5.In the False negatives below area, adjust the slider to find false negative documents that are below the specified score. The number of documents to be reviewed again appears in parentheses.
6.In the False positives above area, adjust the slider to find false positive documents that are above the specified score. The number of documents to be reviewed again appears in parentheses.
7.Click OK.
The Documents page opens. You can now review the conflict documents.
8.After the review of conflict documents is complete, retrain the model. For more information, see Train a model.
After a sample has been scored, you can create a new version of a model by adding documents to the training set. Adding documents gives the model more human reviewed examples from which to learn.
Note: You can only create a new version of a model from a previous version that has been trained. For information about training a model, see Train a model.
Use the following procedure to add a new version of a model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the name of a sample.
4.On the Scores page, click New Version.
The Name box in the New version dialog box is pre-populated with the name of the original model and the new version number.
5.In the Description box, type an optional description for the version.
6.In the Training document source list, select a source for the training documents that you want to add to the new version. The choices are Binder, Population, or Sample.
7.Select a source from the second list.
8.By default, active learning is enabled. Active learning uses the current version of the model to determine which documents can potentially lead to the greatest improvement in the model's performance. Clear the check box to disable active learning.
9.In the Size box, type the number of documents that you want to add to the training set.
Note: If active learning is enabled, fewer documents may be added to the training set than the number that you specify, because active learning only selects documents that could significantly improve the model.
10.Click Save.
You can train the new version at any time, but we recommend that you wait until the human reviewers have completed most, if not all, of their review of the documents that were added to the training set. The table at the bottom of the Properties page provides information about the number of negative, positive, and unreviewed documents in the training set at any given time. For more information about the Properties page, see View and edit predictive models.
Use the following procedure to train a model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.You can train the version in one of the following two ways:
oIn the row for the model version, click the Train button.
oClick the plus sign next to a model to see the model versions, and then click the link in the Version column of the model version. In the navigation pane, click Properties, and then click Train.
A message appears stating that the training request was submitted successfully.
Note: You cannot retrain a version after you add a population to the model for prediction.
3.To refresh the training status, on the Predictive Models page, click the Refresh button at the bottom of the page.
All available models and their associated versions are displayed on the Predictive Models page. The most recent version appears in collapsed form in the list. Click the plus sign next to a model row to view all versions of the model.
The following table describes the information available for model versions.
Note: Changing the recall percentage will impact the rest of the data in this area.
Column |
Description |
Status icons |
Hover over the icon to view information about the status. |
Version |
The version number of the model. |
Training set |
The total number of documents used to train the model version. |
Yield |
The ratio of eliminated false positive documents to additional training documents that were added in different versions of a predictive model. |
Precision |
The percentage of the predicted positives in the comparison sample that are true positives, which means that the human reviewers also marked them as positive. The higher this percentage, the lower the proportion of documents that are incorrectly identified as positive. |
Projected Positives |
The number of documents that are projected to be positive in the parent population of the comparison sample at the selected recall percentage. |
Confidence level |
The probability that the actual value of recall falls within a desired range of values. In other words, the chance that a value that you predict to happen, actually happens. For example, a 95% confidence level means that if you drew 100 independent, random samples from a population, and then calculated the expected range of recall for each sample, the expected range of recall would contain the true value of recall in about 95 of 100 times. |
You can use a comparison sample as a standard measure to compare the rate of improvement between model versions. A comparison sample is a human-reviewed sample that is created at the beginning of the predictive coding process and that is used to evaluate the model as the model is iteratively improved. The comparison can help you decide when to stop refining the model and begin using the model for predictions. Comparisons are also made at a particular recall percentage, which you can adjust.
For each model version that you train and score against the comparison sample, you can see how many additional false positive documents have been eliminated by the training documents that you added to that version's training set. The ratio of eliminated false positive documents to additional training documents that were added is called the version's yield. Yield is displayed as both a numerical ratio and a horizontal bar, as shown in the following figure.
For example, in Version 1 of a model, assume that the parent population of the comparison sample is estimated to contain 12,000 positive documents, of which 3,000 documents are likely to be false positives. You create and train a new version of the model, Version 2, with an additional 100 training documents. After Version 2 is trained and scored, Version 2 projects 10,000 positive documents, of which 2,500 are false positives. In this example, by adding an additional 100 documents to the version's training set, you allow Version 2 to eliminate 500 false positive documents. Version 2's yield is 5:1, because for every document that you added to the training set, the application eliminated five additional false positive documents from the predicted population.
Use the following procedure to compare model versions.
1.On the Case Home page, under Analysis, click Predictive Models.
2.On the Predictive Models page, click the plus sign next to a model.
3.Click Select a sample to compare versions, as shown in the following figure.
4.In the Sample list, select the sample to use to compare the versions.
Note: The application will compare all model versions against the comparison sample that you choose. This sample can be any sample that you think would be beneficial to use as a comparison.
5.Leave the Score all versions of the model against this sample check box selected to score all of the model versions against the selected sample.
Note: You cannot view comparison data for versions that have not been scored against the comparison sample.
6.Click OK.
7.At the bottom of the versions list, locate the Versions compared against [selected sample] with [recall list] phrase and select a recall percentage.
Note: Select a recall percentage ranging from 65-95%. You can observe different performance results at different recall levels.
A notification appears in the versions list if the versions have not yet been trained or scored, as shown in the following figure.
Any version that has been trained and scored against the comparison sample displays a yield bar and numerical ratio in the Yield column, as shown in the following figure.
Note: When the yield drops below 1:1, the bar turns red to indicate that it is no longer economical to create more versions.
Note: You cannot revert to an earlier version of a model if a later version has applied codes.
Use the following procedure to revert to an earlier version of a model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.On the Predictive Models page, click the plus sign next to a model and then select the earlier version from the list.
3.Click Properties.
4.Click Revert to this version.
Caution: Reverting to an earlier version of a model deletes all subsequent versions of the model, as well as their training sets and document scores.
5.Click OK.
You can create a separate new model based on an existing model or version. This option allows you to build on documents that were already reviewed as part of your original model version by adding documents from a different binder or population to the existing training set.
Use the following procedure to add a new model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, click the name of a sample.
4.On a prediction's Scores page, click New Model.
5.In the New model dialog box, in the Name box, enter a name for the model.
6.In the Description box, optionally type a description for the version.
7.In the Training document source list, select a source for additional training documents: Binder, Population, or Sample.
8.Select a source from the list.
9.The Use active learning check box is selected by default. Leave it selected to help improve the training set. Active learning adds documents to the training set for human review that could significantly improve the model.
10.In the Size box, type the number of documents you want to add to the training set.
Note: If Use active learning is selected, the number of documents added to the training set may be fewer than requested because active learning will only select documents that could significantly improve the model.
11.Click Save.
You can train the new model at any time, but we recommend that you wait until the human reviewers have completed most, if not all, of their review of the documents in the training set. The table at the bottom of the Properties page provides information about the number of negative, positive, and unreviewed documents in the training set at any given time.
Use the following procedure to train a new model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.You can train the model in one of the following ways:
oIn the row for the model, click the Train button.
oClick the name of the model. In the navigation pane, click Properties, and then click Train.
A message appears stating that the training request was submitted successfully.
Note: You cannot retrain a model after you add a population to the model for prediction.
3.To refresh the training status displayed on the Predictive Models page, click the Refresh button at the bottom of the page.
A validation sample is created and reviewed at the end of the predictive coding process to make a defensible evaluation of the performance of the model against the population.
After a validation sample has been scored and human reviewed, you can evaluate the predicted scores using the Scores graph and the Projections table. For more information about interpreting scores, see Evaluate the scores.
Note: If the validation sample has not achieved the desired performance, you can use the Review Conflicts feature to review documents for which the model's score and the human reviewer's mark disagree. You can then retrain the model. For more information, see Review conflicts: find false positives and false negatives.
After a validation sample has been scored and interpreted, and the case lead has identified a threshold that results in suitable recall and precision percentages, you can use the model to score an entire population. If you have not already added a population to your model for prediction, see Add a population for prediction.
Note: This process may take a considerable amount of time to complete.
You can score a population in the following ways:
●On the Predictions page, click the Score button in the row for the population that you want to score.
●On the Predictions page, select the check boxes next to the populations that you want to score, and then click Score.
Note: The application also scores all samples of a population when it scores the population.
Click the name of the population on the Predictions page to check the progress of the scoring process on the Properties page. When scoring is complete, a check mark appears in the population's Scored column on the Predictions page.
When the scoring process is complete, you can evaluate the scores on the Scores page.
You can evaluate the scores of a population in one of two ways:
●On the Predictions page, click the name of the population.
●On the Predictions page, click the View scores button in the row for the population.
After the population has been scored, you can overlay a human-reviewed sample on the Scores graph to evaluate how well the model is performing.
Use the following procedure to overlay a validation sample on the target population in the Scores graph.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, select the target population.
4.On the Scores page, click Display Options.
Note: Changes to the display options are retrained per user. Click Display Options to select an overlay sample and display or hide elements on the Scores graph.
5.Select a sample from the Overlay sample list and make any additional changes to the displayed elements. Click OK.
The Scores graph refreshes with the new display options.
oThe gray area at the bottom of the graph displays the distribution of scores across the population's documents. Scores range from -1 to +1. Scores near -1 or +1 are stronger predictions. Scores near 0 are weaker, less certain predictions. The height of the gray area at any given point indicates how many documents received that score.
oThe Threshold slider bar at the top of the graph represents the threshold. Documents with a score that is greater than or equal to the threshold score are considered positive. Documents with a score that is less than the threshold score are considered negative.
oAfter a validation sample has been selected, the following information is available on the graph.
□The information at the top left of the graph contains the name of the validation sample that was overlaid, as well as the percentage of the sample documents that were human reviewed with a positive or negative mark.
□The purple line represents recall.
□The orange line represents precision.
□The green line represents accuracy.
□The Negative document counts list on the left side of the user-defined threshold includes the total number of documents that are considered to be negative, the number of false negative documents, and the number of true negative documents.
□The Positive document counts list on the right side of the user-defined threshold includes the total number of documents that are considered to be positive, the number of true positive documents, and the number of false positive documents.
After a population has been scored and its scores evaluated, you can apply predicted codes to the target population. The applied code contains a Yes/No value for each document, based on where the threshold is set. Any documents with a score greater than or equal to the threshold are coded as positive.
Use the following procedure to apply codes to a target population.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Scores graph, move the Threshold slider bar to achieve the desired recall and precision percentages. To access the Scores graph, see Evaluating a sample's scores.
4.Click Apply.
5.In the Name box, type a name for the applied field to be coded. Or, select an existing applied code field from the list.
6.Click Save.
The field name appears in the Applied codes list on the prediction's Properties page.
Note: You can create multiple applied code fields, each with different threshold settings.
A validation report allows you to document and defend the predictive coding process. The validation report records the final results of a prediction and its applied code. You can create a printable validation report for each applied code field that is created.
Use the following procedure to create a validation report.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.On the Predictions page, select the name of a population.
4.In the navigation pane, click Report.
In the Applied code list, select the name of the applied code. For the selected applied code, the report displays the following information:
oThreshold: Positive documents have a score that is greater than or equal to the indicated threshold score. Negative documents have a score that is less than the threshold score.
oConfidence level: Three industry-standard levels of confidence: 90%, 95%, and 99%. Select a lower confidence level to estimate a tighter range of recall, or select a higher confidence level for a wider range.
oTotal: The total number of documents in the population.
oNegative: The number of documents in the population that were not coded with the selected applied code.
oPositive: The number of documents in the population that were coded with the selected applied code.
oTrue positives (TP): The number of documents that the model scored as positive and that the human reviewer marked as positive.
oTrue negatives (TN): The number of documents that the model scored as negative and that the human reviewer marked as negative.
oFalse negatives (FN): The number of documents that the model scored as negative, but that the human reviewer marked as positive.
oFalse positives (FP): The number of documents that the model scored as positive, but that the human reviewer marked as negative.
oNumber of documents in the population: The total number of documents in the population and the number of unreviewed documents in the population. Click the link to open the unreviewed documents on the Documents page.
Note: Unreviewed documents may affect the reported recall and precision.
oRecall: The range of recall for the population.
oPrecision: The range of precision at the indicated levels of recall.
5.To print the report, click Print and then follow the instructions.
You can view and edit models and model properties on the Predictive Models page.
You can enable group leaders to access to the Predictive Models page. For more information, see Grant administrative access.
Use the following procedure to view predictive models.
1.On the Case Home page, under Analysis, click Predictive Models.
Note: The Predictive Models link appears in the list if the Predictive Models feature and the Populations and Samples feature are enabled. For more information, see Administrative access.
All available models appear on the Predictive Models page in reverse chronological order, that is, most recent first. The following table describes the information on the Predictive Models page.
Column |
Description |
Expand button |
Indicates that the model has at least one additional version. |
Status icons |
Hover over the icon to view information about the status. |
Name |
The name of the predictive model. |
Template |
A predefined set of parameters that governs the behavior of a predictive model. Options include Standard and Standard + people. Standard + people gives more weight to individuals associated with a document (for example, the To and From fields and addresses found in email messages). |
Balanced |
A check mark in this column indicates that the application discards a portion of the documents with negative scores, so that an approximately equal number of positives and negatives are used in training. |
Training set |
A set of documents used as the training set for the model. A training set can be built from a sample or a binder. |
Last modified |
The date and time that a user last made changes to the model. |
2.To view the properties of a model, and to view any predictions associated with the model, click the name of a model. In the navigation pane, click Properties.
The following table describes the information on the model's Properties page.
Column |
Description |
Name |
The name of the model. You can edit the model name. When you edit the name of a model version, the version number that appears next to the Name box is appended to the model name. |
Description |
An optional description of the model. |
Template |
The predefined set of parameters that govern the behavior of the model. |
Enable balancing |
If Yes, the application disregards a portion of the documents with negative scores, so that an approximately equal number of positives and negatives are used in training. |
Training set |
The training set used to create the model. Note: Click the Training set link to open the documents in the training set on the Documents page. |
Last trained |
The date and time that the model was last trained. |
Training field [Pick List] |
The pick list field used by the model to determine positive and negative marks made by human reviewers. |
Training set codes |
This table shows the current results from the human review of the documents in the training set. The Training documents row contains the number of documents in the training set that were marked with a positive code by the human reviewers, the number of documents that were marked with a negative code by human reviewers, the number of documents that have not been reviewed with either a positive or negative code, the total number of documents in the training set, and the percentage of training set documents marked with either a positive or negative code. The rows that follow display the values in the training fields pick list. A check mark appears in the appropriate column to indicate whether the value is defined in the model as positive, negative, or unreviewed. |
You can change the name of a model or edit its description on the model's Properties page.
Use the following procedure to rename a model.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.In the navigation pane, click Properties.
4.Change the text in the Name box, and then click Save.
Use the following procedure to change a model's description.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
3.In the navigation pane, click Properties.
4.Change the text in the Description box, and then click Save.
Use the following procedure to delete a model.
Caution: Deleting a model also deletes all of its versions.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Select the check box next to the models to delete.
3.Click Delete, and then click OK.
Use the following procedure to view a model's predictions.
1.On the Case Home page, under Analysis, click Predictive Models.
2.Click the name of a model.
On the Predictions page, all populations that were added to the model as predictions appear, along with their associated samples.
The following table describes the information on the Predictions page.
Note: The order of the columns from left to right reflects the workflow that you follow to predict a population.
Column |
Description |
Status icons |
Hover over the icon to view information about the status. |
Population |
All populations and samples that have been added to the model for prediction. ●Populations and samples in blue text have been scored. ●Populations and samples in gray text have not been scored. ●Population and sample names that end with the word scrubbed indicate that predictive coding has removed this model's training set documents from the population. |
Reviewed |
The percentage of a sample’s documents that human reviewers have marked with a positive or negative code. |
Scored |
A check mark in this column indicates that the sample or population has been scored in a prediction. |
Coded |
After a population has been coded with one or more applied codes, these columns contain the following information about the population and its associated samples: ●Field: The name of the applied code. ●Recall: Achieved recall for the selected applied code in a validation sample. ●Precision: Achieved precision for the selected applied code in a validation sample. Note: Unreviewed documents in a sample may impact the sample's reported recall and precision. |
Predictions appear as populations on the Predictions page.
Use the following procedure to delete a population from the Predictions page.
1.To open the Predictions page, do the following:
a.On the Case Home page, under Analysis, click Predictive Models.
b.Click the name of a model.
2.On the Predictions page, select the check box next to the populations to delete.
Note: The population will not be deleted from the case.
3.Click Delete, and then click OK.