Biometric Data Cleansing – Army SBIR|STTR Program
Artificial Intelligence/Machine Learning, ASA(ALT), Phase I

Biometric Data Cleansing

Release Date: 03/30/2021
Solicitation: 21.4
Open Date: 04/14/2021
Topic Number: A214-019
Application Due Date: 05/18/2021
Duration: 6 months
Close Date: 05/18/2021
Amount Up To: $256K

Topic Objective 

Resolve biometric data issues in the current authoritative biometrics database, the Department of Defense’s Automated Biometric Identification System (DoD ABIS), through the development of a machine learning software application to identify errors and improve data quality, increasing speed and accuracy of responses to match requests. 

Description  

For the purposes of this SBIR topic, biometrics refers to face, finger, iris, palm, latent images resident in the authoritative biometric data base systems. Cleansing refers to the deduplication of identity records, image (image = all/each modality) identification, image to field type association, image rotation analysis, image quality analysis, image spoofing, missing image analysis, and highest quality identification (per biometric). 

DoD ABIS, the authoritative data base, is required to accept all encounter submissions from across the DoD for inclusion within the biometric data set. Much of this input is from old and nearing obsolescence, legacy collection systems that do not limit inputs that are incorrect or limit in any meaningful way based on image capture quality. For this reason, the authoritative data base includes a large number of biometric records with errors, missing data, or low-quality images. This poor quality data uses valuable computing resources as well as results in a higher number of “yellow” matches which require a human examiner to review the request as well as to manually determine if there is or is not a match in the biometric data within the data base and the request. 

This issue will remain unresolved until all the legacy collection systems are displaced from the force. For that reason, a long term (3-5 year time horizon) solution is required to first, cleanse the data, and second, to continue to assess all incoming data, in order to rank and associate a data quality score for all images in the data base. As the old systems are removed from use in the field, data quality through the enforcement of collection quality thresholds which can be set on current COTS/GOTS (commercial/government off-the-shelf) collection systems. 

This will greatly improve system matching results and reduce the number and type of requests that require manual examination by the operations division examination team. This will in return also reduce the examination team’s backlog and greatly increase their ability to respond to the smaller number of inquiries that will require manual adjudication by the human examiner. 

Phase I 

The objective of this Phase is to design a concept for a set of machine learning tools to rate, rank and associate a data quality score for all images in the data base. Required Phase I deliverables include a determination of the feasibility for development of a prototype in Phase II, along with a preliminary design of the prototype which can rate and rank at least one of the several modalities of biometric image. Phase I deliverables include a plan for practical deployment of the proposed software applications including phases for the design, development and testing of a full suite of machine learning software for the initial data cleansing. Define the proposed concept and develop key component technological milestones in the design, development and testing of a continuous “triage” capability that will can rate, rank and categorize biometric data against all biometric modalities. 

Phase II 

The Phase II objective is to realign systems performance based on the output and the metrics associated to the data reclassification conducted in Phase I. Required Phase II deliverables include a functional prototype which can rate and rank at least one of the several modalities of biometric image. Demonstrate the prototype in accordance with the demo success criteria developed in Phase I for a single modality prototype. Required Phase II deliverables will also include expansion of the Phase I prototype design to cover all biometric modalities found within encounter-based records of the DoD ABIS authoritative data base. Phase II deliverables will include a detailed plan for the testing of additional modality cleansing as well as milestone proposal for the complete cleansing of the existing data set, as well as the milestone plan elements of the software applications that will continuously “triage” all new biometric data submitted to the authoritative data base. 

Phase III 

The desired end state is the successful development of a machine learning algorithm that can rate, rank and categorize biometric data against all biometric modalities, which will directly impact the success of two Army Programs of Record (POR). Until collection devices are fielded across the Army and the DoD that can check biometric quality at the point of capture, the biometric data base will continue to be contaminated with poor quality biometric images and other data quality issues that hinder optimal performance of search algorithms for matching of biometrics. 

A resident set of machine learning tools, that assess biometric quality of records as they enter the authoritative repository will provide a higher level of overall data quality and increase the timeliness and accuracy of biometric match requests from all DoD users that access the authoritative biometrics repository (ABIS 1.2/3 Army POR).  

Until the authoritative data base is cleansed of incomplete, incorrect and poor-quality data, human examiners will be required to adjudicate match requests that would otherwise be addressed through the transaction manager and current matching algorithms. System performance, regardless of investments in new hardware, and software are limited in their ability to improve systems metrics without data cleansing as described herein. 

 The proposed solution should incorporate COTS/GOTS to the maximum extend to aid in the speed of development and deployment of a complete software solution that addresses current data in the authoritative data base as well as the ongoing “triage” of data from old Army legacy devices still in use for biometrics capture. 

Submission Information  

To submit full proposal packages, and for more information, visit the DSIP Portal. 

Topic Objective 

Resolve biometric data issues in the current authoritative biometrics database, the Department of Defense’s Automated Biometric Identification System (DoD ABIS), through the development of a machine learning software application to identify errors and improve data quality, increasing speed and accuracy of responses to match requests. 

Description  

For the purposes of this SBIR topic, biometrics refers to face, finger, iris, palm, latent images resident in the authoritative biometric data base systems. Cleansing refers to the deduplication of identity records, image (image = all/each modality) identification, image to field type association, image rotation analysis, image quality analysis, image spoofing, missing image analysis, and highest quality identification (per biometric). 

DoD ABIS, the authoritative data base, is required to accept all encounter submissions from across the DoD for inclusion within the biometric data set. Much of this input is from old and nearing obsolescence, legacy collection systems that do not limit inputs that are incorrect or limit in any meaningful way based on image capture quality. For this reason, the authoritative data base includes a large number of biometric records with errors, missing data, or low-quality images. This poor quality data uses valuable computing resources as well as results in a higher number of “yellow” matches which require a human examiner to review the request as well as to manually determine if there is or is not a match in the biometric data within the data base and the request. 

This issue will remain unresolved until all the legacy collection systems are displaced from the force. For that reason, a long term (3-5 year time horizon) solution is required to first, cleanse the data, and second, to continue to assess all incoming data, in order to rank and associate a data quality score for all images in the data base. As the old systems are removed from use in the field, data quality through the enforcement of collection quality thresholds which can be set on current COTS/GOTS (commercial/government off-the-shelf) collection systems. 

This will greatly improve system matching results and reduce the number and type of requests that require manual examination by the operations division examination team. This will in return also reduce the examination team’s backlog and greatly increase their ability to respond to the smaller number of inquiries that will require manual adjudication by the human examiner. 

Phase I 

The objective of this Phase is to design a concept for a set of machine learning tools to rate, rank and associate a data quality score for all images in the data base. Required Phase I deliverables include a determination of the feasibility for development of a prototype in Phase II, along with a preliminary design of the prototype which can rate and rank at least one of the several modalities of biometric image. Phase I deliverables include a plan for practical deployment of the proposed software applications including phases for the design, development and testing of a full suite of machine learning software for the initial data cleansing. Define the proposed concept and develop key component technological milestones in the design, development and testing of a continuous “triage” capability that will can rate, rank and categorize biometric data against all biometric modalities. 

Phase II 

The Phase II objective is to realign systems performance based on the output and the metrics associated to the data reclassification conducted in Phase I. Required Phase II deliverables include a functional prototype which can rate and rank at least one of the several modalities of biometric image. Demonstrate the prototype in accordance with the demo success criteria developed in Phase I for a single modality prototype. Required Phase II deliverables will also include expansion of the Phase I prototype design to cover all biometric modalities found within encounter-based records of the DoD ABIS authoritative data base. Phase II deliverables will include a detailed plan for the testing of additional modality cleansing as well as milestone proposal for the complete cleansing of the existing data set, as well as the milestone plan elements of the software applications that will continuously “triage” all new biometric data submitted to the authoritative data base. 

Phase III 

The desired end state is the successful development of a machine learning algorithm that can rate, rank and categorize biometric data against all biometric modalities, which will directly impact the success of two Army Programs of Record (POR). Until collection devices are fielded across the Army and the DoD that can check biometric quality at the point of capture, the biometric data base will continue to be contaminated with poor quality biometric images and other data quality issues that hinder optimal performance of search algorithms for matching of biometrics. 

A resident set of machine learning tools, that assess biometric quality of records as they enter the authoritative repository will provide a higher level of overall data quality and increase the timeliness and accuracy of biometric match requests from all DoD users that access the authoritative biometrics repository (ABIS 1.2/3 Army POR).  

Until the authoritative data base is cleansed of incomplete, incorrect and poor-quality data, human examiners will be required to adjudicate match requests that would otherwise be addressed through the transaction manager and current matching algorithms. System performance, regardless of investments in new hardware, and software are limited in their ability to improve systems metrics without data cleansing as described herein. 

 The proposed solution should incorporate COTS/GOTS to the maximum extend to aid in the speed of development and deployment of a complete software solution that addresses current data in the authoritative data base as well as the ongoing “triage” of data from old Army legacy devices still in use for biometrics capture. 

Submission Information  

To submit full proposal packages, and for more information, visit the DSIP Portal. 

Biometric Data Cleansing