Dr. Ji Ma Develops State-of-the-Art Machine Learning Classifier

Ji Ma Talking

RGK Center assistant professor Ji Ma recently developed a state-of-the-art machine-learning classifier that can automate the coding process for nonprofits, ultimately remapping the U.S. nonprofit sector. The goal of the study was the find the best machine-learning algorithm from an extensive collection of parameters and to increase the accuracy of nonprofit organizational data while decreasing the amount of human labor needed to gather and update this information. This type of machine-learning can help improve researchers’ productivity when working with large data sets related to the nonprofit sector through the National Taxonomy of Exempt Entities (NTEE). The NTEE is a popular classification system used by scholars as a coding schema to operationalize their primary constructs.  

In the article, which is titled “Automated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark,” and accepted by Nonprofit and Voluntary Sector Quarterly, professor Ma explores how to apply the best machine-learning methods within a nonprofit studies context. As is, the NTEE fails to accurately describe nonprofits’ programs, which are often diverse and spread across several social service domains. The data used in the NTEE is pulled from nonprofit organizations’ Forms 990, and only some organizations had completed program descriptions in these forms. Professor Ma also explains that much of the data that is used in the NTEE is outdated, due to the large amount of human labor it would take to maintain accurate records of organizations that are often in flux. Accurate data about the nonprofit sector is needed if we are to better understand the state of the country’s social sector and is especially important during times when accurate data is needed to determine funding for the social sector, such as the recent policy decision to include nonprofits in the PPP loans and other COVID-19 relief efforts. A more accurate data set describing the breadth and depth of the social sector can also help nonprofits better understand their local nonprofit environment and can help facilitate collaborations or avoid overlap in services.  

Through his research, Ma successfully developed a classifier that reliably automates the coding process using NTEE as a schema, an important advancement in research using Big Data to conduct research in the social sector. He achieved 90% overall accuracy for classifying nonprofits into nine broad categories and 88% for classifying them into 25 major groups, as well as successfully remapping the U.S. nonprofit sector to more accurately reflect the number of organizations and the work they do. For example, he found that the current number of “philanthropy, voluntarism, and grantmaking foundations” registered with the IRS is significantly inflated and can’t reflect the actual activities because it is assigned by institutional type but not organizational purposes. These advancements are important for asking or reexamining fundamental questions of nonprofit studies.  

Professor Ma’s research involved collecting and processing data about nonprofits’ mission statements and program descriptions from Forms 990s, 990-EZs, and 990-PFs, then codifying the words used within these texts, ultimately using this data to successfully classify nonprofits into groups using a state-of-the-art machine learning algorithm. By remapping the nonprofit social sector into these more accurate and updated groups, Ma provides a more accurate description and serve as an important instrument for asking or reexamining fundamental questions of nonprofit studies. The key takeaway of Ma’s research is that machine learning algorithms can approximate human coders and substantially improve a researcher’s productivity. Ma explains that “social scientists who want to apply computational methods in their research should be cautiously confident” in machine-learning's ability to accurately process and code this data. He also developed a Python package for classifying texts using NTEE codes that is free for public use and can be modified for use in sectors other than the social sector, and encourages other computational social scientists or even practitioners to adjust the workflow presented in the paper to apply it to other domains of inquiry. Future iterations of this project include a current effort to develop a. multilingual version of the project to support research of global civil society. 

Ma’s research resonates with other data measurement and analytics projects supported by the RGK Center, including courses Ma teaches titled “Data Management and Research Life Cycle” and “Linked Open Data and Computational Social Science Methods” as part of the nonprofit studies portfolio course offerings. In the 2019-2020 school year, Ma also organized a data science speaker series and works with other RGK Center faculty to host the annual Civic Data Hackathon. The importance of strong data management practices in the nonprofit sector aligns with other data-centered work at the Center, including the CONNECT project, which seeks to activate learning from data and build both capacity and interest in greater data, measurement and evaluation capacity within community organizations by matching UT graduate students to local nonprofit organizations with data measurement challenges. Building strong data measurement systems, both at a federal level and within organizations themselves, strengthens the potential for collaboration in the sector, avoids duplication of services, and improves the impact of the sector as a whole.  

Nov. 2, 2020