HYDERABAD, India / SAN FRANCISCO (Reuters) – Over the past year, a team of as many as 260 contract workers in Hyderabad, India has plowed through millions of Facebook Inc pictures, status updates, and other content posted since then. 2014.  FILE PHOTO: The Facebook logo is reflected in glasses in this picture the illustration taken on April 1, 2019. REUTERS / Akhtar Soomro / Illustration / Photo photo
The workers categorize elements according to five "dimensions", which Facebook calls them.
These include the subject of the post – is it food, for example, or a self-help or an animal? What is the occasion ̵[ads1]1; a daily activity or a major life event? And what is the author's intention – to plan an event, to inspire, to make a joke?
The work aims to understand how the types of things users are putting into their services are changing, Facebook said. It can help the company develop new features, potentially increasing usage and ad networks.
Details of the effort were provided by several employees at outsourcing firm Wipro Ltd over several months. The workers spoke because of anonymity due to fear of retaliation by the Indian firm. Facebook later confirmed many details about the project. Wipro refused to comment and referred all questions to Facebook.
The Wipro work is about 200 content tagging projects that Facebook has at all times, and hires thousands of people globally, Reuters company officials said. Many projects are aimed at "training" the software that determines what is displayed in users' news feeds and requires the artificial intelligence that underlies many other features.
The labeling work has not previously been reported.
"It's an important part of what you need," said Nipun Mathur, director of product management for AI on Facebook. "I don't see the need go away."
The content tagging program can provide new privacy issues for Facebook, according to legal experts consulted by Reuters. The company faces regulatory surveys around the world over a non-related set of alleged privacy abuse involving user data sharing with business partners.
The Wipro workers said they got a window into life as they look at a holiday photo or a post-memorial to a deceased family member. Facebook confirmed that some posts, including screenshots and those with comments, may contain usernames.
The company said legal and privacy teams need to sign up for all labeling work, adding that it recently introduced an audit system "to ensure that privacy expectations are followed and parameters in place work as expected."
But a former Facebook privacy manager, speaking because of anonymity, expressed concern that the user's post was being investigated without their explicit permission. The EU's annual general data protection regulation (GDPR) has strict rules on how businesses collect and use personal information and in many cases require specific consent.
"One of the most important parts of the GDPR is the limitation of purpose," said John Kennedy, a partner at the law firm Wiggin and Dana, who have worked with outsourcing, privacy and AI.
If the intention is to look at posts to improve the precision of services, it should be explicitly stated, Kennedy said. Using an external provider for the work may also require consent, he said.
A Facebook spokeswoman said: "We make it clear in our data policy that we use the information people give to Facebook to improve their experience and that we can work with service providers to help in this process."
U.S. Senator Mark Warner, a democrat and leading critic of social media, told Reuters in a statement that large platforms are increasingly "taking more and more data from users, for wider and wider applications, without corresponding compensation to the user."
Warner said he is drafting legislation that would require Facebook to "disclose the value of users' data and tell users exactly how their data is being recognized."
Human-based content marking, also referred to as "data annotation", is a growth industry that companies seek to exploit data for AI training and other purposes.
Self-powered car companies such as Alphabet Inc's Waymo have labels identifying traffic lights and pedestrians in videos to enhance their AI. Voice Assistant Developers, including Amazon.com Inc, have comments on customer audio to improve AI's ability to voice speech.
Facebook launched the Wipro project in April last year. The Indian firm received a $ 4 million contract and formed a team of about 260 labels, according to the workers. Last year the work consisted of analyzing posts from the previous five years.
After completing it was made in December cut to around 30 and switched to labeling each month's posts from the previous month. The work is expected to last through at least the end of 2019, they said.
Facebook confirmed staffing changes, but refused to comment on financial details.
The company said the analysis was ongoing so that it could not make any conclusions from the labeling or resulting product decisions. It has not told labelers the purpose or results of the project, and the workers said all that they have derived from their limited view is that selfies are becoming increasingly popular.
The Wipro tags and Facebook said the posts are a random selection of text-based status updates, shared links, event records, stories, features, uploads, videos, and images, including user-screened chats on Facebook's various messaging applications. The posts come from Facebook and Instagram users globally, in languages including English, Hindi and Arabic.
Each item goes to two labelers to check accuracy, and one-third if they disagree, Facebook said. Employees said they see an average of 700 items per day. Facebook said the target average is lower.
Facebook-certified labels in Timisoara, Romania and Manila, Philippines are involved in the same project.
Among Facebook's other labeling projects, a Hyderabad employee for outsourcing provider Cognizant Technology Solutions Corp. said he and at least 500 colleagues are looking for sensitive topics or profane languages in Facebook videos.
The goal is to train an automated Facebook tool that allows advertisers to avoid sponsoring videos that are, for example, adult or political, Facebook says. Cognizant did not respond to a request for comment.
Another application of tagging involved the social networking marketplace shopping feature, where automated category rules for new listings first by having labels and product experts categorize some existing listings, said Mathur's Facebook.
Facebook users are not allowed to deviate from their data being tagged.
At Wipro, the posts being investigated are not just public posts, but also those shared privately to a limited set of user friends. It ensures that the sample reflects the activity area on Facebook and Instagram, said Karen Courington, director of product support operations on Facebook.
Facebook's data policy does not specify explicit manual analysis.
"We provide information and content to vendors and service providers that support our business, such as providing technical infrastructure services, analyzing how our products are used, providing customer service, facilitating payments, or conducting surveys," the policy countries.
Europe's GDPR also requires companies to delete user data on request. Facebook said it has technology for routinely synchronizing tagged posts with both deletion requests and changes to privacy settings.
Facebook and other companies are testing techniques to limit the need for outsourced labeling, partly to analyze more data faster and cheaper. For example, AI training data for news matrices and photo descriptions for blind people from hashtags on Instagram posts, Facebook Mathur said.
"We are trying to minimize the amount of things we send out," he said.
Reporting of Munsif Vengattil in Hyderabad and Paresh Dave in San Francisco; Further reporting of Douglas Busvine in Frankfurt; Editing by Patrick Graham, Jonathan Weber and Edwina Gibbs