Truth Without Trust in Federated Learning

Created

Jun 26, 2023 1:52 PM

Tags

TL;DR–

As our lives become increasingly intertwined with Big Tech corporations, concerns about their access to our sensitive data grow tremendously. Federated learning, a technique allowing machine learning models to be trained on various computers, presents an opportunity for these corporations to exploit this data.

However, the concept of trustlessness, made possible by blockchain technology utilized by FLock, could provide a solution by removing the human component from decision-making. In this article, we explore the potential of trustlessness as a means to reduce corruption and limit the power of Big Tech corporations, while also improving the safety and dependability of AI-assisted technologies, whilst also considering the implications of this.

This current day and age is characterized by the unprecedented influence of Big Tech corporations. These organizations have a major influence on our lives, and have a direct impact on how we function as a society. They are involved in our work, our leisure, and even our healthcare. And at this point in the 21st century, we consider them a fact of life.

However, as these companies have gained further power and significance in our world, numerous questions and concerns have been raised about them. In particular, people have growing worries over their expansive access to our personal and sensitive data. This issue is especially pressing in the context of federated learning, which is a technique used to train certain machine learning and AI models. In a nutshell, federated learning works by allowing models to be trained locally on various computers, with updates to the model being shared to a main server.

But with federated learning comes a big concern– machine learning models using this are often working alongside sensitive user data, and while that data is not always needed to train the models, usage of these methods can make it easier for Big Tech corporations to gain access, regardless. And once they have that data, they can use it in any number of ways. That data could be used to train other machine learning models, or simply even sold and distributed to third parties. Federated learning is often viewed as being a great way of keeping user data safe because the training occurs on local devices, but this level of safety is heavily predicated on the corporations who control and deploy the model not exploiting this. It also relies on there being no malicious actors that try to leak data via a model inversion attack. There is where somebody gains access to a machine-learning model and queries it about specific user data or certain metrics. This could additionally happen if a company creates a backdoor that covertly allows somebody else to perform this attack.

And this is just the tip of the iceberg when it comes to malicious behaviors. In an interview with Rui Sun, Product Manager at FLock and Ph.D Engineering student at Newcastle University, he explained that when trying to steal user data, “there are two potential roles that a malicious company could play: aggregator or client”.

Rui Sun — Product Manager at FLock and Ph.D Engineering student at Newcastle University

When acting as an aggregator, they can “disclose the model information and its prediction results, initialize a fake training image with a fake label (known as a poisoning attack), feed the data into a model to get delta w1 client data, which is then used to get delta w2 user data. The aggregator then works to minimize the distance between these two types of data, as this will mean that client data from the user-side will be more accurate”, and therefore sensitive.

The situation is different if the bad actor is a client. Rui Sun noted that “as far as I know malicious clients work extremely hard to steal data from other clients by using the global model. But they can negatively impact global model performance by using data-poisoning attacks such as label-flipping, poisoning samples and out-of-distribution (samples from outside the input distribution), or model-poisoning attacks such as random weights, optimization methods and information leakage (the objective is not to compromise the global model, but the communication among the attackers through a secure protocol).”

With giants such as Google, Facebook/Meta, Amazon, and Microsoft using this federated learning, it is understandable why there is cause for concern. These companies all have question marks next to their names when it comes to data privacy and management, contributing to a breach of trust between them and the public. However, with federated learning being so useful for training machine learning and AI models, we need to look into ways of resolving this issue.

What we need is to find a way to use federated learning without having to deal with the trust issues that Big Tech has caused. Repairing the foundation of trust between these corporations and the individual is the most sensical solution, but this could take decades to fully happen as Big Tech has a tainted reputation, and it often takes longer to restore trust than it does to breach it. As such, this is the dilemma we have on our hands. How can Big Tech engage in federated learning without triggering concerns about trust?

Re-Evaluating the Concept of Trust

Trusting Big Tech is a big ask. The poor track record of industry giants like Google and Meta makes this very hard for people to do. But what if there was a way to circumvent this need for trust altogether?

Usually, we view trust as one of the basic principles that underpin our interactions. We trust our friends to care about us. We trust our boss to pay us. We trust store owners to exchange our money for goods. The notion of trust is the bedrock of human interaction. However, advancements in technology and cryptography provide us with an alternative to this, one which has only properly been explored for the last decade or so.

This concept is called trustlessness, which is where two or more parties are able to interact with each other without having to concern themselves over whether anybody is reliable, honest, or decent. This is made possible because the systems that these people all use essentially prevent them from being dishonest or indecent by utilizing mathematical algorithms to streamline and manage the flow of transactions, data, and interactions; as opposed to a third party or intermediary who is expected to handle the upkeep of everything.

Trustlessness is a core principle in blockchain technology, and is arguably the reason why the blockchain industry has flourished over its lifespan. It was the primary element that enticed people to join the industry. To understand this, look no further than the year that Bitcoin was created, in 2008. Prior to the release of Bitcoin, the world had seen one of its most severe economic crises in recent history, plummeting many countries into a deep recession. The causes and culprits of the crisis are hard to fully map, but one widely accepted fact is that it heavily involved the (arguably intentional) mismanagement of finances at the hands of corporations. The companies that handled our money had failed us, and had contributed to huge losses that the average person suffered from and felt on a very personal level.

Naturally, this created a tremendous breach of trust between the individual and the financial custodians of the world. And as a result, Bitcoin was released, with an alternative system, allowing people to transact and manage their money without needing to trust any third parties or intermediaries. The idea was that custodians could be replaced with code, where you would not need to trust anybody to treat your money with respect, because you could rely on the programming and architecture of the system to handle that for you. And what made this distinct from any previous system that existed was that you did not need to worry about who ran it because it was decentralized, meaning it was distributed around the world in a way that prevented singular control from emerging.

Over a decade later and this idea of circumventing trust has become more and more popular. However, now we have learned how to apply it to more fields than simply finance. In fact, trustlessness and decentralization can be used in federated learning, where it can be used to reduce the worries people have about Big Tech.

To do this, a federated machine learning model can be trained on local devices and have the raw data stay on that device, with only the gradients being shared. People do not need to worry about whether data is being sold or intercepted because the system can be coded to compress and mix up raw data, making it practically impossible to be accessed. Not only this, but activity within the federated learning model can be audited on the blockchain, providing a trail that can be followed, so that people can be sure about what is happening to their data. In other words, people would not need to trust that corporations will treat their data fairly because code can prevent them from acting suspiciously and because users can see how the system itself is working by viewing the code and audit trails themselves.

This is how trustlessness works in FLock. Zhipeng Wang, Lead Research Engineer in Blockchain and Cryptography at FLock and Ph.D. Computing student at Imperial College London, explained further how FLock utilizes blockchain technology to maximize safety and efficiency. He noted that “FLock v1 leverages blockchain incentive mechanisms to reward honest participants and slash malicious ones, which can prevent rational companies from behaving in a non-intended manner. FLock v2 will further leverage zero-knowledge proofs to enhance the privacy of data and models in FL”.

Zhipeng Wang– Lead Research Engineer in Blockchain and Cryptography at FLock and Ph.D. Computing student at Imperial College London

Additionally, he discussed the nature of governance in this type of project. “Governance plays an important role in FLock’s ecosystem. Specifically, FLock will launch its tokens, which will be distributed among the participants in the Flock community. Holding the tokens means having the voting power to determine any updates and change proposals from the community. Moreover, the value of Flock tokens will also affect participants’ willingness to join the FL setting, which is critical for the whole FLock ecosystem.”

In other words, FLock is an incentive-driven ecosystem, with that incentive being a token that gets provided to members of its global and distributed community. These community members will have the ability to vote on crucial matters within the ecosystem by staking said tokens, which keeps the project flowing and evolving in a direction that the people want. This creates an inherent dissolving of hierarchy, giving power to collectives of people rather than just a select few. However, honesty is highly valued within the project, and so those who are proven to act honestly will be provided with more rewards in the form of FLock’s token, whereas malicious participants who have known or suspected ulterior motives will have their tokens or rewards slashed, disincentivizing them from staying in the ecosystem, whilst simultaneously weakening their voting power.

What does it Mean to be Free From Trust?

Trustlessness is an exciting concept as it can prevent companies from acting in wrongful ways by limiting their control. Instead of worrying about whether Big Tech will harvest data through their federated learning models, people can be reassured as they are simply unable to. There is no need to place your faith in these corporations because the technology makes such activities highly unlikely.

Trustlessness is a commonly used term in the blockchain space, and is often hailed as a solution to the lack of confidence that people have towards corporate bodies. We have defined trustlessness as a scenario where you do not need to be concerned about the reliability or decency of other parties due to code ensuring that their actions are authentic and dependable. But what exactly does it mean for a system to exist where trust becomes almost entirely unimportant?

As mentioned earlier, trust is such a fundamental aspect of human living. It feels unusual to simply banish it. Trust is one of the primary ways that we come to learn about the world. We learn about other people’s experiences, and these affect our view of reality. We hear what people say about their own thoughts, emotions, and desires, and we use this as a way of creating an idea of that person in our heads. We are only given direct access to a small amount of this life, which is limited mostly to what we personally experience of the outside world, along with our internal thought processes. Everything else is gained from trusting other people.

In this sense, it is understandable that trust is a fundamental aspect of the study of knowledge, which is known as epistemology. Trust and knowledge have a very intimate relationship due to how subjective and limited our personal experiences are, and so we must employ trust if we want to engage with aspects of the world that are unknowable to us personally. However, there has always been a problem with trust, which is that there are many circumstances where it does not provide us with genuine truths.

Despite the two words being so similar and often being discussed together, there is nothing about trust that explicitly leads to truth. This is simply because it is possible to trust somebody who is unreliable or malicious. Typically, we would call these people untrustworthy, and so what they tell us would be taken with a pinch of salt. In other words, we would use our rationality to discern how much we should rely on them. But even with people who have a good track record of being accurate and reliable, there can still be circumstances where they provide the wrong information or treat us badly. The philosopher David Hume speaks about this issue, which is often called the problem of induction. In a nutshell, it is the argument that we cannot rely on future situations functioning like past ones. In other words, even if somebody has a trustworthy track record, this does not mean they will not be misleading or compromised in the future.

In the context of our discussion, this can occur in a corporate and centralized setting, where decision-makers at the top are regularly switched out with other people. The fluid motion of people within an organization means that trust can never be entirely gained because there is no consistency in the people involved. We can never know if they will support us or use us. In this sense, it may always be irrational to trust a corporation.

And this gets to the crux of what makes trustlessness so important. While trust is necessary for interpersonal relationships and on an individual level, when it comes to trust on a corporate level, this may never truly be possible to attain. And so, if we ever want to wholeheartedly interact with a corporation, then we need to engage in a way that requires a minimal amount of trust.

With this in mind, trustlessness is not the eradication of trust, as it is not possible to trust corporations anyway. But rather, trustlessness is the eradication of the implicit need to act as if we do trust these corporations. To give an example, many people do not trust Google to treat their data fairly, but they still act as if they do trust Google by their continued usage. However, this continued usage often occurs because companies like Google are synonymous with 21st-century life, and so it is often impossible to avoid these tech corporations entirely, even if you do not trust them. Therefore, trustlessness is not actually removing trust from a situation, but rather removing the necessity to act as if you do trust. Projects like FLock are then able to create more reliable circumstances without the need to even feign trust.

The epistemic upshot of this is that sidestepping trust might be necessary to create reliable and dependable tools. We might need trust in some parts of life, but trust is more of a bandaid for the fact that there are only limited things we can know first-hand in this world. Trust allows us to fill in our gaps of knowledge and engage with other people in an open way. Centralized systems are fraught with downfalls, and so if there is a trustless alternative then it should be taken.

Trustlessness can also improve the epistemic value of machine learning models as a whole. By utilizing a federated learning technique that prevents raw data from being interpreted by compression and mixing, this stops malicious hackers from seizing control of the model and altering the results or the labeling of the data. This is called a poisoning attack, as Rui Sun spoke about earlier, and it causes an AI or machine learning model to provide inaccurate results or behave unusually. With trustless technologies, a machine learning model can be more relied upon by the end-user because they do not have to worry about inaccuracies happening through tampering.

Removing the Human Component

There is a shadow looming in this discussion. Instead of trying to find solutions that allow us to trust Big Tech, we have opted for a method of engaging with them which requires no trust whatsoever. We are not trying to form a better relationship with these corporations, but rather sanctioning them by limiting their capabilities through blockchains and code. We are essentially removing the human component from Big Tech by finding ways of utilizing their tools without having individuals on their end make any decisions.

In some ways, we could view this as an abandoning of the relationship between people and Big Tech. Instead of encouraging these organizations to restore our faith in them, we are building around them. You could say that it is an admission of the present lack of trust, and a distancing from the complications that occur when trying to rebuild that bridge of trust.

At its core, this is a question of reducing corruption. There are two ways of doing this. One way is to create a situation where corporate executives and leaders are trained to be more ethical, moral, and decent so that they do not misuse tools like federated machine learning models. Along with this, there would need to be a way for more transparency to emerge so that end users and citizens could peek behind the curtain and gain an intimate knowledge of what happens within the boardrooms of these tech companies (because first-hand knowledge is more rational to rely on than trusting others).

The other way is to create systems that stop corporations from doing harm or acting maliciously. This is what trustless and decentralized models do. No one option is better than the other, but the fact that trustlessness is gaining such traction in the world is perhaps a telling sign that we are worried that these corporations are actually irredeemable. Creating trustlessness is by no means easy, but it might be easier than reforming corrupt individuals and industries. Or, at least this might be how it feels right now.

It is becoming harder and harder to reprimand tech corporations and their leadership. Their unimaginable wealth gives them very strong armor against regulations, so it can feel futile to try and change them for the better. Rather, it makes more sense to create trustless systems and simply circumvent them. Or perhaps even encourage them to adopt those systems by offering them benefits like speed from gradient compression, like with FLock.

This can either be viewed as a negative statement on the nature of the world right now, or it can be viewed as a way in which technology can triumph against human disorder and indecency. It all depends on whether you wish to embrace trustlessness or try to rebuild trust.

What Does a Trustless Future Look Like?

However, it does not need to be an either/or situation. It may be possible to use trustless tools like FLock to encourage Big Tech to act more fairly in the future. For instance, if the world turns to trustless models, then it could act as a wake-up call for these companies, telling them that the public is exhausted by them and that we are actively looking for ways to reduce the involvement of the decision-makers and custodians in their executive team. And if they do not clean up their act, then we could one day replace them with systems that do not have such a poor history.

In other words, trustlessness could indicate to them that they must become more transparent and moral in their activity. This is likely to become more and more important with the rise of AI chatbots and and large language models.

In recent months, the battle to create commercial AI services has heated up to unprecedented levels. These could have huge implications due to the fact that people are very much open and ready to give away highly sensitive data to these services. The release of the current market leader, ChatGPT by Open AI, has revealed that people will readily input health conditions, financial information, legal situations, and political views in exchange for advice and guidance. The world is being unbelievably open right now. This does not mean that anybody trusts these services, but rather that they simply want to engage in their services. A bad actor at the top, or a malicious hacker could potentially harvest this information for their own gains.

While ChatGPT does not currently use federated learning, it is very possible that a future AI chat service will. If these tools were using a trustless and decentralized service, then they could eradicate these potential issues and provide people with a more dependable service that they feel safer using. And if the option between a centralized AI service and a decentralized one was placed before an end user, they are likely to choose the decentralized one if it is just as sophisticated, as it will mean they do not need to worry about what will happen to their sensitive info.

This is what the future of trustlessness could look like! Most AI-assisted technologies can be made in a trustless way, and if those options are presented to the public, then they are more likely to pick them. This would no doubt put pressure on Big Tech to address its relationship with the public and force them to become more decent in their actions.

Of course, it is not just limited to chatbots. The best usages revolve around medical and financial data, as these are so sensitive that they can affect people’s quality of life. If data like this gets in the wrong hands then people can be manipulated by corporations through ads, or even through changing social media feeds to reflect their circumstances in a way to push them to make certain decisions. This is what happened with Facebook and Cambridge Analytica, where political data was used to alter social feeds and create polarization and political hostility as a tool to affect power structures.

Another huge perk of a trustless future in the world of AI is that it could lead to a reduction in the underground selling of data. If trustlessness became the norm, then there would be significantly fewer data to sell and distribute, at least not maliciously. People may choose to sell their own data themselves, as this type of information is likely to always be lucrative, but there would be less data available for sale from sources that choose to profit off users without their proper consent. This is both because trustless tools and systems would provide a strong framework for data protection, and because trustless systems would become favored over centralized ones due to people feeling safer.

Final Words

As our suspicions regarding Big Tech grow greater and greater, we are feeling the urge to look for new methods that limit their behavior without reducing our access to them. One of the best ways to do this is by implementing trustless technologies into the tech industry. A decentralized network or protocol that can operate without the need for a figurehead or concentration of power is perfect for this day and age where we have rightfully lost confidence in Big Tech corporations.

With decentralization often comes trustlessness, as it is a huge concept in the blockchain space, albeit one that does not always get examined. A fantastic example of trustlessness technology is FLock’s blockchain-based federated learning network. By decentralizing the process of data distribution, they are able to train AI models in a way that prevents hacks to raw data, whilst also limiting Big Tech’s ability to engage with that raw data themselves. This distancing of them is extremely important as it allows the public user base to interact with these AI models and tools without the fear that their data may be getting siphoned by bad actors or exploitative companies who want to access or steal people’s information for a profit.

In truth, things should not be like this. We should not feel such discomfort and anxiety from these corporations having access to our data– but the sad truth is that these companies have earned our distrust, and so it makes complete sense that we have developed these methods of operating. Perhaps in the future, these corporations repair their fractured relationship with us, but until then it is arguably a necessity to engage in trustless technologies as a way of both protecting user data and clearly signifying to tech giants that we are no longer as comfortable with their unprecedented involvement as we used to be. We want something safer.