Potato Chat 2, starchy boogaloo 🥔💬

Last updated 08/22/2025:

UPDATE: Christian Peter has continued this research into areas I have not addressed - his post is worth a read and the coming LEAPP parser is exciting news!

Check out his writeup here: http://cp-df.com/en/blog/potato.html

In the last post I brought up some behaviors that I was able to test and observe within the potato chat application. While I am not done testing the application’s behaviors (there are many… so many), I have also been attempting to process the various databases and logs to make some meaningful headway into documenting user activity through the app. The two main databases I have been working with are:

~/Shared/AppGroup/<App GUID>/Documents/tgdata.db
~/Shared/AppGroup/<App GUID>/Documents/shareDialogList.db

shareDialogList.db is rather straightforward, so we will start there. The database is composed of a single table:

Within the userInfosJson column, there is JSON data (🤯). I wish the rest of the app were this straightforward! For the test device being used, the content of these JSON BLOBS was observed to information in the following format:

typeId 1:

[ {     
"accessHash" : 7480976858030710324,     
"firstName" : "Jay",     
"lastName" : "Walker",     
"fileUrl" : "",     
"userId" : 94148535   
}, {     
"accessHash" : 15157466708146318,     
"firstName" : "Potato",     
"lastName" : "",     
"fileUrl" : "2_-5681047949067121967_175922129_5805541144254817307_1",     "userId" : 777000   
} ]

Within the first BLOB, the names and IDs of the test device’s contacts are presented. The “Potato” contact was attached to the account by default, whereas the ChatGPT bot and the two other user accounts were added manually.

Accesshash has not yet been deciphered.

firstName and lastName are fairly straightforward.

fileUrl is found when a user has a profile picture associated with their profile, similar structures are also seen within tgdata.db where one of the long integers is shown to translate to a hex ID for a media file. The entire value has not yet been decoded.

userId is the decimal value of the user ID, which will be found in hex (LE) later in tgdata.db.

typeId 2:

 [ {     
"groupId" : 83934061,     
"typeId" : 3,     
"title" : "Supra chat",     
"fileUrl" : "",     
"falgs" : 17538,     
"accessHash" : 7483205870531160386   
}, {     
"typeId" : 1,     
"userId" : 94148535   
}, {     
"groupId" : 83933982,     
"typeId" : 3,     
"title" : "How to use potato chat?",     
"fileUrl" : "2_-5069546109012343639_242406825_5448241800819710781_1",   
"falgs" : 17410,     
"accessHash" : 7483205870467082386  
 } ]

Breaking down the entries here,

  • groupId: 83934061 is the decimal value of the *first portion* of the group ID which is later found in hex (LE) in the tgdata.db database. More on this later.
  • typeId: 3 appears in instances of channels and groups. The other, shorter entries of “typeId: 1” relate to individual one-on-one chats between the various chat accounts.
  • Title: “Supra chat” is the group chat occupied by the three test devices.
  • fileUrl populates when a group photo is declared, Supra chat was not assigned one when it was created, however the “How to use potato chat?” channel was.
  • Falgs  The significance of this number is currently unknown, although converting to hex in this case gives 0x02 44 (LE). If this were a series of packed bytes, there would only be three flags raised. Based on the (assumed) typo of “flags”, this seems like a reasonable direction to look in the future.
  • Accesshash is unknown at this time, however the group and channel produced share the same first 10 digits (7483205870) leaving 531160386 and 467082386 as the unique portions of the number.

The user IDs are correlated to usernames in the first row, as well as in the tgdata.db. The user accounts/IDs in my test data are as follows:

 DecimalHex (LE)
jay walker941485350xB7 97 9C 05
Forrest Cook942768280xDC 8C 9E 05
Billy Johnson942774460x46 8F 9E 05

I wrote a script to pull out the data and format it nicely:

Moving on to the tgdata.db database, there were 93 tables in the database being tested. Unless I miscounted. Which, since there are 93 tables, is entirely possible. I will be focusing on tables which were found to contain relevant information to my testing so far, additional tables may be included at a later date if/when the research progresses.

Channel_messages_v32 by far holds the most significance for my initial investigation. It is also the table that has some of the more complicated information. This table holds the group chat messages. The view you are treated to in DB Browser for SQLite is:

cid (head-canon: “Chat ID”) corresponds with the group chat that a particular row belongs to. The depicted value of -4378901357, when converted to hex (LE) is 0x93 44 FF FA FE FF FF FF. When the first four bytes of this 8 byte series are read as a 32 bit signed integer (LE), the decimal value is -83934061. Dropping the negative, the absolute value of this number matches the “groupId” for Supra Chat as seen earlier in shareDialogList.db. The full hex cid value is also seen in multiple positions within the data BLOB in the same row.

mid (head-canon: “Message ID”) corresponds to the chronological position of that particular message in the group chat. By ordering the table by cid and then by mid, an ordered list of messages from each chat is achieved.

Sort_key and transparent_sort_key both begin with the 8 byte hex representation seen in the corresponding cid. The remaining 9 bytes of these two BLOBs are largely the same in each row (nearly identical, offset by one 0x00 byte) and are currently undocumented.

Data, as you might imagine, has the most significant information in this table. The column holds BLOB data, which appears to be some type of encoded key value pair like a protobuf, however without any headers or .proto instructions to follow along with. Thus far, attempts to leverage protobuf decoders against the BLOBs have been unsuccessful. The individual BLOBs are each structured similarly at the outset, with some consistent patterns emerging. After the initial portion of the data, however, differences arise between the various types of messages sent through the chat. Attempts to benchmark the BLOBs are still underway, but there has been some success. This is the largest area of need in the research currently, completing the parsing of these BLOBs would open up the chats significantly.

Let’s take a look at a relatively simple BLOB from the database. It’ll help to clear up the patterns observed. This is a BLOB from a simple text post into the “Supra Chat” group:

In the ASCII, some plain English can be seen, which mirrors the content of the text post. The patterns leading up to this have information to provide beyond the message, however. The general structure observed in this BLOB is:

1 Byte: Declares how many bytes are in the ASCII key that follows
* Byte(s): ASCII key
1 Byte: Data type is declared
   0x01 = String
   0x02 = 4 byte integer
   0x03 = 8 byte integer
   0x06 = Varint (variable length integer)
1 (or more if varint) Byte(s): Byte length of the value for the previously declared key
* Byte(s) comprising the value for the key

This pattern repeats throughout the BLOB. Below is the same BLOB highlighted and then also processed through a script:

As an aside – I never want to highlight a BLOB again. 7 clicks per block, 80 blocks and four varint bytes = 560 clicks for this lovely image that hurts your eyeballs.

Back on track:

  • The light blue announces the length of the key that follows.
  • The light yellow is the ASCII key
  • The light green announces the data type for the associated value
  • In the case of a variable length data type, the dark green declares the length of the value.
  • The pink is the value for the previously declared key.

To automate the process, I produced a python script which will iterate through this pattern and output some formatted text. This is what the output for this BLOB looks like:

i, data type 'Int':
Hex: 0E 00 00 00
Decimal (LE): 14
*Message 14 within the group chat.

sk, data type 'Varint':
Hex: 93 44 FF FA FE FF FF FF 00 67 DC 44 95 00 00 00 0E
ASCII: �D������g�D�
*First four bytes = Group ID: 83934061

pts, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

unr, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0
*Unread = False

out, data type 'Int':
Hex: 01 00 00 00
Decimal (LE): 1
*Outgoing = True

ds, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

fi, data type 'Int':
Hex: DC 8C 9E 05 00 00 00 00
Decimal (LE): 94276828
*User ID: 94276828

ti, data type 'Int':
Hex: 93 44 FF FA FE FF FF FF
Decimal (LE): 18446744069330650259
*First four bytes = Group ID: 83934061

ci, data type 'Int':
Hex: 93 44 FF FA FE FF FF FF
Decimal (LE): 18446744069330650259
*First four bytes = Group ID: 83934061

t, data type 'Str':
ASCII: Full moon over homewood

d, data type 'Int':
Hex: 95 44 DC 67
Decimal (LE): 1742488725
*Date: 2025-03-20 16:38:45+00:00)

md, data type 'Varint':
Hex:
ASCII:

rd, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

ri, data type 'Int':
Hex: 00 00 00 00 00 00 00 00
Decimal (LE): 0

lt, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

f, data type 'Int':
Hex: 00 00 00 00 00 00 00 00
Decimal (LE): 0

sqi, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

sqo, data type 'Int':
Hex: 00 00 00 00
Decimal (LE): 0

cpr, data type 'Varint':
Hex: 09 74 74 6C 50 65 72 69 6F 64 04 31 00 00 00 54 47 4D 65 73 73 61 67 65 54 74 6C 50 65 72 69 6F 64 43 6F 6E 74 65 6E 74 50 72 6F 70 65 72 74 79 00 09 74 74 6C 50 65 72 69 6F 64 02 00 00 00 00
ASCII:  ttlPeriod1TGMessageTtlPeriodContentProperty     ttlPeriod

Note that some of the fields have been interpreted. In cases where interpretation is taking place, I have placed asterisks next to the interpretations. These values have so far been consistent in content across different BLOBs, with ‘d’ always holding a date, ‘sk’ holding a group ID, ‘fi’ holding the user ID and ‘t’ holding text (if present). The ‘md’ key in particular becomes cumbersome with media posts because additional nested BLOBs can be found within the value here. ‘md’ is also where you will see media reference “URLs” as seen in the shareDialogList.db. These “URLs” will come into play when associating a media file to a post.

Take this BLOB from an outgoing image post for example:

The size of the ‘md’ value is calculated by a “varint” and thus is a variable length payload which appears to encompass a large amount of undocumented information. This is not even close to the most complicated version of ‘md’ that I have observed so far in testing, some post types have nested BLOBs. Within this variable length payload, the image that was posted to the chat is referenced multiple times as is another instance of the timestamp:

Of particular frustration is the variety of ways media is referenced. For example, the first blue highlight matches the hex of the actual filename that was posted, except it is shown here in little endian. Reading the hex values LE gives us: 0x32 AC 60 7B 67 DC 44 82, which matches the filename located in ~shared/AppGroup/<App GUID>/Documents/files/ :

Next, in orange, the same timestamp as seen earlier in the BLOB is repeated. This timestamp is consistent with when the image was sent to the chat. Finally, the last two blue highlights are a decimal representation in ASCII of the hex found in the filename, except the math only checks out if you interpret the hex big endian!

3651399481030362242 in decimal is 0x32 AC 60 7B 67 DC 44 82 interpreted as BE or 0x82 44 DC 67 7B 60 AC 32 interpreted as LE. I should dig around some more to see if they stored a middle finger emoji in Nibble Reverse for me, given the way this is going so far.

Application Logs

Application logs are found in the ~/Shared/AppGroup/<App GUID>/Documents/ directory. They are named “application-<#>.log” and they represent a rolling cache of application data as discussed in the previous post. Digging into these logs, there is a chronological process list of what the app is doing. There is a lot of plain English but there is also a lot of noise. Important to note that times found here are device local time. A few key phrases have been identified in the logs thus far as being significant.

Starting with user ID <User ID> activated
Activating user

These lines indicate which user account is logged in when the app is launching. This number matches the user ID found elsewhere such as tgdata.db

Joining watcher to the watchers of

This line signifies a chat is being joined. The data that follows this line includes a “cid” which matches the cid value found in tgdata.db and identifies the chat being joined. There are different variations of this line as well, another observed version identified the user ID instead of the group. This user ID version and was found very shortly after the cid version.

TGModernGalleryVideoPlayerView playVideo
PTVideoLoader didFinishLoad isExist videoId = <Decimal number>

These lines appear when a video is first played through the application from within a chat. The long decimal number can be converted to hex and this value will match the folder and file values found in ~/Documents/files/ and ~/Documents/video/. For example:

videoId = 4227633935802618330
converted to big endian hex = 0x3AAB92AA67E6D5C2
Corresponding folder/files:
   ~/Documents/files/video-remote-3aab92aa67e6d5c2/
   ~/Documents/video/remote3aab92aa67e6d5c2.mov

The research is very much still ongoing, I would love for this to break through into something actually useful for people. As it stands, I have scripts available to parse the databases for posts from a specific group, posts from a specific user or simply pulling and parsing all BLOBs and organizing them by group chat. I am also working on application log parsing as well and producing an all-encompassing script to pipe stuff out to an excel doc. Please check them out and give me some feedback if you’ve got ideas or if you need to correct something I’ve written about so far.

Check out the scripts and sample BLOBs/Databases here:

https://github.com/Whee30/AppParsers/tree/main/Potato

Known areas of need:

~/Shared/AppGroup//Documents/misc/remotefiles/
   data.mdb and lock.mdb:

Opening the data.mdb in a text editor, there are references here to files in the same "URL" format from sharedDialogList.db and tgdata.db. I need to identify a way to process these files.

tgdata.db - plenty to decode here still. The surface has only been scratched.

application logs - Need to find more pertinent data points. There are IP connection logs in here, it would be interesting to see if other chat participant IPs are being exposed or if they are simply Potato's server IPs. 

Leave a Reply

Your email address will not be published. Required fields are marked *

Digital Forensics and general nerdery. Learning bit by bit (heh) and fighting off imposter syndrome. Learning python, adapting it to my work and overcomplicating simple processes most of the time.