Sure! Could you provide the content you need rewritten with the introduction and the keyword âWeb scrapingâ included?
With web scraping becoming increasingly popular, how can we access some of the valuable data from mobile apps? Today, Iâll show you how to use Wireshark for mobile packet capturing. Of course, this method is universal, so whether youâre using Fiddler or any other packet capture software, you can apply it.
Wireshark is a powerful open-source and free network packet analysis software. It can capture various network packets and display their detailed information. Wireshark is a computer application, so how do we use it to capture mobile network data? The packet capturing principle of Wireshark is to use WinPCAP as an interface to exchange data directly with the network card. We only need to make the mobile transmit data via the computerâs network card. Of course, the same applies to other software where you need to be on the same network!! If you use an Apple device, you might need to install a certificate. Please confirm through general settings. Here, the packet capturing process for Android is mainly explained.
1. I use 360wifi to enable interaction between mobile and computer networks
360 Free WiFi can leverage the laptopâs wireless network card to create a WiFi hotspot, and the phone can access the internet by connecting to this WiFi. After connection, open our Wireshark, start capturing packets, and immediately use your mobile box to click on information so it refreshes the news list.
At this point, you can see the packet capture tool conducting protocol transmission. Some might wonder what 360WIFI is! Typically, youâd set an IP to capture packets, but using 360wifi allows the computer and mobile to share an IP, avoiding the hassle of setting an IP address.
The content of the first packet is:
Code Language: javascriptCopy
GET /apiNewsList.php?action=c HTTP/1.1\r\nHost: box.dwstatic.com\r\n
We can try accessing this URL in a browser to see if itâs the data we need:
The format is JSON, and after transcoding from USC2 to ANSI:
Code Language: javascriptCopy
[{"type":"newsWithHeader","tag":"headlineNews","name":"Headlines"},{"type":"news","tag":"newsVideo","name":"Video"},{"type":"news","tag":"upgradenews","name":"Events"},{"type":"album","tag":"beautifulWoman","name":"Glamour Shots"},{"type":"album","tag":"jiongTu","name":"Funny Pics"},{"type":"album","tag":"wallpaper","name":"Wallpaper"}]
This appears to be some classification of the top navigation bar for box news, not the news list data we were looking for. So letâs continue analyzing the next data packet:
Code Language: javascriptCopy
GET /apiNewsList.php?action=l&newsTag=headlineNews&p=1 HTTP/1.1\r\nHost: box.dwstatic.com\r\n
Try accessing this URL:
- Â
Code Language: javascriptCopy
http://box.dwstatic.com/apiNewsList.php?action=l&newsTag=headlineNews&p=1
The data received after parsing and formatting is:
- Â
Code Language: javascriptCopy
{"totalRecord": "11225","totalPage": 449,"data": [ {"id": "23727","title": "17th Update: Five New Skin Illustrations ","content": "September 17th Update: Five New Skin Illustrations ","weight": "64","time": "1442456005","readCount": "76977","ymz_id": null,"photo": "http://m1.dwstatic.com/mbox/article_img/shouji_ac75a4c4f67a7983455c6bdebd67a611.jpg","artId": "23727","commentSum": "111","commentUrl": "1509/306410856768&aid=23727&uniqid=b84ebe1a9e890dbe418dbb5b551ff291&gochannel=lol","hasVideo": 0,"destUrl": "http://box.dwstatic.com/unsupport.php?lolboxAction=toNewsDetail&newsId=23727","type": "news" } and more after this will not be listed
Thatâs it, this is the data we needed.
- Â
Code Language: javascriptCopy
http://box.dwstatic.com/apiNewsList.php?action=l&newsTag=headlineNews&p=1
This is the data resource for the news list in the LOL box.
Similarly, if you want to capture any software, you only need to search and retrieve the URL step by step as I did.
If you are proficient in Python, you can use Python for some data cleansing. Use requests to crawl the link and perform simple processing, and youâll be able to perfectly obtain the resources you want! Certainly, do not forget the enterprise website solution where we provided how to use the BT panel, which can be utilized now!
On the right side of the panel is a section called plans and tasks, which can automatically execute program scripts. Upload the prepared Python script to the server and activate the scheduled tasks to set it to automatically execute daily
Once all this information is set, click save and edit, and execute to access the logs to check if it is functioning correctly
By now, your email might receive a mysterious email containing the document organized by Python, with daily deliveries, serving as your personal document assistant! Perfect!
What? Youâre asking how to send an email to yourself using Python!!