999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

數據提取須有德

2022-03-07 13:39:06朱利葉斯切爾尼奧斯卡斯云天
英語世界 2022年10期

文/朱利葉斯·切爾尼奧斯卡斯 譯/云天

近來,互聯網正經歷著與18 世紀早期“采金熱”類似的現象,特別是在數據提取方面。數據因其巨大的價值而被某些分析師稱為“新石油”。數據領域仍然對大大小小的參與者開放,但這也導致了若干不專業的行為,甚至有人設法獲取有密碼保護的數據。

2盡管許多網站確實包含IP禁令等防御措施,但由于競爭加劇和各種經濟因素,網絡爬蟲和服務器之間的無形沖突仍在持續,并愈演愈烈。盡管大多數人很樂意利用億客行、谷歌購物、PriceGrabber 和天巡網等聚合網站的低價優勢,但人們并沒有意識到上述沖突正發生在不同的電商平臺之間。

符合道德的網頁數據抓取:目的的重要性

3使用工具的目的有好有壞,網頁數據抓取也不例外。一種相當常見的情況是以營銷為目的抓取個人數據。數億用戶通過電商平臺上的服務協議條款同意公開他們的數據,無論他們是否意識到了這一操作。然而,數據遭泄露的問題在于,這些數據由社交媒體機構提取,卻為僵尸網站所用。這類網站在未經用戶許可的情況下創建個人資料,并羅列出個人的詳細信息。

4結果,網頁數據抓取的負面新聞越來越多,這使得公眾對自身數據價值和隱私的認識有所提高。網頁數據抓取本身并沒有什么不道德的,因為它不過是把人們通常需要手動操作的活動自動化了。主要的區別在于,網頁數據抓取使用機器人程序,在極短時間內爬取大量網站、提取海量信息,從而實現更大規模的信息搜集。

5提取公開的數據需要代理。簡單來說,代理是網絡爬蟲和服務器之間的中介。使用代理可以將數據請求均勻地分配到服務器,這樣能確保以合理的速率請求數據,也可保證請求方匿名。

不道德抓取的后果

6不道德抓取所采用的數據提取方式可能損害個人隱私,導致服務器過載。

7盡管很多網站試圖通過IP禁令來防止不道德抓取,但這漸漸變得徒勞,因為使用了代理,而且這些代理能夠模擬人類行為來規避服務器問題。這最終可能導致服務器過載(使在線企業耗費資金)、互聯網透明度降低、公眾在隱私問題上的不信任加重。

網頁數據抓取道德規范是必要的

8網頁數據抓取大有裨益,但這有賴于有自由且透明的互聯網可用。我確信,如果我們能遵循一些準則,使局面對每個人都公平,那么網頁數據抓取將有益于整個科技領域:

1. 只抓取公開的網頁

2. 研究目標網站的法律文件以確定你依照法律是否接受其服務條款。如果接受,確定自己是否不會違背

3. 合理請求數據以保證服務器功能不受損害(DDoS 攻擊)

4. 尊重源網站對所獲得的任何數據的隱私保護

5. 使用以合乎道德的手段獲取的代理

并非所有代理都是平等的

9眾所周知,當今正在運行的某些代理,其獲取方式并不道德。許多代理通常是人們從下載到個人設備里的應用程序中獲取的。很難確定這些用戶是否意識到了他們的設備正在被使用。但可以肯定的是,如果用戶同意了具有誤導性或是容易混淆的服務條款,從而不情愿地將個人設備變成住宅代理網絡中的參與者,那么將這類程序用作代理一定是不道德的。

合乎道德的做法能提升公平性與責任心

10現代網頁數據抓取的某些方面缺乏明確性,需要道德規范來為行業帶來秩序。如果業內人士能夠就專業的網頁數據抓取方法達成共識,這將有助于維護一個公平、開放、自由的網絡環境,使企業與消費者雙贏。關于數據抓取在各行各業所能發揮的最大潛能,我們對此的了解仍處在早期階段,所以讓我們抓住這個大好時機,以最合乎道德的方式來推動創新、促進發展。 □

The internet is currently undergoing a similar phenomenon to the gold rushes of the early eighteenth century,specifically when it comes to data extraction. With data now dubbed by some analysts as the “new oil” in terms of its value, the field is still open to small and large players alike, which has led to some unprofessional activities that extend all the way towards the acquisition of password-protected data.

2While many websites do contain defensive measures such as IP bans, the invisible conflicts between scrapers1scraper 網絡爬蟲,一種按照一定的規則,自動抓取萬維網信息的程序或腳本。后文的抓取、爬取,均指從萬維網上收集數據。and servers are ongoing and gaining in intensity, due to increased competition and economic factors. Most people don’t realise these are taking place between e-commerce stores, although they are happily taking advantage of the low prices found on aggregator websites2aggregator website 聚合網站,指的是通過人為技術方式收集其他網站的熱點內容,進而將相關鏈接內容分類聚合成為自己網站內容的網站。

2 aggregator website 聚合網站,指的是通過人為技術方式收集其他網站的熱點內容,進而將相關鏈接內容分類聚合成為自己網站內容的網站。like Expedia, Google Shopping, Price-Grabber and Skyscanner.

Ethical web scraping: the importance of intention

3Tools can be used for positive and negative purposes, and web scraping is no exception. A fairly common scenario is the scraping of personal data for marketing purposes. Hundreds of millions of users agree to release their data through terms of service agreements on e-commerce sites—whether they realise it or not. The issue with the exposed data, however, is that it has been extracted by social media agencies and used by now-defunct websites that create profiles and list personal details without user permission.

4As a result, web scraping is increasingly being subjected to negative press that has resulted in increased awareness from the public with respect to the value and privacy of their data. There is nothing inherently unethical about web scraping as it automates activities that people often do on a manual basis. The main difference is that web scraping does it on a much bigger scale by using bots to crawl numerous websites and extract huge amounts of information in seconds.

5Extracting publicly available data requires proxies3proxy 代理,一種特殊的網絡服務。它允許客戶端通過這個服務與服務器進行連接。. In short, proxies act as intermediaries between the web scraper and web server. Employing proxies allows distributing data requests evenly to the web server, ensuring that the data is requested at a fair rate, as well as providing the anonymity factor to the requesting party.

The consequences of unethical scraping

6Unethical scraping uses data extraction in a way that may compromise4compromise 危及,損害。privacy and result in server overload.

7While many websites try to prevent it through IP bans, this is becoming futile5futile 徒勞的。due to the use of proxies and their function in circumventing66 circumvent 逃避(規則或限制)。server issues by simulating human behaviour. The end results can be server overloads that cost online businesses money, reduced internet transparency and more distrust from the public with respect to privacy issues.

A web scraping code of ethics is necessary

8Web scraping has many benefits that depend upon the availability of a free and transparent internet. I believe it would benefit the entire tech space if we adopted a few guidelines in order to make the landscape fair for everyone:

1. Scrape publicly available web pages only

2. Study the target website’s legal documents to determine whether you will legally accept their terms of service and if you will do so—whether you will not breach these terms

3. Make reasonable requests for data in order to ensure that server function is not compromised (DDoS attack7DDoS attack 即distributed denial-of-service attack,分散式阻斷服務攻擊,一種網絡攻擊手法。該手法的目的在于將目標電腦的網絡資源及系統資源耗盡,待目標電腦負荷過重而倒下后,通過系統漏洞入侵目標電腦。)

4. Respect privacy concerns of source websites with regards to any data obtained

5. Make use of proxies procured in an ethical manner

Not all proxies are equal

9It is commonly known that some proxies operating today are not ethically sourced, with many often obtained through applications downloaded by people on their devices. Whether these individuals are aware that their device is being used is difficult to ascertain.What’s certain is that it’s definitely not ethical to use them as a proxy in cases where they consented to misleading or confusing terms of service that unwillingly turn their device into a participant on a residential proxy network.

Ethical practices lead to increased fairness and accountability

10There are some aspects of modern web scraping activity that are missing clarity, and a code of ethics is needed to bring order to the industry. If those in the industry can come together in agreement over a professional approach to web scraping, it will help to maintain a fair, open and free internet that will benefit both businesses and consumers. We are still in the early stages of discovering the full potential of data scraping in different industries, so let’s take advantage of this golden opportunity to drive innovation and create growth in the most ethical way possible. ■

主站蜘蛛池模板: 欧美黄网站免费观看| 中文字幕va| 亚洲AV无码一二区三区在线播放| 色吊丝av中文字幕| 亚洲综合激情另类专区| 在线精品欧美日韩| 国产精品成人第一区| 亚洲 成人国产| JIZZ亚洲国产| 亚洲无码免费黄色网址| 强奷白丝美女在线观看| 91丨九色丨首页在线播放 | 中文字幕2区| 中文字幕2区| 18禁高潮出水呻吟娇喘蜜芽| 在线观看网站国产| 黄色网页在线观看| 日本欧美视频在线观看| 99热这里只有成人精品国产| 久久精品视频一| 国产精品伦视频观看免费| 国产va在线观看免费| 国产精品极品美女自在线看免费一区二区| 伊人久综合| 99视频在线精品免费观看6| 国产成人啪视频一区二区三区| 国产97公开成人免费视频| 91在线无码精品秘九色APP| 91色爱欧美精品www| 日韩小视频网站hq| 欧美性猛交xxxx乱大交极品| 99在线观看免费视频| 国产尤物jk自慰制服喷水| 成年av福利永久免费观看| 国产欧美又粗又猛又爽老| 国产亚洲精品自在线| 亚洲国产欧美目韩成人综合| 欧美成人一级| 99re在线观看视频| 国产无吗一区二区三区在线欢| 丰满人妻久久中文字幕| 国产三级成人| 永久免费无码成人网站| 欧美精品在线观看视频| 国产性生交xxxxx免费| 成人看片欧美一区二区| 日韩东京热无码人妻| 精品一区二区无码av| 毛片免费高清免费| 免费无码又爽又黄又刺激网站| 国产另类视频| 中文字幕在线播放不卡| 久久窝窝国产精品午夜看片| 国产激情无码一区二区三区免费| 人人看人人鲁狠狠高清| 国产精品毛片一区| 亚欧乱色视频网站大全| 婷婷亚洲综合五月天在线| 国产在线无码一区二区三区| 久久久久久高潮白浆| 91网址在线播放| 欧美亚洲日韩中文| 日本一区二区不卡视频| 无码av免费不卡在线观看| 国产免费久久精品44| 试看120秒男女啪啪免费| 亚洲二三区| 国产成人久视频免费| 无码免费视频| 亚洲自偷自拍另类小说| 国产熟睡乱子伦视频网站| 91精品专区| 国产中文一区二区苍井空| 国产小视频在线高清播放| 国产亚洲精品自在线| 国产成人艳妇AA视频在线| 日本在线欧美在线| 99精品视频在线观看免费播放| 国产91透明丝袜美腿在线| 国产午夜小视频| 国产男女免费视频| 在线网站18禁|