Thursday 17 October 2019

Data Project

As certain previous posts demonstrate, the content of various homeopaths websites and social media has been analysed for certain problem content. See this example about vaccination. This is a laborious and very time consuming process.

Being able to show how prevalent CEASE therapy was among homeopaths was very useful in getting media interest and getting regulators to act (even if homeopathy associations don't want to). The amount of effort involved is considerable. Part of the CEASE story is even with various actions, many media reports, homeopaths are still making claims and putting autistic children at risk of harm. It's an ongoing process and it will likely continue until open promotion of CEASE therapy and other associated harmful practices disappear from the UK.

Anything that would speed up the process would be a good thing. Most readers will not be interested in the technical details.

Whilst it may seem strange to reveal strategy and tactics, there are various good reasons for doing so. To be blunt, if homeopaths are aware that highly questionable claims will be detected more easily, that they can be monitored on an ongoing basis, they make be more reluctant to make them. Their associations may find it more difficult to dismiss concerns if presented with concrete evidence of the prevalence of problem claims. And of course, more evidence allows a more compelling case to be made to regulatory bodies.


Web scraping

What is web scraping?
Without going into too much technical detail, websites are made up of structured information. Sometimes that information can be cut and pasted into a spreadsheet and used directly, but more often not. Web scraping is a way of extracting relevant structured information from a website in a format that can be used in other computer programs. 

There are plenty of tools available that claim to be able to scrape websites without the need for coding. There can work very well for certain types of website with very well structured data but homeopathy related websites tend not to be. It is necessary to resort to coding. Because different websites structure data in different ways, different websites can require very different approaches.

Once written, a web scraping program can be set to run on a scheduled basis.

Association websites
Along with lots of extraneous information, homeopathy membership association websites will have membership lists. It would be strange not to. How they are presented varies and as does how easy they are to use. Some association websites do allow content beyond basic details. 

Capturing members over a period of time allows for tracking of numbers. As a side note, because of the way that some websites work(ed), using Wayback Machine it is sometimes possible to obtain historical data. This is proving very useful in charting the decline in membership numbers of the various homeopathy associations.

Directory listings
There are number of specialist directory websites in the UK that cover CAM practitioners. Examples include Natural Therapy Pages but non-specialist business directory websites such as Yell.com also list homeopaths. However, with some of these websites, they are picking up their data from another external source. The homeopaths probably don't know that they are listed on them.

Some homeopaths don't have their own websites and will use directory listings as their primary online marketing. 

These websites can pick up homeopaths that are not members of an association.

"Training" websites
cease-therapy.com  is the most obvious example of a website that lists practitioners that claim to have certain training but there are others. Some of the homeopathic colleges have lists of graduates. 

These websites can also pick up some homeopaths who are not members of any association.

Information
The information of most interest are names and website addresses. Actual and email addresses, telephone numbers are of much less interest but can be useful in determining whether, say, two similar names are variations of the same name or actually two different people. 

Web spidering
A web spider (also known as a web crawler) is a programme that searches the internet for websites and their pages and stores the addresses in a database. Strictly speaking, a web spider doesn't have to capture the content of web pages but most web spiders do.

In practical terms of what is intended is to build a web spider than given a list of homeopath's websites will look for all the pages in that website that are linked to from the homeopage and capture their content. Whilst a web spider could follow links outside of those websites, it can result in huge numbers of web pages being capture.

Change detection
Obviously, if web pages are being captured on a regular basis, it is possible to track changes to them. Indeed, in order to reduce storage requirements it is better to store only webpages that change and track the date on which they have changed.

Most homeopaths' websites don't change very often, if at all. In many cases, it looks as if some homeopaths decide they need one, set one up and never do anything with it again. Indeed, some websites looked at have remained essentially unchanged for over a decade, based on the look of them and also some of the specific wording used. Copyright dates can also be a give away.

Saving pdfs of changed web pages is very likely to be useful if complaints are to be made.

Content analysis
Effectively, you have your own mini-Google to play with, restricted only those websites of interest. Whilst Google does have a Custom Search tool, it's limited to ten websites and the results would have to be scraped.

Being able to search selected webpages for certain keywords would drastically reduce the amount of work involved in analysis.

It would be possible to build lists of similar words/phrases. For example, yes, vaccine and vaccination are obviouly linked, but so is immunisation. It's known that anti-vaccination homeopaths sometimes use the phrase "natural immunity". "informed choice" or "informed decision" are phrases also used.

Examples of the kind of things that simple key word/phrase searches could determine are -
  • Whether common non-homeopathic therapies are offered such as cranio-sacral therapy, reiki and reflexology to name a few.
  • The institution a homeopath studied at. The qualification abbreviation used can narrow this done and sometimes the institution is explicitly named.
  • Protected title and bogus qualifications. Some titles are restricted to regulated professions yet some lay homeopaths will describe themselves as "homeopathic physicians". Some titles like doctor aren't protected but their use by the non-medically qualified can be misleading. There are also a few homeopaths with qualifications from diploma mills
  • Any other association the homeopath may belong to. Again, abbreviations can be indicative and sometimes they are explicitly named.
  • Conditions "treated". Of course homeopaths don't treat diseases they "treat the person" whatever that means. 
Links
Links to external websites can be very revealing. Obviously, links to certain websites such as cease-therapy.comArnica Group and Informed Parent are of concern, but analysis is likely to find more problem websites. 

Text mining
Text mining is a huge area encompassing all sorts of different concepts and techniques. Many will not be applicable to this project. Homeopaths' websites tend not to contain that much text and thus there is a practical limit to what can be inferred from them. Also, some of the techniques require a lot of computing power. But there are some simple things that can be done.

To give a practical example - homeopaths are neither qualified or competent to give advice on vaccination but some explicitly offer it. Both "vaccination" and "advice" are quite common words on homeopathy websites. There are a number of ways that the explicit offer of advice on vaccination could be phrased and it is unlikely that they could all be determined. However, the closer the two words are to each other in the text, the more likely that the offer is being made. A proximity search is required. Another example where a proximity search might return good results is when a homeopath claims to specialise in the treatment of particular conditions.


Social media
Capturing social media content can be trickier than simple web spidering. This is especially true for Facebook. One of the issues with social media content is that there can be no clear differentiation between posts that are purely personal and those that relate to homeopathy etc. Frequent users of Facebook can generate a lot of content that is irrelevant for the purposes of this project.

The business of the muddying of the personal and the professional is a problem. If a social media account is used to promote a homeopath's business in any way, it can be considered advertising. 

Social media content also doesn't change in the same way that website content does. Once a post is made, it is extremely unlikely to change (except, perhaps, to be deleted). Rather new posts are made. Because of the way that social media works, over time, older posts become less visible - although this does depend on how frequently someone posts. More sophisticated users of social media can employ automated tools to regularly make the same post so that key messages remain visible to visitors. 

However, historical posts are of interest because they can, for example, reveal anti-vaccination views even if visitors to a page are unlikely to see them.

Content analysis of links will likely reveal the most important social media platforms. But it is worth pointing out that not every homeopath will link to their social media from their website, assuming that they have a website. Indeed some don't and use social media as principal online marketing platform. Determing whether a homeopath has an actual social media presence on a particular platform can be complicated by some platforms automatically creating placeholder accounts from directory type information that the homeopath may not even know exists.

Social network analysis
Again, like text mining is a big subject. The two are often used together. 

SNA has been used in research looking at the spread of anti-vaccination messages via Facebook. There has also been research looking at content and reasons for anti-vaccination sentiment. Neither are the approach that is intended to be taken. It is links between homeopaths and also links to known anti-vaccination accounts that are of interest. Membership of certain groups too. For example, on LinkedIn there is a CEASE therapy group. There are several closed Facebook groups too but changes to Facebook mean that its no longer possible, quite rightly, to determine who is a member.

Limitations and priorities
The resouces that this project has are very limited. One person who is somewhat rusty at coding and slow to say the least. A motley assortment of elderly hardware and a domestic broadband connection. There is no budget to buy whizzy software tools or new hardware (although it may be possible to get hold of yet more elderly hardware). Nor is there any budget for cloud based computing. 

The scope of the project has to be very tightly controlled. Whilst it could be extended to cover practitioners over than homeopaths or countries beyond the UK, this would consume resources. There is also a temptation to use more advanced techniques, especially in content analysis. Key word/phrase searches might be good enough for most purposes.

Web scraping and spidering are both legal in the UK but there are some limitations on what can be done with the data obtained. The legalities also vary in terms of who is processing the data and who has access to it. It makes collaboration difficult. Scraping/spidering a site too often can lead to being blocked but this can be overcome.

The primary goal of the project is to track those who have ever offered CEASE therapy and the related Homeopathic Detox Therapy. Tracking anti-vaccination sentiment is an important secondary goal.

Reporting findings
The intention is to initially report any findings to any relevant membership organisation. There is no expectation that they will do anything but it does give them the chance to act. If they do act, any results of that action should become apparent. If they don't act...

In the case of homeopaths who belong to either the Society of Homeopaths (SoH) or another other Professional Standards Authority (PSA) accredited register, the PSA will also be made aware of any results.

Some homeopaths do belong to statutorily regulated professions. Their regulators will be made aware but they are unlikely to do very much. Doctors in particular also have professional associations, generally associated with particular specialties. This type of association can be involved in the setting of educational standards for the specialty. Other associations such as the British Medical Association are more like trade unions. 

The Advertising Standards Authority (ASA) are likely to be very interested in some of the results. It's reasonable to assume that the ASA would like membership organisations to be more proactive to getting their members to comply with advertising rules. Indeed, the SoH did make much of cooperating with the ASA although sampling exercises suggest that the SoH has either done very little or is ineffective in what it has done. 

In the current climate, media are very interested in the prevalence of anti-vaccination sentiment. Whilst the importance of homeopaths in spreading anti-vaccination misinformation is unknown, some homeopaths are involved with groups/individuals who appeal very influential in the UK anti-vaccination sphere. There are stories that need to be told. Media reporting can be very useful in getting regulators to act or getting Government to change policy.

Most of the above is thinking about prevalence of problems but it is possible that analysis will turn up individual websites that contain marketing claims that are beyond the pale or even evidence of illegal practices. 

Of course, this blog will publish various analysis as results become available but it must be stressed that placing details of problem homeopaths and their websites into the public domain is not a first resort.

One possibility is submitting a paper to a relevant academic journal.

Potential Impacts
Homeopaths, their supports and associations are not very tech savvy. Nor do they seem to be very good at identifying emerging potential risks (or alternatively, they are very good at ignoring them). A more automated approach to monitoring homeopaths' marketing claims is very definitely a threat.

Some homeopaths should have noticed that Facebook is flagging their accounts as promoting anti-vaccination misinformation. 

UK homeopathy is in decline. Yet more negative media reporting could have the effect of further marginalising homeopathy and reducing what little public demand there is. True, it is unlikely to have any effect on the small core of "true believers" but it does make attempts to promote homeopathy to the general public much more difficult.

Although some aspects of this project are technically fiddly, none of it is rocket science. The associations could implement similar website monitoring themselves, if they wanted to.


No comments:

Post a Comment