Sunday, March 7, 2021

 Cadence vs. Solidity  


This article gives a feature comparison between two smart contract programming languages, Solidity and Cadence. Solidify is the main smart contract language on Ethereum blockchain as well as other layer 1 blockchain such as Binance’s BSC. 

 

Cadence is a new smart contract programming language for use on the Flow Blockchain. Cadence introduces new features to smart contract programming that help developers ensure that their code is safe, secure, clear, and approachable. 


Flow Blockchain is the product of Dapper Labs, the company behind the CryptoKitties blockchain game. Led by founders Roham Gharegozlou, Dieter Shirley and Mikhael Naayem, the team developed Flow as a platform for blockchain-based games and digital collectibles. At the time of this writing (March 6th, 2021), the  DAU (daily active users) for flow application NBA Topshot in the past 24H were 266K, while DTU (daily transacting users) were 122K, surpassed both Uniswap and PancakeSwap  as the top Dapp with highest DAU, according to https://dappradar.com/rankings


For Solidity developers who want to try a safer language, this article can be a starting point to compare different features, especially the security features. For smart contract security auditors, your job will get a lot easier with Cadence. The first table gives comparison of the general programming features, and the second table will give comparison on security features. I also provide some background information on some of the features.



Tabel 1: General Features Comparison between Cadence and Solidity

General Features

Cadence

Solidity

Blockchain

Flow

Etherium, BSC

Compiled vs. Interpreted

Currently Interpreted, plan to be compiled to run on Move VM

Compiled to run on EVM

Multiple inheritance

No

Yes

Function overloading

No

Yes

Syntax

Swift and Rust inspired 

JavaScript

Argument labels

Yes, allow description of the meaning of function arguments

Use comments

Floating point arithmetic

Not supported due to deterministic nature of execution

Supported and additional work needed for security and deterministic requirement

Wrapping of Native Token

Flow is itself a smart contract and can be imported directly without a wrapper

ETH needs to be wrapped into WETH to be used in smart contracts.

ownerOf() vs. tokensOfOwnerByIndex(owner, index)

Support query “What tokens do you own” i.e.

tokensOfOwnerByIndex(owner, index)


Support query “Who owns this token“ i.e.

ownerOf()


Tabel 2: Security Features Comparison between Cadence and Solidity

Security Features

Cadence

Solidity

Type

Strong Typed, does not allow implicit or explicit type conversion

Statically typed, allow both implicit or explicit type conversion 

Access Control

Access control is managed through Capability-Based Security, Or  “what you have” (via Resource types and capabilities). As an example, if an Ethereum ERC-20 fungible token allows minting, it probably has a “minter address”, and only allows the mint() method to be called if (msg.sender == minter_address). In Cadence, you’d create a Minter resource object with a mint() method on it. Only someone with access to the Minter object could call mint(), and no further access controls are necessary. You control (or transfer) ownership of the Minter object instead of recording an authorized address.

Access Control is based on “who you are” (via access control lists, or msg.sender)

Resource-oriented programming

Yes, use Linear types with object capabilities

No, Traditional data structure and account based access control

Pre-Condition 

Built-in for functions and transactions, specified in interfaces and enhanced security

Not built-in, Use modifier is a workaround

Post-Condition 

Built-in for functions and transactions, specified in interfaces and enhanced security

Not built-in, Use event emit and processing is a workaround

Function Default Visibility

Private and Capability based

Public (Parity Multisig wallet hack using initWallet)

Underflow or Overflow protection

Yes

Use Open Zeppelin’s safe math library or

Solidity version 0.8

Reentrancy Protection

Yes,

One time use reference can be created from a capability, and references cannot be stored. They need to be lost at the end of a transaction execution. This restriction is to prevent reentrancy attacks

Use OpenZeppelin ReentrancyGuard

Declare variables without initializing

No and more secure

Yes, less secure

Fallback functions

No. But, you can Use pre-conditions and post-conditions

Yes and can cause undesired security problems

modifier functionality

Capability-Based Security for access control is good enough

Yes

Upgradeability

transparent upgradeability built-in

Proxy based - memory layout issues

Or Data separation: complex data types can cause problems during upgrade.  


https://blog.trailofbits.com/2018/09/05/contract-upgrade-anti-patterns/


Transaction sent to Wrong Address 

Account must contain a copy of the token’s resource type in its storage ensures that funds cannot be lost by being sent to the wrong address

Loss of funds

Security Audit Efforts

Dramatically reduced

Lots of static tools, symbolic execution tools, property testing tools and formal verification tools, but still need to do a lot of manual review. Labor intensive and expensive. 



Statically vs. dynamically typed

Statically-typed programming languages do type checking (i.e., the process of verifying and enforcing the constraints of types on values) at compile-time, whereas dynamically-typed languages do type checks at runtime.


Examples

Statically-typed: C, C++, Java.


Dynamically-typed: Perl, Ruby, Python, PHP, JavaScript.


Strongly vs. weakly typed

Weakly-typed languages make conversions between unrelated types implicitly; whereas, strongly-typed languages don’t allow implicit conversions between unrelated types.


Python is a strongly-typed language:


Compiled Languages

Compiled languages are converted directly into machine code that the processor can execute. As a result, they tend to be faster and more efficient to execute than interpreted languages. They also give the developer more control over hardware aspects, like memory management and CPU usage.


Compiled languages need a “build” step – they need to be manually compiled first. You need to “rebuild” the program every time you need to make a change. In our hummus example, the entire translation is written before it gets to you. If the original author decides that he wants to use a different kind of olive oil, the entire recipe would need to be translated again and resent to you.


Examples of pure compiled languages are C, C++, Erlang, Haskell, Rust, and Go.


Interpreted Languages

Interpreters run through a program line by line and execute each command. Here, if the author decides he wants to use a different kind of olive oil, he could scratch the old one out and add the new one. Your translator friend can then convey that change to you as it happens.


Interpreted languages were once significantly slower than compiled languages. But, with the development of just-in-time compilation, that gap is shrinking.


Examples of common interpreted languages are PHP, Ruby, Python, and JavaScript.


A Small Caveat

Most programming languages can have both compiled and interpreted implementations – the language itself is not necessarily compiled or interpreted. However, for simplicity’s sake, they’re typically referred to as such.


Python, for example, can be executed as either a compiled program or as an interpreted language in interactive mode. On the other hand, most command line tools, CLIs, and shells can theoretically be classified as interpreted languages.


Advantages and disadvantages


Advantages of compiled languages

Programs that are compiled into native machine code tend to be faster than interpreted code. This is because the process of translating code at run time adds to the overhead, and can cause the program to be slower overall.


Disadvantages of compiled languages

The most notable disadvantages are:


Additional time needed to complete the entire compilation step before testing

Platform dependence of the generated binary code


Advantages of interpreted languages

Interpreted languages tend to be more flexible, and often offer features like dynamic typing and smaller program size. Also, because interpreters execute the source program code themselves, the code itself is platform independent.


Disadvantages of interpreted languages

The most notable disadvantage is typical execution speed compared to compiled languages.



Tuesday, May 12, 2015

Machine Learning as Service

Machine Learning as a Service

12 MAY 2015

beautiful header

The machine learning (ML) industry continues to grow apace, and several tools have emerged which give access to advanced learning algorithms with minimal effort. Whether for personal or business gain, machine learning is becoming a service industry, available on-demand, for everyone.

This 'feed in data, get answer back' approach can certainly be a nice alternative to implementing a full ML solution. There is no implementation fuss, and you can trust the tools you use, as they are built by some of the best in the ML field.

ML can be of use to anyone. However, if those with limited knowledge in the field are benefiting from this industry, how much more can those with experience in ML gain from these services?

With this question in mind, I set out to try some machine learning services. My goal is to understand what advantages these services bring compared to a home-made solution.

Others can benefit from this knowledge, so I am sharing what I learn with you in this series.

I am currently experimenting three services: Amazon Machine Learning ServiceGoogle Prediction API, and Microsoft Azure Machine Learning Studio.

There are other machine learning services available. These three, however, also provide a variety of other services such as storage and deployment, making them possible all-in-one solutions for many applications.

My main focus for now is on the services' functionalities. In this post, we are covering data uploading and preprocessing. I will soon post on model training and model evaluation as well.

Other aspects such as integration and scalability are not going to be covered, though they may be in the future.

All tests were performed using the services' consoles. Some functionalities are available only when using the services' API, and such cases will be identified.

Most of the comparison results are presented in tables as the one shown below. I believe this to be a better way for readers to process the results of the benchmarking than wordy descriptions.

Aspects in need of further clarification will be described in more detail.

I will summarise what aspects of data sourcing and preprocessing should be considered when deciding upon a service.1 I will also present some of my thoughts on the matter.2

Data Sourcing

Below is a summary of the aspects I considered during data sourcing. These include data sources, formats, and maximum size, as well as supported data types.

data sourcing

All three services can train models on uploaded text files. Both AWS and MS Azure can also read data from tables in their storage services.

AWS supports the largest datasets for batch training.

Google supports update calls, which one can use to incrementally train the model - that is, to do online training.

MS Azure supports the widest variety of data sources and formats.

There is no clear winner for now, as each service has its strengths and weaknesses.

Data Preprocessing

The table below lists whether certain data preprocessing operations can be performed using these services. The operations covered here are commonly used, but this is not an exhaustive list of all preprocessing techniques.

Keep in mind that you can perform most, if not all these operations using Python or some other language before sending the data to the service. What is being assessed here is whether these operations can be performed within the service.

data preprocessing

It may happen that some transformations are performed behind the scenes, before the actual training takes place. In this table we are referring to explicitly applying the transformations on the data.

In AWS, most operations are performed using the so-called recipes. Recipes are JSON-like scripts used to transform the data before feeding it to a machine learning model.

All the above transformations other than data visualization, data splitting and missing value imputation are done using recipes. For instance, quantile_bin(session_length, 5) would discretize the session_length variable into 5 bins. You can also perform the operations to groups of variables; groups themselves are also defined in the recipes.

Missing value imputation is also indicated as being possible within AWS. Although the transformation is not directly implemented, one can train a simple model - a linear regression, for instance - to predict the missing values. This model can then be chained with the main model. For this reason, I consider AWS as allowing missing value imputation.

In MS Azure, transformations are applied sequentially using the built-in modules. The binning example above could be done using the 'Quatize Data' module. One can choose which variable or variables are affected.

R and Python scripts can also be included to apply custom transformations.

When using Google, most of the data processing will have to be done before feeding the data to the service.

Strings with more than one word are separated into multiple features within Google Prediction API. 'load the data' would be split into 'load', 'the', and 'data'. This type of processing is common in Natural Language Processing (NLP) applications such as document summarization and text translation.

You may choose to make all the data processing before sending the data to any of these services. Though this may mean more work, it is also a way to give you more control - you know exactly what you are doing to your data.

Aspects to Consider

Which service works best for your application? For now, the short answer really is 'it depends'. A number of factors needs to be considered:

These services support data loaded form their own storage services, so how you store your data can prove to be a decisive factor.

Can you handle batch training? If yes, evaluate the typical size of your dataset. On the other hand, if your dataset is really large, or if you want to keep updating the model as you go, consider online training.

When implementing data transformation tools on your side, not having any built-in data transformation tools may not be a problem at all.

If that is not a possibility, know which transformations you need to perform on your data, and understand whether the service you choose offers them. Pay special attention to missing values and text features, as typical application data are sure to have both.

Final Thoughts

Personally, I found MS Azure's flexibility both in data sourcing and preprocessing attractive. I did not use the custom R or Python scripts, mostly because I did not need to.

However, I do like to know exactly what I am doing to the data I feed a model with. Although I was able to quickly transform data using MS Azure, I would still do the data transformation using my own tools. This gives me full control, and allows me to exploit my data's specific traits to perform operations in the most efficient way.

Google provides what I believe to be a key feature in ML applications: incremental training. It allows you to use virtually infinite data. It takes the weight of assessing when to retrain a model off your shoulders.

When it comes to data processing, Amazon lies somewhere between the other two: it has some functionalities, but not many. But given how recent this service is - it was launched little more than a month ago - I see potential. If the service continues to evolve, it may become a very versatile tool.

Data processing is just the beginning, though. I find it too early to make a final decision.

Credit Source: InĂªs Almeida

Sunday, April 12, 2015

Big Data for Security to defend against APT and Zero day

According to Gartner, Big data will change cyber security in network monitoring, identity management, fraud detection,governance, compliance. I listed the following 8 companies (without order of preference) in using big data to defeat zero day and APT attack. The Big data and Cyber Security is at the hyper cycle of growth, I believe that there are at least 50 other companies (big or small or even the startup in stealth modes) are working on a new killer app for using Machine Learning, AI, Deepnet, and Big data to keep ahead of hackers. So, I welcome any comments and please add your preferred tools or products in your comment. 


Niara is making use of big data techniques and Hadoop. "The core intellectual property of Niara is in the collection, storage and analysis of the data," Ramachandran said. "We have been at work for 16 months building the platform."

While some of the components in Niara's platform are open-source, the big challenge has been in aligning an entire application stack to be able to handle the scale that is needed, Ramachandran said. "You have to be very smart about how you process data and how you move it around," Ramachandran said.


2: IBM QRadar Security Intelligence Platform and IBM Big Data Platform


IBM QRadar Security Intelligence Platform and IBM Big Data Platform provide a comprehensive, integrated approach that combines real-time correlation for continuous insight, custom analytics across massive structured and unstructured data, and forensic capabilities for irrefutable evidence. The combination can help you address advanced persistent threats, fraud and insider threats.

The IBM solution is designed to answer questions you could never ask before, by widening the scope and scale of investigation. You can now analyze a greater variety of data – such as DNS transactions, emails, documents, social media data, full packet capture data and business process data – over years of activity. By analyzing structured, enriched security data alongside unstructured data from across the enterprise, the IBM solution helps find malicious activity hidden deep in the masses of an organization's data.



3: Cyphort


The Cyphort Advanced Threat Defense Platform detects advanced malware, prioritizes remediation and automates containment. Cyphort customers benefit from early and reliable detection and fast remediation of breaches across their infrastructure. Our unique approach combines best-in-class malware detection with the knowledge of threat severity, value of targeted user and assets, and malware lifecycle to prioritize threats that matter to you while suppressing the noise. The Cyphort platform is a network-based solution that is designed to be deployed across the entire organization cost effectively. Flexibility to deploy as hardware, software and virtual machine makes Cyphort an ideal solution for large and distributed organizations. 

4: Teradata

www.teradata.com/Cyber-Security-Analytics


5: Intel Security Connected System:

For Intel, "intelligence awareness" translates to a new security product architecture that weaves the existing portfolio of McAfee products, including everything from PC software to data center firewalls, into a data collection backbone feeding a centralized repository used to correlate security anomalies from, across multiple systems

6: Sqrrl
Sqrrl is the Big Data Analytics company that lets organizations pinpoint and react to unusual activity by uncovering hidden connections in their data. Sqrrl Enterprise is Sqrrl's linked data analysis platform that gives analysts a way to visually investigate these connections, allowing them to rapidly understand their surrounding contexts and take action. At the core of Sqrrl's architecture are a variety of Big Data technologies, including Hadoop, link analysis, machine learning, Data-Centric Security, and advanced visualization. 

7: Platfora and MapR Technology 

Platfora provided a wide range of capabilities for preparing the data for analysis which considerably reduced data preparation time. After completing the preparation of the data, the emphasis shifted to developing and understanding the data using a variety of visualization techniques.

8: Splunk

While Splunk can certainly address the tier-1 needs of reduction and correlation, Splunk was designed to support a new paradigm of data discovery. This shift rejects a data reduction strategy in favor of a data inclusion strategy. This supports analysis of very large datasets through data indexing and MapReduce functionality pioneered by Google. This gives Splunk the ability to collect data from virtually any available data source without normalization at collection time and analyze security incidents using analytics and statistical analysis.